In bioinformatics, SIDD is short for Stress-Induced (DNA) Duplex Destabilization. It is the melting of the DNA which is not induced by a promoter, but purely by the superhelical (also called topological) nature of the DNA. [1] It is based on a statistical mechanics treatment of DNA made by Craig J. Benham and Richard M. Fye. [2] This stress-induced unwinding was shown to coincide with DNA promoter regions of bacterial plasmids and may direct the global response of cells to changes in their external environments by affecting which genes are transcribed.
The computational model itself calculates the probability profile of a given base-pair sequence of DNA to denature, as well as the energy profile of sequence. It is through this energy profile that the technique derives its name: base pairs at lower energies are less stable (destabilized) than those of higher energies and more likely to denature. Stress related to the linking number (specifically its twist component) of the DNA causes the destabilization of the double helix (duplex); hence, Stress-Induced Duplex Destabilization.
Craig Benham has also developed an online applet that calculates the SIDD profile of input DNA sequences. [3] It also shows the probability profile for the given base pair sequence to denature, as well as counting the number and location of denaturation runs.
As the full SIDD computational method takes up a large amount of machine processing time (due to its complex nature), an accelerated algorithm proposed by Benham, et al., in their 1999 paper is implemented in the WebSIDD algorithm. This accelerated algorithm truncates the partition function by ignoring contributions of certain conformational states.[ citation needed ]
Bioinformatics is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combines biology, chemistry, physics, computer science, information engineering, mathematics and statistics to analyze and interpret the biological data. Bioinformatics has been used for in silico analyses of biological queries using computational and statistical techniques.
In biochemistry, denaturation is a process in which proteins or nucleic acids lose the quaternary structure, tertiary structure, and secondary structure which is present in their native state, by application of some external stress or compound such as a strong acid or base, a concentrated inorganic salt, an organic solvent, agitation and radiation or heat. If proteins in a living cell are denatured, this results in disruption of cell activity and possibly cell death. Protein denaturation is also a consequence of cell death. Denatured proteins can exhibit a wide range of characteristics, from conformational change and loss of solubility to aggregation due to the exposure of hydrophobic groups. The loss of solubility as a result of denaturation is called coagulation. Denatured proteins lose their 3D structure and therefore cannot function.
In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are typically represented as rows within a matrix. Gaps are inserted between the residues so that identical or similar characters are aligned in successive columns. Sequence alignments are also used for non-biological sequences, such as calculating the distance cost between strings in a natural language or in financial data.
Grammar theory to model symbol strings originated from work in computational linguistics aiming to understand the structure of natural languages. Probabilistic context free grammars (PCFGs) have been applied in probabilistic modeling of RNA structures almost 40 years after they were introduced in computational linguistics.
In bioinformatics, BLAST is an algorithm and program for comparing primary biological sequence information, such as the amino-acid sequences of proteins or the nucleotides of DNA and/or RNA sequences. A BLAST search enables a researcher to compare a subject protein or nucleotide sequence with a library or database of sequences, and identify database sequences that resemble alphabet above a certain threshold. For example, following the discovery of a previously unknown gene in the mouse, a scientist will typically perform a BLAST search of the human genome to see if humans carry a similar gene; BLAST will identify sequences in the human genome that resemble the mouse gene based on similarity of sequence.
In biology, a sequence motif is a nucleotide or amino-acid sequence pattern that is widespread and usually assumed to be related to biological function of the macromolecule. For example, an N-glycosylation site motif can be defined as Asn, followed by anything but Pro, followed by either Ser or Thr, followed by anything but Pro residue.
A Gap penalty is a method of scoring alignments of two or more sequences. When aligning sequences, introducing gaps in the sequences can allow an alignment algorithm to match more terms than a gap-less alignment can. However, minimizing gaps in an alignment is important to create a useful alignment. Too many gaps can cause an alignment to become meaningless. Gap penalties are used to adjust alignment scores based on the number and length of gaps. The five main types of gap penalties are constant, linear, affine, convex, and profile-based.
A chaotropic agent is a molecule in water solution that can disrupt the hydrogen bonding network between water molecules. This has an effect on the stability of the native state of other molecules in the solution, mainly macromolecules by weakening the hydrophobic effect. For example, a chaotropic agent reduces the amount of order in the structure of a protein formed by water molecules, both in the bulk and the hydration shells around hydrophobic amino acids, and may cause its denaturation.
Temperature gradient gel electrophoresis (TGGE) and denaturing gradient gel electrophoresis (DGGE) are forms of electrophoresis which use either a temperature or chemical gradient to denature the sample as it moves across an acrylamide gel. TGGE and DGGE can be applied to nucleic acids such as DNA and RNA, and proteins. TGGE relies on temperature dependent changes in structure to separate nucleic acids. DGGE separates genes of the same size based on their different denaturing ability which is determined by their base pair sequence. DGGE was the original technique, and TGGE a refinement of it.
Triple-stranded DNA is a DNA structure in which three oligonucleotides wind around each other and form a triple helix. In triple-stranded DNA, the third strand binds to a B-form DNA double helix by forming Hoogsteen base pairs or reversed Hoogsteen hydrogen bonds.
Cis-regulatory elements (CREs) or Cis-regulatory modules (CRMs) are regions of non-coding DNA which regulate the transcription of neighboring genes. CREs are vital components of genetic regulatory networks, which in turn control morphogenesis, the development of anatomy, and other aspects of embryonic development, studied in evolutionary developmental biology.
Multiple sequence alignment (MSA) may refer to the process or the result of sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a linkage and are descended from a common ancestor. From the resulting MSA, sequence homology can be inferred and phylogenetic analysis can be conducted to assess the sequences' shared evolutionary origins. Visual depictions of the alignment as in the image at right illustrate mutation events such as point mutations that appear as differing characters in a single alignment column, and insertion or deletion mutations that appear as hyphens in one or more of the sequences in the alignment. Multiple sequence alignment is often used to assess sequence conservation of protein domains, tertiary and secondary structures, and even individual amino acids or nucleotides.
Nucleic acid structure prediction is a computational method to determine secondary and tertiary nucleic acid structure from its sequence. Secondary structure can be predicted from one or several nucleic acid sequences. Tertiary structure can be predicted from the sequence, or by comparative modeling.
DNA supercoiling refers to the amount of twist in a particular DNA strand, which determines the amount of strain on it. A given strand may be "positively supercoiled" or "negatively supercoiled". The amount of a strand’s supercoiling affects a number of biological processes, such as compacting DNA and regulating access to the genetic code. Certain enzymes, such as topoisomerases, change the amount of DNA supercoiling to facilitate functions such as DNA replication and transcription. The amount of supercoiling in a given strand is described by a mathematical formula that compares it to a reference state known as "relaxed B-form" DNA.
Nucleic acid thermodynamics is the study of how temperature affects the nucleic acid structure of double-stranded DNA (dsDNA). The melting temperature (Tm) is defined as the temperature at which half of the DNA strands are in the random coil or single-stranded (ssDNA) state. Tm depends on the length of the DNA molecule and its specific nucleotide sequence. DNA, when in a state where its two strands are dissociated, is referred to as having been denatured by the high temperature.
Nucleic acid design is the process of generating a set of nucleic acid base sequences that will associate into a desired conformation. Nucleic acid design is central to the fields of DNA nanotechnology and DNA computing. It is necessary because there are many possible sequences of nucleic acid strands that will fold into a given secondary structure, but many of these sequences will have undesired additional interactions which must be avoided. In addition, there are many tertiary structure considerations which affect the choice of a secondary structure for a given design.
HMMER is a free and commonly used software package for sequence analysis written by Sean Eddy. Its general usage is to identify homologous protein or nucleotide sequences, and to perform sequence alignments. It detects homology by comparing a profile-HMM to either a single sequence or a database of sequences. Sequences that score significantly better to the profile-HMM compared to a null model are considered to be homologous to the sequences that were used to construct the profile-HMM. Profile-HMMs are constructed from a multiple sequence alignment in the HMMER package using the hmmbuild program. The profile-HMM implementation used in the HMMER software was based on the work of Krogh and colleagues. HMMER is a console utility ported to every major operating system, including different versions of Linux, Windows, and Mac OS.
The Nucleic Acid Package (NUPACK) is a growing software suite for the analysis and design of nucleic acid systems. Jobs can be run online on the NUPACK webserver or NUPACK source code can be downloaded and compiled locally for non-commercial academic use. NUPACK algorithms are formulated in terms of nucleic acid secondary structure. In most cases, pseudoknots are excluded from the structural ensemble.
The term S/MAR, otherwise called SAR, or MAR, are sequences in the DNA of eukaryotic chromosomes where the nuclear matrix attaches. As architectural DNA components that organize the genome of eukaryotes into functional units within the cell nucleus, S/MARs mediate structural organization of the chromatin within the nucleus. These elements constitute anchor points of the DNA for the chromatin scaffold and serve to organize the chromatin into structural domains. Studies on individual genes led to the conclusion that the dynamic and complex organization of the chromatin mediated by S/MAR elements plays an important role in the regulation of gene expression.
Denaturation Mapping is a form of optical mapping, first described in 1966. It is used to characterize DNA molecules without the need for amplification or sequencing. It is based on the differences between the melting temperatures of AT-rich and GC-rich regions. Even though modern sequencing methods reduced the need for denaturation mapping, it is still being used for specific purposes, such as detection of large scale structural variants.