Pseudoknot

Last updated
This example of a naturally occurring pseudoknot is found in the RNA component of human telomerase. Sequence from Chen and Greider (2005). Pseudoknot.svg
This example of a naturally occurring pseudoknot is found in the RNA component of human telomerase. Sequence from Chen and Greider (2005).
Three dimensional structure of almost the same pseudoknot from telomerase RNA. (A) sticks (B) backbone. The pdb-file is based on PDB: 1YMO . Colors: A U C G Pseudoknot 1YMO.png
Three dimensional structure of almost the same pseudoknot from telomerase RNA. (A) sticks (B) backbone. The pdb-file is based on PDB: 1YMO . Colors: AUCG

A pseudoknot is a nucleic acid secondary structure containing at least two stem-loop structures in which half of one stem is intercalated between the two halves of another stem. The pseudoknot was first recognized in the turnip yellow mosaic virus in 1982. [2] Pseudoknots fold into knot-shaped three-dimensional conformations but are not true topological knots. These structures are categorized as cross (X) topology within the circuit topology framework, which, in contrast to knot theory, is a contact-based approach.

Contents

Prediction and identification

The structural configuration of pseudoknots does not lend itself well to bio-computational detection due to its context-sensitivity or "overlapping" nature. The base pairing in pseudoknots is not well nested; that is, base pairs occur that "overlap" one another in sequence position. This makes the presence of pseudoknots in RNA sequences more difficult to predict by the standard method of dynamic programming, which use a recursive scoring system to identify paired stems and consequently, most cannot detect non-nested base pairs. The newer method of stochastic context-free grammars suffers from the same problem. Thus, popular secondary structure prediction methods like Mfold and Pfold will not predict pseudoknot structures present in a query sequence; they will only identify the more stable of the two pseudoknot stems.

It is possible to identify a limited class of pseudoknots using dynamic programming, but these methods are not exhaustive and scale worse as a function of sequence length than non-pseudoknotted algorithms. [3] [4] The general problem of predicting lowest free energy structures with pseudoknots has been shown to be NP-complete. [5] [6]

Biological significance

Several important biological processes rely on RNA molecules that form pseudoknots, which are often RNAs with extensive tertiary structure. For example, the pseudoknot region of RNase P is one of the most conserved elements in all of evolution. The telomerase RNA component contains a pseudoknot that is critical for activity, [1] and several viruses use a pseudoknot structure to form a tRNA-like motif to infiltrate the host cell. [7]

Representing pseudoknots

Many types of pseudoknots exist, differing by how they cross and how many times they cross. To reflect this difference, pseudoknots are classed into H-, K-, L-, M-types, with each successive type adding a layer of step intercalation. The simple telomerase P2b-P3 example in the article, for example, is an H-type pseudoknot. [8]

RNA secondary structure is usually represented by the dot-bracket notation, with pairing round brackets () indicating basepairs in a stem and dots representing loops. The interrupted stems of pseudoknots mean that such notation must be extended with extra brackets, or even letters, so that different sets of stems can be represented. One such extension uses, in nesting order, ([{<ABCDE for opening and edcba>}]) for closing. [9] The structure for the two (slightly varying) telomerase examples, in this notation, is:

           (((.(((((........))))).))).   ....]]]]]]. drawing  1 CGCGCGCUGUUUUUCUCGCUGACUUUCAGCGGGCGA---AAAAAAUGUCAGCU  50 ALIGN         |.|||||||||||||||||||||||||  .|.|   |||||| ||||||. 1ymo     1 ---GGGCUGUUUUUCUCGCUGACUUUCAGC--CCCAAACAAAAAA-GUCAGCA  47               ((((((........))))  )).........]]]]]].

Note that U bulge at the end is normally present in telomerase RNA. It was removed in the 1ymo solution model for enhanced stability of the pseudoknot. [10]

See also

Related Research Articles

An inverted repeat is a single stranded sequence of nucleotides followed downstream by its reverse complement. The intervening sequence of nucleotides between the initial sequence and the reverse complement can be any length including zero. For example, 5'---TTACGnnnnnnCGTAA---3' is an inverted repeat sequence. When the intervening length is zero, the composite sequence is a palindromic sequence.

<span class="mw-page-title-main">Sequence alignment</span> Process in bioinformatics that identifies equivalent sites within molecular sequences

In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are typically represented as rows within a matrix. Gaps are inserted between the residues so that identical or similar characters are aligned in successive columns. Sequence alignments are also used for non-biological sequences, such as calculating the distance cost between strings in a natural language or in financial data.

In bioinformatics, sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of a wide range of analytical methods to understand its features, function, structure, or evolution. Methodologies used include sequence alignment, searches against biological databases, and others.

Grammar theory to model symbol strings originated from work in computational linguistics aiming to understand the structure of natural languages. Probabilistic context free grammars (PCFGs) have been applied in probabilistic modeling of RNA structures almost 40 years after they were introduced in computational linguistics.

In biology, a sequence motif is a nucleotide or amino-acid sequence pattern that is widespread and usually assumed to be related to biological function of the macromolecule. For example, an N-glycosylation site motif can be defined as Asn, followed by anything but Pro, followed by either Ser or Thr, followed by anything but Pro residue.

<span class="mw-page-title-main">Structural alignment</span> Aligning molecular sequences using sequence and structural information

Structural alignment attempts to establish homology between two or more polymer structures based on their shape and three-dimensional conformation. This process is usually applied to protein tertiary structures but can also be used for large RNA molecules. In contrast to simple structural superposition, where at least some equivalent residues of the two structures are known, structural alignment requires no a priori knowledge of equivalent positions. Structural alignment is a valuable tool for the comparison of proteins with low sequence similarity, where evolutionary relationships between proteins cannot be easily detected by standard sequence alignment techniques. Structural alignment can therefore be used to imply evolutionary relationships between proteins that share very little common sequence. However, caution should be used in using the results as evidence for shared evolutionary ancestry because of the possible confounding effects of convergent evolution by which multiple unrelated amino acid sequences converge on a common tertiary structure.

<span class="mw-page-title-main">Transfer-messenger RNA</span>

Transfer-messenger RNA is a bacterial RNA molecule with dual tRNA-like and messenger RNA-like properties. The tmRNA forms a ribonucleoprotein complex (tmRNP) together with Small Protein B (SmpB), Elongation Factor Tu (EF-Tu), and ribosomal protein S1. In trans-translation, tmRNA and its associated proteins bind to bacterial ribosomes which have stalled in the middle of protein biosynthesis, for example when reaching the end of a messenger RNA which has lost its stop codon. The tmRNA is remarkably versatile: it recycles the stalled ribosome, adds a proteolysis-inducing tag to the unfinished polypeptide, and facilitates the degradation of the aberrant messenger RNA. In the majority of bacteria these functions are carried out by standard one-piece tmRNAs. In other bacterial species, a permuted ssrA gene produces a two-piece tmRNA in which two separate RNA chains are joined by base-pairing.

<span class="mw-page-title-main">Multiple sequence alignment</span> Alignment of more than two molecular sequences

Multiple sequence alignment (MSA) may refer to the process or the result of sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a linkage and are descended from a common ancestor. From the resulting MSA, sequence homology can be inferred and phylogenetic analysis can be conducted to assess the sequences' shared evolutionary origins. Visual depictions of the alignment as in the image at right illustrate mutation events such as point mutations that appear as differing characters in a single alignment column, and insertion or deletion mutations that appear as hyphens in one or more of the sequences in the alignment. Multiple sequence alignment is often used to assess sequence conservation of protein domains, tertiary and secondary structures, and even individual amino acids or nucleotides.

Protein–protein interaction prediction is a field combining bioinformatics and structural biology in an attempt to identify and catalog physical interactions between pairs or groups of proteins. Understanding protein–protein interactions is important for the investigation of intracellular signaling pathways, modelling of protein complex structures and for gaining insights into various biochemical processes.

<span class="mw-page-title-main">Biomolecular structure</span> 3D conformation of a biological sequence, like DNA, RNA, proteins

Biomolecular structure is the intricate folded, three-dimensional shape that is formed by a molecule of protein, DNA, or RNA, and that is important to its function. The structure of these molecules may be considered at any of several length scales ranging from the level of individual atoms to the relationships among entire protein subunits. This useful distinction among scales is often expressed as a decomposition of molecular structure into four levels: primary, secondary, tertiary, and quaternary. The scaffold for this multiscale organization of the molecule arises at the secondary level, where the fundamental structural elements are the molecule's various hydrogen bonds. This leads to several recognizable domains of protein structure and nucleic acid structure, including such secondary-structure features as alpha helixes and beta sheets for proteins, and hairpin loops, bulges, and internal loops for nucleic acids. The terms primary, secondary, tertiary, and quaternary structure were introduced by Kaj Ulrik Linderstrøm-Lang in his 1951 Lane Medical Lectures at Stanford University.

Nucleic acid structure prediction is a computational method to determine secondary and tertiary nucleic acid structure from its sequence. Secondary structure can be predicted from one or several nucleic acid sequences. Tertiary structure can be predicted from the sequence, or by comparative modeling.

<span class="mw-page-title-main">Nucleic acid tertiary structure</span> Three-dimensional shape of a nucleic acid polymer

Nucleic acid tertiary structure is the three-dimensional shape of a nucleic acid polymer. RNA and DNA molecules are capable of diverse functions ranging from molecular recognition to catalysis. Such functions require a precise three-dimensional structure. While such structures are diverse and seemingly complex, they are composed of recurring, easily recognizable tertiary structural motifs that serve as molecular building blocks. Some of the most common motifs for RNA and DNA tertiary structure are described below, but this information is based on a limited number of solved structures. Many more tertiary structural motifs will be revealed as new RNA and DNA molecules are structurally characterized.

<span class="mw-page-title-main">Nucleic acid structure</span> Biomolecular structure of nucleic acids such as DNA and RNA

Nucleic acid structure refers to the structure of nucleic acids such as DNA and RNA. Chemically speaking, DNA and RNA are very similar. Nucleic acid structure is often divided into four different levels: primary, secondary, tertiary, and quaternary.

<span class="mw-page-title-main">Nucleic acid secondary structure</span>

Nucleic acid secondary structure is the basepairing interactions within a single nucleic acid polymer or between two polymers. It can be represented as a list of bases which are paired in a nucleic acid molecule. The secondary structures of biological DNAs and RNAs tend to be different: biological DNA mostly exists as fully base paired double helices, while biological RNA is single stranded and often forms complex and intricate base-pairing interactions due to its increased ability to form hydrogen bonds stemming from the extra hydroxyl group in the ribose sugar.

<span class="mw-page-title-main">NUPACK</span>

The Nucleic Acid Package (NUPACK) is a growing software suite for the analysis and design of nucleic acid systems. Jobs can be run online on the NUPACK webserver or NUPACK source code can be downloaded and compiled locally for non-commercial academic use. NUPACK algorithms are formulated in terms of nucleic acid secondary structure. In most cases, pseudoknots are excluded from the structural ensemble.

<span class="mw-page-title-main">Triple helix</span> Set of three congruent geometrical helices with the same axis

In the fields of geometry and biochemistry, a triple helix is a set of three congruent geometrical helices with the same axis, differing by a translation along the axis. This means that each of the helices keeps the same distance from the central axis. As with a single helix, a triple helix may be characterized by its pitch, diameter, and handedness. Examples of triple helices include triplex DNA, triplex RNA, the collagen helix, and collagen-like proteins.

The 3' splice site of the influenza A virus segment 7 pre-mRNA can adopt two different types of RNA structure: a pseudoknot and a hairpin. This conformational switch is proposed to play a role in RNA alternative splicing and may influence the production of M1 and M2 proteins produced by splicing of this pre-mRNA.

Non-canonical base pairs are planar hydrogen bonded pairs of nucleobases, having hydrogen bonding patterns which differ from the patterns observed in Watson-Crick base pairs, as in the classic double helical DNA. The structures of polynucleotide strands of both DNA and RNA molecules can be understood in terms of sugar-phosphate backbones consisting of phosphodiester-linked D 2’ deoxyribofuranose sugar moieties, with purine or pyrimidine nucleobases covalently linked to them. Here, the N9 atoms of the purines, guanine and adenine, and the N1 atoms of the pyrimidines, cytosine and thymine, respectively, form glycosidic linkages with the C1’ atom of the sugars. These nucleobases can be schematically represented as triangles with one of their vertices linked to the sugar, and the three sides accounting for three edges through which they can form hydrogen bonds with other moieties, including with other nucleobases. The side opposite to the sugar linked vertex is traditionally called the Watson-Crick edge, since they are involved in forming the Watson-Crick base pairs which constitute building blocks of double helical DNA. The two sides adjacent to the sugar-linked vertex are referred to, respectively, as the Sugar and Hoogsteen edges.

Non-coding RNAs have been discovered using both experimental and bioinformatic approaches. Bioinformatic approaches can be divided into three main categories. The first involves homology search, although these techniques are by definition unable to find new classes of ncRNAs. The second category includes algorithms designed to discover specific types of ncRNAs that have similar properties. Finally, some discovery methods are based on very general properties of RNA, and are thus able to discover entirely new kinds of ncRNAs.

References

  1. 1 2 Chen, JL; Greider, CW (7 June 2005). "Functional analysis of the pseudoknot structure in human telomerase RNA". Proceedings of the National Academy of Sciences of the United States of America. 102 (23): 8077–9. Bibcode:2005PNAS..102.8080C. doi: 10.1073/pnas.0502259102 . PMC   1149427 . PMID   15849264.
  2. Staple DW, Butcher SE (June 2005). "Pseudoknots: RNA structures with diverse functions". PLOS Biol. 3 (6): e213. doi: 10.1371/journal.pbio.0030213 . PMC   1149493 . PMID   15941360.
  3. Rivas E, Eddy S. (1999). "A dynamic programming algorithm for RNA structure prediction including pseudoknots". J Mol Biol285(5): 2053–2068.
  4. Dirks, R.M. Pierce N.A. (2004) An algorithm for computing nucleic acid base-pairing probabilities including pseudoknots. "J Computation Chemistry". 25:1295-1304, 2004.
  5. Lyngsø RB, Pedersen CN. (2000). "RNA pseudoknot prediction in energy-based models". J Comput Biol7(3–4): 409–427.
  6. Lyngsø, R. B. (2004). Complexity of pseudoknot prediction in simple models. Paper presented at the ICALP.
  7. Pleij CW, Rietveld K, Bosch L (1985). "A new principle of RNA folding based on pseudoknotting". Nucleic Acids Res. 13 (5): 1717–31. doi:10.1093/nar/13.5.1717. PMC   341107 . PMID   4000943.
  8. Kucharík, M; Hofacker, IL; Stadler, PF; Qin, J (15 January 2016). "Pseudoknots in RNA folding landscapes". Bioinformatics. 32 (2): 187–94. doi:10.1093/bioinformatics/btv572. PMC   4708108 . PMID   26428288.
  9. Antczak, M; Popenda, M; Zok, T; Zurkowski, M; Adamiak, RW; Szachniuk, M (15 April 2018). "New algorithms to represent complex pseudoknotted RNA structures in dot-bracket notation". Bioinformatics. 34 (8): 1304–1312. doi:10.1093/bioinformatics/btx783. PMC   5905660 . PMID   29236971.
  10. Theimer, CA; Blois, CA; Feigon, J (4 March 2005). "Structure of the human telomerase RNA pseudoknot reveals conserved tertiary interactions essential for function". Molecular Cell. 17 (5): 671–82. doi: 10.1016/j.molcel.2005.01.017 . PMID   15749017.