Pseudoknot

Last updated
This example of a naturally occurring pseudoknot is found in the RNA component of human telomerase. Sequence from Chen and Greider (2005). Pseudoknot.svg
This example of a naturally occurring pseudoknot is found in the RNA component of human telomerase. Sequence from Chen and Greider (2005).
Three dimensional structure of almost the same pseudoknot from telomerase RNA. (A) sticks (B) backbone. The pdb-file is based on PDB: 1YMO . Colors: A U C G Pseudoknot 1YMO.png
Three dimensional structure of almost the same pseudoknot from telomerase RNA. (A) sticks (B) backbone. The pdb-file is based on PDB: 1YMO . Colors: AUCG

A pseudoknot is a nucleic acid secondary structure containing at least two stem-loop structures in which half of one stem is intercalated between the two halves of another stem. The pseudoknot was first recognized in the turnip yellow mosaic virus in 1982. [2] Pseudoknots fold into knot-shaped three-dimensional conformations but are not true topological knots. These structures are categorized as cross (X) topology within the circuit topology framework, which, in contrast to knot theory, is a contact-based approach.

Contents

Prediction and identification

The structural configuration of pseudoknots does not lend itself well to bio-computational detection due to its context-sensitivity or "overlapping" nature. The base pairing in pseudoknots is not well nested; that is, base pairs occur that "overlap" one another in sequence position. This makes the presence of pseudoknots in RNA sequences more difficult to predict by the standard method of dynamic programming, which use a recursive scoring system to identify paired stems and consequently, most cannot detect non-nested base pairs. The newer method of stochastic context-free grammars suffers from the same problem. Thus, popular secondary structure prediction methods like Mfold and Pfold will not predict pseudoknot structures present in a query sequence; they will only identify the more stable of the two pseudoknot stems.

It is possible to identify a limited class of pseudoknots using dynamic programming, but these methods are not exhaustive and scale worse as a function of sequence length than non-pseudoknotted algorithms. [3] [4] The general problem of predicting lowest free energy structures with pseudoknots has been shown to be NP-complete. [5] [6]

Biological significance

Several important biological processes rely on RNA molecules that form pseudoknots, which are often RNAs with extensive tertiary structure. For example, the pseudoknot region of RNase P is one of the most conserved elements in all of evolution. The telomerase RNA component contains a pseudoknot that is critical for activity, [1] and several viruses use a pseudoknot structure to form a tRNA-like motif to infiltrate the host cell. [7]

Representing pseudoknots

Many types of pseudoknots exist, differing by how they cross and how many times they cross. To reflect this difference, pseudoknots are classed into H-, K-, L-, M-types, with each successive type adding a layer of step intercalation. The simple telomerase P2b-P3 example in the article, for example, is an H-type pseudoknot. [8]

RNA secondary structure is usually represented by the dot-bracket notation, with pairing round brackets () indicating basepairs in a stem and dots representing loops. The interrupted stems of pseudoknots mean that such notation must be extended with extra brackets, or even letters, so that different sets of stems can be represented. One such extension uses, in nesting order, ([{<ABCDE for opening and edcba>}]) for closing. [9] The structure for the two (slightly varying) telomerase examples, in this notation, is:

           (((.(((((........))))).))).   ....]]]]]]. drawing  1 CGCGCGCUGUUUUUCUCGCUGACUUUCAGCGGGCGA---AAAAAAUGUCAGCU  50 ALIGN         |.|||||||||||||||||||||||||  .|.|   |||||| ||||||. 1ymo     1 ---GGGCUGUUUUUCUCGCUGACUUUCAGC--CCCAAACAAAAAA-GUCAGCA  47               ((((((........))))  )).........]]]]]].

Note that U bulge at the end is normally present in telomerase RNA. It was removed in the 1ymo solution model for enhanced stability of the pseudoknot. [10]

See also

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The process of analyzing and interpreting data can some times referred to as computational biology, however this distinction between the two terms is often disputed. To some, the term computational biology refers to building and using models of biological systems.

An inverted repeat is a single stranded sequence of nucleotides followed downstream by its reverse complement. The intervening sequence of nucleotides between the initial sequence and the reverse complement can be any length including zero. For example, 5'---TTACGnnnnnnCGTAA---3' is an inverted repeat sequence. When the intervening length is zero, the composite sequence is a palindromic sequence.

<span class="mw-page-title-main">Sequence alignment</span> Process in bioinformatics that identifies equivalent sites within molecular sequences

In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are typically represented as rows within a matrix. Gaps are inserted between the residues so that identical or similar characters are aligned in successive columns. Sequence alignments are also used for non-biological sequences such as calculating the distance cost between strings in a natural language, or to display financial data.

In theoretical linguistics and computational linguistics, probabilistic context free grammars (PCFGs) extend context-free grammars, similar to how hidden Markov models extend regular grammars. Each production is assigned a probability. The probability of a derivation (parse) is the product of the probabilities of the productions used in that derivation. These probabilities can be viewed as parameters of the model, and for large problems it is convenient to learn these parameters via machine learning. A probabilistic grammar's validity is constrained by context of its training dataset.

In biology, a sequence motif is a nucleotide or amino-acid sequence pattern that is widespread and usually assumed to be related to biological function of the macromolecule. For example, an N-glycosylation site motif can be defined as Asn, followed by anything but Pro, followed by either Ser or Thr, followed by anything but Pro residue.

<span class="mw-page-title-main">Structural alignment</span> Aligning molecular sequences using sequence and structural information

Structural alignment attempts to establish homology between two or more polymer structures based on their shape and three-dimensional conformation. This process is usually applied to protein tertiary structures but can also be used for large RNA molecules. In contrast to simple structural superposition, where at least some equivalent residues of the two structures are known, structural alignment requires no a priori knowledge of equivalent positions. Structural alignment is a valuable tool for the comparison of proteins with low sequence similarity, where evolutionary relationships between proteins cannot be easily detected by standard sequence alignment techniques. Structural alignment can therefore be used to imply evolutionary relationships between proteins that share very little common sequence. However, caution should be used in using the results as evidence for shared evolutionary ancestry because of the possible confounding effects of convergent evolution by which multiple unrelated amino acid sequences converge on a common tertiary structure.

<span class="mw-page-title-main">Transfer-messenger RNA</span>

Transfer-messenger RNA is a bacterial RNA molecule with dual tRNA-like and messenger RNA-like properties. The tmRNA forms a ribonucleoprotein complex (tmRNP) together with Small Protein B (SmpB), Elongation Factor Tu (EF-Tu), and ribosomal protein S1. In trans-translation, tmRNA and its associated proteins bind to bacterial ribosomes which have stalled in the middle of protein biosynthesis, for example when reaching the end of a messenger RNA which has lost its stop codon. The tmRNA is remarkably versatile: it recycles the stalled ribosome, adds a proteolysis-inducing tag to the unfinished polypeptide, and facilitates the degradation of the aberrant messenger RNA. In the majority of bacteria these functions are carried out by standard one-piece tmRNAs. In other bacterial species, a permuted ssrA gene produces a two-piece tmRNA in which two separate RNA chains are joined by base-pairing.

<span class="mw-page-title-main">Biomolecular structure</span> 3D conformation of a biological sequence, like DNA, RNA, proteins

Biomolecular structure is the intricate folded, three-dimensional shape that is formed by a molecule of protein, DNA, or RNA, and that is important to its function. The structure of these molecules may be considered at any of several length scales ranging from the level of individual atoms to the relationships among entire protein subunits. This useful distinction among scales is often expressed as a decomposition of molecular structure into four levels: primary, secondary, tertiary, and quaternary. The scaffold for this multiscale organization of the molecule arises at the secondary level, where the fundamental structural elements are the molecule's various hydrogen bonds. This leads to several recognizable domains of protein structure and nucleic acid structure, including such secondary-structure features as alpha helixes and beta sheets for proteins, and hairpin loops, bulges, and internal loops for nucleic acids. The terms primary, secondary, tertiary, and quaternary structure were introduced by Kaj Ulrik Linderstrøm-Lang in his 1951 Lane Medical Lectures at Stanford University.

Nucleic acid structure prediction is a computational method to determine secondary and tertiary nucleic acid structure from its sequence. Secondary structure can be predicted from one or several nucleic acid sequences. Tertiary structure can be predicted from the sequence, or by comparative modeling.

<span class="mw-page-title-main">Nucleic acid structure</span> Biomolecular structure of nucleic acids such as DNA and RNA

Nucleic acid structure refers to the structure of nucleic acids such as DNA and RNA. Chemically speaking, DNA and RNA are very similar. Nucleic acid structure is often divided into four different levels: primary, secondary, tertiary, and quaternary.

<span class="mw-page-title-main">Nucleic acid secondary structure</span>

Nucleic acid secondary structure is the basepairing interactions within a single nucleic acid polymer or between two polymers. It can be represented as a list of bases which are paired in a nucleic acid molecule. The secondary structures of biological DNAs and RNAs tend to be different: biological DNA mostly exists as fully base paired double helices, while biological RNA is single stranded and often forms complex and intricate base-pairing interactions due to its increased ability to form hydrogen bonds stemming from the extra hydroxyl group in the ribose sugar.

<span class="mw-page-title-main">NUPACK</span>

The Nucleic Acid Package (NUPACK) is a growing software suite for the analysis and design of nucleic acid systems. Jobs can be run online on the NUPACK webserver or NUPACK source code can be downloaded and compiled locally for non-commercial academic use. NUPACK algorithms are formulated in terms of nucleic acid secondary structure. In most cases, pseudoknots are excluded from the structural ensemble.

<span class="mw-page-title-main">DNA annotation</span> The process of describing the structure and function of a genome

In molecular biology and genetics, DNA annotation or genome annotation is the process of describing the structure and function of the components of a genome, by analyzing and interpreting them in order to extract their biological significance and understand the biological processes in which they participate. Among other things, it identifies the locations of genes and all the coding regions in a genome and determines what those genes do.

<span class="mw-page-title-main">Triple helix</span> Set of three congruent geometrical helices with the same axis

In the fields of geometry and biochemistry, a triple helix is a set of three congruent geometrical helices with the same axis, differing by a translation along the axis. This means that each of the helices keeps the same distance from the central axis. As with a single helix, a triple helix may be characterized by its pitch, diameter, and handedness. Examples of triple helices include triplex DNA, triplex RNA, the collagen helix, and collagen-like proteins.

The 3' splice site of the influenza A virus segment 7 pre-mRNA can adopt two different types of RNA structure: a pseudoknot and a hairpin. This conformational switch is proposed to play a role in RNA alternative splicing and may influence the production of M1 and M2 proteins produced by splicing of this pre-mRNA.

Single nucleotide polymorphism annotation is the process of predicting the effect or function of an individual SNP using SNP annotation tools. In SNP annotation the biological information is extracted, collected and displayed in a clear form amenable to query. SNP functional annotation is typically performed based on the available information on nucleic acid and protein sequences.

<span class="mw-page-title-main">Robert Dirks</span> American computational chemist killed in 2015 train wreck

Robert Dirks was an American chemist known for his theoretical and experimental work in DNA nanotechnology. Born in Thailand to a Thai Chinese mother and American father, he moved to Spokane, Washington at a young age. Dirks was the first graduate student in Niles Pierce's research group at the California Institute of Technology, where his dissertation work was on algorithms and computational tools to analyze nucleic acid thermodynamics and predict their structure. He also performed experimental work developing a biochemical chain reaction to self-assemble nucleic acid devices. Dirks later worked at D. E. Shaw Research on algorithms for protein folding that could be used to design new pharmaceuticals.

Non-coding RNAs have been discovered using both experimental and bioinformatic approaches. Bioinformatic approaches can be divided into three main categories. The first involves homology search, although these techniques are by definition unable to find new classes of ncRNAs. The second category includes algorithms designed to discover specific types of ncRNAs that have similar properties. Finally, some discovery methods are based on very general properties of RNA, and are thus able to discover entirely new kinds of ncRNAs.

References

  1. 1 2 Chen, JL; Greider, CW (7 June 2005). "Functional analysis of the pseudoknot structure in human telomerase RNA". Proceedings of the National Academy of Sciences of the United States of America. 102 (23): 8077–9. Bibcode:2005PNAS..102.8080C. doi: 10.1073/pnas.0502259102 . PMC   1149427 . PMID   15849264.
  2. Staple DW, Butcher SE (June 2005). "Pseudoknots: RNA structures with diverse functions". PLOS Biol. 3 (6): e213. doi: 10.1371/journal.pbio.0030213 . PMC   1149493 . PMID   15941360.
  3. Rivas E, Eddy S. (1999). "A dynamic programming algorithm for RNA structure prediction including pseudoknots". J Mol Biol285(5): 2053–2068.
  4. Dirks, R.M. Pierce N.A. (2004) An algorithm for computing nucleic acid base-pairing probabilities including pseudoknots. "J Computation Chemistry". 25:1295-1304, 2004.
  5. Lyngsø, Rune B.; Pedersen, Christian N. S. (2000). "RNA Pseudoknot Prediction in Energy-Based Models". Journal of Computational Biology. 7 (3–4): 409–427. doi:10.1089/106652700750050862. ISSN   1066-5277.
  6. Lyngsø, Rune B. (2004). "Complexity of Pseudoknot Prediction in Simple Models". Automata, Languages and Programming. Vol. 3142. Berlin, Heidelberg: Springer Berlin Heidelberg. p. 919–931. doi:10.1007/978-3-540-27836-8_77. ISBN   978-3-540-22849-3.
  7. Pleij CW, Rietveld K, Bosch L (1985). "A new principle of RNA folding based on pseudoknotting". Nucleic Acids Res. 13 (5): 1717–31. doi:10.1093/nar/13.5.1717. PMC   341107 . PMID   4000943.
  8. Kucharík, M; Hofacker, IL; Stadler, PF; Qin, J (15 January 2016). "Pseudoknots in RNA folding landscapes". Bioinformatics. 32 (2): 187–94. doi:10.1093/bioinformatics/btv572. PMC   4708108 . PMID   26428288.
  9. Antczak, M; Popenda, M; Zok, T; Zurkowski, M; Adamiak, RW; Szachniuk, M (15 April 2018). "New algorithms to represent complex pseudoknotted RNA structures in dot-bracket notation". Bioinformatics. 34 (8): 1304–1312. doi:10.1093/bioinformatics/btx783. PMC   5905660 . PMID   29236971.
  10. Theimer, CA; Blois, CA; Feigon, J (4 March 2005). "Structure of the human telomerase RNA pseudoknot reveals conserved tertiary interactions essential for function". Molecular Cell. 17 (5): 671–82. doi: 10.1016/j.molcel.2005.01.017 . PMID   15749017.