Nucleic acid structure prediction

Last updated

Nucleic acid structure prediction is a computational method to determine secondary and tertiary nucleic acid structure from its sequence. Secondary structure can be predicted from one or several nucleic acid sequences. Tertiary structure can be predicted from the sequence, or by comparative modeling (when the structure of a homologous sequence is known).

Contents

The problem of predicting nucleic acid secondary structure is dependent mainly on base pairing and base stacking interactions; many molecules have several possible three-dimensional structures, so predicting these structures remains out of reach unless obvious sequence and functional similarity to a known class of nucleic acid molecules, such as transfer RNA (tRNA) or microRNA (miRNA), is observed. Many secondary structure prediction methods rely on variations of dynamic programming and therefore are unable to efficiently identify pseudoknots.

While the methods are similar, there are slight differences in the approaches to RNA and DNA structure prediction. In vivo , DNA structures are more likely to be duplexes with full complementarity between two strands, while RNA structures are more likely to fold into complex secondary and tertiary structures such as in the ribosome, spliceosome, or transfer RNA. This is partly because the extra oxygen in RNA increases the propensity for hydrogen bonding in the nucleic acid backbone. The energy parameters are also different for the two nucleic acids. The structure prediction methods can follow a completely theoretical approach, or a hybrid one incorporating experimental data. [1] [2]

Single sequence structure prediction

A common problem for researchers working with RNA is to determine the three-dimensional structure of the molecule given only a nucleic acid sequence. However, in the case of RNA much of the final structure is determined by the secondary structure or intra-molecular base pairing interactions of the molecule. This is shown by the high conservation of base pairings across diverse species.

The most stable structure

Secondary structure of small RNA molecules is largely determined by strong, local interactions such as hydrogen bonds and base stacking. Summing the free energy for such interactions should provide an approximation for the stability of a given structure. To predict the folding free energy of a given secondary structure, an empirical nearest-neighbor model is used. In the nearest neighbor model the free energy change for each motif depends on the sequence of the motif and of its closest base-pairs. [3] The model and parameters of minimal energy for Watson–Crick pairs, GU pairs and loop regions were derived from empirical calorimetric experiments, the most up-to-date parameters were published in 2004, [4] although most software packages use the prior set assembled in 1999. [5]

The simplest way to find the lowest free energy structure would be to generate all possible structures and calculate the free energy for it, but the number of possible structures for a sequence increases exponentially with the length of RNA: number of secondary structures = (1,8)N, N- number of nucleotides. [6] For longer molecules, the number of possible secondary structures is huge: a sequence of 100 nucleotides has more than 1025 possible secondary structures. [3]

Dynamic programming algorithms

Most popular methods for predicting RNA and DNA's secondary structure involve dynamic programming. [7] [8] One of the early attempts at predicting RNA secondary structure was made by Ruth Nussinov and co-workers who developed a dynamic programming-based algorithm that maximized the length and number of a series of "blocks" (polynucleotide chains). [7] Each "block" required at least two nucleotides, which reduced the algorithm's storage requirements over single base-matching approaches. [7] Nussinov et al. later published an adapted approach with improved performance that increased the RNA size limit to ~1,000 bases by folding increasingly sized subsections while storing the results of prior folds, now known as the Nussinov algorithm. [8] In 1981, Michael Zuker and Patrick Stiegler proposed a refined approach with performance comparable to Nussinov et al.'s solution but with the additional ability to find also find "suboptimal" secondary structures. [9]

Dynamic programming algorithms provide a means to implicitly check all variants of possible RNA secondary structures without explicitly generating the structures. First, the lowest conformational free energy is determined for each possible sequence fragment starting with the shortest fragments and then for longer fragments. For longer fragments, recursion on the optimal free energy changes determined for shorter sequences speeds the determination of the lowest folding free energy. Once the lowest free energy of the complete sequence is calculated, the exact structure of RNA molecule is determined. [3]

Dynamic programming algorithms are commonly used to detect base pairing patterns that are "well-nested", that is, form hydrogen bonds only to bases that do not overlap one another in sequence position. Secondary structures that fall into this category include double helices, stem-loops, and variants of the "cloverleaf" pattern found in transfer RNA molecules. These methods rely on pre-calculated parameters which estimate the free energy associated with certain types of base-pairing interactions, including Watson-Crick and Hoogsteen base pairs. Depending on the complexity of the method, single base pairs may be considered, and short two- or three-base segments, to incorporate the effects of base stacking. This method cannot identify pseudoknots, which are not well nested, without substantial algorithmic modifications that are computationally very costly. [10]

Suboptimal structures

The accuracy of RNA secondary structure prediction from one sequence by free energy minimization is limited by several factors:

  1. The free energy value's list in nearest neighbor model is incomplete
  2. Not all known RNA folds in such a way as to conform with the thermodynamic minimum.
  3. Some RNA sequences have more than one biologically active conformation (i.e., riboswitches)

For this reason, the ability to predict structures which have similar low free energy can provide significant information. Such structures are termed suboptimal structures. MFOLD is one program that generates suboptimal structures. [11]

Predicting pseudoknots

One of the issues when predicting RNA secondary structure is that the standard free energy minimization and statistical sampling methods can not find pseudoknots. [5] The major problem is that the usual dynamic programing algorithms, when predicting secondary structure, consider only the interactions between the closest nucleotides, while pseudoknotted structures are formed due to interactions between distant nucleotides. Rivas and Eddy published a dynamic programming algorithm for predicting pseudoknots. [10] However, this dynamic programming algorithm is very slow. The standard dynamic programming algorithm for free energy minimization scales O(N3) in time (N is the number of nucleotides in the sequence), while the Rivas and Eddy algorithm scales O(N6) in time. This has prompted several researchers to implement versions of the algorithm that restrict classes of pseudoknots, resulting in performance gains. For example, pknotsRG tool includes only the class of simple recursive pseudoknots and scales O(N4) in time. [12]

Other approaches for RNA secondary structure prediction

Another approach for RNA secondary structure determination is to sample structures from the Boltzmann ensemble, [13] [14] as exemplified by the program SFOLD. The program generates a statistical sample of all possible RNA secondary structures. The algorithm samples secondary structures according to the Boltzmann distribution. The sampling method offers an appealing solution to the problem of uncertainties in folding. [14]

Comparative secondary structure prediction

S. cerevisiae tRNA-PHE structure space: the energies and structures were calculated using RNAsubopt and the structure distances computed using RNAdistance. TRNA structure space.png
S. cerevisiae tRNA-PHE structure space: the energies and structures were calculated using RNAsubopt and the structure distances computed using RNAdistance.

Sequence covariation methods rely on the existence of a data set composed of multiple homologous RNA sequences with related but dissimilar sequences. These methods analyze the covariation of individual base sites in evolution; maintenance at two widely separated sites of a pair of base-pairing nucleotides indicates the presence of a structurally required hydrogen bond between those positions. The general problem of pseudoknot prediction has been shown to be NP-complete. [15]

In general, the problem of alignment and consensus structure prediction are closely related. Three different approaches to the prediction of consensus structures can be distinguished: [16]

  1. Folding of alignment
  2. Simultaneous sequence alignment and folding
  3. Alignment of predicted structures

Align then fold

A practical heuristic approach is to use multiple sequence alignment tools to produce an alignment of several RNA sequences, to find consensus sequence and then fold it. The quality of the alignment determines the accuracy of the consensus structure model. Consensus sequences are folded using various approaches similarly as in individual structure prediction problem. The thermodynamic folding approach is exemplified by RNAalifold program. [17] The different approaches are exemplified by Pfold and ILM programs. Pfold program implements a SCFGs. [18] ILM (iterated loop matching) unlike the other algorithms for folding of alignments, can return pseudoknoted structures. It uses combination of thermodynamics and mutual information content scores. [19]

Align and fold

Evolution frequently preserves functional RNA structure better than RNA sequence. [17] Hence, a common biological problem is to infer a common structure for two or more highly diverged but homologous RNA sequences. In practice, sequence alignments become unsuitable and do not help to improve the accuracy of structure prediction, when sequence similarity of two sequences is less than 50%. [20]

Structure-based alignment programs improves the performance of these alignments and most of them are variants of the Sankoff algorithm. [21] Basically, Sankoff algorithm is a merger of sequence alignment and Nussinov [7] (maximal-pairing) folding dynamic programming method. [22] Sankoff algorithm itself is a theoretical exercise because it requires extreme computational resources (O(n3m) in time, and O(n2m) in space, where n is the sequence length and m is the number of sequences). Some notable attempts at implementing restricted versions of Sankoff's algorithm are Foldalign, [23] [24] Dynalign, [25] [26] PMmulti/PMcomp, [22] Stemloc, [27] and Murlet. [28] In these implementations the maximal length of alignment or variants of possible consensus structures are restricted. For example, Foldalign focuses on local alignments and restricts the possible length of the sequences alignment.

Fold then align

A less widely used approach is to fold the sequences using single sequence structure prediction methods and align the resulting structures using tree-based metrics. [29] The fundamental weakness with this approach is that single sequence predictions are often inaccurate, thus all further analyses are affected.

Tertiary structure prediction

Once secondary structure of RNA is known, the next challenge is to predict tertiary structure. The biggest problem is to determine the structure of regions between double stranded helical regions. Also RNA molecules often contain posttranscriptionally modified nucleosides, which because of new possible non-canonical interactions, cause a lot of troubles for tertiary structure prediction. [30] [31] [32] [33]

The three-dimensional structure prediction methods can use comparative modeling which starts from a related known structure known as the template. [34] The alternative strategy is de novo modeling of RNA secondary structure [35] which uses physics-based principles such as molecular dynamics [36] or random sampling of the conformational landscape [37] followed by screening with a statistical potential for scoring. [38] These methods either use an all-atom representation [39] of the nucleic acid structure or a coarse-grained representation. [40] The low-resolution structures generated by many of these modeling methods are then subjected to high-resolution refinement. [41] Evaluations of standalone RNA 3D structure prediction methods indicate that machine learning-based methods effectively predict global RNA folds, while non-ML-based methods demonstrate higher precision in modeling intramolecular interactions and ligand binding sites. [42]

See also

Related Research Articles

<span class="mw-page-title-main">Sequence alignment</span> Process in bioinformatics that identifies equivalent sites within molecular sequences

In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are typically represented as rows within a matrix. Gaps are inserted between the residues so that identical or similar characters are aligned in successive columns. Sequence alignments are also used for non-biological sequences such as calculating the distance cost between strings in a natural language, or to display financial data.

In bioinformatics, sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of a wide range of analytical methods to understand its features, function, structure, or evolution. It can be performed on the entire genome, transcriptome or proteome of an organism, and can also involve only selected segments or regions, like tandem repeats and transposable elements. Methodologies used include sequence alignment, searches against biological databases, and others.

In theoretical linguistics and computational linguistics, probabilistic context free grammars (PCFGs) extend context-free grammars, similar to how hidden Markov models extend regular grammars. Each production is assigned a probability. The probability of a derivation (parse) is the product of the probabilities of the productions used in that derivation. These probabilities can be viewed as parameters of the model, and for large problems it is convenient to learn these parameters via machine learning. A probabilistic grammar's validity is constrained by context of its training dataset.

<span class="mw-page-title-main">Protein structure prediction</span> Type of biological prediction

Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is different from the inverse problem of protein design.

<span class="mw-page-title-main">Structural alignment</span> Aligning molecular sequences using sequence and structural information

Structural alignment attempts to establish homology between two or more polymer structures based on their shape and three-dimensional conformation. This process is usually applied to protein tertiary structures but can also be used for large RNA molecules. In contrast to simple structural superposition, where at least some equivalent residues of the two structures are known, structural alignment requires no a priori knowledge of equivalent positions. Structural alignment is a valuable tool for the comparison of proteins with low sequence similarity, where evolutionary relationships between proteins cannot be easily detected by standard sequence alignment techniques. Structural alignment can therefore be used to imply evolutionary relationships between proteins that share very little common sequence. However, caution should be used in using the results as evidence for shared evolutionary ancestry because of the possible confounding effects of convergent evolution by which multiple unrelated amino acid sequences converge on a common tertiary structure.

<span class="mw-page-title-main">Pseudoknot</span> Nucleic acid secondary structure

A pseudoknot is a nucleic acid secondary structure containing at least two stem-loop structures in which half of one stem is intercalated between the two halves of another stem. The pseudoknot was first recognized in the turnip yellow mosaic virus in 1982. Pseudoknots fold into knot-shaped three-dimensional conformations but are not true topological knots. These structures are categorized as cross (X) topology within the circuit topology framework, which, in contrast to knot theory, is a contact-based approach.

In molecular biology, protein threading, also known as fold recognition, is a method of protein modeling which is used to model those proteins which have the same fold as proteins of known structures, but do not have homologous proteins with known structure. It differs from the homology modeling method of structure prediction as it is used for proteins which do not have their homologous protein structures deposited in the Protein Data Bank (PDB), whereas homology modeling is used for those proteins which do. Threading works by using statistical knowledge of the relationship between the structures deposited in the PDB and the sequence of the protein which one wishes to model.

<span class="mw-page-title-main">Multiple sequence alignment</span> Alignment of more than two molecular sequences

Multiple sequence alignment (MSA) is the process or the result of sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. These alignments are used to infer evolutionary relationships via phylogenetic analysis and can highlight homologous features between sequences. Alignments highlight mutation events such as point mutations, insertion mutations and deletion mutations, and alignments are used to assess sequence conservation and infer the presence and activity of protein domains, tertiary structures, secondary structures, and individual amino acids or nucleotides.

Protein–protein interaction prediction is a field combining bioinformatics and structural biology in an attempt to identify and catalog physical interactions between pairs or groups of proteins. Understanding protein–protein interactions is important for the investigation of intracellular signaling pathways, modelling of protein complex structures and for gaining insights into various biochemical processes.

<span class="mw-page-title-main">Biomolecular structure</span> 3D conformation of a biological sequence, like DNA, RNA, proteins

Biomolecular structure is the intricate folded, three-dimensional shape that is formed by a molecule of protein, DNA, or RNA, and that is important to its function. The structure of these molecules may be considered at any of several length scales ranging from the level of individual atoms to the relationships among entire protein subunits. This useful distinction among scales is often expressed as a decomposition of molecular structure into four levels: primary, secondary, tertiary, and quaternary. The scaffold for this multiscale organization of the molecule arises at the secondary level, where the fundamental structural elements are the molecule's various hydrogen bonds. This leads to several recognizable domains of protein structure and nucleic acid structure, including such secondary-structure features as alpha helixes and beta sheets for proteins, and hairpin loops, bulges, and internal loops for nucleic acids. The terms primary, secondary, tertiary, and quaternary structure were introduced by Kaj Ulrik Linderstrøm-Lang in his 1951 Lane Medical Lectures at Stanford University.

Anders Krogh is a bioinformatician at the University of Copenhagen, where he leads the university's bioinformatics center. He is known for his pioneering work on the use of hidden Markov models in bioinformatics, and is co-author of a widely used textbook in bioinformatics. In addition, he also co-authored one of the early textbooks on neural networks. His current research interests include promoter analysis, non-coding RNA, gene prediction and protein structure prediction.

<span class="mw-page-title-main">Nucleic acid structure</span> Biomolecular structure of nucleic acids such as DNA and RNA

Nucleic acid structure refers to the structure of nucleic acids such as DNA and RNA. Chemically speaking, DNA and RNA are very similar. Nucleic acid structure is often divided into four different levels: primary, secondary, tertiary, and quaternary.

<span class="mw-page-title-main">Nucleic acid secondary structure</span>

Nucleic acid secondary structure is the basepairing interactions within a single nucleic acid polymer or between two polymers. It can be represented as a list of bases which are paired in a nucleic acid molecule. The secondary structures of biological DNAs and RNAs tend to be different: biological DNA mostly exists as fully base paired double helices, while biological RNA is single stranded and often forms complex and intricate base-pairing interactions due to its increased ability to form hydrogen bonds stemming from the extra hydroxyl group in the ribose sugar.

The 3' splice site of the influenza A virus segment 7 pre-mRNA can adopt two different types of RNA structure: a pseudoknot and a hairpin. This conformational switch is proposed to play a role in RNA alternative splicing and may influence the production of M1 and M2 proteins produced by splicing of this pre-mRNA.

<span class="mw-page-title-main">I-TASSER</span>

I-TASSER is a bioinformatics method for predicting three-dimensional structure model of protein molecules from amino acid sequences. It detects structure templates from the Protein Data Bank by a technique called fold recognition. The full-length structure models are constructed by reassembling structural fragments from threading templates using replica exchange Monte Carlo simulations. I-TASSER is one of the most successful protein structure prediction methods in the community-wide CASP experiments.

A neutral network is a set of genes all related by point mutations that have equivalent function or fitness. Each node represents a gene sequence and each line represents the mutation connecting two sequences. Neutral networks can be thought of as high, flat plateaus in a fitness landscape. During neutral evolution, genes can randomly move through neutral networks and traverse regions of sequence space which may have consequences for robustness and evolvability.

The ViennaRNA Package is software, a set of standalone programs and libraries used for predicting and analysing RNA nucleic acid secondary structures. The source code for the package is released as free and open-source software and compiled binaries are available for the operating systems Linux, macOS, and Windows. The original paper has been cited over 2,000 times.

Non-coding RNAs have been discovered using both experimental and bioinformatic approaches. Bioinformatic approaches can be divided into three main categories. The first involves homology search, although these techniques are by definition unable to find new classes of ncRNAs. The second category includes algorithms designed to discover specific types of ncRNAs that have similar properties. Finally, some discovery methods are based on very general properties of RNA, and are thus able to discover entirely new kinds of ncRNAs.

<span class="mw-page-title-main">Sfold</span> RNA secondary structure prediction and application software

Sfold is a software program developed to predict probable RNA secondary structures through structure ensemble sampling and centroid predictions with a focus on assessment of RNA target accessibility, for major applications to the rational design of siRNAs in the suppression of gene expressions, and to the identification of targets for regulatory RNAs particularly microRNAs.

References

  1. Ponce-Salvatierra, Almudena; Astha; Merdas, Katarzyna; Chandran, Nithin; Ghosh, Pritha; Mukherjee, Sunandan; Bujnicki, Janusz M (2019-01-22). "Computational modeling of RNA 3D structure based on experimental data". Bioscience Reports. 39 (2): BSR20180430. doi:10.1042/bsr20180430. ISSN   0144-8463. PMC   6367127 . PMID   30670629.
  2. Magnus, Marcin; Matelska, Dorota; Łach, Grzegorz; Chojnowski, Grzegorz; Boniecki, Michal J; Purta, Elzbieta; Dawson, Wayne; Dunin-Horkawicz, Stanislaw; Bujnicki, Janusz M (2014-04-23). "Computational modeling of RNA 3D structures, with the aid of experimental restraints". RNA Biology. 11 (5): 522–536. doi:10.4161/rna.28826. ISSN   1547-6286. PMC   4152360 . PMID   24785264.
  3. 1 2 3 Mathews D.H. (2006). "Revolutions in RNA secondary structure prediction". J. Mol. Biol. 359 (3): 526–532. doi:10.1016/j.jmb.2006.01.067. PMID   16500677.
  4. Mathews DH, Disney MD, Childs JL, Schroeder SJ, Zuker M, Turner DH (2004). "Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure". Proceedings of the National Academy of Sciences USA. 101 (19): 7287–7292. Bibcode:2004PNAS..101.7287M. doi: 10.1073/pnas.0401799101 . PMC   409911 . PMID   15123812.
  5. 1 2 Mathews DH, Sabina J, Zuker M, Turner DH (1999). "Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure". J Mol Biol. 288 (5): 911–40. doi: 10.1006/jmbi.1999.2700 . PMID   10329189. S2CID   19989405.
  6. Zuker M.; Sankoff D. (1984). "RNA secondary structures and their prediction". Bull. Math. Biol. 46 (4): 591–621. doi:10.1016/s0092-8240(84)80062-2 (inactive 1 November 2024). S2CID   189885784.{{cite journal}}: CS1 maint: DOI inactive as of November 2024 (link)
  7. 1 2 3 4 Nussinov R, Piecznik G, Grigg JR and Kleitman DJ (1978) Algorithms for loop matchings. SIAM Journal on Applied Mathematics.
  8. 1 2 Nussinov R, Jacobson AB (1980). "Fast algorithm for predicting the secondary structure of single-stranded RNA". Proc Natl Acad Sci U S A. 77 (11): 6309–13. Bibcode:1980PNAS...77.6309N. doi: 10.1073/pnas.77.11.6309 . PMC   350273 . PMID   6161375.
  9. Zuker M, Stiegler P (1981). "Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information". Nucleic Acids Res. 9 (1): 133–48. doi:10.1093/nar/9.1.133. PMC   326673 . PMID   6163133.
  10. 1 2 Rivas E, Eddy SR (1999). "A dynamic programming algorithm for RNA structure prediction including pseudoknots". J Mol Biol. 285 (5): 2053–68. arXiv: physics/9807048 . doi:10.1006/jmbi.1998.2436. PMID   9925784. S2CID   2228845.
  11. Zuker M (2003). "Mfold web server for nucleic acid folding and hybridization prediction". Nucleic Acids Research. 31 (13): 3406–3415. doi:10.1093/nar/gkg595. PMC   169194 . PMID   12824337.
  12. Reeder J.; Giegerich R. (2004). "Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics". BMC Bioinformatics. 5: 104. doi: 10.1186/1471-2105-5-104 . PMC   514697 . PMID   15294028.
  13. McCaskill JS (1990). "The equilibrium partition function and base pair binding probabilities for RNA secondary structure". Biopolymers. 29 (6–7): 1105–19. doi:10.1002/bip.360290621. hdl: 11858/00-001M-0000-0013-0DE3-9 . PMID   1695107. S2CID   12629688.
  14. 1 2 Ding Y, Lawrence CE (2003). "A statistical sampling algorithm for RNA secondary structure prediction". Nucleic Acids Res. 31 (24): 7280–301. doi:10.1093/nar/gkg938. PMC   297010 . PMID   14654704.
  15. Lyngsø RB, Pedersen CN (2000). "RNA pseudoknot prediction in energy-based models". J Comput Biol. 7 (3–4): 409–427. CiteSeerX   10.1.1.34.4044 . doi:10.1089/106652700750050862. PMID   11108471.
  16. Gardner P.P.; Giegerich, Robert (2004). "A comprehensive comparison of comparative RNA structure prediction approaches". BMC Bioinformatics. 5: 140. doi: 10.1186/1471-2105-5-140 . PMC   526219 . PMID   15458580.
  17. 1 2 Hofacker IL, Fekete M, Stadler PF (2002). "Secondary structure prediction for aligned RNA sequences". J Mol Biol. 319 (5): 1059–66. CiteSeerX   10.1.1.73.479 . doi:10.1016/S0022-2836(02)00308-X. PMID   12079347.
  18. Knudsen B, Hein J (2003). "Pfold: RNA secondary structure prediction using stochastic context-free grammars". Nucleic Acids Res. 31 (13): 3423–8. doi:10.1093/nar/gkg614. PMC   169020 . PMID   12824339.
  19. Ruan, J., Stormo, G.D. & Zhang, W. (2004) ILM: a web server for predicting RNA secondary structures with pseudoknots. Nucleic Acids Research, 32(Web Server issue), W146-149.
  20. Bernhart SH, Hofacker IL (2009). "From consensus structure prediction to RNA gene finding". Brief Funct Genomic Proteomic. 8 (6): 461–71. doi: 10.1093/bfgp/elp043 . PMID   19833701.
  21. Sankoff D (1985). "Simultaneous solution of the RNA folding, alignment and protosequence problems". SIAM Journal on Applied Mathematics. 45 (5): 810–825. CiteSeerX   10.1.1.665.4890 . doi:10.1137/0145048.
  22. 1 2 Hofacker IL, Bernhart SH, Stadler PF (2004). "Alignment of RNA base pairing probability matrices". Bioinformatics. 20 (14): 2222–7. doi: 10.1093/bioinformatics/bth229 . PMID   15073017.
  23. Havgaard JH, Lyngso RB, Stormo GD, Gorodkin J (2005). "Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%". Bioinformatics. 21 (9): 1815–24. doi: 10.1093/bioinformatics/bti279 . PMID   15657094.
  24. Torarinsson E, Havgaard JH, Gorodkin J. (2007) Multiple structural alignment and clustering of RNA sequences. Bioinformatics.
  25. Mathews DH, Turner DH (2002). "Dynalign: an algorithm for finding the secondary structure common to two RNA sequences". J Mol Biol. 317 (2): 191–203. doi:10.1006/jmbi.2001.5351. PMID   11902836.
  26. Harmanci AO, Sharma G, Mathews DH, (2007), Efficient Pairwise RNA Structure Prediction Using Probabilistic Alignment Constraints in Dynalign, BMC Bioinformatics, 8(130).
  27. Holmes I. (2005) Accelerated probabilistic inference of RNA structure evolution. BMC Bioinformatics. 2005 Mar 24;6:73.
  28. Kiryu H, Tabei Y, Kin T, Asai K (2007). "Murlet: A practical multiple alignment tool for structural RNA sequences". Bioinformatics. 23 (13): 1588–1598. doi: 10.1093/bioinformatics/btm146 . PMID   17459961.
  29. Shapiro BA and Zhang K (1990) Comparing Multiple RNA Secondary Structures Using Tree Comparisons Computer Applications in the Biosciences, vol. 6, no. 4, pp. 309–318.
  30. Shapiro BA, Yingling YG, Kasprzak W, Bindewald E. (2007) Bridging the gap in RNA structure prediction. Curr Opin Struct Biol.
  31. Major F, Turcotte M, Gautheret D, Lapalme G, Fillion E, Cedergren R (Sep 1991). "The combination of symbolic and numerical computation for three-dimensional modeling of RNA". Science. 253 (5025): 1255–60. Bibcode:1991Sci...253.1255F. doi:10.1126/science.1716375. PMID   1716375.
  32. Major F, Gautheret D, Cedergren R (Oct 1993). "Reproducing the three-dimensional structure of a tRNA molecule from structural constraints". Proc Natl Acad Sci U S A. 90 (20): 9408–12. Bibcode:1993PNAS...90.9408M. doi: 10.1073/pnas.90.20.9408 . PMC   47577 . PMID   8415714.
  33. Frellsen J, Moltke I, Thiim M, Mardia KV, Ferkinghoff-Borg J, Hamelryck T (2009). "A probabilistic model of RNA conformational space". PLOS Comput Biol. 5 (6): e1000406. Bibcode:2009PLSCB...5E0406F. doi: 10.1371/journal.pcbi.1000406 . PMC   2691987 . PMID   19543381.
  34. Rother, Magdalena; Rother, Kristian; Puton, Tomasz; Bujnicki, Janusz M. (2011-02-07). "ModeRNA: a tool for comparative modeling of RNA 3D structure". Nucleic Acids Research. 39 (10): 4007–4022. doi:10.1093/nar/gkq1320. ISSN   1362-4962. PMC   3105415 . PMID   21300639.
  35. Neocles B Leontis; Eric Westhof, eds. (2012). RNA 3D structure analysis and prediction. Springer. ISBN   9783642257407. OCLC   795570014.
  36. Vangaveti, Sweta; Ranganathan, Srivathsan V.; Chen, Alan A. (2016-10-04). "Advances in RNA molecular dynamics: a simulator's guide to RNA force fields". Wiley Interdisciplinary Reviews: RNA. 8 (2): e1396. doi:10.1002/wrna.1396. ISSN   1757-7004. PMID   27704698. S2CID   35501632.
  37. Chen, Shi-Jie (June 2008). "RNA Folding: Conformational Statistics, Folding Kinetics, and Ion Electrostatics". Annual Review of Biophysics. 37 (1): 197–214. doi:10.1146/annurev.biophys.37.032807.125957. ISSN   1936-122X. PMC   2473866 . PMID   18573079.
  38. Laing, Christian; Schlick, Tamar (June 2011). "Computational approaches to RNA structure prediction, analysis, and design". Current Opinion in Structural Biology. 21 (3): 306–318. doi:10.1016/j.sbi.2011.03.015. ISSN   0959-440X. PMC   3112238 . PMID   21514143.
  39. Zhao, Chenhan; Xu, Xiaojun; Chen, Shi-Jie (2017), "Predicting RNA Structure with Vfold", Functional Genomics, Methods in Molecular Biology, vol. 1654, Springer New York, pp. 3–15, doi:10.1007/978-1-4939-7231-9_1, ISBN   9781493972302, PMC   5762135 , PMID   28986779
  40. Boniecki, Michal J.; Lach, Grzegorz; Dawson, Wayne K.; Tomala, Konrad; Lukasz, Pawel; Soltysinski, Tomasz; Rother, Kristian M.; Bujnicki, Janusz M. (2015-12-19). "SimRNA: a coarse-grained method for RNA folding simulations and 3D structure prediction". Nucleic Acids Research. 44 (7): e63. doi:10.1093/nar/gkv1479. ISSN   0305-1048. PMC   4838351 . PMID   26687716.
  41. Stasiewicz, Juliusz; Mukherjee, Sunandan; Nithin, Chandran; Bujnicki, Janusz M. (2019-03-21). "QRNAS: software tool for refinement of nucleic acid structures". BMC Structural Biology. 19 (1): 5. doi: 10.1186/s12900-019-0103-1 . ISSN   1472-6807. PMC   6429776 . PMID   30898165.
  42. Nithin, Chandran; Kmiecik, Sebastian; Błaszczyk, Roman; Nowicka, Julita; Tuszyńska, Irina (2024-06-25). "Comparative analysis of RNA 3D structure prediction methods: towards enhanced modeling of RNA–ligand interactions". Nucleic Acids Research. 52 (13): 7465–7486. doi: 10.1093/nar/gkae541 . ISSN   0305-1048. PMC   11260495 . PMID   38917327.

Further reading