ViennaRNA Package

Last updated
ViennaRNA Package
Original author(s) Hofacker et al.,
Developer(s) Institut für theoretische Chemie, Währingerstr
Stable release
v2.4.17 / 25 November 2020;3 years ago (2020-11-25)
Written in C, Perl
Operating system Linux, macOS, Windows
Size 13.4 MB (Source)
Type Bioinformatics
Website www.tbi.univie.ac.at/RNA/

The ViennaRNA Package is a set of standalone programs and libraries used for prediction and analysis of RNA secondary structures. [1] The source code for the package is distributed freely and compiled binaries are available for Linux, macOS and Windows platforms. The original paper has been cited over 2000 times.

Contents

Background

The three dimensional structure of biological macromolecules like proteins and nucleic acids play critical role in determining their functional role. [2] This process of decoding function from the sequence is an experimentally and computationally challenging question addressed widely. [3] [4] RNA structures form complex secondary and tertiary structures compared to DNA which form duplexes with full complementarity between two strands. This is partially because the extra oxygen in RNA increases the propensity for hydrogen bonding in the nucleic acid backbone. The base pairing and base stacking interactions of RNA play critical role in formation of ribosome, spliceosome, or tRNA.

Secondary structure prediction is commonly done using approaches like dynamic programming, energy minimisation (for most stable structure) and generating suboptimal structures. A large number of structure prediction tools have been implemented as well.

Development

The first version of the ViennaRNA Package was published by Hofacker et al. in 1994. [1] The package distributed tools to compute either minimum free energy structures or partition functions of RNA molecules; both using the idea of dynamic programming. Non-thermodynamic criterion like formation of maximum matching or various versions of kinetic folding along with an inverse folding heuristic to determine structurally neutral sequences were implemented. Additionally, the package also contained a statistics suite with routines for cluster analysis, statistical geometry, and split decomposition.

The package was made available as library and a set of standalone routines.

Version 2.0

A number of major systemic changes were introduced in this version with the use of a new parametrized energy model (Turner 2004), [5] restructuring of the RNAlib to support concurrent computations in thread-safe manner, improvements to the API, and inclusion of several new auxiliary tools. For example, tools to assess RNA-RNA interactions and restricted ensembles of structures. Furthermore, other features included additional output information such as centroid structures and maximum expected accuracy structures derived from base pairing probabilities, or z-scores for locally stable secondary structures, and support for input in FASTA format. The updates, however, are compatible with earlier versions without affecting the computational efficiency of the core algorithms. [6]

Web server

The tools provided by the ViennaRNA Package are also available for public use through a web interface. [7] [8]

Tools

In addition to prediction and analysis tools, the ViennaRNA Package contains several scripts and utilities for plotting and input-output processing. A summary of the available programs is collected in the table below (an exhaustive list with examples can be found in the official documentation). [9]

ProgramDescription
AnalyseDistsAnalyse a distance matrix
AnalyseSeqsAnalyse a set of sequences of common length
KinfoldSimulate kinetic folding of RNA secondary structures
RNA2DfoldCompute MFE structure, partition function and representative sample structures of k,l neighborhoods
RNAaliduplexPredict conserved RNA-RNA interactions between two alignments
RNAalifoldCalculate secondary structures for a set of aligned RNA sequences
RNAcofoldCalculate secondary structures of two RNAs with dimerization
RNAdistanceCalculate distances between RNA secondary structures
RNAduplexCompute the structure upon hybridization of two RNA strands
RNAevalEvaluate free energy of RNA sequences with given secondary structure
RNAfoldCalculate minimum free energy secondary structures and partition function of RNAs
RNAforesterCompare RNA secondary structures via forest alignment
RNAheatCalculate the specific heat (melting curve) of an RNA sequence
RNAinverseFind RNA sequences with given secondary structure (sequence design)
RNALalifoldCalculate locally stable secondary structures for a set of aligned RNAs
RNALfoldCalculate locally stable secondary structures of long RNAs
RNApalnRNA alignment based on sequence base pairing propensities
RNApdistCalculate distances between thermodynamic RNA secondary structures ensembles
RNAparconvConvert energy parameter files from ViennaRNA 1.8 to 2.0 format
RNAPKplexPredict RNA secondary structures including pseudoknots
RNAplexFind targets of a query RNA
RNAplfoldCalculate average pair probabilities for locally stable secondary structures
RNAplotDraw RNA Secondary Structures in PostScript, SVG, or GML
RNAsnoopFind targets of a query H/ACA snoRNA
RNAsuboptCalculate suboptimal secondary structures of RNAs
RNAupCalculate the thermodynamics of RNA-RNA interactions

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.

Grammar theory to model symbol strings originated from work in computational linguistics aiming to understand the structure of natural languages. Probabilistic context free grammars (PCFGs) have been applied in probabilistic modeling of RNA structures almost 40 years after they were introduced in computational linguistics.

<span class="mw-page-title-main">Structural alignment</span> Aligning molecular sequences using sequence and structural information

Structural alignment attempts to establish homology between two or more polymer structures based on their shape and three-dimensional conformation. This process is usually applied to protein tertiary structures but can also be used for large RNA molecules. In contrast to simple structural superposition, where at least some equivalent residues of the two structures are known, structural alignment requires no a priori knowledge of equivalent positions. Structural alignment is a valuable tool for the comparison of proteins with low sequence similarity, where evolutionary relationships between proteins cannot be easily detected by standard sequence alignment techniques. Structural alignment can therefore be used to imply evolutionary relationships between proteins that share very little common sequence. However, caution should be used in using the results as evidence for shared evolutionary ancestry because of the possible confounding effects of convergent evolution by which multiple unrelated amino acid sequences converge on a common tertiary structure.

In computational biology, gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes as well as RNA genes, but may also include prediction of other functional elements such as regulatory regions. Gene finding is one of the first and most important steps in understanding the genome of a species once it has been sequenced.

<span class="mw-page-title-main">Pseudoknot</span> Nucleic acid secondary structure

A pseudoknot is a nucleic acid secondary structure containing at least two stem-loop structures in which half of one stem is intercalated between the two halves of another stem. The pseudoknot was first recognized in the turnip yellow mosaic virus in 1982. Pseudoknots fold into knot-shaped three-dimensional conformations but are not true topological knots. These structures are categorized as cross (X) topology within the circuit topology framework, which, in contrast to knot theory, is a contact-based approach.

Nucleic acid structure prediction is a computational method to determine secondary and tertiary nucleic acid structure from its sequence. Secondary structure can be predicted from one or several nucleic acid sequences. Tertiary structure can be predicted from the sequence, or by comparative modeling.

Rfam is a database containing information about non-coding RNA (ncRNA) families and other structured RNA elements. It is an annotated, open access database originally developed at the Wellcome Trust Sanger Institute in collaboration with Janelia Farm, and currently hosted at the European Bioinformatics Institute. Rfam is designed to be similar to the Pfam database for annotating protein families.

<span class="mw-page-title-main">Nucleic acid design</span>

Nucleic acid design is the process of generating a set of nucleic acid base sequences that will associate into a desired conformation. Nucleic acid design is central to the fields of DNA nanotechnology and DNA computing. It is necessary because there are many possible sequences of nucleic acid strands that will fold into a given secondary structure, but many of these sequences will have undesired additional interactions which must be avoided. In addition, there are many tertiary structure considerations which affect the choice of a secondary structure for a given design.

<span class="mw-page-title-main">Nucleic acid secondary structure</span>

Nucleic acid secondary structure is the basepairing interactions within a single nucleic acid polymer or between two polymers. It can be represented as a list of bases which are paired in a nucleic acid molecule. The secondary structures of biological DNAs and RNAs tend to be different: biological DNA mostly exists as fully base paired double helices, while biological RNA is single stranded and often forms complex and intricate base-pairing interactions due to its increased ability to form hydrogen bonds stemming from the extra hydroxyl group in the ribose sugar.

Protein function prediction methods are techniques that bioinformatics researchers use to assign biological or biochemical roles to proteins. These proteins are usually ones that are poorly studied or predicted based on genomic sequence data. These predictions are often driven by data-intensive computational procedures. Information may come from nucleic acid sequence homology, gene expression profiles, protein domain structures, text mining of publications, phylogenetic profiles, phenotypic profiles, and protein-protein interaction. Protein function is a broad term: the roles of proteins range from catalysis of biochemical reactions to transport to signal transduction, and a single protein may play a role in multiple processes or cellular pathways.

αr9 is a family of bacterial small non-coding RNAs with representatives in a broad group of α-proteobacteria from the order Hyphomicrobiales. The first member of this family (Smr9C) was found in a Sinorhizobium meliloti 1021 locus located in the chromosome (C). Further homology and structure conservation analysis have identified full-length Smr9C homologs in several nitrogen-fixing symbiotic rhizobia, in the plant pathogens belonging to Agrobacterium species as well as in a broad spectrum of Brucella species. αr9C RNA species are 144-158 nt long and share a well defined common secondary structure consisting of seven conserved regions. Most of the αr9 transcripts can be catalogued as trans-acting sRNAs expressed from well-defined promoter regions of independent transcription units within intergenic regions (IGRs) of the α-proteobacterial genomes.

αr45 is a family of bacterial small non-coding RNAs with representatives in a broad group of α-proteobacteria from the order Hyphomicrobiales. The first member of this family (Smr45C) was found in a Sinorhizobium meliloti 1021 locus located in the chromosome (C). Further homology and structure conservation analysis identified homologs in several nitrogen-fixing symbiotic rhizobia, in the plant pathogens belonging to Agrobacterium species as well as in a broad spectrum of Brucella species, in Bartonella species, in several members of the Xanthobactereacea family, and in some representatives of the Beijerinckiaceae family. αr45C RNA species are 147-153 nt long and share a well defined common secondary structure. All of the αr45 transcripts can be catalogued as trans-acting sRNAs expressed from well-defined promoter regions of independent transcription units within intergenic regions (IGRs) of the α-proteobacterial genomes.

<span class="mw-page-title-main">I-TASSER</span>

I-TASSER is a bioinformatics method for predicting three-dimensional structure model of protein molecules from amino acid sequences. It detects structure templates from the Protein Data Bank by a technique called fold recognition. The full-length structure models are constructed by reassembling structural fragments from threading templates using replica exchange Monte Carlo simulations. I-TASSER is one of the most successful protein structure prediction methods in the community-wide CASP experiments.

A neutral network is a set of genes all related by point mutations that have equivalent function or fitness. Each node represents a gene sequence and each line represents the mutation connecting two sequences. Neutral networks can be thought of as high, flat plateaus in a fitness landscape. During neutral evolution, genes can randomly move through neutral networks and traverse regions of sequence space which may have consequences for robustness and evolvability.

<span class="mw-page-title-main">Robert Dirks</span> American computational chemist killed in 2015 train wreck

Robert Dirks was an American chemist known for his theoretical and experimental work in DNA nanotechnology. Born in Thailand to a Thai Chinese mother and American father, he moved to Spokane, Washington at a young age. Dirks was the first graduate student in Niles Pierce's research group at the California Institute of Technology, where his dissertation work was on algorithms and computational tools to analyze nucleic acid thermodynamics and predict their structure. He also performed experimental work developing a biochemical chain reaction to self-assemble nucleic acid devices. Dirks later worked at D. E. Shaw Research on algorithms for protein folding that could be used to design new pharmaceuticals.

<span class="mw-page-title-main">Machine learning in bioinformatics</span>

Machine learning in bioinformatics is the application of machine learning algorithms to bioinformatics, including genomics, proteomics, microarrays, systems biology, evolution, and text mining.

Non-coding RNAs have been discovered using both experimental and bioinformatic approaches. Bioinformatic approaches can be divided into three main categories. The first involves homology search, although these techniques are by definition unable to find new classes of ncRNAs. The second category includes algorithms designed to discover specific types of ncRNAs that have similar properties. Finally, some discovery methods are based on very general properties of RNA, and are thus able to discover entirely new kinds of ncRNAs.

DIMPL is a bioinformatic pipeline that enables the extraction and selection of bacterial GC-rich intergenic regions (IGRs) that are enriched for structured non-coding RNAs (ncRNAs). The method of enriching bacterial IGRs for ncRNA motif discovery was first reported for a study in "Genome-wide discovery of structured noncoding RNAs in bacteria".

References

  1. 1 2 Hofacker, I. L.; Fontana, W.; Stadler, P. F.; Bonhoeffer, L. S.; Tacker, M.; Schuster, P. (1994-02-01). "Fast folding and comparison of RNA secondary structures". Monatshefte für Chemie. 125 (2): 167–188. doi:10.1007/BF00818163. ISSN   0026-9247. S2CID   19344304.
  2. Vella, F. (1992). "Introduction to Protein Structure". Biochemical Education. 20 (2): 122. doi:10.1016/0307-4412(92)90132-6.
  3. Whisstock, James C.; Lesk, Arthur M. (2003-08-01). "Prediction of protein function from protein sequence and structure". Quarterly Reviews of Biophysics. 36 (3): 307–340. doi:10.1017/S0033583503003901. ISSN   1469-8994. PMID   15029827. S2CID   27123114.
  4. Lee, David; Redfern, Oliver; Orengo, Christine (2007). "Predicting protein function from sequence and structure". Nature Reviews Molecular Cell Biology. 8 (12): 995–1005. doi:10.1038/nrm2281. PMID   18037900. S2CID   14432468.
  5. Mathews, David H.; Disney, Matthew D.; Childs, Jessica L.; Schroeder, Susan J.; Zuker, Michael; Turner, Douglas H. (2004-05-11). "Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure". Proceedings of the National Academy of Sciences of the United States of America. 101 (19): 7287–7292. Bibcode:2004PNAS..101.7287M. doi: 10.1073/pnas.0401799101 . ISSN   0027-8424. PMC   409911 . PMID   15123812.
  6. Lorenz, Ronny; Bernhart, Stephan H; Siederdissen, Christian Höner zu; Tafer, Hakim; Flamm, Christoph; Stadler, Peter F; Hofacker, Ivo L (2011-11-24). "ViennaRNA Package 2.0". Algorithms for Molecular Biology. 6 (1): 26. doi: 10.1186/1748-7188-6-26 . PMC   3319429 . PMID   22115189.
  7. Gruber, Andreas R.; Lorenz, Ronny; Bernhart, Stephan H.; Neuböck, Richard; Hofacker, Ivo L. (2008-07-01). "The Vienna RNA Websuite". Nucleic Acids Research. 36 (suppl 2): W70–W74. doi:10.1093/nar/gkn188. ISSN   0305-1048. PMC   2447809 . PMID   18424795.
  8. Hofacker, Ivo L. (2003-07-01). "Vienna RNA secondary structure server". Nucleic Acids Research. 31 (13): 3429–3431. doi:10.1093/nar/gkg599. ISSN   0305-1048. PMC   169005 . PMID   12824340.
  9. "TBI - ViennaRNA Package 2". www.tbi.univie.ac.at. Retrieved 2016-01-11.

See also