NUPACK

Last updated
NUPACK
Nupack logo small.png
Created byThe NUPACK Team at Caltech
URL www.nupack.org
CommercialNo
RegistrationOptional

The Nucleic Acid Package (NUPACK) is a growing software suite for the analysis and design of nucleic acid systems. [1] Jobs can be run online on the NUPACK webserver or NUPACK source code can be downloaded and compiled locally for non-commercial academic use. [2] NUPACK algorithms are formulated in terms of nucleic acid secondary structure. In most cases, pseudoknots are excluded from the structural ensemble.

Contents

Secondary structure model

An example secondary structure drawing (left) and the corresponding polymer graph (right). Backbones are represented by thick colored lines and bases and base pairs are represented by thin black lines. Secondary structure polymer graph.png
An example secondary structure drawing (left) and the corresponding polymer graph (right). Backbones are represented by thick colored lines and bases and base pairs are represented by thin black lines.

The secondary structure of multiple interacting strands is defined by a list of base pairs. [3] A polymer graph for a secondary structure can be constructed by ordering the strands around a circle, drawing the backbones in succession from 5’ to 3’ around the circumference with a nick between each strand, and drawing straight lines connecting paired bases. A secondary structure is pseudoknotted if every strand ordering corresponds to a polymer graph with crossing lines. A secondary structure is connected if no subset of the strands is free of the others. Algorithms are formulated in terms of ordered complexes, each corresponding to the structural ensemble of all connected polymer graphs with no crossing lines for a particular ordering of a set of strands. The free energy of an unpseudoknotted secondary structure is calculated using nearest-neighbor empirical parameters for RNA in 1M Na+ [4] [5] or for DNA in user-specified Na+ and Mg++ concentrations; [6] [7] [8] added parameters are employed for the analysis of pseudoknots (single RNA strands only). [9] [10]

Web server

Analysis

The Analysis page allows users to analyze the thermodynamic properties of a dilute solution of interacting nucleic acid strands in the absence of pseudoknots (e.g., a test tube of DNA or RNA strand species). [1] [3] For a dilute solution containing multiple strand species interacting to form multiple species of ordered complexes, NUPACK calculates for each ordered complex:

including rigorous treatment of distinguishability issues that arise in the multi-stranded setting.

Design

The Design page allows users to design sequences for one or more strands intended to adopt an unpseudoknotted target secondary structure at equilibrium. [1] Sequence design is formulated as an optimization problem with the goal of reducing the ensemble defect below a user-specified stop condition. [11] For a candidate sequence and a given target secondary structure, the ensemble defect is the average number of incorrectly paired over the structural ensemble of the ordered complex. [12] For a target secondary structure with N nucleotides, the algorithm seeks to achieve an ensemble defect below N/100. Empirically, the design algorithm exhibits asymptotic optimality as N increases: for sufficiently large N, the cost of sequence design is typically only 4/3 the cost of a single evaluation of the ensemble defect. [11]

Utilities

The Utilities page allows users to evaluate, display, and annotate the equilibrium properties of a complex of interacting nucleic acid strands. [1] The page accepts as input either sequence information, structure information, or both, performing diverse functions based on the information provided, including automatic layout and rendering of secondary structures with or without ideal helical geometry. In either case, the structure layout can be edited dynamically within the web application.

The Utilities page enables depicting secondary structures with ideal helical geometry for stacked base pairs, as for this complex of three RNA strands with A-form helices (left) or three DNA strands with B-form helices (right). Nupack secondary structure helices.png
The Utilities page enables depicting secondary structures with ideal helical geometry for stacked base pairs, as for this complex of three RNA strands with A-form helices (left) or three DNA strands with B-form helices (right).

Implementation

The NUPACK web application [1] is programmed within the Ruby on Rails framework, employing Ajax and the Dojo Toolkit to implement dynamic features and interactive graphics. Plots and graphics are generated using NumPy and matplotlib. The site is supported on current versions of the web browsers Safari, Chrome, and Firefox. The NUPACK library of analysis and design algorithms is written in the programming language C. Dynamic programs are parallelized using Message Passing Interface (MPI).

Terms of use

The NUPACK web server and NUPACK source code are provided for non-commercial research purposes and is with this restriction not Free and open source software.

Funding

NUPACK development is funded by the National Science Foundation via the Molecular Programming Project [13] and by the Beckman Institute [14] at the California Institute of Technology (Caltech).

See also

Related Research Articles

Denaturation (biochemistry)

Denaturation is a process in which proteins or nucleic acids lose the quaternary structure, tertiary structure, and secondary structure which is present in their native state, by application of some external stress or compound such as a strong acid or base, a concentrated inorganic salt, an organic solvent, radiation or heat. If proteins in a living cell are denatured, this results in disruption of cell activity and possibly cell death. Protein denaturation is also a consequence of cell death. Denatured proteins can exhibit a wide range of characteristics, from conformational change and loss of solubility to aggregation due to the exposure of hydrophobic groups. Denatured proteins lose their 3D structure and therefore cannot function.

Grammar theory to model symbol strings originated from work in computational linguistics aiming to understand the structure of natural languages. Probabilistic context free grammars (PCFGs) have been applied in probabilistic modeling of RNA structures almost 40 years after they were introduced in computational linguistics.

Nucleic acid sequence Succession of nucleotides in a nucleic acid

A nucleic acid sequence is a succession of bases signified by a series of a set of five different letters that indicate the order of nucleotides forming alleles within a DNA or RNA (GACU) molecule. By convention, sequences are usually presented from the 5' end to the 3' end. For DNA, the sense strand is used. Because nucleic acids are normally linear (unbranched) polymers, specifying the sequence is equivalent to defining the covalent structure of the entire molecule. For this reason, the nucleic acid sequence is also termed the primary structure.

Pseudoknot Nucleic acid secondary structure

A pseudoknot is a nucleic acid secondary structure containing at least two stem-loop structures in which half of one stem is intercalated between the two halves of another stem. The pseudoknot was first recognized in the turnip yellow mosaic virus in 1982. Pseudoknots fold into knot-shaped three-dimensional conformations but are not true topological knots.

Biomolecular structure

Biomolecular structure is the intricate folded, three-dimensional shape that is formed by a molecule of protein, DNA, or RNA, and that is important to its function. The structure of these molecules may be considered at any of several length scales ranging from the level of individual atoms to the relationships among entire protein subunits. This useful distinction among scales is often expressed as a decomposition of molecular structure into four levels: primary, secondary, tertiary, and quaternary. The scaffold for this multiscale organization of the molecule arises at the secondary level, where the fundamental structural elements are the molecule's various hydrogen bonds. This leads to several recognizable domains of protein structure and nucleic acid structure, including such secondary-structure features as alpha helixes and beta sheets for proteins, and hairpin loops, bulges, and internal loops for nucleic acids. The terms primary, secondary, tertiary, and quaternary structure were introduced by Kaj Ulrik Linderstrøm-Lang in his 1951 Lane Medical Lectures at Stanford University.

Nucleic acid structure prediction is a computational method to determine secondary and tertiary nucleic acid structure from its sequence. Secondary structure can be predicted from one or several nucleic acid sequences. Tertiary structure can be predicted from the sequence, or by comparative modeling.

Nucleic acid thermodynamics is the study of how temperature affects the nucleic acid structure of double-stranded DNA (dsDNA). The melting temperature (Tm) is defined as the temperature at which half of the DNA strands are in the random coil or single-stranded (ssDNA) state. Tm depends on the length of the DNA molecule and its specific nucleotide sequence. DNA, when in a state where its two strands are dissociated, is referred to as having been denatured by the high temperature.

Ribosomal frameshifting, also known as translational frameshifting or translational recoding, is a biological phenomenon that occurs during translation that results in the production of multiple, unique proteins from a single mRNA. The process can be programmed by the nucleotide sequence of the mRNA and is sometimes affected by the secondary, 3-dimensional mRNA structure. It has been described mainly in viruses, retrotransposons and bacterial insertion elements, and also in some cellular genes.

Nucleic acid design

Nucleic acid design is the process of generating a set of nucleic acid base sequences that will associate into a desired conformation. Nucleic acid design is central to the fields of DNA nanotechnology and DNA computing. It is necessary because there are many possible sequences of nucleic acid strands that will fold into a given secondary structure, but many of these sequences will have undesired additional interactions which must be avoided. In addition, there are many tertiary structure considerations which affect the choice of a secondary structure for a given design.

taveRNA is a software suite for RNA/DNA secondary structure. It is developed in the laboratories for computational biology of the School of Computing Science at the Simon Fraser University. The suite is composed by alteRNA, for RNA density fold computing, inteRNA, for RNA-RNA interaction prediction, piRNA, for predicting the joint partition function, equilibrium concentration, ensemble energy, and melting temperature for two RNA sequences, pRuNA, a sequence based pruning RNA interaction search engine, and smyRNA, a platform independent C program novel ab initio ncRNA finder.

Nucleic acid tertiary structure

Nucleic acid tertiary structure is the three-dimensional shape of a nucleic acid polymer. RNA and DNA molecules are capable of diverse functions ranging from molecular recognition to catalysis. Such functions require a precise three-dimensional tertiary structure. While such structures are diverse and seemingly complex, they are composed of recurring, easily recognizable tertiary structure motifs that serve as molecular building blocks. Some of the most common motifs for RNA and DNA tertiary structure are described below, but this information is based on a limited number of solved structures. Many more tertiary structural motifs will be revealed as new RNA and DNA molecules are structurally characterized.

Nucleic acid structure

Nucleic acid structure refers to the structure of nucleic acids such as DNA and RNA. Chemically speaking, DNA and RNA are very similar. Nucleic acid structure is often divided into four different levels: primary, secondary, tertiary, and quaternary.

DNA nanotechnology The design and manufacture of artificial nucleic acid structures for technological uses

DNA nanotechnology is the design and manufacture of artificial nucleic acid structures for technological uses. In this field, nucleic acids are used as non-biological engineering materials for nanotechnology rather than as the carriers of genetic information in living cells. Researchers in the field have created static structures such as two- and three-dimensional crystal lattices, nanotubes, polyhedra, and arbitrary shapes, and functional devices such as molecular machines and DNA computers. The field is beginning to be used as a tool to solve basic science problems in structural biology and biophysics, including applications in X-ray crystallography and nuclear magnetic resonance spectroscopy of proteins to determine structures. Potential applications in molecular scale electronics and nanomedicine are also being investigated.

Nucleic acid secondary structure

Nucleic acid secondary structure is the basepairing interactions within a single nucleic acid polymer or between two polymers. It can be represented as a list of bases which are paired in a nucleic acid molecule. The secondary structures of biological DNA's and RNA's tend to be different: biological DNA mostly exists as fully base paired double helices, while biological RNA is single stranded and often forms complex and intricate base-pairing interactions due to its increased ability to form hydrogen bonds stemming from the extra hydroxyl group in the ribose sugar.

Niles A. Pierce is an American mathematician, bioengineer, and professor at the California Institute of Technology. He is a leading researcher in the fields of molecular programming and dynamic nucleic acid nanotechnology. His research is focused on kinetically controlled DNA and RNA self-assembly. Pierce is working on applications in bioimaging.

The 3' splice site of the influenza A virus segment 7 pre-mRNA can adopt two different types of RNA structure: a pseudoknot and a hairpin. This conformational switch is proposed to play a role in RNA alternative splicing and may influence the production of M1 and M2 proteins produced by splicing of this pre-mRNA.

A neutral network is a set of genes all related by point mutations that have equivalent function or fitness. Each node represents a gene sequence and each line represents the mutation connecting two sequences. Neutral networks can be thought of as high, flat plateaus in a fitness landscape. During neutral evolution, genes can randomly move through neutral networks and traverse regions of sequence space which may have consequences for robustness and evolvability.

Robert Dirks

Robert Dirks was an American chemist known for his theoretical and experimental work in DNA nanotechnology. Born in Thailand to a Thai Chinese mother and American father, he moved to Spokane, Washington at a young age. Dirks was the first graduate student in Niles Pierce's research group at the California Institute of Technology, where his dissertation work was on algorithms and computational tools to analyze nucleic acid thermodynamics and predict their structure. He also performed experimental work developing a biochemical chain reaction to self-assemble nucleic acid devices. Dirks later worked at D. E. Shaw Research on algorithms for protein folding that could be used to design new pharmaceuticals.

The ViennaRNA Package is a set of standalone programs and libraries used for prediction and analysis of RNA secondary structures. The source code for the package is distributed freely and compiled binaries are available for Linux, macOS and Windows platforms. The original paper has been cited over 2000 times.

References

  1. 1 2 3 4 5 Zadeh, J.N., C.D. Steenberg, J.S. Bois, B.R. Wolfe, A.R. Khan, M.B. Pierce, R.M. Dirks, and N.A. Pierce, NUPACK: analysis and design of nucleic acid systems. Journal of Computational Chemistry
  2. downloads
  3. 1 2 Dirks, R.M., J.S. Bois, J.M. Schaeffer, E. Winfree, and N.A. Pierce, Thermodynamic analysis of interacting nucleic acid strands SIAM Review, 2007. 49(1): p. 65-88.
  4. Serra, M.J. and D.H. Turner, Predicting thermodynamic properties of RNA. Methods in Enzymology, 1995. 259: p. 242-261.
  5. Mathews, D.H., J. Sabina, M. Zuker, and D.H. Turner, Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. Journal of Molecular Biology, 1999. 288: p. 911-940.
  6. SantaLucia, J., J., A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. Proceedings of the National Academy of Sciences of the United States of America, 1998. 95(4): p. 1460-1465.
  7. SantaLucia, J. and D. Hicks, The thermodynamics of DNA structural motifs. Annual Review of Biophysics and Biomolecular Structure, 2004. 33: p. 415-440.
  8. Koehler, R.T. and N. Peyret, Thermodynamic properties of DNA sequences: characteristic values for the human genome. Bioinformatics, 2005. 21(16): p. 3333-3339.
  9. Dirks, R.M. and N.A. Pierce, A partition function algorithm for nucleic acid secondary structure including pseudoknots. Journal of Computational Chemistry, 2003. 24: p. 1664-1677.
  10. Dirks, R.M. and N.A. Pierce, An algorithm for computing nucleic acid base-pairing probabilities including pseudoknots. Journal of Computational Chemistry, 2004. 25: p. 1295-1304.
  11. 1 2 Zadeh, J.N., B.R. Wolfe, and N.A. Pierce, Nucleic acid sequence design via efficient ensemble defect optimization. Journal of Computational Chemistry.
  12. Dirks, R.M., M. Lin, E. Winfree, and N.A. Pierce, Paradigms for computational nucleic acid design. Nucleic Acids Research, 2004. 32(4): p. 1392-1403.
  13. Molecular Programming Project
  14. Beckman Institute