Nucleic acid design

Last updated
Nucleic acid design can be used to create nucleic acid complexes with complicated secondary structures such as this four-arm junction. These four strands associate into this structure because it maximizes the number of correct base pairs, with A's matched to T's and C's matched to G's. Image from Mao, 2004. Nepodvizhnaia struktura Khollideia (angl.).svg
Nucleic acid design can be used to create nucleic acid complexes with complicated secondary structures such as this four-arm junction. These four strands associate into this structure because it maximizes the number of correct base pairs, with A's matched to T's and C's matched to G's. Image from Mao, 2004.

Nucleic acid design is the process of generating a set of nucleic acid base sequences that will associate into a desired conformation. Nucleic acid design is central to the fields of DNA nanotechnology and DNA computing. [2] It is necessary because there are many possible sequences of nucleic acid strands that will fold into a given secondary structure, but many of these sequences will have undesired additional interactions which must be avoided. In addition, there are many tertiary structure considerations which affect the choice of a secondary structure for a given design. [3] [4]

Contents

Nucleic acid design has similar goals to protein design: in both, the sequence of monomers is rationally designed to favor the desired folded or associated structure and to disfavor alternate structures. However, nucleic acid design has the advantage of being a much computationally simpler problem, since the simplicity of Watson-Crick base pairing rules leads to simple heuristic methods which yield experimentally robust designs. Computational models for protein folding require tertiary structure information whereas nucleic acid design can operate largely on the level of secondary structure. However, nucleic acid structures are less versatile than proteins in their functionality. [2] [5]

Nucleic acid design can be considered the inverse of nucleic acid structure prediction. In structure prediction, the structure is determined from a known sequence, while in nucleic acid design, a sequence is generated which will form a desired structure. [2]

Fundamental concepts

Chemical structure of DNA. Nucleic acid double helices will only form between two strands of complementary sequences, where the bases are matched into only A-T or G-C pairs. DNA chemical structure.svg
Chemical structure of DNA. Nucleic acid double helices will only form between two strands of complementary sequences, where the bases are matched into only A-T or G-C pairs.

The structure of nucleic acids consists of a sequence of nucleotides. There are four types of nucleotides distinguished by which of the four nucleobases they contain: in DNA these are adenine (A), cytosine (C), guanine (G), and thymine (T). Nucleic acids have the property that two molecules will bind to each other to form a double helix only if the two sequences are complementary, that is, they can form matching sequences of base pairs. Thus, in nucleic acids the sequence determines the pattern of binding and thus the overall structure. [5]

Nucleic acid design is the process by which, given a desired target structure or functionality, sequences are generated for nucleic acid strands which will self-assemble into that target structure. Nucleic acid design encompasses all levels of nucleic acid structure:

One of the greatest concerns in nucleic acid design is ensuring that the target structure has the lowest free energy (i.e. is the most thermodynamically favorable) whereas misformed structures have higher values of free energy and are thus unfavored. [2] These goals can be achieved through the use of a number of approaches, including heuristic, thermodynamic, and geometrical ones. Almost all nucleic acid design tasks are aided by computers, and a number of software packages are available for many of these tasks.

Two considerations in nucleic acid design are that desired hybridizations should have melting temperatures in a narrow range, and any spurious interactions should have very low melting temperatures (i.e. they should be very weak). [5] There is also a contrast between affinity-optimizing "positive design", seeks to minimize the energy of the desired structure in an absolute sense, and specificity-optimizing "negative design", which considers the energy of the target structure relative to those of undesired structures. Algorithms which implement both kinds of design tend to perform better than those that consider only one type. [2]

Approaches

Heuristic methods

Heuristic methods use simple criteria which can be quickly evaluated to judge the suitability of different sequences for a given secondary structure. They have the advantage of being much less computationally expensive than the energy minimization algorithms needed for thermodynamic or geometrical modeling, and being easier to implement, but at the cost of being less rigorous than these models.

Sequence symmetry minimization is the oldest approach to nucleic acid design and was first used to design immobile versions of branched DNA structures. Sequence symmetry minimization divides the nucleic acid sequence into overlapping subsequences of a fixed length, called the criterion length. Each of the 4N possible subsequences of length N is allowed to appear only once in the sequence. This ensures that no undesired hybridizations can occur which have a length greater than or equal to the criterion length. [2] [3]

A related heuristic approach is to consider the "mismatch distance", meaning the number of positions in a certain frame where the bases are not complementary. A greater mismatch distance lessens the chance that a strong spurious interaction can happen. [5] This is related to the concept of Hamming distance in information theory. Another related but more involved approach is to use methods from coding theory to construct nucleic acid sequences with desired properties.

Thermodynamic models

Information about the secondary structure of a nucleic acid complex along with its sequence can be used to predict the thermodynamic properties of the complex.

When thermodynamic models are used in nucleic acid design, there are usually two considerations: desired hybridizations should have melting temperatures in a narrow range, and any spurious interactions should have very low melting temperatures (i.e. they should be very weak). The Gibbs free energy of a perfectly matched nucleic acid duplex can be predicted using a nearest neighbor model. This model considers only the interactions between a nucleotide and its nearest neighbors on the nucleic acid strand, by summing the free energy of each of the overlapping two-nucleotide subwords of the duplex. This is then corrected for self-complementary monomers and for GC-content. Once the free energy is known, the melting temperature of the duplex can be determined. GC-content alone can also be used to estimate the free energy and melting temperature of a nucleic acid duplex. This is less accurate but also much less computationally costly. [5]

Software for thermodynamic modeling of nucleic acids includes Nupack, [6] [7] mfold/UNAFold, [8] and Vienna. [9]

A related approach, inverse secondary structure prediction, uses stochastic local search which improves a nucleic acid sequence by running a structure prediction algorithm and the modifying the sequence to eliminate unwanted features. [5]

Geometrical models

A geometrical model of a DNA tetrahedron described in Goodman, 2005. Models of this type are useful for ensuring that tertiary structure constraints do not cause excessive strain to the molecule. DNA tetrahedron.png
A geometrical model of a DNA tetrahedron described in Goodman, 2005. Models of this type are useful for ensuring that tertiary structure constraints do not cause excessive strain to the molecule.

Geometrical models of nucleic acids are used to predict tertiary structure. This is important because designed nucleic acid complexes usually contain multiple junction points, which introduces geometric constraints to the system. These constraints stem from the basic structure of nucleic acids, mainly that the double helix formed by nucleic acid duplexes has a fixed helicity of about 10.4 base pairs per turn, and is relatively stiff. Because of these constraints, the nucleic acid complexes are sensitive to the relative orientation of the major and minor grooves at junction points. Geometrical modeling can detect strain stemming from misalignments in the structure, which can then be corrected by the designer. [4] [11]

Geometric models of nucleic acids for DNA nanotechnology generally use reduced representations of the nucleic acid, because simulating every atom would be very computationally expensive for such large systems. Models with three pseudo-atoms per base pair, representing the two backbone sugars and the helix axis, have been reported to have a sufficient level of detail to predict experimental results. [11] However, models with five pseudo-atoms per base pair, explicitly including the backbone phosphates, are also used. [12]

Software for geometrical modeling of nucleic acids includes GIDEON, [11] Tiamat, [13] Nanoengineer-1, and UNIQUIMER 3D. [14] Geometrical concerns are especially of interest in the design of DNA origami, because the sequence is predetermined by the choice of scaffold strand. Software specifically for DNA origami design has been made, including caDNAno [15] and SARSE. [16]

Applications

Nucleic acid design is used in DNA nanotechnology to design strands which will self-assemble into a desired target structure. These include examples such as DNA machines, periodic two- and three-dimensional lattices, polyhedra, and DNA origami. [2] It can also be used to create sets of nucleic acid strands which are "orthogonal", or non-interacting with each other, so as to minimize or eliminate spurious interactions. This is useful in DNA computing, as well as for molecular barcoding applications in chemical biology and biotechnology. [5]

See also

Related Research Articles

<span class="mw-page-title-main">Base pair</span> Unit consisting of two nucleobases bound to each other by hydrogen bonds

A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both DNA and RNA. Dictated by specific hydrogen bonding patterns, "Watson–Crick" base pairs allow the DNA helix to maintain a regular helical structure that is subtly dependent on its nucleotide sequence. The complementary nature of this based-paired structure provides a redundant copy of the genetic information encoded within each strand of DNA. The regular structure and data redundancy provided by the DNA double helix make DNA well suited to the storage of genetic information, while base-pairing between DNA and incoming nucleotides provides the mechanism through which DNA polymerase replicates DNA and RNA polymerase transcribes DNA into RNA. Many DNA-binding proteins can recognize specific base-pairing patterns that identify particular regulatory regions of genes.

<span class="mw-page-title-main">Denaturation (biochemistry)</span> Loss of structure in proteins and nucleic acids due to external stress

In biochemistry, denaturation is a process in which proteins or nucleic acids lose the quaternary structure, tertiary structure, and secondary structure which is present in their native state, by application of some external stress or compound such as a strong acid or base, a concentrated inorganic salt, an organic solvent, agitation and radiation or heat. If proteins in a living cell are denatured, this results in disruption of cell activity and possibly cell death. Protein denaturation is also a consequence of cell death. Denatured proteins can exhibit a wide range of characteristics, from conformational change and loss of solubility to aggregation due to the exposure of hydrophobic groups. The loss of solubility as a result of denaturation is called coagulation. Denatured proteins lose their 3D structure and therefore cannot function.

<span class="mw-page-title-main">Peptide nucleic acid</span> Biological molecule

Peptide nucleic acid (PNA) is an artificially synthesized polymer similar to DNA or RNA.

<span class="mw-page-title-main">Helicase</span> Class of enzymes to unpack an organisms genes

Helicases are a class of enzymes thought to be vital to all organisms. Their main function is to unpack an organism's genetic material. Helicases are motor proteins that move directionally along a nucleic acid phosphodiester backbone, separating two hybridized nucleic acid strands, using energy from ATP hydrolysis. There are many helicases, representing the great variety of processes in which strand separation must be catalyzed. Approximately 1% of eukaryotic genes code for helicases.

<span class="mw-page-title-main">Hoogsteen base pair</span>

A Hoogsteen base pair is a variation of base-pairing in nucleic acids such as the A•T pair. In this manner, two nucleobases, one on each strand, can be held together by hydrogen bonds in the major groove. A Hoogsteen base pair applies the N7 position of the purine base and C6 amino group, which bind the Watson–Crick (N3–C4) face of the pyrimidine base.

<span class="mw-page-title-main">Triple-stranded DNA</span> DNA structure

Triple-stranded DNA is a DNA structure in which three oligonucleotides wind around each other and form a triple helix. In triple-stranded DNA, the third strand binds to a B-form DNA double helix by forming Hoogsteen base pairs or reversed Hoogsteen hydrogen bonds.

<span class="mw-page-title-main">Nucleic acid double helix</span> Structure formed by double-stranded molecules

In molecular biology, the term double helix refers to the structure formed by double-stranded molecules of nucleic acids such as DNA. The double helical structure of a nucleic acid complex arises as a consequence of its secondary structure, and is a fundamental component in determining its tertiary structure.The structure was discovered by Rosalind Franklin and her student Raymond Gosling, but the term "double helix" entered popular culture with the publication in 1968 of The Double Helix: A Personal Account of the Discovery of the Structure of DNA by James Watson.

<span class="mw-page-title-main">DNA origami</span> Folding of DNA to create two- and three-dimensional shapes at the nanoscale

DNA origami is the nanoscale folding of DNA to create arbitrary two- and three-dimensional shapes at the nanoscale. The specificity of the interactions between complementary base pairs make DNA a useful construction material, through design of its base sequences. DNA is a well-understood material that is suitable for creating scaffolds that hold other molecules in place or to create structures all on its own.

<span class="mw-page-title-main">Holliday junction</span> Branched nucleic acid structure

A Holliday junction is a branched nucleic acid structure that contains four double-stranded arms joined. These arms may adopt one of several conformations depending on buffer salt concentrations and the sequence of nucleobases closest to the junction. The structure is named after Robin Holliday, the molecular biologist who proposed its existence in 1964.

Nucleic acid structure prediction is a computational method to determine secondary and tertiary nucleic acid structure from its sequence. Secondary structure can be predicted from one or several nucleic acid sequences. Tertiary structure can be predicted from the sequence, or by comparative modeling.

Nucleic acid thermodynamics is the study of how temperature affects the nucleic acid structure of double-stranded DNA (dsDNA). The melting temperature (Tm) is defined as the temperature at which half of the DNA strands are in the random coil or single-stranded (ssDNA) state. Tm depends on the length of the DNA molecule and its specific nucleotide sequence. DNA, when in a state where its two strands are dissociated, is referred to as having been denatured by the high temperature.

<span class="mw-page-title-main">Nucleic acid structure</span> Biomolecular structure of nucleic acids such as DNA and RNA

Nucleic acid structure refers to the structure of nucleic acids such as DNA and RNA. Chemically speaking, DNA and RNA are very similar. Nucleic acid structure is often divided into four different levels: primary, secondary, tertiary, and quaternary.

<span class="mw-page-title-main">DNA nanotechnology</span> The design and manufacture of artificial nucleic acid structures for technological uses

DNA nanotechnology is the design and manufacture of artificial nucleic acid structures for technological uses. In this field, nucleic acids are used as non-biological engineering materials for nanotechnology rather than as the carriers of genetic information in living cells. Researchers in the field have created static structures such as two- and three-dimensional crystal lattices, nanotubes, polyhedra, and arbitrary shapes, and functional devices such as molecular machines and DNA computers. The field is beginning to be used as a tool to solve basic science problems in structural biology and biophysics, including applications in X-ray crystallography and nuclear magnetic resonance spectroscopy of proteins to determine structures. Potential applications in molecular scale electronics and nanomedicine are also being investigated.

<span class="mw-page-title-main">Nucleic acid secondary structure</span>

Nucleic acid secondary structure is the basepairing interactions within a single nucleic acid polymer or between two polymers. It can be represented as a list of bases which are paired in a nucleic acid molecule. The secondary structures of biological DNAs and RNAs tend to be different: biological DNA mostly exists as fully base paired double helices, while biological RNA is single stranded and often forms complex and intricate base-pairing interactions due to its increased ability to form hydrogen bonds stemming from the extra hydroxyl group in the ribose sugar.

<span class="mw-page-title-main">Triple helix</span> Set of three congruent geometrical helices with the same axis

In the fields of geometry and biochemistry, a triple helix is a set of three congruent geometrical helices with the same axis, differing by a translation along the axis. This means that each of the helices keeps the same distance from the central axis. As with a single helix, a triple helix may be characterized by its pitch, diameter, and handedness. Examples of triple helices include triplex DNA, triplex RNA, the collagen helix, and collagen-like proteins.

<span class="mw-page-title-main">Spherical nucleic acid</span>

Spherical nucleic acids (SNAs) are nanostructures that consist of a densely packed, highly oriented arrangement of linear nucleic acids in a three-dimensional, spherical geometry. This novel three-dimensional architecture is responsible for many of the SNA's novel chemical, biological, and physical properties that make it useful in biomedicine and materials synthesis. SNAs were first introduced in 1996 by Chad Mirkin’s group at Northwestern University.

The ViennaRNA Package is a set of standalone programs and libraries used for prediction and analysis of RNA secondary structures. The source code for the package is distributed freely and compiled binaries are available for Linux, macOS and Windows platforms. The original paper has been cited over 2000 times.

Non-canonical base pairs are planar hydrogen bonded pairs of nucleobases, having hydrogen bonding patterns which differ from the patterns observed in Watson-Crick base pairs, as in the classic double helical DNA. The structures of polynucleotide strands of both DNA and RNA molecules can be understood in terms of sugar-phosphate backbones consisting of phosphodiester-linked D 2’ deoxyribofuranose sugar moieties, with purine or pyrimidine nucleobases covalently linked to them. Here, the N9 atoms of the purines, guanine and adenine, and the N1 atoms of the pyrimidines, cytosine and thymine, respectively, form glycosidic linkages with the C1’ atom of the sugars. These nucleobases can be schematically represented as triangles with one of their vertices linked to the sugar, and the three sides accounting for three edges through which they can form hydrogen bonds with other moieties, including with other nucleobases. The side opposite to the sugar linked vertex is traditionally called the Watson-Crick edge, since they are involved in forming the Watson-Crick base pairs which constitute building blocks of double helical DNA. The two sides adjacent to the sugar-linked vertex are referred to, respectively, as the Sugar and Hoogsteen edges.

<span class="mw-page-title-main">RNA origami</span>

RNA origami is the nanoscale folding of RNA, enabling the RNA to create particular shapes to organize these molecules. It is a new method that was developed by researchers from Aarhus University and California Institute of Technology. RNA origami is synthesized by enzymes that fold RNA into particular shapes. The folding of the RNA occurs in living cells under natural conditions. RNA origami is represented as a DNA gene, which within cells can be transcribed into RNA by RNA polymerase. Many computer algorithms are present to help with RNA folding, but none can fully predict the folding of RNA of a singular sequence.

TectoRNAs are modular RNA units able to self-assemble into larger nanostructures in a programmable fashion. They are generated by rational design through an approach called RNA architectonics, which make use of RNA structural modules identified in natural RNA molecules to form pre-defined 3D structures spontaneously.

References

  1. Mao, Chengde (December 2004). "The Emergence of Complexity: Lessons from DNA". PLOS Biology . 2 (12): 2036–2038. doi: 10.1371/journal.pbio.0020431 . ISSN   1544-9173. PMC   535573 . PMID   15597116.
  2. 1 2 3 4 5 6 7 Dirks, Robert M.; Lin, Milo; Winfree, Erik; Pierce, Niles A. (2004). "Paradigms for computational nucleic acid design". Nucleic Acids Research . 32 (4): 1392–1403. doi:10.1093/nar/gkh291. PMC   390280 . PMID   14990744.
  3. 1 2 Seeman, N (1982). "Nucleic acid junctions and lattices". Journal of Theoretical Biology. 99 (2): 237–47. Bibcode:1982JThBi..99..237S. doi:10.1016/0022-5193(82)90002-9. PMID   6188926.
  4. 1 2 Sherman, W; Seeman, N (2006). "Design of Minimally Strained Nucleic Acid Nanotubes". Biophysical Journal. 90 (12): 4546–57. Bibcode:2006BpJ....90.4546S. doi:10.1529/biophysj.105.080390. PMC   1471877 . PMID   16581842.
  5. 1 2 3 4 5 6 7 Brenneman, Arwen; Condon, Anne (2002). "Strand design for biomolecular computation". Theoretical Computer Science. 287: 39–58. doi: 10.1016/S0304-3975(02)00135-4 .
  6. Dirks, Robert M.; Bois, Justin S.; Schaeffer, Joseph M.; Winfree, Erik; Pierce, Niles A. (2007). "Thermodynamic Analysis of Interacting Nucleic Acid Strands". SIAM Review. 49 (1): 65–88. Bibcode:2007SIAMR..49...65D. CiteSeerX   10.1.1.523.4764 . doi:10.1137/060651100.
  7. Zadeh, Joseph N.; Wolfe, Brian R.; Pierce, Niles A. (2011). "Nucleic acid sequence design via efficient ensemble defect optimization" (PDF). Journal of Computational Chemistry. 32 (3): 439–452. doi:10.1002/jcc.21633. PMID   20717905. S2CID   1803200.
  8. Zuker, M. (2003). "Mfold web server for nucleic acid folding and hybridization prediction". Nucleic Acids Research. 31 (13): 3406–15. doi:10.1093/nar/gkg595. PMC   169194 . PMID   12824337.
  9. Gruber AR, Lorenz R, Bernhart SH, Neuböck R, Hofacker IL (2008). "The Vienna RNA websuite". Nucleic Acids Res. 36 (Web Server issue): W70–4. doi:10.1093/nar/gkn188. PMC   2447809 . PMID   18424795.
  10. Goodman, R.P.; Schaap, I.A.T.; Tardin, C.F.; Erben, C.M.; Berry, R.M.; Schmidt, C.F.; Turberfield, A.J. (9 December 2005). "Rapid chiral assembly of rigid DNA building blocks for molecular nanofabrication". Science . 310 (5754): 1661–1665. Bibcode:2005Sci...310.1661G. doi:10.1126/science.1120367. ISSN   0036-8075. PMID   16339440. S2CID   13678773.
  11. 1 2 3 Birac, Jeffrey J.; Sherman, William B.; Kopatsch, Jens; Constantinou, Pamela E.; Seeman, Nadrian C. (2006). "Architecture with GIDEON, a program for design in structural DNA nanotechnology". Journal of Molecular Graphics and Modelling. 25 (4): 470–80. doi:10.1016/j.jmgm.2006.03.005. PMC   3465968 . PMID   16630733.
  12. "PAM3 and PAM5 Model Descriptions". Nanoengineer-1 documentation wiki. Nanorex. Retrieved 2010-04-15.
  13. Williams, Sean; Lund, Kyle; Lin, Chenxiang; Wonka, Peter; Lindsay, Stuart; Yan, Hao (2009). "Tiamat: A Three-Dimensional Editing Tool for Complex DNA Structures". DNA Computing. Lecture Notes in Computer Science. Vol. 5347. Springer Berlin / Heidelberg. pp. 90–101. doi:10.1007/978-3-642-03076-5_8. ISBN   978-3-642-03075-8. ISSN   0302-9743.
  14. Zhu, J.; Wei, B.; Yuan, Y.; Mi, Y. (2009). "UNIQUIMER 3D, a software system for structural DNA nanotechnology design, analysis and evaluation". Nucleic Acids Research. 37 (7): 2164–75. doi:10.1093/nar/gkp005. PMC   2673411 . PMID   19228709.
  15. Douglas, S. M.; Marblestone, A. H.; Teerapittayanon, S.; Vazquez, A.; Church, G. M.; Shih, W. M. (2009). "Rapid prototyping of 3D DNA-origami shapes with caDNAno". Nucleic Acids Research. 37 (15): 5001–6. doi:10.1093/nar/gkp436. PMC   2731887 . PMID   19531737.
  16. Andersen, Ebbe S.; Dong, Mingdong; Nielsen, Morten M.; Jahn, Kasper; Lind-Thomsen, Allan; Mamdouh, Wael; Gothelf, Kurt V.; Besenbacher, Flemming; Kjems, JøRgen (2008). "DNA Origami Design of Dolphin-Shaped Structures with Flexible Tails". ACS Nano. 2 (6): 1213–8. doi:10.1021/nn800215j. PMID   19206339.

Further reading