DNA synthesis is the natural or artificial creation of deoxyribonucleic acid (DNA) molecules. DNA is a macromolecule made up of nucleotide units, which are linked by covalent bonds and hydrogen bonds, in a repeating structure. DNA synthesis occurs when these nucleotide units are joined to form DNA; this can occur artificially (in vitro) or naturally (in vivo). Nucleotide units are made up of a nitrogenous base (cytosine, guanine, adenine or thymine), pentose sugar (deoxyribose) and phosphate group. Each unit is joined when a covalent bond forms between its phosphate group and the pentose sugar of the next nucleotide, forming a sugar-phosphate backbone. DNA is a complementary, double stranded structure as specific base pairing (adenine and thymine, guanine and cytosine) occurs naturally when hydrogen bonds form between the nucleotide bases.
There are several different definitions for DNA synthesis: it can refer to DNA replication - DNA biosynthesis (in vivo DNA amplification), polymerase chain reaction - enzymatic DNA synthesis (in vitro DNA amplification) or gene synthesis - physically creating artificial gene sequences. Though each type of synthesis is very different, they do share some features. Nucleotides that have been joined to form polynucleotides can act as a DNA template for one form of DNA synthesis - PCR - to occur. DNA replication also works by using a DNA template, the DNA double helix unwinds during replication, exposing unpaired bases for new nucleotides to hydrogen bond to. Gene synthesis, however, does not require a DNA template and genes are assembled de novo.
DNA synthesis occurs in all eukaryotes and prokaryotes, as well as some viruses. The accurate synthesis of DNA is important in order to avoid mutations to DNA. In humans, mutations could lead to diseases such as cancer so DNA synthesis, and the machinery involved in vivo, has been studied extensively throughout the decades. In the future these studies may be used to develop technologies involving DNA synthesis, to be used in data storage.
In nature, DNA molecules are synthesised by all living cells through the process of DNA replication. This typically occurs as a part of cell division. DNA replication occurs so, during cell division, each daughter cell contains an accurate copy of the genetic material of the cell. In vivo DNA synthesis (DNA replication) is dependent on a complex set of enzymes which have evolved to act during the S phase of the cell cycle, in a concerted fashion. In both eukaryotes and prokaryotes, DNA replication occurs when specific topoisomerases, helicases and gyrases (replication initiator proteins) uncoil the double-stranded DNA, exposing the nitrogenous bases. [1] These enzymes, along with accessory proteins, form a macromolecular machine which ensures accurate duplication of DNA sequences. Complementary base pairing takes place, forming a new double-stranded DNA molecule. This is known as semi-conservative replication since one strand of the new DNA molecule is from the 'parent' strand.
Continuously, eukaryotic enzymes encounter DNA damage which can perturb DNA replication. This damage is in the form of DNA lesions that arise spontaneously or due to DNA damaging agents. DNA replication machinery is therefore highly controlled in order to prevent collapse when encountering damage. [2] Control of the DNA replication system ensures that the genome is replicated only once per cycle; over-replication induces DNA damage. Deregulation of DNA replication is a key factor in genomic instability during cancer development. [3]
This highlights the specificity of DNA synthesis machinery in vivo. Various means exist to artificially stimulate the replication of naturally occurring DNA, or to create artificial gene sequences. However, DNA synthesis in vitro can be a very error-prone process.
Damaged DNA is subject to repair by several different enzymatic repair processes, where each individual process is specialized to repair particular types of damage. The DNA of humans is subject to damage from multiple natural sources and insufficient repair is associated with disease and premature aging. [4] Most DNA repair processes form single-strand gaps in DNA during an intermediate stage of the repair, and these gaps are filled in by repair synthesis. [4] The specific repair processes that require gap filling by DNA synthesis include nucleotide excision repair, base excision repair, mismatch repair, homologous recombinational repair, non-homologous end joining and microhomology-mediated end joining.
Reverse transcription is part of the replication cycle of particular virus families, including retroviruses. It involves copying RNA into double-stranded complementary DNA (cDNA), using reverse transcriptase enzymes. In retroviruses, viral RNA is inserted into a host cell nucleus. There, a viral reverse transcriptase enzyme adds DNA nucleotides onto the RNA sequence, generating cDNA that is inserted into the host cell genome by the enzyme integrase, encoding viral proteins. [5]
A polymerase chain reaction is a form of enzymatic DNA synthesis in the laboratory, using cycles of repeated heating and cooling of the reaction for DNA melting and enzymatic replication of the DNA.
DNA synthesis during PCR is very similar to living cells but has very specific reagents and conditions. During PCR, DNA is chemically extracted from host chaperone proteins then heated, causing thermal dissociation of the DNA strands. Two new cDNA strands are built from the original strand, these strands can be split again to act as the template for further PCR products. The original DNA is multiplied through many rounds of PCR. [1] More than a billion copies of the original DNA strand can be made.
For many experiments, such as structural and evolutionary studies, scientists need to produce a large library of variants of a particular DNA sequence. Random mutagenesis takes place in vitro, when mutagenic replication with a low fidelity DNA polymerase is combined with selective PCR amplification to produce many copies of mutant DNA. [6]
RT-PCR differs from conventional PCR as it synthesizes cDNA from mRNA, rather than template DNA. The technique couples a reverse transcription reaction with PCR-based amplification, as an RNA sequence acts as a template for the enzyme, reverse transcriptase. RT-PCR is often used to test gene expression in particular tissue or cell types at various developmental stages or to test for genetic disorders. [7]
Artificial gene synthesis is the process of synthesizing a gene in vitro without the need for initial template DNA samples. In 2010 J. Craig Venter and his team were the first to use entirely synthesized DNA to create a self-replicating microbe, dubbed Mycoplasma laboratorium. [8]
Oligonucleotide synthesis is the chemical synthesis of sequences of nucleic acids. The majority of biological research and bioengineering involves synthetic DNA, which can include oligonucleotides, synthetic genes, or even chromosomes. Today, most synthetic DNA is custom-built using the phosphoramidite method by Marvin H. Caruthers. Oligos are synthesized from building blocks which replicate natural bases. Other techniques for snythesising DNA have been commercially made available, including Short Oligo Ligation Assembly. [9] The process has been automated since the late 1970s and can be used to form desired genetic sequences as well as for other uses in medicine and molecular biology. However, creating sequences chemically is impractical beyond 200-300 bases, and is an environmentally hazardous process. These oligos, of around 200 bases, can be connected using DNA assembly methods, creating larger DNA molecules. [10]
Some studies have explored the possibility of enzymatic synthesis using terminal deoxynucleotidyl transferase (TdT), a DNA polymerase that requires no template. However, this method is not yet as effective as chemical synthesis, and is not commercially available. [11]
With advances in artificial DNA synthesis, the possibility of DNA data storage is being explored. With its ultrahigh storage density and long-term stability, synthetic DNA is an interesting option to store large amounts of data. Although information can be retrieved very quickly from DNA through next generation sequencing technologies, de novo synthesis of DNA is a major bottleneck in the process. Only one nucleotide can be added per cycle, with each cycle taking seconds, so the overall synthesis is very time-consuming, as well as very error prone. However, if biotechnology improves, synthetic DNA could one day be used in data storage. [12]
It has been reported that new nucleobase pairs can be synthesized, as well as A-T (adenine - thymine) and G-C (guanine - cytosine). Synthetic nucleotides can be used to expand the genetic alphabet and allow specific modification of DNA sites. Even just a third base pair would expand the number of amino acids that can be encoded by DNA from the existing 20 amino acids to a possible 172. [8] Hachimoji DNA is built from eight nucleotide letters, forming four possible base pairs. It therefore doubles the information density of natural DNA. In studies, RNA has even been produced from hachimoji DNA. This technology could also be used to allow data storage in DNA. [13]
A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both DNA and RNA. Dictated by specific hydrogen bonding patterns, "Watson–Crick" base pairs allow the DNA helix to maintain a regular helical structure that is subtly dependent on its nucleotide sequence. The complementary nature of this based-paired structure provides a redundant copy of the genetic information encoded within each strand of DNA. The regular structure and data redundancy provided by the DNA double helix make DNA well suited to the storage of genetic information, while base-pairing between DNA and incoming nucleotides provides the mechanism through which DNA polymerase replicates DNA and RNA polymerase transcribes DNA into RNA. Many DNA-binding proteins can recognize specific base-pairing patterns that identify particular regulatory regions of genes.
In molecular biology, DNA replication is the biological process of producing two identical replicas of DNA from one original DNA molecule. DNA replication occurs in all living organisms acting as the most essential part of biological inheritance. This is essential for cell division during growth and repair of damaged tissues, while it also ensures that each of the new cells receives its own copy of the DNA. The cell possesses the distinctive property of division, which makes replication of DNA essential.
The polymerase chain reaction (PCR) is a method widely used to make millions to billions of copies of a specific DNA sample rapidly, allowing scientists to amplify a very small sample of DNA sufficiently to enable detailed study. PCR was invented in 1983 by American biochemist Kary Mullis at Cetus Corporation. Mullis and biochemist Michael Smith, who had developed other essential ways of manipulating DNA, were jointly awarded the Nobel Prize in Chemistry in 1993.
A primer is a short, single-stranded nucleic acid used by all living organisms in the initiation of DNA synthesis. A synthetic primer may also be referred to as an oligo, short for oligonucleotide. DNA polymerase enzymes are only capable of adding nucleotides to the 3’-end of an existing nucleic acid, requiring a primer be bound to the template before DNA polymerase can begin a complementary strand. DNA polymerase adds nucleotides after binding to the RNA primer and synthesizes the whole strand. Later, the RNA strands must be removed accurately and replace them with DNA nucleotides forming a gap region known as a nick that is filled in using an enzyme called ligase. The removal process of the RNA primer requires several enzymes, such as Fen1, Lig1, and others that work in coordination with DNA polymerase, to ensure the removal of the RNA nucleotides and the addition of DNA nucleotides. Living organisms use solely RNA primers, while laboratory techniques in biochemistry and molecular biology that require in vitro DNA synthesis usually use DNA primers, since they are more temperature stable. Primers can be designed in laboratory for specific reactions such as polymerase chain reaction (PCR). When designing PCR primers, there are specific measures that must be taken into consideration, like the melting temperature of the primers and the annealing temperature of the reaction itself. Moreover, the DNA binding sequence of the primer in vitro has to be specifically chosen, which is done using a method called basic local alignment search tool (BLAST) that scans the DNA and finds specific and unique regions for the primer to bind.
A reverse transcriptase (RT) is an enzyme used to convert RNA genome to DNA, a process termed reverse transcription. Reverse transcriptases are used by viruses such as HIV and hepatitis B to replicate their genomes, by retrotransposon mobile genetic elements to proliferate within the host genome, and by eukaryotic cells to extend the telomeres at the ends of their linear chromosomes. Contrary to a widely held belief, the process does not violate the flows of genetic information as described by the classical central dogma, as transfers of information from RNA to DNA are explicitly held possible.
In genetics and biochemistry, sequencing means to determine the primary structure of an unbranched biopolymer. Sequencing results in a symbolic linear depiction known as a sequence which succinctly summarizes much of the atomic-level structure of the sequenced molecule.
Transcription is the process of copying a segment of DNA into RNA. The segments of DNA transcribed into RNA molecules that can encode proteins produce messenger RNA (mRNA). Other segments of DNA are transcribed into RNA molecules called non-coding RNAs (ncRNAs).
Protein engineering is the process of developing useful or valuable proteins through the design and production of unnatural polypeptides, often by altering amino acid sequences found in nature. It is a young discipline, with much research taking place into the understanding of protein folding and recognition for protein design principles. It has been used to improve the function of many enzymes for industrial catalysis. It is also a product and services market, with an estimated value of $168 billion by 2017.
A DNA polymerase is a member of a family of enzymes that catalyze the synthesis of DNA molecules from nucleoside triphosphates, the molecular precursors of DNA. These enzymes are essential for DNA replication and usually work in groups to create two identical DNA duplexes from a single original DNA duplex. During this process, DNA polymerase "reads" the existing DNA strands to create two new strands that match the existing ones. These enzymes catalyze the chemical reaction
DNA polymerase I is an enzyme that participates in the process of prokaryotic DNA replication. Discovered by Arthur Kornberg in 1956, it was the first known DNA polymerase. It was initially characterized in E. coli and is ubiquitous in prokaryotes. In E. coli and many other bacteria, the gene that encodes Pol I is known as polA. The E. coli Pol I enzyme is composed of 928 amino acids, and is an example of a processive enzyme — it can sequentially catalyze multiple polymerisation steps without releasing the single-stranded template. The physiological function of Pol I is mainly to support repair of damaged DNA, but it also contributes to connecting Okazaki fragments by deleting RNA primers and replacing the ribonucleotides with DNA.
Site-directed mutagenesis is a molecular biology method that is used to make specific and intentional mutating changes to the DNA sequence of a gene and any gene products. Also called site-specific mutagenesis or oligonucleotide-directed mutagenesis, it is used for investigating the structure and biological activity of DNA, RNA, and protein molecules, and for protein engineering.
In molecular biology, a library is a collection of genetic material fragments that are stored and propagated in a population of microbes through the process of molecular cloning. There are different types of DNA libraries, including cDNA libraries, genomic libraries and randomized mutant libraries. DNA library technology is a mainstay of current molecular biology, genetic engineering, and protein engineering, and the applications of these libraries depend on the source of the original DNA fragments. There are differences in the cloning vectors and techniques used in library preparation, but in general each DNA fragment is uniquely inserted into a cloning vector and the pool of recombinant DNA molecules is then transferred into a population of bacteria or yeast such that each organism contains on average one construct. As the population of organisms is grown in culture, the DNA molecules contained within them are copied and propagated.
Nuclear DNA (nDNA), or nuclear deoxyribonucleic acid, is the DNA contained within each cell nucleus of a eukaryotic organism. It encodes for the majority of the genome in eukaryotes, with mitochondrial DNA and plastid DNA coding for the rest. It adheres to Mendelian inheritance, with information coming from two parents, one male and one female—rather than matrilineally as in mitochondrial DNA.
Taq polymerase is a thermostable DNA polymerase I named after the thermophilic eubacterial microorganism Thermus aquaticus, from which it was originally isolated by Chien et al. in 1976. Its name is often abbreviated to Taq or Taq pol. It is frequently used in the polymerase chain reaction (PCR), a method for greatly amplifying the quantity of short segments of DNA.
Rolling circle replication (RCR) is a process of unidirectional nucleic acid replication that can rapidly synthesize multiple copies of circular molecules of DNA or RNA, such as plasmids, the genomes of bacteriophages, and the circular RNA genome of viroids. Some eukaryotic viruses also replicate their DNA or RNA via the rolling circle mechanism.
The replisome is a complex molecular machine that carries out replication of DNA. The replisome first unwinds double stranded DNA into two single strands. For each of the resulting single strands, a new complementary sequence of DNA is synthesized. The total result is formation of two new double stranded DNA sequences that are exact copies of the original double stranded DNA sequence.
The history of the polymerase chain reaction (PCR) has variously been described as a classic "Eureka!" moment, or as an example of cooperative teamwork between disparate researchers. Following is a list of events before, during, and after its development:
Recombinant DNA (rDNA), or molecular cloning, is the process by which a single gene, or segment of DNA, is isolated and amplified. Recombinant DNA is also known as in vitro recombination. A cloning vector is a DNA molecule that carries foreign DNA into a host cell, where it replicates, producing many copies of itself along with the foreign DNA. There are many types of cloning vectors such as plasmids and phages. In order to carry out recombination between vector and the foreign DNA, it is necessary the vector and DNA to be cloned by digestion, ligase the foreign DNA into the vector with the enzyme DNA ligase. And DNA is inserted by introducing the DNA into bacteria cells by transformation.
Xeno nucleic acids (XNA) are synthetic nucleic acid analogues that have a different backbone from the ribose and deoxyribose found in the nucleic acids of naturally occurring RNA and DNA.
This glossary of cellular and molecular biology is a list of definitions of terms and concepts commonly used in the study of cell biology, molecular biology, and related disciplines, including molecular genetics, biochemistry, and microbiology. It is split across two articles: