Synthetic genomes

Last updated

Synthetic genome is a synthetically built genome whose formation involves either genetic modification on pre-existing life forms or artificial gene synthesis to create new DNA or entire lifeforms. [1] [2] [3] The field that studies synthetic genomes is called synthetic genomics.

Contents

Recombinant DNA technology

Soon after the discovery of restriction endonucleases and ligases, the field of genetics began using these molecular tools to assemble artificial sequences from smaller fragments of synthetic or naturally occurring DNA. The advantage in using the recombinatory approach as opposed to continual DNA synthesis stems from the inverse relationship that exists between synthetic DNA length and percent purity of that synthetic length. In other words, as you synthesize longer sequences, the number of error-containing clones increases due to the inherent error rates of current technologies. [4] Although recombinant DNA technology is more commonly used in the construction of fusion proteins and plasmids, several techniques with larger capacities have emerged, allowing for the construction of entire genomes. [5]

Polymerase cycling assembly

Polymerase Cycling Assembly. Blue arrows represent oligonucleotides 40 to 60 bp with overlapping regions of about 20 bp. The cycle is repeated until the final genome is constructed. PCA illustrated by Nivin Nasri (edited).png
Polymerase Cycling Assembly. Blue arrows represent oligonucleotides 40 to 60 bp with overlapping regions of about 20 bp. The cycle is repeated until the final genome is constructed.

Polymerase cycling assembly (PCA) uses a series of oligonucleotides (or oligos), approximately 40 to 60 nucleotides long, that altogether constitute both strands of the DNA being synthesized. These oligos are designed such that a single oligo from one strand contains a length of approximately 20 nucleotides at each end that is complementary to sequences of two different oligos on the opposite strand, thereby creating regions of overlap. The entire set is processed through cycles of: (a) hybridization at 60 °C; (b) elongation via Taq polymerase and a standard ligase; and (c) denaturation at 95 °C, forming progressively longer contiguous strands and ultimately resulting in the final genome. [6] PCA was used to generate the first synthetic genome in history, that of the Phi X 174 virus. [7]

Gibson assembly method

Gibson Assembly Method. The blue arrows represent DNA cassettes, which could be any size, 6 kb each for example. The orange segments represent areas of identical DNA sequences. This process can be carried out with multiple initial cassettes. GAM illustrated by Nivin Nasri.png
Gibson Assembly Method. The blue arrows represent DNA cassettes, which could be any size, 6 kb each for example. The orange segments represent areas of identical DNA sequences. This process can be carried out with multiple initial cassettes.

The gibson assembly method, designed by Daniel Gibson during his time at the J. Craig Venter Institute, requires a set of double-stranded DNA cassettes that constitute the entire genome being synthesized. Note that cassettes differ from contigs by definition, in that these sequences contain regions of homology to other cassettes for the purposes of recombination. In contrast to Polymerase Cycling Assembly, Gibson Assembly is a single-step, isothermal reaction with larger sequence-length capacity; ergo, it is used in place of Polymerase Cycling Assembly for genomes larger than 6 kb.

A T5 exonuclease performs a chew-back reaction at the terminal segments, working in the 5' to 3' direction, thereby producing complementary overhangs. The overhangs hybridize to each other, a Phusion DNA polymerase fills in any missing nucleotides and the nicks are sealed with a ligase. However, the genomes capable of being synthesized using this method alone is limited because as DNA cassettes increase in length, they require propagation in vitro in order to continue hybridizing; accordingly, Gibson assembly is often used in conjunction with Transformation-Associated Recombination (see below) to synthesize genomes several hundred kilobases in size. [8]

Transformation-associated recombination

Gap Repair Cloning. The blue arrows represent DNA contigs. Segments of the same colour represent complementary or identical sequences. Specialized primers with extensions are used in a polymerase chain reaction to generate regions of homology at the terminal ends of the DNA contigs. GRC illustrated by Nivin Nasri.png
Gap Repair Cloning. The blue arrows represent DNA contigs. Segments of the same colour represent complementary or identical sequences. Specialized primers with extensions are used in a polymerase chain reaction to generate regions of homology at the terminal ends of the DNA contigs.

The goal of transformation-associated recombination (TAR) technology in synthetic genomics is to combine DNA contigs by means of homologous recombination performed by the Yeast Artificial Chromosome (YAC). Of importance is the CEN element within the YAC vector, which corresponds to the yeast centromere. This sequence gives the vector the ability to behave in a chromosomal manner, thereby allowing it to perform homologous recombination. [9]

Transformation-Associated Recombination. Cross over events occur between regions of homology across the cassettes and YAC vector, thereby connecting the smaller DNA sequences into one larger contig. TAR illustrated by Nivin Nasri (edited).png
Transformation-Associated Recombination. Cross over events occur between regions of homology across the cassettes and YAC vector, thereby connecting the smaller DNA sequences into one larger contig.

First, gap repair cloning is performed to generate regions of homology flanking the DNA contigs. Gap Repair Cloning is a particular form of the Polymerase Chain Reaction in which specialized primers with extensions beyond the sequence of the DNA target are utilized. [10] Then, the DNA cassettes are exposed to the YAC vector, which drives the process of homologous recombination, thereby connecting the DNA cassettes. Polymerase Cycling Assembly and TAR technology were used together to construct the 600 kb Mycoplasma genitalium genome in 2008, the first synthetic organism ever created. [11] Similar steps were taken in synthesizing the larger Mycoplasma mycoides genome a few years later. [12]

General creation of synthetic genomes

It is difficult to directly synthesize oligonucleotides larger than ~200 base pairs and maintain high fidelity. [13] Therefore, smaller oligonucleotides (around 5-20 base pairs) are combined to create genome-size oligonucleotides. Previous methods of stitching the smaller strands involved using T4 polynucleotide ligase. Modern techniques, like PCA/PCR based-methods have improved on this method, increasing speed and fidelity. To further increase fidelity, PCA-based methods can include an error-reversal step in which nucleases recognize and cut mismatched base pairs. [14] Recognition is possible because errors usually cause structural budges and abnormalities in the DNA. [15] Currently, a 4-Mb E. coli genome created in May 2019 holds the record for the largest synthetic genome size. [16]

See also

Related Research Articles

<span class="mw-page-title-main">Primer (molecular biology)</span> Short strand of RNA or DNA that serves as a starting point for DNA synthesis

A primer is a short single-stranded nucleic acid used by all living organisms in the initiation of DNA synthesis. A synthetic primer may also be referred to as an oligo, short for oligonucleotide. DNA polymerase enzymes are only capable of adding nucleotides to the 3’-end of an existing nucleic acid, requiring a primer be bound to the template before DNA polymerase can begin a complementary strand. DNA polymerase adds nucleotides after binding to the RNA primer and synthesizes the whole strand. Later, the RNA strands must be removed accurately and replace them with DNA nucleotides forming a gap region known as a nick that is filled in using an enzyme called ligase. The removal process of the RNA primer requires several enzymes, such as Fen1, Lig1, and others that work in coordination with DNA polymerase, to ensure the removal of the RNA nucleotides and the addition of DNA nucleotides. Living organisms use solely RNA primers, while laboratory techniques in biochemistry and molecular biology that require in vitro DNA synthesis usually use DNA primers, since they are more temperature stable. Primers can be designed in laboratory for specific reactions such as polymerase chain reaction (PCR). When designing PCR primers, there are specific measures that must be taken into consideration, like the melting temperature of the primers and the annealing temperature of the reaction itself. Moreover, the DNA binding sequence of the primer in vitro has to be specifically chosen, which is done using a method called basic local alignment search tool (BLAST) that scans the DNA and finds specific and unique regions for the primer to bind.

Protein engineering is the process of developing useful or valuable proteins through the design and production of unnatural polypeptides, often by altering amino acid sequences found in nature. It is a young discipline, with much research taking place into the understanding of protein folding and recognition for protein design principles. It has been used to improve the function of many enzymes for industrial catalysis. It is also a product and services market, with an estimated value of $168 billion by 2017.

<span class="mw-page-title-main">Yeast artificial chromosome</span> Genetically engineered chromosome derived from the DNA of yeast

Yeast artificial chromosomes (YACs) are genetically engineered chromosomes derived from the DNA of the yeast, Saccharomyces cerevisiae, which is then ligated into a bacterial plasmid. By inserting large fragments of DNA, from 100–1000 kb, the inserted sequences can be cloned and physically mapped using a process called chromosome walking. This is the process that was initially used for the Human Genome Project, however due to stability issues, YACs were abandoned for the use of Bacterial artificial chromosome

<span class="mw-page-title-main">DNA synthesis</span>

DNA synthesis is the natural or artificial creation of deoxyribonucleic acid (DNA) molecules. DNA is a macromolecule made up of nucleotide units, which are linked by covalent bonds and hydrogen bonds, in a repeating structure. DNA synthesis occurs when these nucleotide units are joined to form DNA; this can occur artificially or naturally. Nucleotide units are made up of a nitrogenous base, pentose sugar (deoxyribose) and phosphate group. Each unit is joined when a covalent bond forms between its phosphate group and the pentose sugar of the next nucleotide, forming a sugar-phosphate backbone. DNA is a complementary, double stranded structure as specific base pairing occurs naturally when hydrogen bonds form between the nucleotide bases.

Site-directed mutagenesis is a molecular biology method that is used to make specific and intentional mutating changes to the DNA sequence of a gene and any gene products. Also called site-specific mutagenesis or oligonucleotide-directed mutagenesis, it is used for investigating the structure and biological activity of DNA, RNA, and protein molecules, and for protein engineering.

A DNA construct is an artificially-designed segment of DNA borne on a vector that can be used to incorporate genetic material into a target tissue or cell. A DNA construct contains a DNA insert, called a transgene, delivered via a transformation vector which allows the insert sequence to be replicated and/or expressed in the target cell. This gene can be cloned from a naturally occurring gene, or synthetically constructed. The vector can be delivered using physical, chemical or viral methods. Typically, the vectors used in DNA constructs contain an origin of replication, a multiple cloning site, and a selectable marker. Certain vectors can carry additional regulatory elements based on the expression system involved.

A genomic library is a collection of overlapping DNA fragments that together make up the total genomic DNA of a single organism. The DNA is stored in a population of identical vectors, each containing a different insert of DNA. In order to construct a genomic library, the organism's DNA is extracted from cells and then digested with a restriction enzyme to cut the DNA into fragments of a specific size. The fragments are then inserted into the vector using DNA ligase. Next, the vector DNA can be taken up by a host organism - commonly a population of Escherichia coli or yeast - with each cell containing only one vector molecule. Using a host cell to carry the vector allows for easy amplification and retrieval of specific clones from the library for analysis.

Recombineering is a genetic and molecular biology technique based on homologous recombination systems, as opposed to the older/more common method of using restriction enzymes and ligases to combine DNA sequences in a specified order. Recombineering is widely used for bacterial genetics, in the generation of target vectors for making a conditional mouse knockout, and for modifying DNA of any source often contained on a bacterial artificial chromosome (BAC), among other applications.

Synthetic genomics is a nascent field of synthetic biology that uses aspects of genetic modification on pre-existing life forms, or artificial gene synthesis to create new DNA or entire lifeforms.

Polymerase cycling assembly is a method for the assembly of large DNA oligonucleotides from shorter fragments. The process uses the same technology as PCR, but takes advantage of DNA hybridization and annealing as well as DNA polymerase to amplify a complete sequence of DNA in a precise order based on the single stranded oligonucleotides used in the process. It thus allows for the production of synthetic genes and even entire synthetic genomes.

Artificial gene synthesis, or simply gene synthesis, refers to a group of methods that are used in synthetic biology to construct and assemble genes from nucleotides de novo. Unlike DNA synthesis in living cells, artificial gene synthesis does not require template DNA, allowing virtually any DNA sequence to be synthesized in the laboratory. It comprises two main steps, the first of which is solid-phase DNA synthesis, sometimes known as DNA printing. This produces oligonucleotide fragments that are generally under 200 base pairs. The second step then involves connecting these oligonucleotide fragments using various DNA assembly methods. Because artificial gene synthesis does not require template DNA, it is theoretically possible to make a completely synthetic DNA molecule with no limits on the nucleotide sequence or size.

<span class="mw-page-title-main">Functional cloning</span>

Functional cloning is a molecular cloning technique that relies on prior knowledge of the encoded protein’s sequence or function for gene identification. In this assay, a genomic or cDNA library is screened to identify the genetic sequence of a protein of interest. Expression cDNA libraries may be screened with antibodies specific for the protein of interest or may rely on selection via the protein function. Historically, the amino acid sequence of a protein was used to prepare degenerate oligonucleotides which were then probed against the library to identify the gene encoding the protein of interest. Once candidate clones carrying the gene of interest are identified, they are sequenced and their identity is confirmed. This method of cloning allows researchers to screen entire genomes without prior knowledge of the location of the gene or the genetic sequence.

Mycoplasma laboratorium or Synthia refers to a synthetic strain of bacterium. The project to build the new bacterium has evolved since its inception. Initially the goal was to identify a minimal set of genes that are required to sustain life from the genome of Mycoplasma genitalium, and rebuild these genes synthetically to create a "new" organism. Mycoplasma genitalium was originally chosen as the basis for this project because at the time it had the smallest number of genes of all organisms analyzed. Later, the focus switched to Mycoplasma mycoides and took a more trial-and-error approach.

The versatility of polymerase chain reaction (PCR) has led to modifications of the basic protocol being used in a large number of variant techniques designed for various purposes. This article summarizes many of the most common variations currently or formerly used in molecular biology laboratories; familiarity with the fundamental premise by which PCR works and corresponding terms and concepts is necessary for understanding these variant techniques.

Delitto perfetto is a genetic technique for in vivo site-directed mutagenesis in yeast. This name is the Italian term for "perfect murder", and it refers to the ability of the technique to create desired genetic changes without leaving any foreign DNA in the genome.

Gibson assembly is a molecular cloning method that allows for the joining of multiple DNA fragments in a single, isothermal reaction. It is named after its creator, Daniel G. Gibson, who is the chief technology officer and co-founder of the synthetic biology company, Telesis Bio.

<span class="mw-page-title-main">Illumina dye sequencing</span> DNA sequencing method

Illumina dye sequencing is a technique used to determine the series of base pairs in DNA, also known as DNA sequencing. The reversible terminated chemistry concept was invented by Bruno Canard and Simon Sarfati at the Pasteur Institute in Paris. It was developed by Shankar Balasubramanian and David Klenerman of Cambridge University, who subsequently founded Solexa, a company later acquired by Illumina. This sequencing method is based on reversible dye-terminators that enable the identification of single nucleotides as they are washed over DNA strands. It can also be used for whole-genome and region sequencing, transcriptome analysis, metagenomics, small RNA discovery, methylation profiling, and genome-wide protein-nucleic acid interaction analysis.

<span class="mw-page-title-main">In vitro recombination</span> Process of isolation and amplification of DNA segments

Recombinant DNA (rDNA), or molecular cloning, is the process by which a single gene, or segment of DNA, is isolated and amplified. Recombinant DNA is also known as in vitro recombination. A cloning vector is a DNA molecule that carries foreign DNA into a host cell, where it replicates, producing many copies of itself along with the foreign DNA. There are many types of cloning vectors such as plasmids and phages. In order to carry out recombination between vector and the foreign DNA, it is necessary the vector and DNA to be cloned by digestion, ligase the foreign DNA into the vector with the enzyme DNA ligase. And DNA is inserted by introducing the DNA into bacteria cells by transformation.

Clyde A. Hutchison III is an American biochemist and microbiologist notable for his research on site-directed mutagenesis and synthetic biology. He is Professor Emeritus of Microbiology and Immunology at the University of North Carolina at Chapel Hill, distinguished professor at the J Craig Venter Institute, a member of the National Academy of Sciences, and a fellow of the American Academy of Arts and Sciences.

No-SCAR genome editing is an editing method that is able to manipulate the Escherichia coli genome. The system relies on recombineering whereby DNA sequences are combined and manipulated through homologous recombination. No-SCAR is able to manipulate the E. coli genome without the use of the chromosomal markers detailed in previous recombineering methods. Instead, the λ-Red recombination system facilitates donor DNA integration while Cas9 cleaves double-stranded DNA to counter-select against wild-type cells. Although λ-Red and Cas9 genome editing are widely used technologies, the no-SCAR method is novel in combining the two functions; this technique is able to establish point mutations, gene deletions, and short sequence insertions in several genomic loci with increased efficiency and time sensitivity.

References

  1. Yong, Ed. "The Mysterious Thing About a Marvelous New Synthetic Cell". The Atlantic. Retrieved 2017-09-12.
  2. "Here's what we could really learn from a synthetic human genome". STAT. 2016-06-02. Retrieved 2017-09-12.
  3. "The synthetic human genome could be around the corner - ExtremeTech". ExtremeTech. 2016-05-19. Retrieved 2017-09-12.
  4. Montague, Michael G; Lartigue, Carole; Vashee, Sanjay (2012). "Synthetic genomics: potential and limitations". Current Opinion in Biotechnology. 23 (5): 659–665. doi:10.1016/j.copbio.2012.01.014. PMID   22342755.
  5. Gibson, Daniel (2011). Synthetic Biology, Part B: Computer Aided Design and DNA Assembly; Chapter Fifteen - Enzymatic Assembly of Overlapping DNA Fragments. Academic Press. pp. 349–361. ISBN   978-0-12-385120-8.
  6. Stemmer, Willem P. C.; Crameri, Andreas; Ha, Kim D.; Brennan, Thomas M.; Heyneker, Herbert L. (1995-10-16). "Single-step assembly of a gene and entire plasmid from large numbers of oligodeoxyribonucleotides". Gene. 164 (1): 49–53. doi:10.1016/0378-1119(95)00511-4. PMID   7590320.
  7. Smith, Hamilton O.; Hutchison, Clyde A.; Pfannkoch, Cynthia; Venter, J. Craig (2003-12-23). "Generating a synthetic genome by whole genome assembly: φX174 bacteriophage from synthetic oligonucleotides". Proceedings of the National Academy of Sciences. 100 (26): 15440–15445. Bibcode:2003PNAS..10015440S. doi: 10.1073/pnas.2237126100 . ISSN   0027-8424. PMC   307586 . PMID   14657399.
  8. Gibson, Daniel G; Young, Lei; Chuang, Ray-Yuan; Venter, J Craig; Hutchison, Clyde A; Smith, Hamilton O (2009-04-12). "Enzymatic assembly of DNA molecules up to several hundred kilobases". Nature Methods. 6 (5): 343–345. doi:10.1038/nmeth.1318. PMID   19363495. S2CID   1351008.
  9. Kouprina, Natalay; Larionov, Vladimir (2003-12-01). "Exploiting the yeast Saccharomyces cerevisiae for the study of the organization and evolution of complex genomes". FEMS Microbiology Reviews. 27 (5): 629–649. doi: 10.1016/S0168-6445(03)00070-6 . ISSN   1574-6976. PMID   14638416.
  10. Marsischky, Gerald; LaBaer, Joshua (2004-10-15). "Many Paths to Many Clones: A Comparative Look at High-Throughput Cloning Methods". Genome Research. 14 (10b): 2020–2028. doi: 10.1101/gr.2528804 . ISSN   1088-9051. PMID   15489321.
  11. Gibson, Daniel G.; Benders, Gwynedd A.; Andrews-Pfannkoch, Cynthia; Denisova, Evgeniya A.; Baden-Tillson, Holly; Zaveri, Jayshree; Stockwell, Timothy B.; Brownley, Anushka; Thomas, David W. (2008-02-29). "Complete Chemical Synthesis, Assembly, and Cloning of a Mycoplasma genitalium Genome". Science. 319 (5867): 1215–1220. Bibcode:2008Sci...319.1215G. doi:10.1126/science.1151721. ISSN   0036-8075. PMID   18218864. S2CID   8190996.
  12. Gibson, Daniel G.; Glass, John I.; Lartigue, Carole; Noskov, Vladimir N.; Chuang, Ray-Yuan; Algire, Mikkel A.; Benders, Gwynedd A.; Montague, Michael G.; Ma, Li (2010-07-02). "Creation of a Bacterial Cell Controlled by a Chemically Synthesized Genome". Science. 329 (5987): 52–56. Bibcode:2010Sci...329...52G. doi: 10.1126/science.1190719 . ISSN   0036-8075. PMID   20488990.
  13. Matsudaira, Paul (1989), Wittmann-Liebold, Brigitte (ed.), "Initial and Repetitive Yields from Proteins Blotted on PVDF Membranes", Methods in Protein Sequence Analysis: Proceedings of the 7th International Conference, Berlin, July 3–8, 1988, Berlin, Heidelberg: Springer, pp. 234–239, doi:10.1007/978-3-642-73834-0_30, ISBN   978-3-642-73834-0 , retrieved 2021-04-21
  14. Zhang, Weimin; Mitchell, Leslie A.; Bader, Joel S.; Boeke, Jef D. (2020-06-20). "Synthetic Genomes". Annual Review of Biochemistry. 89 (1): 77–101. doi:10.1146/annurev-biochem-013118-110704. ISSN   0066-4154. PMID   32569517. S2CID   219986041.
  15. Davis, Leonard G.; Dibner, Mark D.; Battey, James F. (1986-01-01). "Restriction Endonucleases (REs) and Their Use". Basic Methods in Molecular Biology. Elsevier. pp. 51–57. doi:10.1016/B978-0-444-01082-7.50021-7. ISBN   978-0-444-01082-7.
  16. Fredens, Julius; Wang, Kaihang; de la Torre, Daniel; Funke, Louise F. H.; Robertson, Wesley E.; Christova, Yonka; Chia, Tiongsun; Schmied, Wolfgang H.; Dunkelmann, Daniel L.; Beránek, Václav; Uttamapinant, Chayasith (May 2019). "Total synthesis of Escherichia coli with a recoded genome". Nature. 569 (7757): 514–518. Bibcode:2019Natur.569..514F. doi:10.1038/s41586-019-1192-5. ISSN   1476-4687. PMC   7039709 . PMID   31092918.