This article has multiple issues. Please help improve it or discuss these issues on the talk page . (Learn how and when to remove these template messages)
|
Optical mapping [1] is a technique for constructing ordered, genome-wide, high-resolution restriction maps from single, stained molecules of DNA, called "optical maps". By mapping the location of restriction enzyme sites along the unknown DNA of an organism, the spectrum of resulting DNA fragments collectively serves as a unique "fingerprint" or "barcode" for that sequence. Originally developed by Dr. David C. Schwartz and his lab at NYU in the 1990s [2] this method has since been integral to the assembly process of many large-scale sequencing projects for both microbial and eukaryotic genomes. Later technologies use DNA melting, [3] DNA competitive binding [4] or enzymatic labelling [5] [6] in order to create the optical mappings.
The modern optical mapping platform works as follows: [7]
DNA molecules were fixed on molten agarose developed between a cover slip and a microscope slide. Restriction enzyme was pre-mixed with the molten agarose before DNA placement and cleavage was triggered by addition of magnesium.
Rather than being immobilized within a gel matrix, DNA molecules were held in place by electrostatic interactions on a positively charged surface. Resolution improved such that fragments from ~30 kb to as small as 800 bp could sized.
This involved the development and integration of an automated spotting system to spot multiple single molecules on a slide (like a microarray) for parallel enzymatic processing, automated fluorescence microscopy for image acquisition, image procession vision to handle images, algorithms for optical map construction, cluster computing for processing large amounts of data
Observing that microarrays spotted with single molecules did not work well for large genomic DNA molecules, microfluidic devices using soft lithography possessing a series of parallel microchannels were developed.
An improvement on optical mapping, called "Nanocoding", [8] has potential to boost throughput by trapping elongated DNA molecules in nanoconfinements.
The advantage of OM over traditional mapping techniques is that it preserves the order of the DNA fragment, whereas the order needs to be reconstructed using restriction mapping. In addition, since maps are constructed directly from genomic DNA molecules, cloning or PCR artifacts are avoided. However, each OM process is still affected by false positive and negative sites because not all restriction sites are cleaved in each molecule and some sites may be incorrectly cut. In practice, multiple optical maps are created from molecules of the same genomic region, and an algorithm is used to determine the best consensus map. [9]
There are a variety of approaches to identifying large-scale genomic variations (such as indels, duplications, inversions, translocations) between genomes. Other categories of methods include using microarrays, pulsed-field gel electrophoresis, cytogenetics and paired-end tags.
Initially, the optical mapping system has been used to construct whole-genome restriction maps of bacteria, parasites, and fungi. [10] [11] [12] It has also been used to scaffold and validate bacterial genomes. [13] To serve as scaffolds for assembly, assembled sequence contigs can be scanned for restriction sites in silico using known sequence data and aligning them to the assembled genomic optical map. Commercial company, Opgen has provided optical mappings for microbial genomes. For larger eukaryotic genomes, only the David C. Schwartz lab (now at Madison-Wisconsin) has produced optical maps for mouse, [14] human, [15] rice, [16] and maize. [17]
Optical sequencing is a single molecule DNA sequencing technique that follows sequence-by-synthesis and uses optical mapping technology. [18] [19] Similar to other single molecular sequencing approaches such as SMRT sequencing, this technique analyzes a single DNA molecule, rather than amplify the initial sample and sequence multiple copies of the DNA. During synthesis, fluorochrome-labeled nucleotides are incorporated through the use of DNA polymerases and tracked by fluorescence microscopy. This technique was originally proposed by David C. Schwartz and Arvind Ramanathan in 2003.
The following is an overview of each cycle in the optical sequencing process. [20]
Step 1: DNA barcoding
Cells are lysed to release genomic DNA. These DNA molecules are untangled, placed onto optical mapping surface containing microfluidic channels and the DNA is allowed to flow through the channels. These molecules are then barcoded by restriction enzymes to allow for genomic localization through the technique of optical mapping. See the above section on "Technology" for those steps.
Step 2: Template nicking
DNase I is added to randomly nick the mounted DNA molecules. A wash is then performed to remove the DNase I. The mean number of nicks that occur per template is dependent on the concentration of DNase I as well as the incubation time.
Step 3: Gap formation
T7 exonuclease is added which uses the nicks in the DNA molecules to expand the gaps in a 5'–3' direction. Amount of T7 exonuclease must be carefully controlled to avoid overly high levels of double-stranded breaks.
Step 4: Fluorochrome incorporation
DNA polymerase is used to incorporate fluorochrome-labelled nucleotides (FdNTPs) into the multiple gapped sites along each DNA molecule. During each cycle, the reaction mixture contains a single type of FdNTP and allows for multiple additions of that nucleotide type. Various washes are then performed to remove unincorporated fdNTPs in preparation for imaging and the next cycle of FdNTP addition.
Step 5: Imaging
This step counts the number of incorporated fluorochrome-labeled nucleotides at the gap regions using fluorescence microscopy.
Step 6: Photobleaching
The laser illumination that is used to excite the fluorochrome is also used here to destroy the fluorochrome signal. This essentially resets the fluorochrome counter, and prepares the counter for the next cycle. This step is a unique aspect of optical sequencing as it does not actually remove the fluorochrome label of the nucleotide after its incorporation. not removing the fluorochrome label makes sequencing more economical, but it results in the need to incorporate fluorochrome labels consecutively which can result in problems due to the bulkiness of the labels.
Step 7: Repeat steps 4–6
Steps 4-6 are repeated with step 4 using a reaction mixture that contains a different fluorochrome-labeled nucleotide (FdNTP) each time. This is repeated until the desired region is sequenced.
Selection of an appropriate DNA polymerase is critical to the efficiency of the base addition step and must meet several criteria:
In addition, different polymerase preference for different fluorochromes, linker length on fluorochrome-nucleotides, and buffer compositions are also important factors to be considered to optimize the base addition process and maximize number of consecutive FdNTP incorporations.
Single-molecule analysis
Since minimal DNA sample required, time-consuming and costly amplification step is avoided to streamline sample preparation process.
Large DNA molecule templates (~500 kb) vs. Short DNA molecule templates (< 1kb) While most next generation sequencing technologies aim of massive amounts of smalls sequence reads, these small sequence reads make de novo sequencing efforts and genome repeat regions difficult to comprehend. Optical sequencing uses large DNA molecule templates (~500 kb) for sequencing and these offer several advantages over small templates:
The polymerase chain reaction (PCR) is a method widely used to make millions to billions of copies of a specific DNA sample rapidly, allowing scientists to amplify a very small sample of DNA sufficiently to enable detailed study. PCR was invented in 1983 by American biochemist Kary Mullis at Cetus Corporation. Mullis and biochemist Michael Smith, who had developed other essential ways of manipulating DNA, were jointly awarded the Nobel Prize in Chemistry in 1993.
In genetics and biochemistry, sequencing means to determine the primary structure of an unbranched biopolymer. Sequencing results in a symbolic linear depiction known as a sequence which succinctly summarizes much of the atomic-level structure of the sequenced molecule.
Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dimensional structural configuration. In contrast to genetics, which refers to the study of individual genes and their roles in inheritance, genomics aims at the collective characterization and quantification of all of an organism's genes, their interrelations and influence on the organism. Genes may direct the production of proteins with the assistance of enzymes and messenger molecules. In turn, proteins make up body structures such as organs and tissues as well as control chemical reactions and carry signals between cells. Genomics also involves the sequencing and analysis of genomes through uses of high throughput DNA sequencing and bioinformatics to assemble and analyze the function and structure of entire genomes. Advances in genomics have triggered a revolution in discovery-based research and systems biology to facilitate understanding of even the most complex biological systems such as the brain.
Pyrosequencing is a method of DNA sequencing based on the "sequencing by synthesis" principle, in which the sequencing is performed by detecting the nucleotide incorporated by a DNA polymerase. Pyrosequencing relies on light detection based on a chain reaction when pyrophosphate is released. Hence, the name pyrosequencing.
DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. The advent of rapid DNA sequencing methods has greatly accelerated biological and medical research and discovery.
Sanger sequencing is a method of DNA sequencing that involves electrophoresis and is based on the random incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication. After first being developed by Frederick Sanger and colleagues in 1977, it became the most widely used sequencing method for approximately 40 years. It was first commercialized by Applied Biosystems in 1986. More recently, higher volume Sanger sequencing has been replaced by next generation sequencing methods, especially for large-scale, automated genome analyses. However, the Sanger method remains in wide use for smaller-scale projects and for validation of deep sequencing results. It still has the advantage over short-read sequencing technologies in that it can produce DNA sequence reads of > 500 nucleotides and maintains a very low error rate with accuracies around 99.99%. Sanger sequencing is still actively being used in efforts for public health initiatives such as sequencing the spike protein from SARS-CoV-2 as well as for the surveillance of norovirus outbreaks through the Center for Disease Control and Prevention's (CDC) CaliciNet surveillance network.
Gene mapping or genome mapping describes the methods used to identify the location of a gene on a chromosome and the distances between genes. Gene mapping can also describe the distances between different sites within a gene.
SNP genotyping is the measurement of genetic variations of single nucleotide polymorphisms (SNPs) between members of a species. It is a form of genotyping, which is the measurement of more general genetic variation. SNPs are one of the most common types of genetic variation. An SNP is a single base pair mutation at a specific locus, usually consisting of two alleles. SNPs are found to be involved in the etiology of many human diseases and are becoming of particular interest in pharmacogenetics. Because SNPs are conserved during evolution, they have been proposed as markers for use in quantitative trait loci (QTL) analysis and in association studies in place of microsatellites. The use of SNPs is being extended in the HapMap project, which aims to provide the minimal set of SNPs needed to genotype the human genome. SNPs can also provide a genetic fingerprint for use in identity testing. The increase of interest in SNPs has been reflected by the furious development of a diverse range of SNP genotyping methods.
Bisulfitesequencing (also known as bisulphite sequencing) is the use of bisulfite treatment of DNA before routine sequencing to determine the pattern of methylation. DNA methylation was the first discovered epigenetic mark, and remains the most studied. In animals it predominantly involves the addition of a methyl group to the carbon-5 position of cytosine residues of the dinucleotide CpG, and is implicated in repression of transcriptional activity.
2 Base Encoding, also called SOLiD, is a next-generation sequencing technology developed by Applied Biosystems and has been commercially available since 2008. These technologies generate hundreds of thousands of small sequence reads at one time. Well-known examples of such DNA sequencing methods include 454 pyrosequencing, the Solexa system and the SOLiD system. These methods have reduced the cost from $0.01/base in 2004 to nearly $0.0001/base in 2006 and increased the sequencing capacity from 1,000,000 bases/machine/day in 2004 to more than 100,000,000 bases/machine/day in 2006.
Single-molecule real-time (SMRT) sequencing is a parallelized single molecule DNA sequencing method. Single-molecule real-time sequencing utilizes a zero-mode waveguide (ZMW). A single DNA polymerase enzyme is affixed at the bottom of a ZMW with a single molecule of DNA as a template. The ZMW is a structure that creates an illuminated observation volume that is small enough to observe only a single nucleotide of DNA being incorporated by DNA polymerase. Each of the four DNA bases is attached to one of four different fluorescent dyes. When a nucleotide is incorporated by the DNA polymerase, the fluorescent tag is cleaved off and diffuses out of the observation area of the ZMW where its fluorescence is no longer observable. A detector detects the fluorescent signal of the nucleotide incorporation, and the base call is made according to the corresponding fluorescence of the dye.
Epigenomics is the study of the complete set of epigenetic modifications on the genetic material of a cell, known as the epigenome. The field is analogous to genomics and proteomics, which are the study of the genome and proteome of a cell. Epigenetic modifications are reversible modifications on a cell's DNA or histones that affect gene expression without altering the DNA sequence. Epigenomic maintenance is a continuous process and plays an important role in stability of eukaryotic genomes by taking part in crucial biological mechanisms like DNA repair. Plant flavones are said to be inhibiting epigenomic marks that cause cancers. Two of the most characterized epigenetic modifications are DNA methylation and histone modification. Epigenetic modifications play an important role in gene expression and regulation, and are involved in numerous cellular processes such as in differentiation/development and tumorigenesis. The study of epigenetics on a global level has been made possible only recently through the adaptation of genomic high-throughput assays.
Cap analysis of gene expression (CAGE) is a gene expression technique used in molecular biology to produce a snapshot of the 5′ end of the messenger RNA population in a biological sample. The small fragments from the very beginnings of mRNAs are extracted, reverse-transcribed to cDNA, PCR amplified and sequenced. CAGE was first published by Hayashizaki, Carninci and co-workers in 2003. CAGE has been extensively used within the FANTOM research projects.
Molecular Inversion Probe (MIP) belongs to the class of Capture by Circularization molecular techniques for performing genomic partitioning, a process through which one captures and enriches specific regions of the genome. Probes used in this technique are single stranded DNA molecules and, similar to other genomic partitioning techniques, contain sequences that are complementary to the target in the genome; these probes hybridize to and capture the genomic target. MIP stands unique from other genomic partitioning strategies in that MIP probes share the common design of two genomic target complementary segments separated by a linker region. With this design, when the probe hybridizes to the target, it undergoes an inversion in configuration and circularizes. Specifically, the two target complementary regions at the 5’ and 3’ ends of the probe become adjacent to one another while the internal linker region forms a free hanging loop. The technology has been used extensively in the HapMap project for large-scale SNP genotyping as well as for studying gene copy alterations and characteristics of specific genomic loci to identify biomarkers for different diseases such as cancer. Key strengths of the MIP technology include its high specificity to the target and its scalability for high-throughput, multiplexed analyses where tens of thousands of genomic loci are assayed simultaneously.
Ion semiconductor sequencing is a method of DNA sequencing based on the detection of hydrogen ions that are released during the polymerization of DNA. This is a method of "sequencing by synthesis", during which a complementary strand is built based on the sequence of a template strand.
DNA nanoball sequencing is a high throughput sequencing technology that is used to determine the entire genomic sequence of an organism. The method uses rolling circle replication to amplify small fragments of genomic DNA into DNA nanoballs. Fluorescent nucleotides bind to complementary nucleotides and are then polymerized to anchor sequences bound to known sequences on the DNA template. The base order is determined via the fluorescence of the bound nucleotides This DNA sequencing method allows large numbers of DNA nanoballs to be sequenced per run at lower reagent costs compared to other next generation sequencing platforms. However, a limitation of this method is that it generates only short sequences of DNA, which presents challenges to mapping its reads to a reference genome. After purchasing Complete Genomics, the Beijing Genomics Institute (BGI) refined DNA nanoball sequencing to sequence nucleotide samples on their own platform.
Massive parallel sequencing or massively parallel sequencing is any of several high-throughput approaches to DNA sequencing using the concept of massively parallel processing; it is also called next-generation sequencing (NGS) or second-generation sequencing. Some of these technologies emerged between 1993 and 1998 and have been commercially available since 2005. These technologies use miniaturized and parallelized platforms for sequencing of 1 million to 43 billion short reads per instrument run.
Illumina dye sequencing is a technique used to determine the series of base pairs in DNA, also known as DNA sequencing. The reversible terminated chemistry concept was invented by Bruno Canard and Simon Sarfati at the Pasteur Institute in Paris. It was developed by Shankar Balasubramanian and David Klenerman of Cambridge University, who subsequently founded Solexa, a company later acquired by Illumina. This sequencing method is based on reversible dye-terminators that enable the identification of single nucleotides as they are washed over DNA strands. It can also be used for whole-genome and region sequencing, transcriptome analysis, metagenomics, small RNA discovery, methylation profiling, and genome-wide protein-nucleic acid interaction analysis.
Denaturation Mapping is a form of optical mapping, first described in 1966. It is used to characterize DNA molecules without the need for amplification or sequencing. It is based on the differences between the melting temperatures of AT-rich and GC-rich regions. Even though modern sequencing methods reduced the need for denaturation mapping, it is still being used for specific purposes, such as detection of large scale structural variants.
BLESS, also known as breaks labeling, enrichment on streptavidin and next-generation sequencing, is a method used to detect genome-wide double-strand DNA damage. In contrast to chromatin immunoprecipitation (ChIP)-based methods of identifying DNA double-strand breaks (DSBs) by labeling DNA repair proteins, BLESS utilizes biotinylated DNA linkers to directly label genomic DNA in situ which allows for high-specificity enrichment of samples on streptavidin beads and the subsequent sequencing-based DSB mapping to nucleotide resolution.