![]() DNA sequencers | |
Manufacturers | Roche, Illumina, Life Technologies, Beckman Coulter, Pacific Biosciences, MGI/BGI, Oxford Nanopore Technologies |
---|
A DNA sequencer is a scientific instrument used to automate the DNA sequencing process. Given a sample of DNA, a DNA sequencer is used to determine the order of the four bases: G (guanine), C (cytosine), A (adenine) and T (thymine). This is then reported as a text string, called a read. Some DNA sequencers can be also considered optical instruments as they analyze light signals originating from fluorochromes attached to nucleotides.
The first automated DNA sequencer, invented by Lloyd M. Smith, was introduced by Applied Biosystems in 1987. [1] It used the Sanger sequencing method, a technology which formed the basis of the "first generation" of DNA sequencers [2] [3] and enabled the completion of the human genome project in 2001. [4] This first generation of DNA sequencers are essentially automated electrophoresis systems that detect the migration of labelled DNA fragments. Therefore, these sequencers can also be used in the genotyping of genetic markers where only the length of a DNA fragment(s) needs to be determined (e.g. microsatellites, AFLPs).
The Human Genome Project spurred the development of cheaper, high throughput and more accurate platforms known as Next Generation Sequencers (NGS) to sequence the human genome. These include the 454, SOLiD and Illumina DNA sequencing platforms. Next generation sequencing machines have increased the rate of DNA sequencing substantially, as compared with the previous Sanger methods. DNA samples can be prepared automatically in as little as 90 mins, [5] while a human genome can be sequenced at 15 times coverage in a matter of days. [6]
More recent, third-generation DNA sequencers such as PacBio SMRT and Oxford Nanopore offer the possibility of sequencing long molecules, compared to short-read technologies such as Illumina SBS or MGI Tech's DNBSEQ.
Because of limitations in DNA sequencer technology, the reads of many of these technologies are short, compared to the length of a genome therefore the reads must be assembled into longer contigs. [7] The data may also contain errors, caused by limitations in the DNA sequencing technique or by errors during PCR amplification. DNA sequencer manufacturers use a number of different methods to detect which DNA bases are present. The specific protocols applied in different sequencing platforms have an impact in the final data that is generated. Therefore, comparing data quality and cost across different technologies can be a daunting task. Each manufacturer provides their own ways to inform sequencing errors and scores. However, errors and scores between different platforms cannot always be compared directly. Since these systems rely on different DNA sequencing approaches, choosing the best DNA sequencer and method will typically depend on the experiment objectives and available budget. [2]
The first DNA sequencing methods were developed by Gilbert (1973) [8] and Sanger (1975). [9] Gilbert introduced a sequencing method based on chemical modification of DNA followed by cleavage at specific bases whereas Sanger's technique is based on dideoxynucleotide chain termination. The Sanger method became popular due to its increased efficiency and low radioactivity. The first automated DNA sequencer was the AB370A, introduced in 1986 by Applied Biosystems. The AB370A was able to sequence 96 samples simultaneously, 500 kilobases per day, and reaching read lengths up to 600 bases. This was the beginning of the "first generation" of DNA sequencers, [2] [3] which implemented Sanger sequencing, fluorescent dideoxy nucleotides and polyacrylamide gel sandwiched between glass plates - slab gels. The next major advance was the release in 1995 of the AB310 which utilized a linear polymer in a capillary in place of the slab gel for DNA strand separation by electrophoresis. These techniques formed the base for the completion of the human genome project in 2001. [4] The human genome project spurred the development of cheaper, high throughput and more accurate platforms known as Next Generation Sequencers (NGS). In 2005, 454 Life Sciences released the 454 sequencer, followed by Solexa Genome Analyzer and SOLiD (Supported Oligo Ligation Detection) by Agencourt in 2006. Applied Biosystems acquired Agencourt in 2006, and in 2007, Roche bought 454 Life Sciences, while Illumina purchased Solexa. Ion Torrent entered the market in 2010 and was acquired by Life Technologies (now Thermo Fisher Scientific). And BGI started manufacturing sequencers in China after acquiring Complete Genomics under their MGI arm. These are still the most common NGS systems due to their competitive cost, accuracy, and performance.
More recently, a third generation of DNA sequencers was introduced. The sequencing methods applied by these sequencers do not require DNA amplification (polymerase chain reaction – PCR), which speeds up the sample preparation before sequencing and reduces errors. In addition, sequencing data is collected from the reactions caused by the addition of nucleotides in the complementary strand in real time. Two companies introduced different approaches in their third-generation sequencers. Pacific Biosciences sequencers utilize a method called Single-molecule real-time (SMRT), where sequencing data is produced by light (captured by a camera) emitted when a nucleotide is added to the complementary strand by enzymes containing fluorescent dyes. Oxford Nanopore Technologies is another company developing third-generation sequencers using electronic systems based on nanopore sensing technologies.
DNA sequencers have been developed, manufactured, and sold by the following companies, among others.
The 454 DNA sequencer was the first next-generation sequencer to become commercially successful. [10] It was developed by 454 Life Sciences and purchased by Roche in 2007. 454 utilizes the detection of pyrophosphate released by the DNA polymerase reaction when adding a nucleotide to the template strain.
Roche currently manufactures two systems based on their pyrosequencing technology: the GS FLX+ and the GS Junior System. [11] The GS FLX+ System promises read lengths of approximately 1000 base pairs while the GS Junior System promises 400 base pair reads. [12] [13] A predecessor to GS FLX+, the 454 GS FLX Titanium system was released in 2008, achieving an output of 0.7G of data per run, with 99.9% accuracy after quality filter, and a read length of up to 700bp. In 2009, Roche launched the GS Junior, a bench top version of the 454 sequencer with read length up to 400bp, and simplified library preparation and data processing.
One of the advantages of 454 systems is their running speed. Manpower can be reduced with automation of library preparation and semi-automation of emulsion PCR. A disadvantage of the 454 system is that it is prone to errors when estimating the number of bases in a long string of identical nucleotides. This is referred to as a homopolymer error and occurs when there are 6 or more identical bases in row. [14] Another disadvantage is that the price of reagents is relatively more expensive compared with other next-generation sequencers.
In 2013 Roche announced that they would be shutting down development of 454 technology and phasing out 454 machines completely in 2016 when its technology became noncompetitive. [15] [16]
Roche produces a number of software tools which are optimised for the analysis of 454 sequencing data. [17] Such as,
Illumina produces a number of next-generation sequencing machines using technology acquired from Manteia Predictive Medicine and developed by Solexa. [19] Illumina makes a number of next generation sequencing machines using this technology including the HiSeq, Genome Analyzer IIx, MiSeq and the HiScanSQ, which can also process microarrays. [20]
The technology leading to these DNA sequencers was first released by Solexa in 2006 as the Genome Analyzer. [10] Illumina purchased Solexa in 2007. The Genome Analyzer uses a sequencing by synthesis method. The first model produced 1G per run. During the year 2009 the output was increased from 20G per run in August to 50G per run in December. In 2010 Illumina released the HiSeq 2000 with an output of 200 and then 600G per run which would take 8 days. At its release the HiSeq 2000 provided one of the cheapest sequencing platforms at $0.02 per million bases as costed by the Beijing Genomics Institute.
In 2011 Illumina released a benchtop sequencer called the MiSeq. At its release the MiSeq could generate 1.5G per run with paired end 150bp reads. A sequencing run can be performed in 10 hours when using automated DNA sample preparation. [10]
The Illumina HiSeq uses two software tools to calculate the number and position of DNA clusters to assess the sequencing quality: the HiSeq control system and the real-time analyzer. These methods help to assess if nearby clusters are interfering with each other. [10]
Life Technologies (now Thermo Fisher Scientific) produces DNA sequencers under the Applied Biosystems and Ion Torrent brands. Applied Biosystems makes the SOLiD next-generation sequencing platform, [21] and Sanger-based DNA sequencers such as the 3500 Genetic Analyzer. [22] Under the Ion Torrent brand, Applied Biosystems produces four next-generation sequencers: the Ion PGM System, Ion Proton System, Ion S5 and Ion S5xl systems. [23] The company is also believed to be developing their new capillary DNA sequencer called SeqStudio that will be released early 2018. [24]
SOLiD systems was acquired by Applied Biosystems in 2006. SOLiD applies sequencing by ligation and dual base encoding. The first SOLiD system was launched in 2007, generating reading lengths of 35bp and 3G data per run. After five upgrades, the 5500xl sequencing system was released in 2010, considerably increasing read length to 85bp, improving accuracy up to 99.99% and producing 30G per 7-day run. [10]
The limited read length of the SOLiD has remained a significant shortcoming [25] and has to some extent limited its use to experiments where read length is less vital such as resequencing and transcriptome analysis and more recently ChIP-Seq and methylation experiments. [10] The DNA sample preparation time for SOLiD systems has become much quicker with the automation of sequencing library preparations such as the Tecan system. [10]
The colour space data produced by the SOLiD platform can be decoded into DNA bases for further analysis, however software that considers the original colour space information can give more accurate results. Life Technologies has released BioScope, [26] a data analysis package for resequencing, ChiP-Seq and transcriptome analysis. It uses the MaxMapper algorithm to map the colour space reads.
Beckman Coulter (now Danaher) has previously manufactured chain termination and capillary electrophoresis-based DNA sequencers under the model name CEQ, including the CEQ 8000. [27] The company now produces the GeXP Genetic Analysis System, which uses dye terminator sequencing. [28] This method uses a thermocycler in much the same way as PCR to denature, anneal, and extend DNA fragments, amplifying the sequenced fragments. [29] [30]
Pacific Biosciences produces the PacBio RS and Sequel sequencing systems using a single molecule real time sequencing, or SMRT, method. [31] This system can produce read lengths of multiple thousands of base pairs. Higher raw read errors are corrected using either circular consensus - where the same strand is read over and over again - or using optimized assembly strategies. [32] Scientists have reported 99.9999% accuracy with these strategies. [33] The Sequel system was launched in 2015 with an increased capacity and a lower price. [34] [35]
Oxford Nanopore Technologies' MinION sequencer is based on evolving nanopore sequencing technology to nucleic acid analyses. [37] The device is four inches long and gets power from a USB port. MinION decodes DNA directly as the molecule is drawn at the rate of 450 bases/second through a nanopore suspended in a membrane. [38] Changes in electric current indicate which base is present. Initially, the device was 60 to 85 percent accurate, compared with 99.9 percent in conventional machines. [39] Even inaccurate results may prove useful because it produces long read lengths. [40] In early 2021, researchers from University of British Columbia has used special molecular tags and able to reduce the five-to-15 per cent error rate of the device to less than 0.005 per cent even when sequencing many long stretches of DNA at a time. [41] There are two more product iterations based on MinION; the first one is the GridION which is a slightly larger sequencer that processes up to five MinION flow cells at once. And, the second one is the PromethION which uses as many as 100,000 pores in parallel, more suitable for high volume sequencing. [42]
MGI produces high-throughput sequencers for scientific research and clinical applications such as DNBSEQ-G50, DNBSEQ-G400, and DNBSEQ-T7, under a proprietary DNBSEQ technology. [43] It is based upon DNA nanoball sequencing and combinatorial probe anchor synthesis technologies, in which DNA nanoballs (DNBs) are loaded onto a patterned array chip via the fluidic system, and later a sequencing primer is added to the adaptor region of DNBs for hybridization. DNBSEQ-T7 can generate short reads at a very large scale—up to 60 human genomes per day. [44] DNBSEQ-T7 was used to generate 150 bp paired-end reads, sequencing 30X, to sequence the genome of SARS-CoV-2 or COVID-19 to identify the genetic variants predisposition in severe COVID-19 illness. [45] Using a novel technique the researchers from China National GeneBank sequenced PCR-free libraries on MGI's PCR-free DNBSEQ arrays to obtain for the first time a true PCR-free whole genome sequencing. [46] MGISEQ-2000 was used in single-cell RNA sequencing to study the underlying pathogenesis and recovery in COVID-19 patients, as published in Nature Medicine. [47]
Current offerings in DNA sequencing technology show a dominant player: Illumina (December 2019), followed by PacBio, MGI and Oxford Nanopore.
Sequencer | Ion Torrent PGM [5] [49] [50] | 454 GS FLX [10] | HiSeq 2000 [5] [10] | SOLiDv4 [10] | PacBio [5] [51] | Sanger 3730xl [10] | MGI DNBSEQ-G400 [52] |
---|---|---|---|---|---|---|---|
Manufacturer | Ion Torrent (Life Technologies) | 454 Life Sciences (Roche) | Illumina | Applied Biosystems (Life Technologies) | Pacific Biosciences | Applied Biosystems (Life Technologies) | MGI |
Sequencing Chemistry | Ion semiconductor sequencing | Pyrosequencing | Polymerase-based sequence-by-synthesis | Ligation-based sequencing | Phospholinked fluorescent nucleotides | Dideoxy chain termination | Polymerase-based sequence-by-synthesis |
Amplification approach | Emulsion PCR | Emulsion PCR | Bridge amplification | Emulsion PCR | Single-molecule; no amplification | PCR | DNA nanoball (DNB) generation |
Data output per run | 100-200 Mb | 0.7 Gb | 600 Gb | 120 Gb | 0.5 - 1.0 Gb | 1.9~84 Kb | 1440 Gb / 1500-1800M reads |
Accuracy | 99% | 99.9% | 99.9% | 99.94% | 88.0% (>99.9999% CCS or HGAP) | 99.999% | 99.90% |
Time per run | 2 hours | 24 hours | 3–10 days | 7–14 days | 2–4 hours | 20 minutes - 3 hours | 3–5 days |
Read length | 200-400 bp | 700 bp | 100x100 bp paired end | 50x50 bp paired end | 14,000 bp (N50) | 400-900 bp | 100/150/200 bp paired end |
Cost per run | US$350 | US$7,000 | US$6,000 (30x human genome) | US$4,000 | $125–300 USD | US$4 (single read/reaction) | N/A |
Cost per Mb | US$1.00 | US$10 | US$0.07 | US$0.13 | $0.13 - US$0.60 | US$2400 | $0.007 |
Cost per instrument | US$80,000 | US$500,000 | US$690,000 | US$495,000 | US$695,000 | US$95,000 | N/A |
In bioinformatics, sequence assembly refers to aligning and merging fragments from a longer DNA sequence in order to reconstruct the original sequence. This is needed as DNA sequencing technology might not be able to 'read' whole genomes in one go, but rather reads small pieces of between 20 and 30,000 bases, depending on the technology used. Typically, the short fragments (reads) result from shotgun sequencing genomic DNA, or gene transcript (ESTs).
DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. The advent of rapid DNA sequencing methods has greatly accelerated biological and medical research and discovery.
Metagenomics is the study of genetic material recovered directly from environmental or clinical samples by a method called sequencing. The broad field may also be referred to as environmental genomics, ecogenomics, community genomics or microbiomics.
454 Life Sciences was a biotechnology company based in Branford, Connecticut that specialized in high-throughput DNA sequencing. It was acquired by Roche in 2007 and shut down by Roche in 2013 when its technology became noncompetitive, although production continued until mid-2016.
Illumina, Inc. is an American biotechnology company, headquartered in San Diego, California, and it serves more than 155 countries. Incorporated on April 1, 1998, Illumina develops, manufactures, and markets integrated systems for the analysis of genetic variation and biological function. The company provides a line of products and services that serves the sequencing, genotyping and gene expression, and proteomics markets.
SOLiD (Sequencing by Oligonucleotide Ligation and Detection) is a next-generation DNA sequencing technology developed by Life Technologies and has been commercially available since 2006. This next generation technology generates 108 - 109 small sequence reads at one time. It uses 2 base encoding to decode the raw data generated by the sequencing platform into sequence data.
Methylated DNA immunoprecipitation is a large-scale purification technique in molecular biology that is used to enrich for methylated DNA sequences. It consists of isolating methylated DNA fragments via an antibody raised against 5-methylcytosine (5mC). This technique was first described by Weber M. et al. in 2005 and has helped pave the way for viable methylome-level assessment efforts, as the purified fraction of methylated DNA can be input to high-throughput DNA detection methods such as high-resolution DNA microarrays (MeDIP-chip) or next-generation sequencing (MeDIP-seq). Nonetheless, understanding of the methylome remains rudimentary; its study is complicated by the fact that, like other epigenetic properties, patterns vary from cell-type to cell-type.
Cap analysis of gene expression (CAGE) is a gene expression technique used in molecular biology to produce a snapshot of the 5′ end of the messenger RNA population in a biological sample. The small fragments from the very beginnings of mRNAs are extracted, reverse-transcribed to cDNA, PCR amplified and sequenced. CAGE was first published by Hayashizaki, Carninci and co-workers in 2003. CAGE has been extensively used within the FANTOM research projects.
In bioinformatics, hybrid genome assembly refers to utilizing various sequencing technologies to achieve the task of assembling a genome from fragmented, sequenced DNA resulting from shotgun sequencing. Genome assembly presents one of the most challenging tasks in genome sequencing as most modern DNA sequencing technologies can only produce reads that are, on average, 25-300 base pairs in length. This is orders of magnitude smaller than the average size of a genome. This assembly is computationally difficult and has some inherent challenges, one of these challenges being that genomes often contain complex tandem repeats of sequences that can be thousands of base pairs in length. These repeats can be long enough that second generation sequencing reads are not long enough to bridge the repeat, and, as such, determining the location of each repeat in the genome can be difficult. Resolving these tandem repeats can be accomplished by utilizing long third generation sequencing reads, such as those obtained using the PacBio RS DNA sequencer. These sequences are, on average, 10,000-15,000 base pairs in length and are long enough to span most repeated regions. Using a hybrid approach to this process can increase the fidelity of assembling tandem repeats by being able to accurately place them along a linear scaffold and make the process more computationally efficient.
Massive parallel sequencing or massively parallel sequencing is any of several high-throughput approaches to DNA sequencing using the concept of massively parallel processing; it is also called next-generation sequencing (NGS) or second-generation sequencing. Some of these technologies emerged between 1993 and 1998 and have been commercially available since 2005. These technologies use miniaturized and parallelized platforms for sequencing of 1 million to 43 billion short reads per instrument run.
The $1,000 genome refers to an era of predictive and personalized medicine during which the cost of fully sequencing an individual's genome (WGS) is roughly one thousand USD. It is also the title of a book by British science writer and founding editor of Nature Genetics, Kevin Davies. By late 2015, the cost to generate a high-quality "draft" whole human genome sequence was just below $1,500.
Illumina dye sequencing is a technique used to determine the series of base pairs in DNA, also known as DNA sequencing. The reversible terminated chemistry concept was invented by Bruno Canard and Simon Sarfati at the Pasteur Institute in Paris. It was developed by Shankar Balasubramanian and David Klenerman of Cambridge University, who subsequently founded Solexa, a company later acquired by Illumina. This sequencing method is based on reversible dye-terminators that enable the identification of single nucleotides as they are washed over DNA strands. It can also be used for whole-genome and region sequencing, transcriptome analysis, metagenomics, small RNA discovery, methylation profiling, and genome-wide protein-nucleic acid interaction analysis.
In DNA sequencing, a read is an inferred sequence of base pairs corresponding to all or part of a single DNA fragment. A typical sequencing experiment involves fragmentation of the genome into millions of molecules, which are size-selected and ligated to adapters. The set of fragments is referred to as a sequencing library, which is sequenced to produce a set of reads.
Reduced representation bisulfite sequencing (RRBS) is an efficient and high-throughput technique for analyzing the genome-wide methylation profiles on a single nucleotide level. It combines restriction enzymes and bisulfite sequencing to enrich for areas of the genome with a high CpG content. Due to the high cost and depth of sequencing to analyze methylation status in the entire genome, Meissner et al. developed this technique in 2005 to reduce the amount of nucleotides required to sequence to 1% of the genome. The fragments that comprise the reduced genome still include the majority of promoters, as well as regions such as repeated sequences that are difficult to profile using conventional bisulfite sequencing approaches.
Sir David Klenerman is a British biophysical chemist and a professor of biophysical chemistry at the Department of Chemistry at the University of Cambridge and a Fellow of Christ's College, Cambridge.
Third-generation sequencing is a class of DNA sequencing methods currently under active development.
Transcriptomics technologies are the techniques used to study an organism's transcriptome, the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its genome and expressed through transcription. Here, mRNA serves as a transient intermediary molecule in the information network, whilst non-coding RNAs perform additional diverse functions. A transcriptome captures a snapshot in time of the total transcripts present in a cell. Transcriptomics technologies provide a broad account of which cellular processes are active and which are dormant. A major challenge in molecular biology is to understand how a single genome gives rise to a variety of cells. Another is how gene expression is regulated.
Clinical metagenomic next-generation sequencing (mNGS) is the comprehensive analysis of microbial and host genetic material in clinical samples from patients by next-generation sequencing. It uses the techniques of metagenomics to identify and characterize the genome of bacteria, fungi, parasites, and viruses without the need for a prior knowledge of a specific pathogen directly from clinical specimens. The capacity to detect all the potential pathogens in a sample makes metagenomic next generation sequencing a potent tool in the diagnosis of infectious disease especially when other more directed assays, such as PCR, fail. Its limitations include clinical utility, laboratory validity, sense and sensitivity, cost and regulatory considerations.
Korean Genome Project (KGP) is the largest Korean Genome Project which currently includes over 10,000 human genomes sequenced in Korea by April 2021.