Ion semiconductor sequencing

Last updated

Ion semiconductor sequencing is a method of DNA sequencing based on the detection of hydrogen ions that are released during the polymerization of DNA. This is a method of "sequencing by synthesis", during which a complementary strand is built based on the sequence of a template strand.

Contents

An Ion Proton semiconductor sequencer Life Technologies - Ion Proton (TM).jpg
An Ion Proton semiconductor sequencer

A microwell containing a template DNA strand to be sequenced is flooded with a single species of deoxyribonucleotide triphosphate (dNTP). If the introduced dNTP is complementary to the leading template nucleotide, it is incorporated into the growing complementary strand. This causes the release of a hydrogen ion that triggers an ion-sensitive field-effect transistor (ISFET) sensor, which indicates that a reaction has occurred. If homopolymer repeats are present in the template sequence, multiple dNTP molecules will be incorporated in a single cycle. This leads to a corresponding number of released hydrogens and a proportionally higher electronic signal.

This technology differs from other sequencing-by-synthesis technologies in that no modified nucleotides or optics are used. Ion semiconductor sequencing may also be referred to as Ion Torrent sequencing, pH-mediated sequencing, silicon sequencing, or semiconductor sequencing.

Technology development history

The technology was licensed from DNA Electronics Ltd, [1] [2] developed by Ion Torrent Systems Inc. and was released in February 2010. [3] Ion Torrent have marketed their machine as a rapid, compact and economical sequencer that can be utilized in a large number of laboratories as a bench top machine. [4] Roche's 454 Life Sciences is partnering with DNA Electronics on the development of a long-read, high-density semiconductor sequencing platform using this technology. [5]

Technology

The incorporation of deoxyribonucleotide triphosphate into a growing DNA strand causes the release of hydrogen and pyrophosphate. DNTP nucleotide incorporation reaction.svg
The incorporation of deoxyribonucleotide triphosphate into a growing DNA strand causes the release of hydrogen and pyrophosphate.
The release of hydrogen ions indicate if zero, one or more nucleotides were incorporated. DNTP nucletide incorporation events.svg
The release of hydrogen ions indicate if zero, one or more nucleotides were incorporated.
Released hydrogens ions are detected by an ion sensor. Multiple incorporations lead to a corresponding number of released hydrogens and intensity of signal. DNTP incorporation hydrogen magnitude.svg
Released hydrogens ions are detected by an ion sensor. Multiple incorporations lead to a corresponding number of released hydrogens and intensity of signal.

Sequencing chemistry

In nature, the incorporation of a deoxyribonucleoside triphosphate (dNTP) into a growing DNA strand involves the formation of a covalent bond and the release of pyrophosphate and a positively charged hydrogen ion. [1] [3] [6] A dNTP will only be incorporated if it is complementary to the leading unpaired template nucleotide. Ion semiconductor sequencing exploits these facts by determining if a hydrogen ion is released upon providing a single species of dNTP to the reaction.

Microwells on a semiconductor chip that each contain many copies of one single-stranded template DNA molecule to be sequenced and DNA polymerase are sequentially flooded with unmodified A, C, G or T dNTP. [3] [7] [8] If an introduced dNTP is complementary to the next unpaired nucleotide on the template strand it is incorporated into the growing complementary strand by the DNA polymerase. [9] If the introduced dNTP is not complementary there is no incorporation and no biochemical reaction. The hydrogen ion that is released in the reaction changes the pH of the solution, which is detected by an ISFET. [1] [3] [7] The unattached dNTP molecules are washed out before the next cycle when a different dNTP species is introduced. [7]

Signal detection

Beneath the layer of microwells is an ion sensitive layer, below which is an ISFET ion sensor. [4] All layers are contained within a CMOS semiconductor chip, similar to that used in the electronics industry. [4] [10]

Each chip contains an array of microwells with corresponding ISFET detectors. [7] Each released hydrogen ion then triggers the ISFET ion sensor. The series of electrical pulses transmitted from the chip to a computer is translated into a DNA sequence, with no intermediate signal conversion required. [7] [11] Because nucleotide incorporation events are measured directly by electronics, the use of labeled nucleotides and optical measurements are avoided. [4] [10] Signal processing and DNA assembly can then be carried out in software.

Sequencing characteristics

The per base accuracy achieved on the Ion Torrent Ion semiconductor sequencer as of February 2011 was 99.6% based on 50 base reads, with 100 Mb per run. [12] The read-length as of February 2011 was 100 base pairs. [12] The accuracy for homopolymer repeats of 5 repeats in length was 98%. [12] Later releases show a read length of 400 base pairs [13] These figures have not yet been independently verified outside of the company.

Strengths

The major benefits of ion semiconductor sequencing are rapid sequencing speed and low upfront and operating costs. [8] [11] This has been enabled by the avoidance of modified nucleotides and optical measurements.

Because the system records natural polymerase-mediated nucleotide incorporation events, sequencing can occur in real-time. In reality, the sequencing rate is limited by the cycling of substrate nucleotides through the system. [14] Ion Torrent Systems Inc., the developer of the technology, claims that each incorporation measurement takes 4 seconds and each run takes about one hour, during which 100-200 nucleotides are sequenced. [11] [15] If the semiconductor chips are improved (as predicted by Moore’s law), the number of reads per chip (and therefore per run) should increase. [11]

The cost of acquiring a pH-mediated sequencer at time of launch was priced at around $50,000 USD, excluding sample preparation equipment and a server for data analysis. [8] [11] [15] The cost per run is also significantly lower than that of alternative automated sequencing methods, at roughly $1,000. [8] [12]

Limitations

If homopolymer repeats of the same nucleotide (e.g. TTTTT) are present on the template strand (strand to be sequenced) then multiple introduced nucleotides are incorporated and more hydrogen ions are released in a single cycle. This results in a greater pH change and a proportionally greater electronic signal. [11] This is a limitation of the system in that it is difficult to enumerate long repeats. This limitation is shared by other techniques that detect single nucleotide additions such as pyrosequencing. [16] Signals generated from a high repeat number are difficult to differentiate from repeats of a similar but different number; e.g., homorepeats of length 7 are difficult to differentiate from those of length 8.

Another limitation of this system is the short read length compared to other sequencing methods such as Sanger sequencing or pyrosequencing. Longer read lengths are beneficial for de novo genome assembly. Ion Torrent semiconductor sequencers produce an average read length of approximately 400 nucleotides per read. [3] [8]

The throughput is currently lower than that of other high-throughput sequencing technologies, although the developers hope to change this by increasing the density of the chip. [3]

Application

The developers of Ion Torrent semiconductor sequencing have marketed it as a rapid, compact and economical sequencer that can be utilized in a large number of laboratories as a bench top machine. [3] [4] The company hopes that their system will take sequencing outside of specialized centers and into the reach of hospitals and smaller laboratories. [17] A January 2011 New York Times article, "Taking DNA Sequencing to the Masses", underlines these ambitions. [17]

Due to the ability of alternative sequencing methods to achieve a greater read length (and therefore being more suited to whole genome analysis) this technology may be best suited to small scale applications such as microbial genome sequencing, microbial transcriptome sequencing, targeted sequencing, amplicon sequencing, or for quality testing of sequencing libraries. [3] [8] [18]

Related Research Articles

In genetics and biochemistry, sequencing means to determine the primary structure of an unbranched biopolymer. Sequencing results in a symbolic linear depiction known as a sequence which succinctly summarizes much of the atomic-level structure of the sequenced molecule.

<span class="mw-page-title-main">Genomics</span> Discipline in genetics

Genomics is an interdisciplinary field of molecular biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dimensional structural configuration. In contrast to genetics, which refers to the study of individual genes and their roles in inheritance, genomics aims at the collective characterization and quantification of all of an organism's genes, their interrelations and influence on the organism. Genes may direct the production of proteins with the assistance of enzymes and messenger molecules. In turn, proteins make up body structures such as organs and tissues as well as control chemical reactions and carry signals between cells. Genomics also involves the sequencing and analysis of genomes through uses of high throughput DNA sequencing and bioinformatics to assemble and analyze the function and structure of entire genomes. Advances in genomics have triggered a revolution in discovery-based research and systems biology to facilitate understanding of even the most complex biological systems such as the brain.

<span class="mw-page-title-main">DNA sequencer</span> A scientific instrument used to automate the DNA sequencing process

A DNA sequencer is a scientific instrument used to automate the DNA sequencing process. Given a sample of DNA, a DNA sequencer is used to determine the order of the four bases: G (guanine), C (cytosine), A (adenine) and T (thymine). This is then reported as a text string, called a read. Some DNA sequencers can be also considered optical instruments as they analyze light signals originating from fluorochromes attached to nucleotides.

<span class="mw-page-title-main">DNA polymerase I</span> Family of enzymes

DNA polymerase I is an enzyme that participates in the process of prokaryotic DNA replication. Discovered by Arthur Kornberg in 1956, it was the first known DNA polymerase. It was initially characterized in E. coli and is ubiquitous in prokaryotes. In E. coli and many other bacteria, the gene that encodes Pol I is known as polA. The E. coli Pol I enzyme is composed of 928 amino acids, and is an example of a processive enzyme — it can sequentially catalyze multiple polymerisation steps without releasing the single-stranded template. The physiological function of Pol I is mainly to support repair of damaged DNA, but it also contributes to connecting Okazaki fragments by deleting RNA primers and replacing the ribonucleotides with DNA.

Pyrosequencing is a method of DNA sequencing based on the "sequencing by synthesis" principle, in which the sequencing is performed by detecting the nucleotide incorporated by a DNA polymerase. Pyrosequencing relies on light detection based on a chain reaction when pyrophosphate is released. Hence, the name pyrosequencing.

<span class="mw-page-title-main">DNA sequencing</span> Process of determining the nucleic acid sequence

DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. The advent of rapid DNA sequencing methods has greatly accelerated biological and medical research and discovery.

<span class="mw-page-title-main">Sanger sequencing</span> Method of DNA sequencing developed in 1977

Sanger sequencing is a method of DNA sequencing that involves electrophoresis and is based on the random incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication. After first being developed by Frederick Sanger and colleagues in 1977, it became the most widely used sequencing method for approximately 40 years. It was first commercialized by Applied Biosystems in 1986. More recently, higher volume Sanger sequencing has been replaced by next generation sequencing methods, especially for large-scale, automated genome analyses. However, the Sanger method remains in wide use for smaller-scale projects and for validation of deep sequencing results. It still has the advantage over short-read sequencing technologies in that it can produce DNA sequence reads of > 500 nucleotides and maintains a very low error rate with accuracies around 99.99%. Sanger sequencing is still actively being used in efforts for public health initiatives such as sequencing the spike protein from SARS-CoV-2 as well as for the surveillance of norovirus outbreaks through the Center for Disease Control and Prevention's (CDC) CaliciNet surveillance network.

454 Life Sciences was a biotechnology company based in Branford, Connecticut that specialized in high-throughput DNA sequencing. It was acquired by Roche in 2007 and shut down by Roche in 2013 when its technology became noncompetitive, although production continued until mid-2016.

<span class="mw-page-title-main">RNA-dependent RNA polymerase</span> Enzyme that synthesizes RNA from an RNA template

RNA-dependent RNA polymerase (RdRp) or RNA replicase is an enzyme that catalyzes the replication of RNA from an RNA template. Specifically, it catalyzes synthesis of the RNA strand complementary to a given RNA template. This is in contrast to typical DNA-dependent RNA polymerases, which all organisms use to catalyze the transcription of RNA from a DNA template.

SNP genotyping is the measurement of genetic variations of single nucleotide polymorphisms (SNPs) between members of a species. It is a form of genotyping, which is the measurement of more general genetic variation. SNPs are one of the most common types of genetic variation. An SNP is a single base pair mutation at a specific locus, usually consisting of two alleles. SNPs are found to be involved in the etiology of many human diseases and are becoming of particular interest in pharmacogenetics. Because SNPs are conserved during evolution, they have been proposed as markers for use in quantitative trait loci (QTL) analysis and in association studies in place of microsatellites. The use of SNPs is being extended in the HapMap project, which aims to provide the minimal set of SNPs needed to genotype the human genome. SNPs can also provide a genetic fingerprint for use in identity testing. The increase of interest in SNPs has been reflected by the furious development of a diverse range of SNP genotyping methods.

The polymerase chain reaction (PCR) is a commonly used molecular biology tool for amplifying DNA, and various techniques for PCR optimization which have been developed by molecular biologists to improve PCR performance and minimize failure.

<span class="mw-page-title-main">ABI Solid Sequencing</span>

SOLiD (Sequencing by Oligonucleotide Ligation and Detection) is a next-generation DNA sequencing technology developed by Life Technologies and has been commercially available since 2006. This next generation technology generates 108 - 109 small sequence reads at one time. It uses 2 base encoding to decode the raw data generated by the sequencing platform into sequence data.

Single-molecule real-time (SMRT) sequencing is a parallelized single molecule DNA sequencing method. Single-molecule real-time sequencing utilizes a zero-mode waveguide (ZMW). A single DNA polymerase enzyme is affixed at the bottom of a ZMW with a single molecule of DNA as a template. The ZMW is a structure that creates an illuminated observation volume that is small enough to observe only a single nucleotide of DNA being incorporated by DNA polymerase. Each of the four DNA bases is attached to one of four different fluorescent dyes. When a nucleotide is incorporated by the DNA polymerase, the fluorescent tag is cleaved off and diffuses out of the observation area of the ZMW where its fluorescence is no longer observable. A detector detects the fluorescent signal of the nucleotide incorporation, and the base call is made according to the corresponding fluorescence of the dye.

<span class="mw-page-title-main">T7 DNA polymerase</span> Enzyme

T7 DNA polymerase is an enzyme used during the DNA replication of the T7 bacteriophage. During this process, the DNA polymerase “reads” existing DNA strands and creates two new strands that match the existing ones. The T7 DNA polymerase requires a host factor, E. coli thioredoxin, in order to carry out its function. This helps stabilize the binding of the necessary protein to the primer-template to improve processivity by more than 100-fold, which is a feature unique to this enzyme. It is a member of the Family A DNA polymerases, which include E. coli DNA polymerase I and Taq DNA polymerase.

Optical mapping is a technique for constructing ordered, genome-wide, high-resolution restriction maps from single, stained molecules of DNA, called "optical maps". By mapping the location of restriction enzyme sites along the unknown DNA of an organism, the spectrum of resulting DNA fragments collectively serves as a unique "fingerprint" or "barcode" for that sequence. Originally developed by Dr. David C. Schwartz and his lab at NYU in the 1990s this method has since been integral to the assembly process of many large-scale sequencing projects for both microbial and eukaryotic genomes. Later technologies use DNA melting, DNA competitive binding or enzymatic labelling in order to create the optical mappings.

<span class="mw-page-title-main">Complementarity (molecular biology)</span> Lock-and-key pairing between two structures

In molecular biology, complementarity describes a relationship between two structures each following the lock-and-key principle. In nature complementarity is the base principle of DNA replication and transcription as it is a property shared between two DNA or RNA sequences, such that when they are aligned antiparallel to each other, the nucleotide bases at each position in the sequences will be complementary, much like looking in the mirror and seeing the reverse of things. This complementary base pairing allows cells to copy information from one generation to another and even find and repair damage to the information stored in the sequences.

<span class="mw-page-title-main">DNA nanoball sequencing</span> DNA sequencing technology

DNA nanoball sequencing is a high throughput sequencing technology that is used to determine the entire genomic sequence of an organism. The method uses rolling circle replication to amplify small fragments of genomic DNA into DNA nanoballs. Fluorescent nucleotides bind to complementary nucleotides and are then polymerized to anchor sequences bound to known sequences on the DNA template. The base order is determined via the fluorescence of the bound nucleotides This DNA sequencing method allows large numbers of DNA nanoballs to be sequenced per run at lower reagent costs compared to other next generation sequencing platforms. However, a limitation of this method is that it generates only short sequences of DNA, which presents challenges to mapping its reads to a reference genome. After purchasing Complete Genomics, the Beijing Genomics Institute (BGI) refined DNA nanoball sequencing to sequence nucleotide samples on their own platform.

Massive parallel sequencing or massively parallel sequencing is any of several high-throughput approaches to DNA sequencing using the concept of massively parallel processing; it is also called next-generation sequencing (NGS) or second-generation sequencing. Some of these technologies emerged between 1993 and 1998 and have been commercially available since 2005. These technologies use miniaturized and parallelized platforms for sequencing of 1 million to 43 billion short reads per instrument run.

<span class="mw-page-title-main">Illumina dye sequencing</span> DNA sequencing method

Illumina dye sequencing is a technique used to determine the series of base pairs in DNA, also known as DNA sequencing. The reversible terminated chemistry concept was invented by Bruno Canard and Simon Sarfati at the Pasteur Institute in Paris. It was developed by Shankar Balasubramanian and David Klenerman of Cambridge University, who subsequently founded Solexa, a company later acquired by Illumina. This sequencing method is based on reversible dye-terminators that enable the identification of single nucleotides as they are washed over DNA strands. It can also be used for whole-genome and region sequencing, transcriptome analysis, metagenomics, small RNA discovery, methylation profiling, and genome-wide protein-nucleic acid interaction analysis.

Multiple Annealing and Looping Based Amplification Cycles (MALBAC) is a quasilinear whole genome amplification method. Unlike conventional DNA amplification methods that are non-linear or exponential, MALBAC utilizes special primers that allow amplicons to have complementary ends and therefore to loop, preventing DNA from being copied exponentially. This results in amplification of only the original genomic DNA and therefore reduces amplification bias. MALBAC is “used to create overlapped shotgun amplicons covering most of the genome”. For next generation sequencing, MALBAC is followed by regular PCR which is used to further amplify amplicons.

References

  1. 1 2 3 Bio-IT World, Davies, K. Powering Preventative Medicine Archived 2016-06-06 at the Wayback Machine . Bio-IT World 2011
  2. GenomeWeb DNA Electronics Licenses IP to Ion Torrent. August 2010
  3. 1 2 3 4 5 6 7 8 Rusk, N. (2011). "Torrents of sequence". Nat Meth 8(1): 44-44.
  4. 1 2 3 4 5 Ion Torrent Official Webpage Archived 2012-11-06 at the Wayback Machine .
  5. GenomeWeb Roche Partners with DNA Electronics to Help Migrate 454 Platform to Electrochemical Detection. November 2010
  6. Purushothaman, S, Toumazou, C, Ou, C-P Protons and single nucleotide polymorphism detection: a simple use for the ion sensitive field effect transistor
  7. 1 2 3 4 5 Pennisi, E (2010). "Semiconductors inspire new sequencing technologies". Science. 327 (5970): 1190. Bibcode:2010Sci...327.1190P. doi:10.1126/science.327.5970.1190. PMID   20203024.
  8. 1 2 3 4 5 6 Perkel, J., "Making contact with sequencing's fourth generation" Archived 2013-12-27 at the Wayback Machine . Biotechniques, 2011.
  9. Alberts B, Molecular Biology of the Cell. 5th Edition ed. 2008, New York: Garland Science.
  10. 1 2 Karow, J. (2009) Ion Torrent Patent App Suggests Sequencing Tech Using Chemical-Sensitive Field-Effect Transistors. In Sequence.
  11. 1 2 3 4 5 6 Bio-IT World, Davies, K. It’s "Watson Meets Moore" as Ion Torrent Introduces Semiconductor Sequencing Archived 2015-08-02 at the Wayback Machine . Bio-IT World 2010.
  12. 1 2 3 4 Karow, J. (2009) At AGBT, Ion Torrent Customers Provide First Feedback; Life Tech Outlines Platform's Growth. In Sequence.
  13. https://tools.lifetechnologies.com/content/sfs/brochures/Small-Genome-Ecoli-De-Novo-App-Note.pdf Archived 2014-08-30 at the Wayback Machine [ bare URL PDF ]
  14. Eid, J., et al., "Real-time DNA sequencing from single polymerase molecules". Science, 2009. 323(5910): p. 133-8.
  15. 1 2 Karow, J. (2010) Ion Torrent Systems Presents $50,000 Electronic Sequencer at AGBT. In Sequence.
  16. Metzker, M.L., "Emerging technologies in DNA sequencing". Genome Res, 2005. 15(12): p. 1767-76.
  17. 1 2 Pollack, A., Taking DNA Sequencing to the Masses, in New York Times. 2011: New York.
  18. Chiosea, SI; Williams, L; Griffith, CC; Thompson, LD; Weinreb, I; Bauman, JE; Luvison, A; Roy, S; Seethala, RR; Nikiforova, MN (June 2015). "Molecular characterization of apocrine salivary duct carcinoma". The American Journal of Surgical Pathology. 39 (6): 744–52. doi:10.1097/pas.0000000000000410. PMID   25723113. S2CID   34106002.