Carlson curve

Last updated
Total cost of sequencing a human genome over time as calculated by the NHGRI. Historic cost of sequencing a human genome.svg
Total cost of sequencing a human genome over time as calculated by the NHGRI.

The Carlson curve is a term to describe the rate of DNA sequencing or cost per sequenced base as a function of time. [1] It is the biotechnological equivalent of Moore's law. Carlson predicted that the doubling time of DNA sequencing technologies (measured by cost and performance) would be at least as fast as Moore's law. [2]

Contents

History

The term was coined by The Economist [3] and is named after author Rob Carlson. [1]

Carlson curves illustrate the rapid (in some cases above exponential growth) decreases in cost, and increases in performance, of a variety of technologies, including DNA sequencing, DNA synthesis and a range of physical and computational tools used in protein production and in determining protein structures.

Next generation sequencing

Sequencing floor in BGI Hong Kong, showing the Illumina Hiseq 2000 sequencers Illumina Hiseq 2000 sequencers, BGI Hong Kong sequencing room.JPG
Sequencing floor in BGI Hong Kong, showing the Illumina Hiseq 2000 sequencers

Moore's Law started being profoundly out-paced in January 2008 when the centers transitioned from Sanger sequencing to newer DNA sequencing technologies: [4] 454 sequencing with average read length=300-400 bases (10-fold) Illumina and SOLiD sequencing with average read length=50-100 bases (30-fold).

Related Research Articles

In genetics and biochemistry, sequencing means to determine the primary structure of an unbranched biopolymer. Sequencing results in a symbolic linear depiction known as a sequence which succinctly summarizes much of the atomic-level structure of the sequenced molecule.

In genetics, shotgun sequencing is a method used for sequencing random DNA strands. It is named by analogy with the rapidly expanding, quasi-random shot grouping of a shotgun.

Moores law Observation on the growth of integrated circuit capacity

Moore's law is the observation that the number of transistors in a dense integrated circuit (IC) doubles about every two years. Moore's law is an observation and projection of a historical trend. Rather than a law of physics, it is an empirical relationship linked to gains from experience in production.

Genomics Discipline in genetics

Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dimensional structural configuration. In contrast to genetics, which refers to the study of individual genes and their roles in inheritance, genomics aims at the collective characterization and quantification of all of an organism's genes, their interrelations and influence on the organism. Genes may direct the production of proteins with the assistance of enzymes and messenger molecules. In turn, proteins make up body structures such as organs and tissues as well as control chemical reactions and carry signals between cells. Genomics also involves the sequencing and analysis of genomes through uses of high throughput DNA sequencing and bioinformatics to assemble and analyze the function and structure of entire genomes. Advances in genomics have triggered a revolution in discovery-based research and systems biology to facilitate understanding of even the most complex biological systems such as the brain.

DNA sequencer

A DNA sequencer is a scientific instrument used to automate the DNA sequencing process. Given a sample of DNA, a DNA sequencer is used to determine the order of the four bases: G (guanine), C (cytosine), A (adenine) and T (thymine). This is then reported as a text string, called a read. Some DNA sequencers can be also considered optical instruments as they analyze light signals originating from fluorochromes attached to nucleotides.

In bioinformatics, sequence assembly refers to aligning and merging fragments from a longer DNA sequence in order to reconstruct the original sequence. This is needed as DNA sequencing technology cannot read whole genomes in one go, but rather reads small pieces of between 20 and 30,000 bases, depending on the technology used. Typically the short fragments, called reads, result from shotgun sequencing genomic DNA, or gene transcript (ESTs).

Nanopore sequencing DNA / RNA sequencing technique

Nanopore sequencing is a third generation approach used in the sequencing of biopolymers- specifically, polynucleotides in the form of DNA or RNA.

DNA sequencing Process of determining the order of nucleotides in DNA molecules

DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. The advent of rapid DNA sequencing methods has greatly accelerated biological and medical research and discovery.

Sanger sequencing Method of DNA sequencing developed in 1977

Sanger sequencing is a method of DNA sequencing based on the selective incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication. After first being developed by Frederick Sanger and colleagues in 1977, it became the most widely used sequencing method for approximately 40 years. It was first commercialized by Applied Biosystems in 1986. More recently, higher volume Sanger sequencing has been replaced by "Next-Gen" sequencing methods, especially for large-scale, automated genome analyses. However, the Sanger method remains in wide use, for smaller-scale projects, and for validation of Next-Gen results. It still has the advantage over short-read sequencing technologies in that it can produce DNA sequence reads of > 500 nucleotides.

Illumina, Inc. is an American company. Incorporated on April 1, 1998, Illumina develops, manufactures, and markets integrated systems for the analysis of genetic variation and biological function. The company provides a line of products and services that serves the sequencing, genotyping and gene expression, and proteomics markets. Its headquarters are located in San Diego, California.

ABI Solid Sequencing

SOLiD (Sequencing by Oligonucleotide Ligation and Detection) is a next-generation DNA sequencing technology developed by Life Technologies and has been commercially available since 2006. This next generation technology generates 108 - 109 small sequence reads at one time. It uses 2 base encoding to decode the raw data generated by the sequencing platform into sequence data.

Single-molecule real-time (SMRT) sequencing is a parallelized single molecule DNA sequencing method. Single-molecule real-time sequencing utilizes a zero-mode waveguide (ZMW). A single DNA polymerase enzyme is affixed at the bottom of a ZMW with a single molecule of DNA as a template. The ZMW is a structure that creates an illuminated observation volume that is small enough to observe only a single nucleotide of DNA being incorporated by DNA polymerase. Each of the four DNA bases is attached to one of four different fluorescent dyes. When a nucleotide is incorporated by the DNA polymerase, the fluorescent tag is cleaved off and diffuses out of the observation area of the ZMW where its fluorescence is no longer observable. A detector detects the fluorescent signal of the nucleotide incorporation, and the base call is made according to the corresponding fluorescence of the dye.

Epigenomics is the study of the complete set of epigenetic modifications on the genetic material of a cell, known as the epigenome. The field is analogous to genomics and proteomics, which are the study of the genome and proteome of a cell. Epigenetic modifications are reversible modifications on a cell's DNA or histones that affect gene expression without altering the DNA sequence. Epigenomic maintenance is a continuous process and plays an important role in stability of eukaryotic genomes by taking part in crucial biological mechanisms like DNA repair. Plant flavones are said to be inhibiting epigenomic marks that cause cancers. Two of the most characterized epigenetic modifications are DNA methylation and histone modification. Epigenetic modifications play an important role in gene expression and regulation, and are involved in numerous cellular processes such as in differentiation/development and tumorigenesis. The study of epigenetics on a global level has been made possible only recently through the adaptation of genomic high-throughput assays.

Hybrid genome assembly

In bioinformatics, hybrid genome assembly refers to utilizing various sequencing technologies to achieve the task of assembling a genome from fragmented, sequenced DNA resulting from shotgun sequencing. Genome assembly presents one of the most challenging tasks in genome sequencing as most modern DNA sequencing technologies can only produce reads that are, on average, 25-300 base pairs in length. This is orders of magnitude smaller than the average size of a genome. This assembly is computationally difficult and has some inherent challenges, one of these challenges being that genomes often contain complex tandem repeats of sequences that can be thousands of base pairs in length. These repeats can be long enough that second generation sequencing reads are not long enough to bridge the repeat, and, as such, determining the location of each repeat in the genome can be difficult. Resolving these tandem repeats can be accomplished by utilizing long third generation sequencing reads, such as those obtained using the PacBio RS DNA sequencer. These sequences are, on average, 10,000-15,000 base pairs in length and are long enough to span most repeated regions. Using a hybrid approach to this process can increase the fidelity of assembling tandem repeats by being able to accurately place them along a linear scaffold and make the process more computationally efficient.

Pacific Biosciences

Pacific Biosciences of California, Inc. is an American biotechnology company founded in 2004 that develops and manufactures systems for gene sequencing and some novel real time biological observation. PacBio describes its platform as single-molecule real-time sequencing (SMRT), based on the properties of zero-mode waveguides.

Transmission electron microscopy DNA sequencing

Transmission electron microscopy DNA sequencing is a single-molecule sequencing technology that uses transmission electron microscopy techniques. The method was conceived and developed in the 1960s and 70s, but lost favor when the extent of damage to the sample was recognized.

Ion semiconductor sequencing

Ion semiconductor sequencing is a method of DNA sequencing based on the detection of hydrogen ions that are released during the polymerization of DNA. This is a method of "sequencing by synthesis", during which a complementary strand is built based on the sequence of a template strand.

Massive parallel sequencing or massively parallel sequencing is any of several high-throughput approaches to DNA sequencing using the concept of massively parallel processing; it is also called next-generation sequencing (NGS) or second-generation sequencing. Some of these technologies emerged in 1994-1998 and have been commercially available since 2005. These technologies use miniaturized and parallelized platforms for sequencing of 1 million to 43 billion short reads per instrument run.

Coverage in DNA sequencing is the number of unique reads that include a given nucleotide in the reconstructed sequence. Deep sequencing refers to the general concept of aiming for high number of unique reads of each region of a sequence.

Third-generation sequencing is a class of DNA sequencing methods currently under active development.

References

  1. 1 2 Robert H. Carlson (April 2011). Biology Is Technology : The Promise, Peril, and New Business of Engineering Life. Cambridge, MA: Harvard University Press.
  2. Robert Carlson (September 2003). "The Pace and Proliferation of Biological Technologies". Biosecurity and Bioterrorism: Biodefense Strategy, Practice, and Science. 1 (3: 203–214): 203–214. doi:10.1089/153871303769201851. PMID   15040198.
  3. "Life 2.0". The Economist. August 31, 2006.
  4. "DNA Sequencing Costs". National Human Genome Research Institute.