Duplex sequencing

Last updated
Duplex sequencing overview: Duplex tagged libraries containing sequencing adapters are amplified and result in two types of products each originates from a single strand of DNA. After sequencing the PCR products, the generated reads divide into tag families based on the genomic position, duplex tags, and the neighboring sequencing adapter. Sequence tag a is the reverse complement of sequence tag b and vice versa. Duplex sequencing overview alphabeta fix.svg
Duplex sequencing overview: Duplex tagged libraries containing sequencing adapters are amplified and result in two types of products each originates from a single strand of DNA. After sequencing the PCR products, the generated reads divide into tag families based on the genomic position, duplex tags, and the neighboring sequencing adapter. Sequence tag α is the reverse complement of sequence tag β and vice versa.

Duplex sequencing is a library preparation and analysis method for next-generation sequencing (NGS) platforms that employs random tagging of double-stranded DNA to detect mutations with higher accuracy and lower error rates.

Contents

This method uses degenerate molecular tags in addition to sequencing adapters to recognize reads originating from each strand of DNA. As the two strands are complementary, true mutations are found at the same position in both strands. In contrast, PCR or sequencing errors result in mutations in only one strand and can thus be discounted as technical error. Duplex sequencing theoretically can detect mutations with frequencies as low as 5 x 10−8 --that is more than 10,000 times higher in accuracy compared to the conventional next-generation sequencing methods. [1] [2]

The estimated error rate of standard next-generation sequencing platforms is 10−2 to 10−3 per base call. With this error rate, billions of base calls that are produced by NGS will result in millions of errors. The errors are introduced during sample preparation and sequencing such as polymerase chain reaction, sequencing, and image analysis errors. While the NGS platforms' error rate is acceptable in some applications such as detection of clonal variants, it is a major limitation for applications that require higher accuracy for detection of low-frequency variants such as detection of intra-organismal mosaicism, subclonal variants in genetically heterogeneous cancers, or circulating tumor DNA. [3] [4] [5]

Several library preparation strategies have been developed that increase accuracy of NGS platforms such as molecular barcoding and circular consensus sequencing method. [6] [7] [8] [9] Like NGS platforms, the data generated by these methods originates from a single strand of DNA, and therefore the errors that are introduced during PCR amplification, tissue processing, DNA extraction, hybridization capture (where used) or DNA sequencing itself can still be distinguished as a true variant. The duplex sequencing method addresses this problem by taking advantage of the complementary nature of two strands of DNA and confirming only variants that are present in both strands of DNA. Because the probability of two complementary errors arising at the same location in both strands is exceedingly low, duplex sequencing increases the accuracy of sequencing significantly. [1] [6] [8] [10]

Experimental workflow

Duplex sequencing tagged adapters can be used in combination with the majority of NGS adapters. In the figures and workflow section of this article, Illumina sequencing adapters are used as an example following the original published protocol. [1] [2]

Duplex sequencing library preparation workflow: Two adapter oligos go through several steps (Annealing, Synthesis, dT-tailing) to generate double-stranded unique tags with 3'-dT-overhangs. Then the duplex tag adapters ligate to the double-stranded DNA templates. Finally, Illumina sequencing adapters are inserted into the tagged-DNA fragments and form the final libraries containing DS adapters, Illumina sequencing adapters, and template DNA. Duplex sequencing library preparation procedure.png
Duplex sequencing library preparation workflow: Two adapter oligos go through several steps (Annealing, Synthesis, dT-tailing) to generate double-stranded unique tags with 3'-dT-overhangs. Then the duplex tag adapters ligate to the double-stranded DNA templates. Finally, Illumina sequencing adapters are inserted into the tagged-DNA fragments and form the final libraries containing DS adapters, Illumina sequencing adapters, and template DNA.

Adapter annealing

Two oligonucleotides are used for this step (Figure 1: Adapter oligos). One of the oligonucleotides contains a 12-nucleotide single-stranded random tag sequence followed by a fixed 5' nucleotide sequence (black sequence in Figure 1). In this step, oligonucleotides are annealed in a complementary region by incubation at the required temporal condition. [1] [2]

Adapter synthesis

The adapters that annealed successfully are extended and synthesized by a DNA polymerase to complete a double-stranded adapter containing complementary tags (Figure 1). [1] [2]

3’-dT-tailing

The extended double-stranded adapters are cleaved by HpyCH4III at a specific restriction site located at 3’ side of the tag sequence and will result in a 3’-dT overhang that will be ligated to the 3’-dA overhang on DNA libraries in the adapter ligation step (Figure 1). [1] [2]

Library preparation

Double-stranded DNA is sheared using one of these methods: sonication, enzymatic digestion, or nebulization. Fragments are size selected using Ampure XP beads. Gel-based size selection is not recommended since it can cause melting of DNA double strands and DNA damage due to UV exposure. The size of selected fragments of DNA are subjected to 3’-end-dA-tailing. [1] [2]

Adapter ligation

In this step, two tagged adapters are ligated from 3’-dT-tails to 3’-dA-tails on both sides of double-stranded DNA library fragments. This process results in double-stranded library fragments that contain two random tags (α and β) on each side that are the reverse complement of each other (Figure 1 and 2). The "DNA: adapter" ratio is crucial in determining the success of ligation. [1] [2]

Insertion of sequencing adapters to tagged libraries

In the last step of duplex sequencing library preparation, Illumina sequencing adapters are added to the tagged double stranded libraries by PCR amplification using primers containing sequencing adapters. During PCR amplification, both complementary strands of DNA are amplified and generate two types of PCR products. Product 1 derives from strand 1's which have a unique tag sequence (called α in Figure 2) next to the Illumina adapter 1 and product 2 has a unique tag (called β in Figure 2) next to the Illumina adapter 1. (In each strand, tag α is the reverse complement of tag β and vice versa). The libraries containing duplex tags and Illumina adapters are sequenced using the Illumina TruSeq system. Reads that are originating from every single strand of DNA form a group of reads (tag families) that share the same tag. The detected families of reads will be used in the next step for analyzing sequencing data. [1] [2]

Considerations

Efficiency of adapter ligation

Adapter ligation efficiency is very important in successful duplex sequencing. An extra amount of libraries or adapters can affect the DNA to adapter balance, resulting in inefficient ligation and an excess amount of primer dimers, respectively. Therefore, it is important to keep the molar concentration of DNA to adapter at the optimal ratio (0.05). [2]

Tag family size

The efficiency of duplex sequencing depends on the final number of DCSs which is directly related to the number of reads in each family (family size). If the family size is too small then the DCS can not be assembled and if too many reads are sharing the same tag, the data yield will be low. Family size is determined by the amount of DNA template needed for PCR amplification and the dedicated sequencing lane fraction. The optimal tag family size is between 6 and 12 members. To obtain the optimal family size, the amounts of DNA template and the dedicated sequencing lane fraction need to be adjusted. The following formula takes into account the most important variables that can affect depth of coverage (N=40DG÷R) where "N" is the number of reads, "D" is the desired depth of coverage, "G" is the size of DNA target in base pair, and "R" is final read length.

Computational workflow

Filtering and trimming

Each duplex sequencing read contains a fixed 5-nucleotide sequence (shown in figures in black) located upstream of the 12-nucleotide tag sequence. The reads are filtered if they do not have the expected 5-nucleotide sequence or have more than nine identical or ambiguous bases within each tag. The two 12-nucleotide tags at each end of the reads are combined and moved to the read header. Two families of reads are formed that originate from the two strands of DNA. One family contains reads with αβ header originating from strand 1 and the second contains reads with βα header originating from strand 2 (Figure 2). The reads are then trimmed by removing the fixed 5-base pair sequence and 4 error-prone nucleotides located at the sites of ligation and end repair. [1] [2] The remaining reads are assembled to consensus sequences using SSCS and DCS assemblies.

SSCS assembly

Trimmed sequences from the previous step are aligned to the reference genome using a Burrows–Wheeler aligner (BWA) and the unmapped reads are removed. The aligned reads that have the same 24-base pair tag sequence and genomic region are detected and grouped (family αβ and βα in Figure 2). Each group represents a “tag family.” Tag families with fewer than three members are not analyzed. To remove errors that arise during PCR amplification or sequencing, mutations that are supported by less than 70% of the members (reads) are filtered out from the analysis. A consensus sequence is then generated for each family using the identical sequences in each position of the remaining reads. The consensus sequence is called the SSCS. It increases the NGS accuracy to about 20 fold higher; however, this method relies on the sequencing information from single strands of DNA and therefore is sensitive to the errors induced at the first round or before PCR amplification. [1] [2]

DCS assembly

The reads from the last step are realigned to the reference genome. In this method, SSCS family pairs that have complementary tags will be grouped (family αβ and βα in Figure 2). These reads originate from two complementary strands of DNA. High confidence sequences are selected based on the perfectly matched base calls of each family. The final sequence is called the DCS. True mutations are those that match perfectly between complementary SSCSs. This step filters out remaining errors raised during the first round of PCR amplification or during sample preparation. [1] [2]

Advantages

Decreasing error rate of sequencing

The high error rate (0.01-0.001) of standard NGS platforms introduced during sample preparation or sequencing is a major limitation for the detection of variants present in a small fraction of cells. Due to the duplex tagging system and use of information in both strands of DNA, duplex sequencing has significantly decreased the error rate of sequencing about 10 million fold using both SSCS and DCS method. [1] [2] [10]

Increasing accuracy of variant calling

It is challenging to identify rare variants accurately using standard NGS methods with a mutation rate of (10−2 to 10−3). Errors that happen early during sample preparation can be detected as rare variants. An example of such errors is C>A/G>T transversion, detected in low frequencies using deep sequencing or targeted capture data and arising due to DNA oxidation during sample preparation. [11] These types of false-positive variants are filtered out by the duplex sequencing method since mutations need to be accurately matched in both strands of DNA to be validated as true mutations. Duplex sequencing can theoretically detect mutations with frequencies as low as 10−8 compared to the 10−2 rate of standard NGS methods. [1] [2] [10]

Applicable to majority of NGS platforms

Another advantage of duplex sequencing is that it can be used in combination with the majority of NGS platforms without making significant changes to the standard protocols.

Limitations

Cost

Because duplex sequencing provides a significantly higher sequencing accuracy and uses information in both strands of DNA, this method needs a much higher sequencing depth and therefore is a costly approach. The expense limits its application to targeted and amplicon sequencing at present time and will not be applicable for whole genome sequencing approaches. However, the application of duplex sequencing for larger DNA targets will be more feasible when the cost of NGS decreases.

Practical application

Duplex sequencing is a new method and its efficiency was studied in limited applications such as detecting point mutations using targeted capture sequencing. [12] More studies need to be performed to expand the application and feasibility of duplex sequencing to more complex samples with larger numbers of mutations, indels, and copy number variations.

Applications

Detection of variants with low frequencies

Duplex sequencing and the significant increase of sequencing accuracy has had an important impact on applications such as detection of rare human genetic variants, detection of subclonal mutations involved in mechanisms of resistance to therapy in genetically heterogeneous cancers, screening variants in circulating tumor DNA as a non-invasive biomarker, and prenatal screening for genetic abnormalities in a fetus.

Copy number detection

Another application for duplex sequencing is in the detection of DNA/RNA copy numbers by estimating the relative frequency of variants. A method for counting PCR template molecules with application to next-generation sequencing is an example. [1]

Analysis and software

A list of required tools and packages for SSCS and DCS analysis can be found online.

See also

Related Research Articles

<span class="mw-page-title-main">Polymerase chain reaction</span> Laboratory technique to multiply a DNA sample for study

The polymerase chain reaction (PCR) is a method widely used to make millions to billions of copies of a specific DNA sample rapidly, allowing scientists to amplify a very small sample of DNA sufficiently to enable detailed study. PCR was invented in 1983 by American biochemist Kary Mullis at Cetus Corporation. Mullis and biochemist Michael Smith, who had developed other essential ways of manipulating DNA, were jointly awarded the Nobel Prize in Chemistry in 1993.

<span class="mw-page-title-main">DNA sequencer</span> A scientific instrument used to automate the DNA sequencing process

A DNA sequencer is a scientific instrument used to automate the DNA sequencing process. Given a sample of DNA, a DNA sequencer is used to determine the order of the four bases: G (guanine), C (cytosine), A (adenine) and T (thymine). This is then reported as a text string, called a read. Some DNA sequencers can be also considered optical instruments as they analyze light signals originating from fluorochromes attached to nucleotides.

<span class="mw-page-title-main">DNA sequencing</span> Process of determining the nucleic acid sequence

DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. The advent of rapid DNA sequencing methods has greatly accelerated biological and medical research and discovery.

<span class="mw-page-title-main">Sanger sequencing</span> Method of DNA sequencing developed in 1977

Sanger sequencing is a method of DNA sequencing that involves electrophoresis and is based on the random incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication. After first being developed by Frederick Sanger and colleagues in 1977, it became the most widely used sequencing method for approximately 40 years. It was first commercialized by Applied Biosystems in 1986. More recently, higher volume Sanger sequencing has been replaced by next generation sequencing methods, especially for large-scale, automated genome analyses. However, the Sanger method remains in wide use for smaller-scale projects and for validation of deep sequencing results. It still has the advantage over short-read sequencing technologies in that it can produce DNA sequence reads of > 500 nucleotides and maintains a very low error rate with accuracies around 99.99%. Sanger sequencing is still actively being used in efforts for public health initiatives such as sequencing the spike protein from SARS-CoV-2 as well as for the surveillance of norovirus outbreaks through the Center for Disease Control and Prevention's (CDC) CaliciNet surveillance network.

The overlap extension polymerase chain reaction is a variant of PCR. It is also referred to as Splicing by overlap extension / Splicing by overhang extension (SOE) PCR. It is used assemble multiple smaller double stranded DNA fragments into a larger DNA sequence. OE-PCR is widely used to insert mutations at specific points in a sequence or to assemble custom DNA sequence from smaller DNA fragments into a larger polynucleotide.

SNP genotyping is the measurement of genetic variations of single nucleotide polymorphisms (SNPs) between members of a species. It is a form of genotyping, which is the measurement of more general genetic variation. SNPs are one of the most common types of genetic variation. An SNP is a single base pair mutation at a specific locus, usually consisting of two alleles. SNPs are found to be involved in the etiology of many human diseases and are becoming of particular interest in pharmacogenetics. Because SNPs are conserved during evolution, they have been proposed as markers for use in quantitative trait loci (QTL) analysis and in association studies in place of microsatellites. The use of SNPs is being extended in the HapMap project, which aims to provide the minimal set of SNPs needed to genotype the human genome. SNPs can also provide a genetic fingerprint for use in identity testing. The increase of interest in SNPs has been reflected by the furious development of a diverse range of SNP genotyping methods.

An allele-specific oligonucleotide (ASO) is a short piece of synthetic DNA complementary to the sequence of a variable target DNA. It acts as a probe for the presence of the target in a Southern blot assay or, more commonly, in the simpler dot blot assay. It is a common tool used in genetic testing, forensics, and molecular biology research.

<span class="mw-page-title-main">2 base encoding</span>

2 Base Encoding, also called SOLiD, is a next-generation sequencing technology developed by Applied Biosystems and has been commercially available since 2008. These technologies generate hundreds of thousands of small sequence reads at one time. Well-known examples of such DNA sequencing methods include 454 pyrosequencing, the Solexa system and the SOLiD system. These methods have reduced the cost from $0.01/base in 2004 to nearly $0.0001/base in 2006 and increased the sequencing capacity from 1,000,000 bases/machine/day in 2004 to more than 100,000,000 bases/machine/day in 2006.

Molecular Inversion Probe (MIP) belongs to the class of Capture by Circularization molecular techniques for performing genomic partitioning, a process through which one captures and enriches specific regions of the genome. Probes used in this technique are single stranded DNA molecules and, similar to other genomic partitioning techniques, contain sequences that are complementary to the target in the genome; these probes hybridize to and capture the genomic target. MIP stands unique from other genomic partitioning strategies in that MIP probes share the common design of two genomic target complementary segments separated by a linker region. With this design, when the probe hybridizes to the target, it undergoes an inversion in configuration and circularizes. Specifically, the two target complementary regions at the 5’ and 3’ ends of the probe become adjacent to one another while the internal linker region forms a free hanging loop. The technology has been used extensively in the HapMap project for large-scale SNP genotyping as well as for studying gene copy alterations and characteristics of specific genomic loci to identify biomarkers for different diseases such as cancer. Key strengths of the MIP technology include its high specificity to the target and its scalability for high-throughput, multiplexed analyses where tens of thousands of genomic loci are assayed simultaneously.

COLD-PCR is a modified polymerase chain reaction (PCR) protocol that enriches variant alleles from a mixture of wildtype and mutation-containing DNA. The ability to preferentially amplify and identify minority alleles and low-level somatic DNA mutations in the presence of excess wildtype alleles is useful for the detection of mutations. Detection of mutations is important in the case of early cancer detection from tissue biopsies and body fluids such as blood plasma or serum, assessment of residual disease after surgery or chemotherapy, disease staging and molecular profiling for prognosis or tailoring therapy to individual patients, and monitoring of therapy outcome and cancer remission or relapse. Common PCR will amplify both the major (wildtype) and minor (mutant) alleles with the same efficiency, occluding the ability to easily detect the presence of low-level mutations. The capacity to detect a mutation in a mixture of variant/wildtype DNA is valuable because this mixture of variant DNAs can occur when provided with a heterogeneous sample – as is often the case with cancer biopsies. Currently, traditional PCR is used in tandem with a number of different downstream assays for genotyping or the detection of somatic mutations. These can include the use of amplified DNA for RFLP analysis, MALDI-TOF genotyping, or direct sequencing for detection of mutations by Sanger sequencing or pyrosequencing. Replacing traditional PCR with COLD-PCR for these downstream assays will increase the reliability in detecting mutations from mixed samples, including tumors and body fluids.

<span class="mw-page-title-main">DNA nanoball sequencing</span> DNA sequencing technology

DNA nanoball sequencing is a high throughput sequencing technology that is used to determine the entire genomic sequence of an organism. The method uses rolling circle replication to amplify small fragments of genomic DNA into DNA nanoballs. Fluorescent nucleotides bind to complementary nucleotides and are then polymerized to anchor sequences bound to known sequences on the DNA template. The base order is determined via the fluorescence of the bound nucleotides This DNA sequencing method allows large numbers of DNA nanoballs to be sequenced per run at lower reagent costs compared to other next generation sequencing platforms. However, a limitation of this method is that it generates only short sequences of DNA, which presents challenges to mapping its reads to a reference genome. After purchasing Complete Genomics, the Beijing Genomics Institute (BGI) refined DNA nanoball sequencing to sequence nucleotide samples on their own platform.

Massive parallel sequencing or massively parallel sequencing is any of several high-throughput approaches to DNA sequencing using the concept of massively parallel processing; it is also called next-generation sequencing (NGS) or second-generation sequencing. Some of these technologies emerged between 1993 and 1998 and have been commercially available since 2005. These technologies use miniaturized and parallelized platforms for sequencing of 1 million to 43 billion short reads per instrument run.

<span class="mw-page-title-main">Illumina dye sequencing</span> DNA sequencing method

Illumina dye sequencing is a technique used to determine the series of base pairs in DNA, also known as DNA sequencing. The reversible terminated chemistry concept was invented by Bruno Canard and Simon Sarfati at the Pasteur Institute in Paris. It was developed by Shankar Balasubramanian and David Klenerman of Cambridge University, who subsequently founded Solexa, a company later acquired by Illumina. This sequencing method is based on reversible dye-terminators that enable the identification of single nucleotides as they are washed over DNA strands. It can also be used for whole-genome and region sequencing, transcriptome analysis, metagenomics, small RNA discovery, methylation profiling, and genome-wide protein-nucleic acid interaction analysis.

Multiple Annealing and Looping Based Amplification Cycles (MALBAC) is a quasilinear whole genome amplification method. Unlike conventional DNA amplification methods that are non-linear or exponential, MALBAC utilizes special primers that allow amplicons to have complementary ends and therefore to loop, preventing DNA from being copied exponentially. This results in amplification of only the original genomic DNA and therefore reduces amplification bias. MALBAC is “used to create overlapped shotgun amplicons covering most of the genome”. For next generation sequencing, MALBAC is followed by regular PCR which is used to further amplify amplicons.

Magnetic sequencing is a single-molecule sequencing method in development. A DNA hairpin, containing the sequence of interest, is bound between a magnetic bead and a glass surface. A magnetic field is applied to stretch the hairpin open into single strands, and the hairpin refolds after decreasing of the magnetic field. The hairpin length can be determined by direct imaging of the diffraction rings of the magnetic beads using a simple microscope. The DNA sequences are determined by measuring the changes in the hairpin length following successful hybridization of complementary nucleotides.

SNV calling from NGS data is any of a range of methods for identifying the existence of single nucleotide variants (SNVs) from the results of next generation sequencing (NGS) experiments. These are computational techniques, and are in contrast to special experimental methods based on known population-wide single nucleotide polymorphisms. Due to the increasing abundance of NGS data, these techniques are becoming increasingly popular for performing SNP genotyping, with a wide variety of algorithms designed for specific experimental designs and applications. In addition to the usual application domain of SNP genotyping, these techniques have been successfully adapted to identify rare SNPs within a population, as well as detecting somatic SNVs within an individual using multiple tissue samples.

<span class="mw-page-title-main">Whole genome bisulfite sequencing</span>

Whole genome bisulfite sequencing is a next-generation sequencing technology used to determine the DNA methylation status of single cytosines by treating the DNA with sodium bisulfite before high-throughput DNA sequencing. The DNA methylation status at various genes can reveal information regarding gene regulation and transcriptional activities. This technique was developed in 2009 along with reduced representation bisulfite sequencing after bisulfite sequencing became the gold standard for DNA methylation analysis.

<span class="mw-page-title-main">Epitranscriptomic sequencing</span>

In epitranscriptomic sequencing, most methods focus on either (1) enrichment and purification of the modified RNA molecules before running on the RNA sequencer, or (2) improving or modifying bioinformatics analysis pipelines to call the modification peaks. Most methods have been adapted and optimized for mRNA molecules, except for modified bisulfite sequencing for profiling 5-methylcytidine which was optimized for tRNAs and rRNAs.

BLESS, also known as breaks labeling, enrichment on streptavidin and next-generation sequencing, is a method used to detect genome-wide double-strand DNA damage. In contrast to chromatin immunoprecipitation (ChIP)-based methods of identifying DNA double-strand breaks (DSBs) by labeling DNA repair proteins, BLESS utilizes biotinylated DNA linkers to directly label genomic DNA in situ which allows for high-specificity enrichment of samples on streptavidin beads and the subsequent sequencing-based DSB mapping to nucleotide resolution.

GUIDE-Seq is a molecular biology technique that allows for the unbiased in vitro detection of off-target genome editing events in DNA caused by CRISPR/Cas9 as well as other RNA-guided nucleases in living cells. Similar to LAM-PCR, it employs multiple PCRs to amplify regions of interest that contain a specific insert that preferentially integrates into double-stranded breaks. As gene therapy is an emerging field, GUIDE-Seq has gained traction as a cheap method to detect the off-target effects of potential therapeutics without needing whole genome sequencing.

References

  1. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 M. W. Schmitt, S. R. Kennedy, J. J. Salk, et al. “Detection of ultra-rare mutations by next-generation sequencing”. Proc. Natl. Acad. Sci., vol. 109 no. 36. 2012. PMID   22853953.
  2. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 S. R. Kennedy, M. W. Schmitt, E. J. Fox, B. F. Kohrn, et al. “Detecting ultra low-frequency mutations by Duplex Sequencing”. Nature Protoc., vol. 9 no. 11, 2586-606. 2014. PMID   25299156.
  3. T. E. Druley, F. L. M. Vallania, D. J. Wegner, et al. “Quantification of rare allelic variants from pooled genomic DNA” Nature Methods, vol. 6, no. 4, pp. 263–265, 2009. PMID   19252504.
  4. N. McGranahan and C. Swanton. “Biological and Therapeutic Impact of Intratumor Heterogeneity in Cancer Evolution” Cancer Cell, vol. 27, no. 1, pp. 15–26, 2015. PMID   25584892.
  5. C Bettegowda, M Sausen, RJ Leary, et al. “Detection of Circulating Tumor DNA in Early- and Late-Stage Human Malignancies”. Sci Transl Med, vol. 6, no. 224, p. 224ra24, 2014. PMID   24553385.
  6. 1 2 B. E. Miner, R. J. Stöger, A. F. Burden, et al. “Molecular barcodes detect redundancy and contamination in hairpin-bisulfite PCR” [ dead link ]. Nucleic Acids Res, vol. 32, no. 17, p. e135, 2004. PMID   15459281.
  7. M. L. McCloskey, R. Stoger, R. S. Hansen, et al.“Encoding PCR products with batch-stamps and barcodes”, Biochem. Genet., vol. 45, no. 11–12, pp. 761–767, 2007. PMID   17955361.
  8. 1 2 D. I. Lou, J. A. Hussmann, R. M. Mcbee, et al. “High-throughput DNA sequencing errors are reduced by orders of magnitude using circle sequencing”. Proc Natl Acad Sci U S A, vol. 110 no. 49, 19872–19877, 2013. PMID   24243955.
  9. A. Y. Maslov, W. Quispe-Tintaya, T. Gorbacheva, R. R. White, and J. Vijg, “High-throughput sequencing in mutation detection: A new generation of genotoxicity tests?”, Mutat. Res., vol. 776, pp. 136–43, 2015. PMID   25934519.
  10. 1 2 3 E. J. Fox, K. S. Reid-Bayliss, M. J. Emond, et al. “Accuracy of Next Generation Sequencing Platforms”. Next Gener Seq Appl., pp. 1–9, 2015. PMID   25699289.
  11. M. Costello, T. J. Pugh, T. J. Fennell, et al. “Discovery and characterization of artifactual mutations in deep coverage targeted capture sequencing data due to oxidative DNA damage during sample preparation”. Nucleic Acids Res., vol. 41, no. 6, pp. 1–12, 2013. PMID   23303777.
  12. M. W. Schmitt, E. J. Fox, M. J. Prindle, et al. “Sequencing small genomic targets with high efficiency and extreme accuracy”. Nat Methods, vol. 12, no. 5, pp. 423–425, 2015. PMID   2584963.