DNA nanoball sequencing

Last updated
Workflow for DNA nanoball sequencing Flowchart of library construction and sequencing for BGISEQ-500 DNA nanoball sequencing.jpg
Workflow for DNA nanoball sequencing

DNA nanoball sequencing is a high throughput sequencing technology that is used to determine the entire genomic sequence of an organism. The method uses rolling circle replication to amplify small fragments of genomic DNA into DNA nanoballs. Fluorescent nucleotides bind to complementary nucleotides and are then polymerized to anchor sequences bound to known sequences on the DNA template. The base order is determined via the fluorescence of the bound nucleotides [2] This DNA sequencing method allows large numbers of DNA nanoballs to be sequenced per run at lower reagent costs compared to other next generation sequencing platforms. [3] However, a limitation of this method is that it generates only short sequences of DNA, which presents challenges to mapping its reads to a reference genome. [2] After purchasing Complete Genomics, the Beijing Genomics Institute (BGI) refined DNA nanoball sequencing to sequence nucleotide samples on their own platform. [4] [5]

Contents

Procedure

DNA Nanoball Sequencing involves isolating DNA that is to be sequenced, shearing it into small 100 – 350 base pair (bp) fragments, ligating adapter sequences to the fragments, and circularizing the fragments. The circular fragments are copied by rolling circle replication resulting in many single-stranded copies of each fragment. The DNA copies concatenate head to tail in a long strand, and are compacted into a DNA nanoball. The nanoballs are then adsorbed onto a sequencing flow cell. The color of the fluorescence at each interrogated position is recorded through a high-resolution camera. Bioinformatics are used to analyze the fluorescence data and make a base call, and for mapping or quantifying the 50bp, 100bp, or 150bp single- or paired-end reads. [6] [2]

DNA Isolation, fragmentation, and size capture

Cells are lysed and DNA is extracted from the cell lysate. The high-molecular-weight DNA, often several megabase pairs long, is fragmented by physical or enzymatic methods to break the DNA double-strands at random intervals. Bioinformatic mapping of the sequencing reads is most efficient when the sample DNA contains a narrow length range. [7] For small RNA sequencing, selection of the ideal fragment lengths for sequencing is performed by gel electrophoresis; [8] for sequencing of larger fragments, DNA fragments are separated by bead-based size selection. [9]

Attaching adapter sequences

Adapter DNA sequences must be attached to the unknown DNA fragment so that DNA segments with known sequences flank the unknown DNA. In the first round of adapter ligation, right (Ad153_right) and left (Ad153_left) adapters are attached to the right and left flanks of the fragmented DNA, and the DNA is amplified by PCR. A splint oligo then hybridizes to the ends of the fragments which are ligated to form a circle. An exonuclease is added to remove all remaining linear single-stranded and double-stranded DNA products. The result is a completed circular DNA template. [2]

Rolling circle replication

Once a single-stranded circular DNA template is created, containing sample DNA that is ligated to two unique adapter sequences has been generated, the full sequence is amplified into a long string of DNA. This is accomplished by rolling circle replication with the Phi 29 DNA polymerase which binds and replicates the DNA template. The newly synthesized strand is released from the circular template, resulting in a long single-stranded DNA comprising several head-to-tail copies of the circular template. [10] The resulting nanoparticle self-assembles into a tight ball of DNA approximately 300 nanometers (nm) across. Nanoballs remain separated from each other because they are negatively charged naturally repel each other, reducing any tangling between different single stranded DNA lengths. [2]

DNA nanoball creation and adsorption to the patterned array flowcell BGI DNA nanoball tube to array 2.png
DNA nanoball creation and adsorption to the patterned array flowcell

DNA nanoball patterned array

To obtain DNA sequence, the DNA nanoballs are attached to a patterned array flow cell. The flow cell is a silicon wafer coated with silicon dioxide, titanium, hexamethyldisilazane (HMDS), and a photoresist material. The DNA nanoballs are added to the flow cell and selectively bind to the positively-charged aminosilane in a highly ordered pattern, allowing a very high density of DNA nanoballs to be sequenced. [2] [11]

Imaging

After each DNA nucleotide incorporation step, the flow cell is imaged to determine which nucleotide base bound to the DNA nanoball. The fluorophore is excited with a laser that excites specific wavelengths of light. The emission of fluorescence from each DNA nanoball is captured on a high resolution CCD camera. The image is then processed to remove background noise and assess the intensity of each point. The color of each DNA nanoball corresponds to a base at the interrogative position and a computer records the base position information. [2]

Sequencing data format

The data generated from the DNA nanoballs is formatted as standard FASTQ formatted files with contiguous bases (no gaps). These files can be used in any data analysis pipeline that is configured to read single-end or paired-end FASTQ files.

For example:

Read 1, from a 100bp paired end run from [12]

 @CL100011513L1C001R013_126365/1  CTAGGCAACTATAGGTCTCAGTTAAGTCAAATAAAATTCACATCAAATTTTTACTCCCACCATCCCAACACTTTCCTGCCTGGCATATGCCGTGTCTGCC  +  FFFFFFFFFFFGFGFFFFFF;FFFFFFFGFGFGFFFFFF;FFFFGFGFGFFEFFFFFEDGFDFF@FCFGFGCFFFFFEFFEGDFDFFFFFGDAFFEFGFF

Corresponding Read 2:

 @CL100011513L1C001R013_126365/2  TGTCTACCATATTCTACATTCCACACTCGGTGAGGGAAGGTAGGCACATAAAGCAATGGCAGTACGGTGTAATACATGCTAATGTAGAGTAAGCACTCAG  +  3E9E<ADEBB:D>E?FD<<@EFE>>ECEF5CE:B6E:CEE?6B>B+@??31/FD:0?@:E9<3FE2/A:/8>9CB&=E<7:-+>;29:7+/5D9)?5F/:

Informatics Tips

Reference Genome Alignment

Default parameters for the popular aligners are sufficient.

Read Names

In the FASTQ file created by BGI/MGI sequencers using DNA nanoballs on a patterned array flowcell, the read names look like this:

Anatomy of a BGI sequencer read name BGI readname anatomy.png
Anatomy of a BGI sequencer read name
Anatomy of an MGI sequencer read name MGI readname anatomy.png
Anatomy of an MGI sequencer read name

BGISEQ-500: CL100025298L1C002R050_244547

MGISEQ-2000: V100006430L1C001R018613883

Read names can be parsed to extract three variables describing the physical location of the read on the patterned array: (1) tile/region, (2) x coordinate, and (3) y coordinate. Note that, due to the order of these variables, these read names cannot be natively parsed by Picard MarkDuplicates in order to identify optical duplicates. However, as there are none on this platform, this poses no problem to Picard-based data analysis.

Duplicates

Because DNA nanoballs remain confined their spots on the patterned array there are no optical duplicates to contend with during bioinformatics analysis of sequencing reads. It is suggested to run Picard MarkDuplicates as follows:

java -jar picard.jar MarkDuplicates I=input.bam O=marked_duplicates.bam M=marked_dup_metrics.txt READ_NAME_REGEX=null

A test with Picard-friendly, reformatted read names demonstrates the absence of this class of duplicate read:

Test of Picard MarkDuplicates varying the OPTICAL_DUPLICATE_PIXEL_DISTANCE parameter BGISEQ optical duplicate test.png
Test of Picard MarkDuplicates varying the OPTICAL_DUPLICATE_PIXEL_DISTANCE parameter

The single read marked as an optical duplicate is most assuredly artefactual. In any case, the effect on the estimated library size is negligible.

Advantages

DNA nanoball sequencing technology offers some advantages over other sequencing platforms. One advantage is the eradication of optical duplicates. DNA nanoballs remain in place on the patterned array and do not interfere with neighboring nanoballs.

Another advantage of DNA nanoball sequencing include the use of high-fidelity Phi 29 DNA polymerase [10] to ensure accurate amplification of the circular template, several hundred copies of the circular template compacted into a small area resulting in an intense signal, and attachment of the fluorophore to the probe at a long distance from the ligation point results in improved ligation. [2]

Disadvantages

The main disadvantage of DNA nanoball sequencing is the short read length of the DNA sequences obtained with this method. [2] Short reads, especially for DNA high in DNA repeats, may map to two or more regions of the reference genome. A second disadvantage of this method is that multiple rounds of PCR have to be used. This can introduce PCR bias and possibly amplify contaminants in the template construction phase. [2] However, these disadvantages are common to all short-read sequencing platforms are not specific to DNA nanoballs.

Applications

DNA nanoball sequencing has been used in recent studies. Lee et al. used this technology to find mutations that were present in a lung cancer and compared them to normal lung tissue. [13] They were able to identify over 50,000 single nucleotide variants. Roach et al. used DNA nanoball sequencing to sequence the genomes of a family of four relatives and were able to identify SNPs that may be responsible for a Mendelian disorder, [14] and were able to estimate the inter-generation mutation rate. [14] The Institute for Systems Biology has used this technology to sequence 615 complete human genome samples as part of a survey studying neurodegenerative diseases, and the National Cancer Institute is using DNA nanoball sequencing to sequence 50 tumours and matched normal tissues from pediatric cancers.[ citation needed ]

Significance

Massively parallel next generation sequencing platforms like DNA nanoball sequencing may contribute to the diagnosis and treatment of many genetic diseases. The cost of sequencing an entire human genome has fallen from about one million dollars in 2008, to $4400 in 2010 with the DNA nanoball technology. [15] Sequencing the entire genomes of patients with heritable diseases or cancer, mutations associated with these diseases have been identified, opening up strategies, such as targeted therapeutics for at-risk people and for genetic counseling. [15] As the price of sequencing an entire human genome approaches the $1000 mark, genomic sequencing of every individual may become feasible as part of normal preventative medicine. [15]

Related Research Articles

In genetics and biochemistry, sequencing means to determine the primary structure of an unbranched biopolymer. Sequencing results in a symbolic linear depiction known as a sequence which succinctly summarizes much of the atomic-level structure of the sequenced molecule.

<span class="mw-page-title-main">DNA synthesis</span>

DNA synthesis is the natural or artificial creation of deoxyribonucleic acid (DNA) molecules. DNA is a macromolecule made up of nucleotide units, which are linked by covalent bonds and hydrogen bonds, in a repeating structure. DNA synthesis occurs when these nucleotide units are joined to form DNA; this can occur artificially or naturally. Nucleotide units are made up of a nitrogenous base, pentose sugar (deoxyribose) and phosphate group. Each unit is joined when a covalent bond forms between its phosphate group and the pentose sugar of the next nucleotide, forming a sugar-phosphate backbone. DNA is a complementary, double stranded structure as specific base pairing occurs naturally when hydrogen bonds form between the nucleotide bases.

<span class="mw-page-title-main">DNA sequencing</span> Process of determining the nucleic acid sequence

DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. The advent of rapid DNA sequencing methods has greatly accelerated biological and medical research and discovery.

<span class="mw-page-title-main">Sanger sequencing</span> Method of DNA sequencing developed in 1977

Sanger sequencing is a method of DNA sequencing that involves electrophoresis and is based on the random incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication. After first being developed by Frederick Sanger and colleagues in 1977, it became the most widely used sequencing method for approximately 40 years. It was first commercialized by Applied Biosystems in 1986. More recently, higher volume Sanger sequencing has been replaced by next generation sequencing methods, especially for large-scale, automated genome analyses. However, the Sanger method remains in wide use for smaller-scale projects and for validation of deep sequencing results. It still has the advantage over short-read sequencing technologies in that it can produce DNA sequence reads of > 500 nucleotides and maintains a very low error rate with accuracies around 99.99%. Sanger sequencing is still actively being used in efforts for public health initiatives such as sequencing the spike protein from SARS-CoV-2 as well as for the surveillance of norovirus outbreaks through the Center for Disease Control and Prevention's (CDC) CaliciNet surveillance network.

A genomic library is a collection of overlapping DNA fragments that together make up the total genomic DNA of a single organism. The DNA is stored in a population of identical vectors, each containing a different insert of DNA. In order to construct a genomic library, the organism's DNA is extracted from cells and then digested with a restriction enzyme to cut the DNA into fragments of a specific size. The fragments are then inserted into the vector using DNA ligase. Next, the vector DNA can be taken up by a host organism - commonly a population of Escherichia coli or yeast - with each cell containing only one vector molecule. Using a host cell to carry the vector allows for easy amplification and retrieval of specific clones from the library for analysis.

454 Life Sciences was a biotechnology company based in Branford, Connecticut that specialized in high-throughput DNA sequencing. It was acquired by Roche in 2007 and shut down by Roche in 2013 when its technology became noncompetitive, although production continued until mid-2016.

SNP genotyping is the measurement of genetic variations of single nucleotide polymorphisms (SNPs) between members of a species. It is a form of genotyping, which is the measurement of more general genetic variation. SNPs are one of the most common types of genetic variation. An SNP is a single base pair mutation at a specific locus, usually consisting of two alleles. SNPs are found to be involved in the etiology of many human diseases and are becoming of particular interest in pharmacogenetics. Because SNPs are conserved during evolution, they have been proposed as markers for use in quantitative trait loci (QTL) analysis and in association studies in place of microsatellites. The use of SNPs is being extended in the HapMap project, which aims to provide the minimal set of SNPs needed to genotype the human genome. SNPs can also provide a genetic fingerprint for use in identity testing. The increase of interest in SNPs has been reflected by the furious development of a diverse range of SNP genotyping methods.

Multiple displacement amplification (MDA) is a DNA amplification technique. This method can rapidly amplify minute amounts of DNA samples to a reasonable quantity for genomic analysis. The reaction starts by annealing random hexamer primers to the template: DNA synthesis is carried out by a high fidelity enzyme, preferentially Φ29 DNA polymerase. Compared with conventional PCR amplification techniques, MDA does not employ sequence-specific primers but amplifies all DNA, generates larger-sized products with a lower error frequency, and works at a constant temperature. MDA has been actively used in whole genome amplification (WGA) and is a promising method for application to single cell genome sequencing and sequencing-based genetic studies.

Paired-end tags (PET) are the short sequences at the 5’ and 3' ends of a DNA fragment which are unique enough that they (theoretically) exist together only once in a genome, therefore making the sequence of the DNA in between them available upon search or upon further sequencing. Paired-end tags (PET) exist in PET libraries with the intervening DNA absent, that is, a PET "represents" a larger fragment of genomic or cDNA by consisting of a short 5' linker sequence, a short 5' sequence tag, a short 3' sequence tag, and a short 3' linker sequence. It was shown conceptually that 13 base pairs are sufficient to map tags uniquely. However, longer sequences are more practical for mapping reads uniquely. The endonucleases used to produce PETs give longer tags but sequences of 50–100 base pairs would be optimal for both mapping and cost efficiency. After extracting the PETs from many DNA fragments, they are linked (concatenated) together for efficient sequencing. On average, 20–30 tags could be sequenced with the Sanger method, which has a longer read length. Since the tag sequences are short, individual PETs are well suited for next-generation sequencing that has short read lengths and higher throughput. The main advantages of PET sequencing are its reduced cost by sequencing only short fragments, detection of structural variants in the genome, and increased specificity when aligning back to the genome compared to single tags, which involves only one end of the DNA fragment.

Optical mapping is a technique for constructing ordered, genome-wide, high-resolution restriction maps from single, stained molecules of DNA, called "optical maps". By mapping the location of restriction enzyme sites along the unknown DNA of an organism, the spectrum of resulting DNA fragments collectively serves as a unique "fingerprint" or "barcode" for that sequence. Originally developed by Dr. David C. Schwartz and his lab at NYU in the 1990s this method has since been integral to the assembly process of many large-scale sequencing projects for both microbial and eukaryotic genomes. Later technologies use DNA melting, DNA competitive binding or enzymatic labelling in order to create the optical mappings.

<span class="mw-page-title-main">Restriction site associated DNA markers</span> Type of genetic marker

Restriction site associated DNA (RAD) markers are a type of genetic marker which are useful for association mapping, QTL-mapping, population genetics, ecological genetics and evolutionary genetics. The use of RAD markers for genetic mapping is often called RAD mapping. An important aspect of RAD markers and mapping is the process of isolating RAD tags, which are the DNA sequences that immediately flank each instance of a particular restriction site of a restriction enzyme throughout the genome. Once RAD tags have been isolated, they can be used to identify and genotype DNA sequence polymorphisms mainly in form of single nucleotide polymorphisms (SNPs). Polymorphisms that are identified and genotyped by isolating and analyzing RAD tags are referred to as RAD markers. Although genotyping by sequencing presents an approach similar to the RAD-seq method, they differ in some substantial ways.

Massive parallel sequencing or massively parallel sequencing is any of several high-throughput approaches to DNA sequencing using the concept of massively parallel processing; it is also called next-generation sequencing (NGS) or second-generation sequencing. Some of these technologies emerged between 1993 and 1998 and have been commercially available since 2005. These technologies use miniaturized and parallelized platforms for sequencing of 1 million to 43 billion short reads per instrument run.

<span class="mw-page-title-main">Illumina dye sequencing</span> DNA sequencing method

Illumina dye sequencing is a technique used to determine the series of base pairs in DNA, also known as DNA sequencing. The reversible terminated chemistry concept was invented by Bruno Canard and Simon Sarfati at the Pasteur Institute in Paris. It was developed by Shankar Balasubramanian and David Klenerman of Cambridge University, who subsequently founded Solexa, a company later acquired by Illumina. This sequencing method is based on reversible dye-terminators that enable the identification of single nucleotides as they are washed over DNA strands. It can also be used for whole-genome and region sequencing, transcriptome analysis, metagenomics, small RNA discovery, methylation profiling, and genome-wide protein-nucleic acid interaction analysis.

<span class="mw-page-title-main">Reduced representation bisulfite sequencing</span> Methylation process

Reduced representation bisulfite sequencing (RRBS) is an efficient and high-throughput technique for analyzing the genome-wide methylation profiles on a single nucleotide level. It combines restriction enzymes and bisulfite sequencing to enrich for areas of the genome with a high CpG content. Due to the high cost and depth of sequencing to analyze methylation status in the entire genome, Meissner et al. developed this technique in 2005 to reduce the amount of nucleotides required to sequence to 1% of the genome. The fragments that comprise the reduced genome still include the majority of promoters, as well as regions such as repeated sequences that are difficult to profile using conventional bisulfite sequencing approaches.

Multiple Annealing and Looping Based Amplification Cycles (MALBAC) is a quasilinear whole genome amplification method. Unlike conventional DNA amplification methods that are non-linear or exponential, MALBAC utilizes special primers that allow amplicons to have complementary ends and therefore to loop, preventing DNA from being copied exponentially. This results in amplification of only the original genomic DNA and therefore reduces amplification bias. MALBAC is “used to create overlapped shotgun amplicons covering most of the genome”. For next generation sequencing, MALBAC is followed by regular PCR which is used to further amplify amplicons.

Single-cell DNA template strand sequencing, or Strand-seq, is a technique for the selective sequencing of a daughter cell's parental template strands. This technique offers a wide variety of applications, including the identification of sister chromatid exchanges in the parental cell prior to segregation, the assessment of non-random segregation of sister chromatids, the identification of misoriented contigs in genome assemblies, de novo genome assembly of both haplotypes in diploid organisms including humans, whole-chromosome haplotyping, and the identification of germline and somatic genomic structural variation, the latter of which can be detected robustly even in single cells.

G&T-seq is a novel form of single cell sequencing technique allowing one to simultaneously obtain both transcriptomic and genomic data from single cells, allowing for direct comparison of gene expression data to its corresponding genomic data in the same cell...

<span class="mw-page-title-main">Duplex sequencing</span>

Duplex sequencing is a library preparation and analysis method for next-generation sequencing (NGS) platforms that employs random tagging of double-stranded DNA to detect mutations with higher accuracy and lower error rates.

BLESS, also known as breaks labeling, enrichment on streptavidin and next-generation sequencing, is a method used to detect genome-wide double-strand DNA damage. In contrast to chromatin immunoprecipitation (ChIP)-based methods of identifying DNA double-strand breaks (DSBs) by labeling DNA repair proteins, BLESS utilizes biotinylated DNA linkers to directly label genomic DNA in situ which allows for high-specificity enrichment of samples on streptavidin beads and the subsequent sequencing-based DSB mapping to nucleotide resolution.

<span class="mw-page-title-main">Ribose-seq</span> Genetic mapping technique

Ribose-seq is a mapping technique used in genetics research to determine the full profile of embedded ribonucleotides, specifically ribonucleoside monophosphates (rNMPs), in genomic DNA. Embedded ribonucleotides are thought to be the most common alteration to DNA in cells, and their presence in genomic DNA can affect genome stability. As recent studies have suggested that ribonucleotides in mouse DNA may affect disease pathology, ribonucleotide incorporation in genomic DNA has become an important target of medical genetics research. Ribose-seq allows scientists to determine the precise location and type of ribonucleotides that have been incorporated into eukaryotic or prokaryotic DNA.

References

  1. Huang, Jie; Liang, Xinming; Xuan, Yuankai; Geng, Chunyu; Li, Yuxiang; Lu, Haorong; Qu, Shoufang; Mei, Xianglin; Chen, Hongbo; Yu, Ting; Sun, Nan; Rao, Junhua; Wang, Jiahao; Zhang, Wenwei; Chen, Ying; Liao, Sha; Jiang, Hui; Liu, Xin; Yang, Zhaopeng; Mu, Feng; Gao, Shangxian (2017). "A reference human genome dataset of the BGISEQ-500 sequencer". GigaScience. 6 (5): 1–9. doi:10.1093/gigascience/gix024. ISSN   2047-217X. PMC   5467036 . PMID   28379488.
  2. 1 2 3 4 5 6 7 8 9 10 Drmanac, R.; Sparks, A. B.; Callow, M. J.; Halpern, A. L.; Burns, N. L.; Kermani, B. G.; Carnevali, P.; Nazarenko, I.; et al. (2009). "Human Genome Sequencing Using Unchained Base Reads on Self-Assembling DNA Nanoarrays". Science. 327 (5961): 78–81. Bibcode:2010Sci...327...78D. doi: 10.1126/science.1181498 . PMID   19892942. S2CID   17309571.
  3. Porreca, Gregory J (2010). "Genome sequencing on nanoballs". Nature Biotechnology. 28 (1): 43–4. doi:10.1038/nbt0110-43. PMID   20062041. S2CID   54557996.
  4. "BGI-Shenzhen Completes Acquisition of Complete Genomics" (Press release). PR Newswire.
  5. "Revolocity™ Whole Genome Sequencing Technology Overview" (PDF). Complete Genomics. Retrieved 18 November 2017.
  6. Huang, J. (2017). "A reference human genome dataset of the BGISEQ-500 sequencer". GigaScience. 6 (5): 1–9. doi:10.1093/gigascience/gix024. PMC   5467036 . PMID   28379488.
  7. Fullwood, M. J.; Wei, C.-L.; Liu, E. T.; Ruan, Y. (2009). "Next-generation DNA sequencing of paired-end tags (PET) for transcriptome and genome analyses". Genome Research. 19 (4): 521–32. doi:10.1101/gr.074906.107. PMC   3807531 . PMID   19339662.
  8. Fehlmann, T. (2016). "cPAS-based sequencing on the BGISEQ-500 to explore small non-coding RNAs". Clin Epigenetics. 8: 123. doi: 10.1186/s13148-016-0287-1 . PMC   5117531 . PMID   27895807.
  9. Muller, W. (1982). "Size Fractionation of DNA Fragments Ranging from 20 to 30000 Base Pairs by Liquid/Liquid Chromatography". Eur J Biochem. 128 (1): 231–238. doi: 10.1111/j.1432-1033.1982.tb06956.x . PMID   7173204.
  10. 1 2 Blanco, Luis; Bernad, Antonio; Lázaro, José M.; Martin, Gil; Garmendia, Cristina; Margarita, M; Salas (1989). "Highly efficient DNA synthesis by the phage phi 29 DNA polymerase. Symmetrical mode of DNA replication". The Journal of Biological Chemistry. 264 (15): 8935–40. doi: 10.1016/S0021-9258(18)81883-X . PMID   2498321.
  11. Chrisey, L.; Lee, GU; O'Ferrall, CE (1996). "Covalent attachment of synthetic DNA to self-assembled monolayer films". Nucleic Acids Research. 24 (15): 3031–9. doi:10.1093/nar/24.15.3031. PMC   146042 . PMID   8760890.
  12. "An updated reference human genome dataset of the BGISEQ-500 sequencer". GigaDB. Retrieved 22 March 2017.
  13. Lee, William; Jiang, Zhaoshi; Liu, Jinfeng; Haverty, Peter M.; Guan, Yinghui; Stinson, Jeremy; Yue, Peng; Zhang, Yan; et al. (2010). "The mutation spectrum revealed by paired genome sequences from a lung cancer patient". Nature. 465 (7297): 473–7. Bibcode:2010Natur.465..473L. doi:10.1038/nature09004. PMID   20505728. S2CID   4354035.
  14. 1 2 Roach, J. C.; Glusman, G.; Smit, A. F. A.; Huff, C. D.; Hubley, R.; Shannon, P. T.; Rowen, L.; Pant, K. P.; et al. (2010). "Analysis of Genetic Inheritance in a Family Quartet by Whole-Genome Sequencing". Science. 328 (5978): 636–9. Bibcode:2010Sci...328..636R. doi:10.1126/science.1186802. PMC   3037280 . PMID   20220176.
  15. 1 2 3 Speicher, Michael R; Geigl, Jochen B; Tomlinson, Ian P (2010). "Effect of genome-wide association studies, direct-to-consumer genetic testing, and high-speed sequencing technologies on predictive genetic counselling for cancer risk". The Lancet Oncology. 11 (9): 890–8. doi:10.1016/S1470-2045(09)70359-6. PMID   20537948.