Multiple Annealing and Looping Based Amplification Cycles (MALBAC) is a quasilinear whole genome amplification method. Unlike conventional DNA amplification methods that are non-linear or exponential (in each cycle, DNA copied can serve as template for subsequent cycles), MALBAC utilizes special primers that allow amplicons to have complementary ends and therefore to loop, preventing DNA from being copied exponentially. This results in amplification of only the original genomic DNA and therefore reduces amplification bias. MALBAC is “used to create overlapped shotgun amplicons covering most of the genome”. [1] For next generation sequencing, MALBAC is followed by regular PCR which is used to further amplify amplicons.
Prior to MALBAC, a single cell is isolated by various methods including laser capture microdissection, microfluidic devices, flow cytometry, or micro pipetting, then lysed. MALBAC single-cell whole-genome amplification involves 5 cycles of quenching, extending, melting, and looping.
The major advantage of MALBAC is that DNA is amplified almost linearly. The utilization of specialized primers enables looping of amplicons which then prevents them from being further amplified in subsequent cycles of MALBAC. These primers are 35 nucleotides long, with 8 variable nucleotides that hybridize to the templates and 27 common nucleotides. [1] The common nucleotide sequence is GTG AGT GAT GGT TGA GGT AGT GTG GAG. The 8 variable nucleotides anneal randomly to the single stranded genomic DNA molecule. After one extension, semi-amplicon, an amplicon containing the common nucleotide sequence on only the 5’ end, is made. This semi-amplicon is used as a template for another round of extension, which then results in a full-amplicon, an amplicon where the 3’ end is complementary to the sequence on the 5’ end.
MALBAC primers have variable components which allow them to randomly bind to the template DNA. This means that on a single fragment at any cycle, there could be multiple primers annealed to the fragment. A DNA polymerase such as one derived from Bacillus stearothermophilus (Bst polymerase) is able to displace the 5’ end of another upstream strand growing in the same direction. [2]
Bst DNA polymerase has an error rate of 1/10000 bases. [3]
At the end of PCR, picograms of genetic material is amplified to microgram of DNA, yielding enough DNA to be sequenced.
MALBAC offers an unbiased approach to the amplification of DNA from a single cell. This method of single cell sequencing has a vast number of applications, many of which have yet to be exploited. MALBAC may aid in the analysis of forensic specimens, in pre-natal screening for genetic diseases, in understanding the development of reproductive cells, or in elucidating the complexity of a tumour. [1] [4] At its foundation, this technology allows researchers to observe the frequency with which mutations accumulate in single cells. [1] Moreover, it permits the detection of chromosomal abnormalities and gene copy number variations (CNVs) within and between cells, and further facilitates the detection of uncommon mutations that result in single nucleotide polymorphisms (SNPs). [1]
In the field of cancer research, MALBAC has many applications. It may be used to examine intratumor heterogeneity, to identify genes which may confer an aggressive or metastatic phenotype, or to evaluate the potential for a tumour to develop drug resistance. [4] [5] A pioneering application of MALBAC was published in a December 2012 issue of Science and described the use of this technology to measure the mutation rate of the colon cancer cell line SW4802. [1] By sequencing the amplified DNA of three kindred colon cancer cells in parallel with unrelated colon cancer cells from a different lineage, SNPs were identified with no false positives detected. [1] It was also observed that purine-pyrimidine transversions occurred at a high frequency among the SNPs. [1] The characterization of copy number and single nucleotide variations of single colon cancer cells highlighted the heterogeneity present within a tumour. [1]
MALBAC has been applied as a method to examine the genetic diversity amongst reproductive cells. By sequencing the genomes of 99 individual human sperm cells from an anonymous donor, MALBAC was used to examine genetic recombination events involving single gametes and ultimately provide insight into the dynamics of genetic recombination and its contribution to male infertility. [6] Additionally, within an individual sperm, MALBAC identified duplicated or missing chromosomes, as well as SNPs or CNVs which could negatively affect fertility. [6]
MALBAC has resulted in many significant advances over other single cell sequencing techniques, foremost that it can report 93% of the genome of a single human cell. [1] Some advantages of this technology include reduced amplification bias and increased genome coverage, the requirement for very little template DNA, and low rates of false positive and false negative mutations. [4] [6]
MALBAC is a form of whole genome sequencing which reduces the bias associated with exponential PCR amplification by using a quasilinear phase of pre-amplification. [1] MALBAC utilizes five cycles of pre-amplification and primers containing a 27 nucleotide common sequence and an 8 nucleotide variable sequence to produce fragments of amplified DNA (amplicons) which loop back on themselves to prevent additional copying and cross-hybridization. [1] [7] These loops cannot be used as a template for amplification during MALBAC and therefore reduce the amplification bias commonly associated with the uneven exponential amplification of DNA fragments by polymerase chain reaction. [1] MALBAC has been described to have better amplification uniformity than other methods of single sequencing, such as multiple displacement amplification (MDA). [1] [5] MDA does not utilize DNA looping and amplifies DNA in an exponential fashion, resulting in bias. [1] Accordingly, the amplification bias associated with other single cell sequencing methods results in low coverage of the genome. [1] [5] The reduced bias associated with MALBAC has generated better genome sequence coverage than other single cell sequencing methods.
MALBAC can be used to amplify and subsequently sequence DNA when only one or a few cells are available, such as in the analysis of circulating tumour cells, pre-natal screens or forensic samples. [4] [7] Only a small amount of starting template (picograms of DNA) is required to initiate the process, and therefore it is an ideal method for the sequencing of a single human cell. [1]
Single cell sequencing often has a high rate of false negative mutations. [1] A false negative mutation rate is defined as the probability of not detecting a real mutation, and this may occur due to amplification bias resulting from the loss, or drop-out, of an allele. [8] The sequence coverage uniformity of MALBAC in comparison to other single cell sequencing techniques has enhanced the detection of SNPs and reduced allele dropout rate. [1] Allelic dropout rate increases when an allele of a heterozygote fails to amplify resulting in identification of a ‘false homozygote.’ This may occur due to low concentration of DNA template, or the uneven amplification of template resulting in one allele of a heterozygote being copied more than the other. [8] The allele dropout rate of MALBAC has been shown to be much lower (approximately 1%) compared to MDA which is approximately 65%. In contrast to MDA which has been shown to have a 41% SNP detection efficiency in comparison with bulk sequencing, MALBAC has been reported to have SNP detection efficiency of 76%. [1] MALBAC has also been reported to have a low false positive rate. False positive mutations generated by MALBAC largely result from errors introduced by DNA polymerase during the first cycle of amplification that are further propagated during subsequent cycles. This false positive rate can be eliminated by sequencing 2-3 cells within a lineage derived from a single cell to verify the presence of a SNP, and by eliminating sequencing and amplification errors by sequencing unrelated cells from a separate lineage. [1]
The polymerase chain reaction (PCR) is a method widely used to rapidly make millions to billions of copies of a specific DNA sample, allowing scientists to take a very small sample of DNA and amplify it to a large enough amount to study in detail. PCR was invented in 1983 by the American biochemist Kary Mullis at Cetus Corporation; Mullis and biochemist Michael Smith, who had developed other essential ways of manipulating DNA, were jointly awarded the Nobel Prize in Chemistry in 1993.
In molecular biology, restriction fragment length polymorphism (RFLP) is a technique that exploits variations in homologous DNA sequences, known as polymorphisms, in order to distinguish individuals, populations, or species or to pinpoint the locations of genes within a sequence. The term may refer to a polymorphism itself, as detected through the differing locations of restriction enzyme sites, or to a related laboratory technique by which such differences can be illustrated. In RFLP analysis, a DNA sample is digested into fragments by one or more restriction enzymes, and the resulting restriction fragments are then separated by gel electrophoresis according to their size.
This is a list of topics in molecular biology. See also index of biochemistry articles.
Sanger sequencing is a method of DNA sequencing that involves electrophoresis and is based on the random incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication. After first being developed by Frederick Sanger and colleagues in 1977, it became the most widely used sequencing method for approximately 40 years. It was first commercialized by Applied Biosystems in 1986. More recently, higher volume Sanger sequencing has been replaced by next generation sequencing methods, especially for large-scale, automated genome analyses. However, the Sanger method remains in wide use for smaller-scale projects and for validation of deep sequencing results. It still has the advantage over short-read sequencing technologies in that it can produce DNA sequence reads of > 500 nucleotides and maintains a very low error rate with accuracies around 99.99%. Sanger sequencing is still actively being used in efforts for public health initiatives such as sequencing the spike protein from SARS-CoV-2 as well as for the surveillance of norovirus outbreaks through the Center for Disease Control and Prevention's (CDC) CaliciNet surveillance network.
Genotyping is the process of determining differences in the genetic make-up (genotype) of an individual by examining the individual's DNA sequence using biological assays and comparing it to another individual's sequence or a reference sequence. It reveals the alleles an individual has inherited from their parents. Traditionally genotyping is the use of DNA sequences to define biological populations by use of molecular tools. It does not usually involve defining the genes of an individual.
SNP genotyping is the measurement of genetic variations of single nucleotide polymorphisms (SNPs) between members of a species. It is a form of genotyping, which is the measurement of more general genetic variation. SNPs are one of the most common types of genetic variation. An SNP is a single base pair mutation at a specific locus, usually consisting of two alleles. SNPs are found to be involved in the etiology of many human diseases and are becoming of particular interest in pharmacogenetics. Because SNPs are conserved during evolution, they have been proposed as markers for use in quantitative trait loci (QTL) analysis and in association studies in place of microsatellites. The use of SNPs is being extended in the HapMap project, which aims to provide the minimal set of SNPs needed to genotype the human genome. SNPs can also provide a genetic fingerprint for use in identity testing. The increase of interest in SNPs has been reflected by the furious development of a diverse range of SNP genotyping methods.
Bisulfitesequencing (also known as bisulphite sequencing) is the use of bisulfite treatment of DNA before routine sequencing to determine the pattern of methylation. DNA methylation was the first discovered epigenetic mark, and remains the most studied. In animals it predominantly involves the addition of a methyl group to the carbon-5 position of cytosine residues of the dinucleotide CpG, and is implicated in repression of transcriptional activity.
2 Base Encoding, also called SOLiD, is a next-generation sequencing technology developed by Applied Biosystems and has been commercially available since 2008. These technologies generate hundreds of thousands of small sequence reads at one time. Well-known examples of such DNA sequencing methods include 454 pyrosequencing, the Solexa system and the SOLiD system. These methods have reduced the cost from $0.01/base in 2004 to nearly $0.0001/base in 2006 and increased the sequencing capacity from 1,000,000 bases/machine/day in 2004 to more than 100,000,000 bases/machine/day in 2006.
The history of the polymerase chain reaction (PCR) has variously been described as a classic "Eureka!" moment, or as an example of cooperative teamwork between disparate researchers. Following is a list of events before, during, and after its development:
The versatility of polymerase chain reaction (PCR) has led to a large number of variants of PCR.
Multiple displacement amplification (MDA) is a DNA amplification technique. This method can rapidly amplify minute amounts of DNA samples to a reasonable quantity for genomic analysis. The reaction starts by annealing random hexamer primers to the template: DNA synthesis is carried out by a high fidelity enzyme, preferentially Φ29 DNA polymerase. Compared with conventional PCR amplification techniques, MDA does not employ sequence-specific primers but amplifies all DNA, generates larger-sized products with a lower error frequency, and works at a constant temperature. MDA has been actively used in whole genome amplification (WGA) and is a promising method for application to single cell genome sequencing and sequencing-based genetic studies.
Multiplex polymerase chain reaction refers to the use of polymerase chain reaction to amplify several different DNA sequences simultaneously. This process amplifies DNA in samples using multiple primers and a temperature-mediated DNA polymerase in a thermal cycler. The primer design for all primers pairs has to be optimized so that all primer pairs can work at the same annealing temperature during PCR.
COLD-PCR is a modified polymerase chain reaction (PCR) protocol that enriches variant alleles from a mixture of wildtype and mutation-containing DNA. The ability to preferentially amplify and identify minority alleles and low-level somatic DNA mutations in the presence of excess wildtype alleles is useful for the detection of mutations. Detection of mutations is important in the case of early cancer detection from tissue biopsies and body fluids such as blood plasma or serum, assessment of residual disease after surgery or chemotherapy, disease staging and molecular profiling for prognosis or tailoring therapy to individual patients, and monitoring of therapy outcome and cancer remission or relapse. Common PCR will amplify both the major (wildtype) and minor (mutant) alleles with the same efficiency, occluding the ability to easily detect the presence of low-level mutations. The capacity to detect a mutation in a mixture of variant/wildtype DNA is valuable because this mixture of variant DNAs can occur when provided with a heterogeneous sample – as is often the case with cancer biopsies. Currently, traditional PCR is used in tandem with a number of different downstream assays for genotyping or the detection of somatic mutations. These can include the use of amplified DNA for RFLP analysis, MALDI-TOF genotyping, or direct sequencing for detection of mutations by Sanger sequencing or pyrosequencing. Replacing traditional PCR with COLD-PCR for these downstream assays will increase the reliability in detecting mutations from mixed samples, including tumors and body fluids.
DNA nanoball sequencing is a high throughput sequencing technology that is used to determine the entire genomic sequence of an organism. The method uses rolling circle replication to amplify small fragments of genomic DNA into DNA nanoballs. Fluorescent nucleotides bind to complementary nucleotides and are then polymerized to anchor sequences bound to known sequences on the DNA template. The base order is determined via the fluorescence of the bound nucleotides This DNA sequencing method allows large numbers of DNA nanoballs to be sequenced per run at lower reagent costs compared to other next generation sequencing platforms. However, a limitation of this method is that it generates only short sequences of DNA, which presents challenges to mapping its reads to a reference genome. After purchasing Complete Genomics, the Beijing Genomics Institute (BGI) refined DNA nanoball sequencing to sequence nucleotide samples on their own platform.
Massive parallel sequencing or massively parallel sequencing is any of several high-throughput approaches to DNA sequencing using the concept of massively parallel processing; it is also called next-generation sequencing (NGS) or second-generation sequencing. Some of these technologies emerged between 1994 and 1998 and have been commercially available since 2005. These technologies use miniaturized and parallelized platforms for sequencing of 1 million to 43 billion short reads per instrument run.
Illumina dye sequencing is a technique used to determine the series of base pairs in DNA, also known as DNA sequencing. The reversible terminated chemistry concept was invented by Bruno Canard and Simon Sarfati at the Pasteur Institute in Paris. It was developed by Shankar Balasubramanian and David Klenerman of Cambridge University, who subsequently founded Solexa, a company later acquired by Illumina. This sequencing method is based on reversible dye-terminators that enable the identification of single nucleotides as they are washed over DNA strands. It can also be used for whole-genome and region sequencing, transcriptome analysis, metagenomics, small RNA discovery, methylation profiling, and genome-wide protein-nucleic acid interaction analysis.
Single-cell DNA template strand sequencing, or Strand-seq, is a technique for the selective sequencing of a daughter cell's parental template strands. This technique offers a wide variety of applications, including the identification of sister chromatid exchanges in the parental cell prior to segregation, the assessment of non-random segregation of sister chromatids, the identification of misoriented contigs in genome assemblies, de novo genome assembly of both haplotypes in diploid organisms including humans, whole-chromosome haplotyping, and the identification of germline and somatic genomic structural variation, the latter of which can be detected robustly even in single cells.
G&T-seq is a novel form of single cell sequencing technique allowing one to simultaneously obtain both transcriptomic and genomic data from single cells, allowing for direct comparison of gene expression data to its corresponding genomic data in the same cell...
Duplex sequencing is a library preparation and analysis method for next-generation sequencing (NGS) platforms that employs random tagging of double-stranded DNA to detect mutations with higher accuracy and lower error rates.
Linear Amplification via Transposon Insertion (LIANTI) is a linear whole genome amplification (WGA) method. To analyze or sequence very small amount of DNA, i.e. genomic DNA from a single cell, the picograms of DNA is subject to WGA to amplify at least thousands of times into nanogram scale, before DNA analysis or sequencing can be carried out. Previous WGA methods use exponential/nonlinear amplification schemes, leading to bias accumulation and error propagation. LIANTI achieved linear amplification of the whole genome for the first time, enabling more uniform and accurate amplification.