Microfluidic whole genome haplotyping

Last updated

Microfluidic whole genome haplotyping is a technique for the physical separation of individual chromosomes from a metaphase cell followed by direct resolution of the haplotype for each allele.

Contents

Background

Whole genome haplotyping

Whole genome haplotyping is the process of resolving personal haplotypes on a whole genome basis. [1] Current methods of next generation sequencing are capable of identifying heterozygous loci, but they are not well suited to identify which polymorphisms exist on the same (in cis) or allelic (in trans) strand of DNA. Haplotype information contributes to the understanding of the potential functional effects of variants in cis or in trans. Haplotypes are more frequently resolved by inference through comparison with parental genotypes, or from population samples using statistical computational methods to determine linkage disequilibrium between markers. Direct haplotyping is possible through isolation of chromosomes or chromosome segments. Most molecular biology techniques for haplotyping can accurately determine haplotypes of only a limited region of the genome. Whole genome direct haplotyping involves the resolution of haplotype at the whole genome level, usually through the isolation of individual chromosomes.

Haplotype

A haplotype (haplo: from Ancient Greek ἁπλόος (haplóos, “single, simple”) is a contiguous section of closely linked segments of DNA within the larger genome that tend to be inherited together as a unit on a single chromosome. Haplotypes have no defined size and can refer to anything from a few closely linked loci up to an entire chromosome. The term is also used to describe groups of single-nucleotide polymorphisms (SNPs) that are statistically associated. Most of the knowledge of SNP association comes from the effort of the International HapMap Project, which has proved itself a powerful resource in the development of a publicly accessible database of human genetic variation.

Phasing

Phasing is the process of identifying the individual complement of homologous chromosomes. Methods for phasing include pedigree analysis, allele-specific PCR, linkage emulsion PCR haplotype analysis, [2] polony PCR, [3] sperm typing, bacterial artificial chromosome cloning, construction of somatic cell hybrids, atomic force microscopy, among others. Haplotype phasing can also be achieved through computational inference methods.

Microfluidics

Microfluidics refers to the use of micro-sized channels on a micro-electro-mechanical system (MEMS). [4] Microfluidic channels have a diameter of 10-100μm, making it possible to manipulate and analyze minute volumes. This technology combines engineering, physics, chemistry, biology, and optics. Over the past decades it has revolutionized micro and nanoscale biology, genetics and proteomics. Microfluidic devices can combine several analytical steps into one device. This technology has been coined by some as the "lab on a chip" technology. Most current molecular biology methods use some form of MEMS, including microarray technology and next generation sequencing instruments.

Microfluidic direct deterministic phasing

Principle

Direct deterministic phasing of individual chromosomes can be achieved by isolating single chromosomes for genetic analysis through the use of a microfluidic device. [5]

Methods

Workflow of microfluidic whole genome chromosome isolation and amplification. Not at scale MicrofluidicsII.png
Workflow of microfluidic whole genome chromosome isolation and amplification. Not at scale

A single metaphase cell is isolated from solution. The chromosomes are then released from the nucleus, and the cytoplasm is digested enzymatically. Next, the chromosome suspension is directed towards multiple partitioning channels. The chromosomes are physically directed into the partitioning channels using a series of valves. In the first description of this technique, Fan et al. designed a custom-made program (MatLab) to control this process. Once separated, the chromosomes are prepared for amplification by sequential addition and washout of trypsin, denaturation buffer and neutralization solution. The DNA is then ready for further processing. Because of the small amount of DNA, amplification needs to be performed using kits specialized for very small initial DNA quantities. The amplified DNA is flushed out of the microfluidic device and solubilized by the addition of a buffer. The amplified DNA can now be analyzed by various methods.

Once the chromosomes have been isolated and amplified any molecular haplotyping can be applied as long as the chromosomes remain distinct. This could be accomplished by keeping them physically separated, or identifying each sample by genotyping. Once each chromosome has been identified each pair of homologs can be assorted into one of two haploid genomes.

Applications

Microfluidic direct deterministic phasing allows all the chromosomes to be isolated in the same experiment. This unique feature suggests possible applications within clinical, research and personal genomics realms. Some of the possible clinical applications for this technique include phasing of multiple mutations when parental samples are unavailable, preimplantation genetic diagnosis, prenatal diagnosis and in the characterization of cancer cells.

Whole genome haplotyping through microfluidics will increase the rate of discovery within the HapMap project, and provides an opportunity for corroboration and error detection within the existing database. It will further inform genetic association studies.

As methods for amplification of small amounts of DNA improve, single chromosome sequencing is possible using microfluidics to separate each individual chromosome. A cost-effective approach may be to barcode each individual chromosome and perform parallel resequencing of the entire individual genome. The amplification of each chromosome separately also provides a mechanism to potentially fill in some of the gaps that remain in the human reference genome. Single chromosome sequencing will allow for unmapped sequences to be associated with a single chromosome. Additionally, single chromosome sequencing will be more accurate in the identification of copy number variants and repetitive sequences.

Limitations

As of January 2011, only one publication has described use of this technique. [5] The scientific commons awaits further validation of this method and its efficacy in isolating and amplifying analyzable amounts of DNA. While this method does streamline the process of chromosome isolation, certain parts in the process – such as the initial isolation of a metaphase cell – remain difficult and labour intensive. Other automated techniques for metaphase cell separation would improve throughput. In addition, this method is only applicable to cells in metaphase, which inherently limits the technique to cell types and tissues that undergo mitosis. Single cell analysis does not account for the possibility of mosaicism; therefore, applications in cancer diagnosis and research would necessarily require processing of multiple cells. Finally, since this entire process is based on amplification from a single cell, the accuracy of any genetic analysis is limited to the ability of commercially available platforms to produce sufficient amounts of unbiased and error free amplicon.

Alternative methods of whole genome haplotyping

Chromosome microdissection

Chromosome microdissection is another process for isolating single chromosomes for genetic analysis. As with the above technique microdissection begins with metaphase cells. The nucleus is lysed mechanically on a glass slide and part of the genetic material is partitioned under microscope. The actual microdissection of genetic material was initially accomplished through the careful use of a fine needle. Today computer-directed lasers are available. The genomic area isolated can range from part of a single chromosome, up to several chromosomes. To accomplish whole genome haplotyping the microdissected genomic section is amplified and genotyped or sequenced. Like with the microfluidic technique, specialized amplification platforms are necessary to address the problem of a small initial DNA sample. [6] [7] [8]

Large insert cloning

Randomly partitioning a complete diploid fosmid library into various pools of equal size presents an alternative method for haplotype phasing. In the proof of principle description of this technique [9] 115 pools were created containing ~5000 unique clones from the original fosmid library. Each of these pools contained roughly 3% of the genome. Between the 3% in each pool and the fact that each clone is a random sampling of the diploid genome, 99.1% of the time each pool contains DNA from a single homolog. Amplification and analysis of each pool provide haplotype resolution limited only by the size of the fosmid insert.

Related Research Articles

<span class="mw-page-title-main">Polymerase chain reaction</span> Laboratory technique to multiply a DNA sample for study

The polymerase chain reaction (PCR) is a method widely used to rapidly make millions to billions of copies of a specific DNA sample, allowing scientists to take a very small sample of DNA and amplify it to a large enough amount to study in detail. PCR was invented in 1983 by the American biochemist Kary Mullis at Cetus Corporation; Mullis and biochemist Michael Smith, who had developed other essential ways of manipulating DNA, were jointly awarded the Nobel Prize in Chemistry in 1993.

In molecular biology, restriction fragment length polymorphism (RFLP) is a technique that exploits variations in homologous DNA sequences, known as polymorphisms, in order to distinguish individuals, populations, or species or to pinpoint the locations of genes within a sequence. The term may refer to a polymorphism itself, as detected through the differing locations of restriction enzyme sites, or to a related laboratory technique by which such differences can be illustrated. In RFLP analysis, a DNA sample is digested into fragments by one or more restriction enzymes, and the resulting restriction fragments are then separated by gel electrophoresis according to their size.

<span class="mw-page-title-main">Haplotype</span> Group of genes from one parent

A haplotype is a group of alleles in an organism that are inherited together from a single parent.

Comparative genomic hybridization(CGH) is a molecular cytogenetic method for analysing copy number variations (CNVs) relative to ploidy level in the DNA of a test sample compared to a reference sample, without the need for culturing cells. The aim of this technique is to quickly and efficiently compare two genomic DNA samples arising from two sources, which are most often closely related, because it is suspected that they contain differences in terms of either gains or losses of either whole chromosomes or subchromosomal regions. This technique was originally developed for the evaluation of the differences between the chromosomal complements of solid tumor and normal tissue, and has an improved resolution of 5–10 megabases compared to the more traditional cytogenetic analysis techniques of giemsa banding and fluorescence in situ hybridization (FISH) which are limited by the resolution of the microscope utilized.

The first isolation of deoxyribonucleic acid (DNA) was done in 1869 by Friedrich Miescher. Currently, it is a routine procedure in molecular biology or forensic analyses. For the chemical method, many different kits are used for extraction, and selecting the correct one will save time on kit optimization and extraction procedures. PCR sensitivity detection is considered to show the variation between the commercial kits.

<span class="mw-page-title-main">DNA sequencing</span> Process of determining the order of nucleotides in DNA molecules

DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. The advent of rapid DNA sequencing methods has greatly accelerated biological and medical research and discovery.

Fluorescence <i>in situ</i> hybridization Genetic testing technique

Fluorescence in situ hybridization (FISH) is a molecular cytogenetic technique that uses fluorescent probes that bind to only particular parts of a nucleic acid sequence with a high degree of sequence complementarity. It was developed by biomedical researchers in the early 1980s to detect and localize the presence or absence of specific DNA sequences on chromosomes. Fluorescence microscopy can be used to find out where the fluorescent probe is bound to the chromosomes. FISH is often used for finding specific features in DNA for use in genetic counseling, medicine, and species identification. FISH can also be used to detect and localize specific RNA targets in cells, circulating tumor cells, and tissue samples. In this context, it can help define the spatial-temporal patterns of gene expression within cells and tissues.

Chromosome microdissection is a technique that physically removes a large section of DNA from a complete chromosome. The smallest portion of DNA that can be isolated using this method comprises 10 million base pairs - hundreds or thousands of individual genes.

<span class="mw-page-title-main">Genetic analysis</span>

Genetic analysis is the overall process of studying and researching in fields of science that involve genetics and molecular biology. There are a number of applications that are developed from this research, and these are also considered parts of the process. The base system of analysis revolves around general genetics. Basic studies include identification of genes and inherited disorders. This research has been conducted for centuries on both a large-scale physical observation basis and on a more microscopic scale. Genetic analysis can be used generally to describe methods both used in and resulting from the sciences of genetics and molecular biology, or to applications resulting from this research.

Preimplantation genetic haplotyping (PGH) is a clinical method of preimplantation genetic diagnosis (PGD) used to determine the presence of single gene disorders in offspring. PGH provides a more feasible method of gene location than whole-genome association experiments, which are expensive and time-consuming.

<span class="mw-page-title-main">Molecular cytogenetics</span>

Molecular cytogenetics combines two disciplines, molecular biology and cytogenetics, and involves the analysis of chromosome structure to help distinguish normal and cancer-causing cells. Human cytogenetics began in 1956 when it was discovered that normal human cells contain 46 chromosomes. However, the first microscopic observations of chromosomes were reported by Arnold, Flemming, and Hansemann in the late 1800s. Their work was ignored for decades until the actual chromosome number in humans was discovered as 46. In 1879, Arnold examined sarcoma and carcinoma cells having very large nuclei. Today, the study of molecular cytogenetics can be useful in diagnosing and treating various malignancies such as hematological malignancies, brain tumors, and other precursors of cancer. The field is overall focused on studying the evolution of chromosomes, more specifically the number, structure, function, and origin of chromosome abnormalities. It includes a series of techniques referred to as fluorescence in situ hybridization, or FISH, in which DNA probes are labeled with different colored fluorescent tags to visualize one or more specific regions of the genome. Introduced in the 1980s, FISH uses probes with complementary base sequences to locate the presence or absence of the specific DNA regions you are looking for. FISH can either be performed as a direct approach to metaphase chromosomes or interphase nuclei. Alternatively, an indirect approach can be taken in which the entire genome can be assessed for copy number changes using virtual karyotyping. Virtual karyotypes are generated from arrays made of thousands to millions of probes, and computational tools are used to recreate the genome in silico.

Digital polymerase chain reaction is a biotechnological refinement of conventional polymerase chain reaction methods that can be used to directly quantify and clonally amplify nucleic acids strands including DNA, cDNA, or RNA. The key difference between dPCR and traditional PCR lies in the method of measuring nucleic acids amounts, with the former being a more precise method than PCR, though also more prone to error in the hands of inexperienced users. A "digital" measurement quantitatively and discretely measures a certain variable, whereas an “analog” measurement extrapolates certain measurements based on measured patterns. PCR carries out one reaction per single sample. dPCR also carries out a single reaction within a sample, however the sample is separated into a large number of partitions and the reaction is carried out in each partition individually. This separation allows a more reliable collection and sensitive measurement of nucleic acid amounts. The method has been demonstrated as useful for studying variations in gene sequences — such as copy number variants and point mutations — and it is routinely used for clonal amplification of samples for next-generation sequencing.

<span class="mw-page-title-main">Transmission electron microscopy DNA sequencing</span>

Transmission electron microscopy DNA sequencing is a single-molecule sequencing technology that uses transmission electron microscopy techniques. The method was conceived and developed in the 1960s and 70s, but lost favor when the extent of damage to the sample was recognized.

Massive parallel sequencing or massively parallel sequencing is any of several high-throughput approaches to DNA sequencing using the concept of massively parallel processing; it is also called next-generation sequencing (NGS) or second-generation sequencing. Some of these technologies emerged between 1994 and 1998 and have been commercially available since 2005. These technologies use miniaturized and parallelized platforms for sequencing of 1 million to 43 billion short reads per instrument run.

Multiple Annealing and Looping Based Amplification Cycles (MALBAC) is a quasilinear whole genome amplification method. Unlike conventional DNA amplification methods that are non-linear or exponential, MALBAC utilizes special primers that allow amplicons to have complementary ends and therefore to loop, preventing DNA from being copied exponentially. This results in amplification of only the original genomic DNA and therefore reduces amplification bias. MALBAC is “used to create overlapped shotgun amplicons covering most of the genome”. For next generation sequencing, MALBAC is followed by regular PCR which is used to further amplify amplicons.

<span class="mw-page-title-main">Single-cell analysis</span> Testbg biochemical processes and reactions in an individual cell

In the field of cellular biology, single-cell analysis is the study of genomics, transcriptomics, proteomics, metabolomics and cell–cell interactions at the single cell level. The concept of single-cell analysis originated in the 1970s. Before the discovery of heterogeneity, single-cell analysis mainly referred to the analysis or manipulation of an individual cell in a bulk population of cells at a particular condition using optical or electronic microscope. To date, due to the heterogeneity seen in both eukaryotic and prokaryotic cell populations, analyzing a single cell makes it possible to discover mechanisms not seen when studying a bulk population of cells. Technologies such as fluorescence-activated cell sorting (FACS) allow the precise isolation of selected single cells from complex samples, while high throughput single cell partitioning technologies, enable the simultaneous molecular analysis of hundreds or thousands of single unsorted cells; this is particularly useful for the analysis of transcriptome variation in genotypically identical cells, allowing the definition of otherwise undetectable cell subtypes. The development of new technologies is increasing our ability to analyze the genome and transcriptome of single cells, as well as to quantify their proteome and metabolome. Mass spectrometry techniques have become important analytical tools for proteomic and metabolomic analysis of single cells. Recent advances have enabled quantifying thousands of protein across hundreds of single cells, and thus make possible new types of analysis. In situ sequencing and fluorescence in situ hybridization (FISH) do not require that cells be isolated and are increasingly being used for analysis of tissues.

Single-cell sequencing examines the sequence information from individual cells with optimized next-generation sequencing technologies, providing a higher resolution of cellular differences and a better understanding of the function of an individual cell in the context of its microenvironment. For example, in cancer, sequencing the DNA of individual cells can give information about mutations carried by small populations of cells. In development, sequencing the RNAs expressed by individual cells can give insight into the existence and behavior of different cell types. In microbial systems, a population of the same species can appear genetically clonal. Still, single-cell sequencing of RNA or epigenetic modifications can reveal cell-to-cell variability that may help populations rapidly adapt to survive in changing environments.

<span class="mw-page-title-main">Circulating tumor DNA</span>

Circulating tumor DNA (ctDNA) is tumor-derived fragmented DNA in the bloodstream that is not associated with cells. ctDNA should not be confused with cell-free DNA (cfDNA), a broader term which describes DNA that is freely circulating in the bloodstream, but is not necessarily of tumor origin. Because ctDNA may reflect the entire tumor genome, it has gained traction for its potential clinical utility; "liquid biopsies" in the form of blood draws may be taken at various time points to monitor tumor progression throughout the treatment regimen.

Single-cell DNA template strand sequencing, or Strand-seq, is a technique for the selective sequencing of a daughter cell's parental template strands. This technique offers a wide variety of applications, including the identification of sister chromatid exchanges in the parental cell prior to segregation, the assessment of non-random segregation of sister chromatids, the identification of misoriented contigs in genome assemblies, de novo genome assembly of both haplotypes in diploid organisms including humans, whole-chromosome haplotyping, and the identification of germline and somatic genomic structural variation, the latter of which can be detected robustly even in single cells.

Physical map is a technique used in molecular biology to find the order and physical distance between DNA base pairs by DNA markers. It is one of the gene mapping techniques which can determine the sequence of DNA base pairs with high accuracy. Genetic mapping, another approach of gene mapping, can provide markers needed for the physical mapping. However, as the former deduces the relative gene position by recombination frequencies, it is less accurate than the latter.

References

  1. The next phase in human genetics. Bansal V. et al. Nat Biotechnol. 2011 Jan;29(1):38-9.
  2. Linking emulsion PCR haplotype analysis. Wetmur JG, Chen J. Methods Mol Biol. 2011;687:165-75. PMID   20967607
  3. Long-range polony haplotyping of individual human chromosome molecules. Zhang K, et al. Nat. Genet. 2006;38:382–87
  4. microfluidics for biological applications. Finehout E, Tian WC. Springer US. 2009.
  5. 1 2 Whole-genome molecular haplotyping of single cells. Fan HC et al. Nat Biotechnol. 2011
  6. Whole Genome Amplification from Microdissected Chromosomes. M. Hockner et al. Cytogenetic and Genome research 2009; 125: 98-102
  7. Direct determination of molecular haplotypes by chromosome microdissection. L. Ma et al. Nature Methods vol. 7 no. 4 299-301.
  8. Chromosome-specific segmentation revealed by structural analysis of individually isolated chromosomes. K. Kitada et al. Genes, Chromosomes and Cancer, 50(4): 217–227, April 2011
  9. Haplotype-resolved genome sequencing of a Gujarati Indian individual. J.O. Kitzman et al. Nature Biotechnology vol 29 no 1 59-63