Bisulfite [1] sequencing (also known as bisulphite sequencing) is the use of bisulfite treatment of DNA before routine sequencing to determine the pattern of methylation. DNA methylation was the first discovered epigenetic mark, and remains the most studied. In animals it predominantly involves the addition of a methyl group to the carbon-5 position of cytosine residues of the dinucleotide CpG, and is implicated in repression of transcriptional activity.
Treatment of DNA with bisulfite converts cytosine residues to uracil, but leaves 5-methylcytosine residues unaffected. Therefore, DNA that has been treated with bisulfite retains only methylated cytosines. Thus, bisulfite treatment introduces specific changes in the DNA sequence that depend on the methylation status of individual cytosine residues, yielding single-nucleotide resolution information about the methylation status of a segment of DNA. Various analyses can be performed on the altered sequence to retrieve this information. The objective of this analysis is therefore reduced to differentiating between single nucleotide polymorphisms (cytosines and thymidine) resulting from bisulfite conversion (Figure 1).
Bisulfite sequencing applies routine sequencing methods on bisulfite-treated genomic DNA to determine methylation status at CpG dinucleotides. Other non-sequencing strategies are also employed to interrogate the methylation at specific loci or at a genome-wide level. All strategies assume that bisulfite-induced conversion of unmethylated cytosines to uracil is complete, and this serves as the basis of all subsequent techniques. Ideally, the method used would determine the methylation status separately for each allele. Alternative methods to bisulfite sequencing include Combined Bisulphite Restriction Analysis and methylated DNA immunoprecipitation (MeDIP).
Methodologies to analyze bisulfite-treated DNA are continuously being developed. To summarize these rapidly evolving methodologies, numerous review articles have been written. [2] [3] [4] [5]
The methodologies can be generally divided into strategies based on methylation-specific PCR (MSP) (Figure 4), and strategies employing polymerase chain reaction (PCR) performed under non-methylation-specific conditions (Figure 3). Microarray-based methods use PCR based on non-methylation-specific conditions also.
The first reported method of methylation analysis using bisulfite-treated DNA utilized PCR and standard dideoxynucleotide DNA sequencing to directly determine the nucleotides resistant to bisulfite conversion. [6] Primers are designed to be strand-specific as well as bisulfite-specific (i.e., primers containing non-CpG cytosines such that they are not complementary to non-bisulfite-treated DNA), flanking (but not involving) the methylation site of interest. Therefore, it will amplify both methylated and unmethylated sequences, in contrast to methylation-specific PCR. All sites of unmethylated cytosines are displayed as thymines in the resulting amplified sequence of the sense strand, and as adenines in the amplified antisense strand. By incorporating high throughput sequencing adaptors into the PCR primers, PCR products can be sequenced with massively parallel sequencing. Alternatively, and labour-intensively, PCR product can be cloned and sequenced. Nested PCR methods can be used to enhance the product for sequencing.
All subsequent DNA methylation analysis techniques using bisulfite-treated DNA is based on this report by Frommer et al. (Figure 2). [6] Although most other modalities are not true sequencing-based techniques, the term "bisulfite sequencing" is often used to describe bisulfite-conversion DNA methylation analysis techniques in general.
Pyrosequencing has also been used to analyze bisulfite-treated DNA without using methylation-specific PCR. [7] [8] Following PCR amplification of the region of interest, pyrosequencing is used to determine the bisulfite-converted sequence of specific CpG sites in the region. The ratio of C-to-T at individual sites can be determined quantitatively based on the amount of C and T incorporation during the sequence extension. The main limitation of this method is the cost of the technology. However, Pyrosequencing does well allow for extension to high-throughput screening methods.
A variant of this technique, described by Wong et al., uses allele-specific primers that incorporate single-nucleotide polymorphisms into the sequence of the sequencing primer, thus allowing for separate analysis of maternal and paternal alleles. [9] This technique is of particular usefulness for genomic imprinting analysis.
This method is based on the single-strand conformation polymorphism analysis (SSCA) method developed for single-nucleotide polymorphism (SNP) analysis. [10] SSCA differentiates between single-stranded DNA fragments of identical size but distinct sequence based on differential migration in non-denaturating electrophoresis. In MS-SSCA, this is used to distinguish between bisulfite-treated, PCR-amplified regions containing the CpG sites of interest. Although SSCA lacks sensitivity when only a single nucleotide difference is present, bisulfite treatment frequently makes a number of C-to-T conversions in most regions of interest, and the resulting sensitivity approaches 100%. MS-SSCA also provides semi-quantitative analysis of the degree of DNA methylation based on the ratio of band intensities. However, this method is designed to assess all CpG sites as a whole in the region of interest rather than individual methylation sites.
A further method to differentiate converted from unconverted bisulfite-treated DNA is using high-resolution melting analysis (HRM), a quantitative PCR-based technique initially designed to distinguish SNPs. [11] The PCR amplicons are analyzed directly by temperature ramping and resulting liberation of an intercalating fluorescent dye during melting. The degree of methylation, as represented by the C-to-T content in the amplicon, determines the rapidity of melting and consequent release of the dye. This method allows direct quantitation in a single-tube assay, but assesses methylation in the amplified region as a whole rather than at specific CpG sites.
MS-SnuPE employs the primer extension method initially designed for analyzing single-nucleotide polymorphisms. [12] DNA is bisulfite-converted, and bisulfite-specific primers are annealed to the sequence up to the base pair immediately before the CpG of interest. The primer is allowed to extend one base pair into the C (or T) using DNA polymerase terminating dideoxynucleotides, and the ratio of C to T is determined quantitatively.
A number of methods can be used to determine this C:T ratio. At the beginning, MS-SnuPE relied on radioactive ddNTPs as the reporter of the primer extension. Fluorescence-based methods or Pyrosequencing can also be used. [13] However, matrix-assisted laser desorption ionization/time-of-flight (MALDI-TOF) mass spectrometry analysis to differentiate between the two polymorphic primer extension products can be used, in essence, based on the GOOD assay designed for SNP genotyping. Ion pair reverse-phase high-performance liquid chromatography (IP-RP-HPLC) has also been used to distinguish primer extension products. [14]
A recently described method by Ehrich et al. further takes advantage of bisulfite-conversions by adding a base-specific cleavage step to enhance the information gained from the nucleotide changes. [15] By first using in vitro transcription of the region of interest into RNA (by adding an RNA polymerase promoter site to the PCR primer in the initial amplification), RNase A can be used to cleave the RNA transcript at base-specific sites. As RNase A cleaves RNA specifically at cytosine and uracil ribonucleotides, base-specificity is achieved by adding incorporating cleavage-resistant dTTP when cytosine-specific (C-specific) cleavage is desired, and incorporating dCTP when uracil-specific (U-specific) cleavage is desired. The cleaved fragments can then be analyzed by MALDI-TOF. Bisulfite treatment results in either introduction/removal of cleavage sites by C-to-U conversions or shift in fragment mass by G-to-A conversions in the amplified reverse strand. C-specific cleavage will cut specifically at all methylated CpG sites. By analyzing the sizes of the resulting fragments, it is possible to determine the specific pattern of DNA methylation of CpG sites within the region, rather than determining the extent of methylation of the region as a whole. This method demonstrated efficacy for high-throughput screening, allowing for interrogation of numerous CpG sites in multiple tissues in a cost-efficient manner.
This alternative method of methylation analysis also uses bisulfite-treated DNA but avoids the need to sequence the area of interest. [16] Instead, primer pairs are designed themselves to be "methylated-specific" by including sequences complementing only unconverted 5-methylcytosines, or, on the converse, "unmethylated-specific", complementing thymines converted from unmethylated cytosines. Methylation is determined by the ability of the specific primer to achieve amplification. This method is particularly useful to interrogate CpG islands with possibly high methylation density, as increased numbers of CpG pairs in the primer increase the specificity of the assay. Placing the CpG pair at the 3'-end of the primer also improves the sensitivity. The initial report using MSP described sufficient sensitivity to detect methylation of 0.1% of alleles. In general, MSP and its related protocols are considered to be the most sensitive when interrogating the methylation status at a specific locus.
The MethyLight method is based on MSP, but provides a quantitative analysis using quantitative PCR. [17] Methylated-specific primers are used, and a methylated-specific fluorescence reporter probe is also used that anneals to the amplified region. In alternative fashion, the primers or probe can be designed without methylation specificity if discrimination is needed between the CpG pairs within the involved sequences. Quantitation is made in reference to a methylated reference DNA. A modification to this protocol to increase the specificity of the PCR for successfully bisulfite-converted DNA (ConLight-MSP) uses an additional probe to bisulfite-unconverted DNA to quantify this non-specific amplification. [18]
Further methodology using MSP-amplified DNA analyzes the products using melting curve analysis (Mc-MSP). [19] This method amplifies bisulfite-converted DNA with both methylated-specific and unmethylated-specific primers, and determines the quantitative ratio of the two products by comparing the differential peaks generated in a melting curve analysis. A high-resolution melting analysis method that uses both quantitative PCR and melting analysis has been introduced, in particular, for sensitive detection of low-level methylation [20]
Microarray-based methods are a logical extension of the technologies available to analyze bisulfite-treated DNA to allow for genome-wide analysis of methylation. [21] Oligonucleotide microarrays are designed using pairs of oligonucleotide hybridization probes targeting CpG sites of interest. One is complementary to the unaltered methylated sequence, and the other is complementary to the C-to-U-converted unmethylated sequence. The probes are also bisulfite-specific to prevent binding to DNA incompletely converted by bisulfite. The Illumina Methylation Assay is one such assay that applies the bisulfite sequencing technology on a microarray level to generate genome-wide methylation data.
Bisulfite sequencing is used widely across mammalian genomes, however complications have arisen with the discovery of a new mammalian DNA modification 5-hydroxymethylcytosine. [22] [23] 5-Hydroxymethylcytosine converts to cytosine-5-methylsulfonate upon bisulfite treatment, which then reads as a C when sequenced. [24] Therefore, bisulfite sequencing cannot discriminate between 5-methylcytosine and 5-hydroxymethylcytosine. This means that the output from bisulfite sequencing can no longer be defined as solely DNA methylation, as it is the composite of 5-methylcytosine and 5-hydroxymethylcytosine. The development of Tet-assisted oxidative bisulfite sequencing by Chuan He at the University of Chicago is now able to distinguish between the two modifications at single base resolution. [25]
Bisulfite sequencing relies on the conversion of every single unmethylated cytosine residue to uracil. If conversion is incomplete, the subsequent analysis will incorrectly interpret the unconverted unmethylated cytosines as methylated cytosines, resulting in false positive results for methylation. Only cytosines in single-stranded DNA are susceptible to attack by bisulfite, therefore denaturation of the DNA undergoing analysis is critical. [2] It is important to ensure that reaction parameters such as temperature and salt concentration are suitable to maintain the DNA in a single-stranded conformation and allow for complete conversion. Embedding the DNA in agarose gel has been reported to improve the rate of conversion by keeping strands of DNA physically separate. [26] Incomplete conversion rates can be estimated and adjusted-for after sequencing by including an internal control in the sequencing library, such as lambda phage DNA (which is known to be unmethylated) or by aligning bisulfite sequencing reads to a known unmethylated region in the organism, such as the chloroplast genome. [27]
A major challenge in bisulfite sequencing is the degradation of DNA that takes place concurrently with the conversion. The conditions necessary for complete conversion, such as long incubation times, elevated temperature, and high bisulfite concentration, can lead to the degradation of about 90% of the incubated DNA. [28] Given that the starting amount of DNA is often limited, such extensive degradation can be problematic. The degradation occurs as depurinations resulting in random strand breaks. [29] Therefore, the longer the desired PCR amplicon, the more limited the number of intact template molecules will likely be. This could lead to the failure of the PCR amplification, or the loss of quantitatively accurate information on methylation levels resulting from the limited sampling of template molecules. Thus, it is important to assess the amount of DNA degradation resulting from the reaction conditions employed, and consider how this will affect the desired amplicon. Techniques can also be used to minimize DNA degradation, such as cycling the incubation temperature. [29]
In 2020, New England Biolabs developed NEBNext Enzymatic Methyl-seq an alternative enzymatic approach to minimize DNA damage. [30]
A potentially significant problem following bisulfite treatment is incomplete desulfonation of pyrimidine residues due to inadequate alkalization of the solution. This may inhibit some DNA polymerases, rendering subsequent PCR difficult. However, this situation can be avoided by monitoring the pH of the solution to ensure that desulfonation will be complete. [2]
A final concern is that bisulfite treatment greatly reduces the level of complexity in the sample, which can be problematic if multiple PCR reactions are to be performed (2006). [5] Primer design is more difficult, and inappropriate cross-hybridization is more frequent.
The advances in bisulfite sequencing have led to the possibility of applying them at a genome-wide scale, where, previously, global measure of DNA methylation was feasible only using other techniques, such as Restriction landmark genomic scanning. The mapping of the human epigenome is seen by many scientists as the logical follow-up to the completion of the Human Genome Project. [31] [32] This epigenomic information will be important in understanding how the function of the genetic sequence is implemented and regulated. Since the epigenome is less stable than the genome, it is thought to be important in gene-environment interactions. [33]
Epigenomic mapping is inherently more complex than genome sequencing, however, since the epigenome is much more variable than the genome. One's epigenome varies with age, differs between tissues, is altered by environmental factors, and shows aberrations in diseases. Such rich epigenomic mapping, however, representing different ages, tissue types, and disease states, would yield valuable information on the normal function of epigenetic marks as well as the mechanisms leading to aging and disease.
Direct benefits of epigenomic mapping include probable advances in cloning technology. It is believed that failures to produce cloned animals with normal viability and lifespan result from inappropriate patterns of epigenetic marks. Also, aberrant methylation patterns are well characterized in many cancers. Global hypomethylation results in decreased genomic stability, while local hypermethylation of tumour suppressor gene promoters often accounts for their loss of function. Specific patterns of methylation are indicative of specific cancer types, have prognostic value, and can help to guide the best course of treatment. [32]
Large-scale epigenome mapping efforts are under way around the world and have been organized under the Human Epigenome Project. [33] This is based on a multi-tiered strategy, whereby bisulfite sequencing is used to obtain high-resolution methylation profiles for a limited number of reference epigenomes, while less thorough analysis is performed on a wider spectrum of samples. This approach is intended to maximize the insight gained from a given amount of resources, as high-resolution genome-wide mapping remains a costly undertaking.
Gene-set analysis (for example using tools like DAVID and GoSeq) has been shown to be severely biased when applied to high-throughput methylation data (e.g. genome-wide bisulfite sequencing); it has been suggested that this can be corrected using sample label permutations or using a statistical model to control for differences in the numberes of CpG probes / CpG sites that target each gene. [34]
5-Methylcytosine and 5-hydroxymethylcytosine both read as a C in bisulfite sequencing. [24] Oxidative bisulfite sequencing is a method to discriminate between 5-methylcytosine and 5-hydroxymethylcytosine at single base resolution. The method employs a specific (Tet-assisted) chemical oxidation of 5-hydroxymethylcytosine to 5-formylcytosine, which subsequently converts to uracil during bisulfite treatment. [35] The only base that then reads as a C is 5‑methylcytosine, giving a map of the true methylation status in the DNA sample. Levels of 5‑hydroxymethylcytosine can also be quantified by measuring the difference between bisulfite and oxidative bisulfite sequencing.
The polymerase chain reaction (PCR) is a method widely used to make millions to billions of copies of a specific DNA sample rapidly, allowing scientists to amplify a very small sample of DNA sufficiently to enable detailed study. PCR was invented in 1983 by American biochemist Kary Mullis at Cetus Corporation. Mullis and biochemist Michael Smith, who had developed other essential ways of manipulating DNA, were jointly awarded the Nobel Prize in Chemistry in 1993.
5-Methylcytosine is a methylated form of the DNA base cytosine (C) that regulates gene transcription and takes several other biological roles. When cytosine is methylated, the DNA maintains the same sequence, but the expression of methylated genes can be altered. 5-Methylcytosine is incorporated in the nucleoside 5-methylcytidine.
The CpG sites or CG sites are regions of DNA where a cytosine nucleotide is followed by a guanine nucleotide in the linear sequence of bases along its 5' → 3' direction. CpG sites occur with high frequency in genomic regions called CpG islands.
DNA methylation is a biological process by which methyl groups are added to the DNA molecule. Methylation can change the activity of a DNA segment without changing the sequence. When located in a gene promoter, DNA methylation typically acts to repress gene transcription. In mammals, DNA methylation is essential for normal development and is associated with a number of key processes including genomic imprinting, X-chromosome inactivation, repression of transposable elements, aging, and carcinogenesis.
In biology, the epigenome of an organism is the collection of chemical changes to its DNA and histone proteins that affects when, where, and how the DNA is expressed; these changes can be passed down to an organism's offspring via transgenerational epigenetic inheritance. Changes to the epigenome can result in changes to the structure of chromatin and changes to the function of the genome. The human epigenome, including DNA methylation and histone modification, is maintained through cell division. The epigenome is essential for normal development and cellular differentiation, enabling cells with the same genetic code to perform different functions. The human epigenome is dynamic and can be influenced by environmental factors such as diet, stress, and toxins.
The bisulfite ion (IUPAC-recommended nomenclature: hydrogensulfite) is the ion HSO−
3. Salts containing the HSO−
3 ion are also known as "sulfite lyes". Sodium bisulfite is used interchangeably with sodium metabisulfite (Na2S2O5). Sodium metabisulfite dissolves in water to give a solution of Na+HSO−
3.
Methylation specific oligonucleotide microarray, also known as MSO microarray, was developed as a technique to map epigenetic methylation changes in DNA of cancer cells.
The versatility of polymerase chain reaction (PCR) has led to modifications of the basic protocol being used in a large number of variant techniques designed for various purposes. This article summarizes many of the most common variations currently or formerly used in molecular biology laboratories; familiarity with the fundamental premise by which PCR works and corresponding terms and concepts is necessary for understanding these variant techniques.
Epigenomics is the study of the complete set of epigenetic modifications on the genetic material of a cell, known as the epigenome. The field is analogous to genomics and proteomics, which are the study of the genome and proteome of a cell. Epigenetic modifications are reversible modifications on a cell's DNA or histones that affect gene expression without altering the DNA sequence. Epigenomic maintenance is a continuous process and plays an important role in stability of eukaryotic genomes by taking part in crucial biological mechanisms like DNA repair. Plant flavones are said to be inhibiting epigenomic marks that cause cancers. Two of the most characterized epigenetic modifications are DNA methylation and histone modification. Epigenetic modifications play an important role in gene expression and regulation, and are involved in numerous cellular processes such as in differentiation/development and tumorigenesis. The study of epigenetics on a global level has been made possible only recently through the adaptation of genomic high-throughput assays.
The Illumina Methylation Assay using the Infinium I platform uses 'BeadChip' technology to generate a comprehensive genome-wide profiling of human DNA methylation. Similar to bisulfite sequencing and pyrosequencing, this method quantifies methylation levels at various loci within the genome. This assay is used for methylation probes on the Illumina Infinium HumanMethylation27 BeadChip. Probes on the 27k array target regions of the human genome to measure methylation levels at 27,578 CpG dinucleotides in 14,495 genes. In 2008, Illumina released the Infinium HumanMethylation450 BeadChip array, which targets over 450,000 methylation sites. In 2016, the Infinium MethylationEPIC BeadChip ("EPIC") was released, which interrogates over 850,000 methylation sites across the human genome.
Methylated DNA immunoprecipitation is a large-scale purification technique in molecular biology that is used to enrich for methylated DNA sequences. It consists of isolating methylated DNA fragments via an antibody raised against 5-methylcytosine (5mC). This technique was first described by Weber M. et al. in 2005 and has helped pave the way for viable methylome-level assessment efforts, as the purified fraction of methylated DNA can be input to high-throughput DNA detection methods such as high-resolution DNA microarrays (MeDIP-chip) or next-generation sequencing (MeDIP-seq). Nonetheless, understanding of the methylome remains rudimentary; its study is complicated by the fact that, like other epigenetic properties, patterns vary from cell-type to cell-type.
5-Hydroxymethylcytosine (5hmC) is a DNA pyrimidine nitrogen base derived from cytosine. It is potentially important in epigenetics, because the hydroxymethyl group on the cytosine can possibly switch a gene on and off. It was first seen in bacteriophages in 1952. However, in 2009 it was found to be abundant in human and mouse brains, as well as in embryonic stem cells. In mammals, it can be generated by oxidation of 5-methylcytosine, a reaction mediated by TET enzymes.
Combined Bisulfite Restriction Analysis is a molecular biology technique that allows for the sensitive quantification of DNA methylation levels at a specific genomic locus on a DNA sequence in a small sample of genomic DNA. The technique is a variation of bisulfite sequencing, and combines bisulfite conversion based polymerase chain reaction with restriction digestion. Originally developed to reliably handle minute amounts of genomic DNA from microdissected paraffin-embedded tissue samples, the technique has since seen widespread usage in cancer research and epigenetics studies.
Reduced representation bisulfite sequencing (RRBS) is an efficient and high-throughput technique for analyzing the genome-wide methylation profiles on a single nucleotide level. It combines restriction enzymes and bisulfite sequencing to enrich for areas of the genome with a high CpG content. Due to the high cost and depth of sequencing to analyze methylation status in the entire genome, Meissner et al. developed this technique in 2005 to reduce the amount of nucleotides required to sequence to 1% of the genome. The fragments that comprise the reduced genome still include the majority of promoters, as well as regions such as repeated sequences that are difficult to profile using conventional bisulfite sequencing approaches.
Single-cell sequencing examines the nucleic acid sequence information from individual cells with optimized next-generation sequencing technologies, providing a higher resolution of cellular differences and a better understanding of the function of an individual cell in the context of its microenvironment. For example, in cancer, sequencing the DNA of individual cells can give information about mutations carried by small populations of cells. In development, sequencing the RNAs expressed by individual cells can give insight into the existence and behavior of different cell types. In microbial systems, a population of the same species can appear genetically clonal. Still, single-cell sequencing of RNA or epigenetic modifications can reveal cell-to-cell variability that may help populations rapidly adapt to survive in changing environments.
Marianne Frommer is an Australian geneticist. She was born in Hong Kong and educated at the University of Sydney – BSc(Hons) 1969 and PhD in 1976. She is best known for developing a protocol to map DNA methylation by bisulphite genomic sequencing.
Whole genome bisulfite sequencing is a next-generation sequencing technology used to determine the DNA methylation status of single cytosines by treating the DNA with sodium bisulfite before high-throughput DNA sequencing. The DNA methylation status at various genes can reveal information regarding gene regulation and transcriptional activities. This technique was developed in 2009 along with reduced representation bisulfite sequencing after bisulfite sequencing became the gold standard for DNA methylation analysis.
In epitranscriptomic sequencing, most methods focus on either (1) enrichment and purification of the modified RNA molecules before running on the RNA sequencer, or (2) improving or modifying bioinformatics analysis pipelines to call the modification peaks. Most methods have been adapted and optimized for mRNA molecules, except for modified bisulfite sequencing for profiling 5-methylcytidine which was optimized for tRNAs and rRNAs.
Glal hydrolysis and Ligation Adapter Dependent PCR assay is the novel method to determine R(5mC)GY sites produced in the course of de novo DNA methylation with DNMTЗA and DNMTЗB DNA methyltransferases. GLAD-PCR assay do not require bisulfite treatment of the DNA.
Nucleosome Occupancy and Methylome Sequencing (NOMe-seq) is a genomics technique used to simultaneously detect nucleosome positioning and DNA methylation... This method is an extension of bisulfite sequencing, which is the gold standard for determining DNA methylation. NOMe-seq relies on the methyltransferase M.CviPl, which methylates cytosines in GpC dinucleotides unbound by nucleosomes or other proteins, creating a nucleosome footprint. The mammalian genome naturally contains DNA methylation, but only at CpG sites, so GpC methylation can be differentiated from genomic methylation after bisulfite sequencing. This allows simultaneous analysis of the nucleosome footprint and endogenous methylation on the same DNA molecules. In addition to nucleosome foot-printing, NOMe-seq can determine locations bound by transcription factors. Nucleosomes are bound by 147 base pairs of DNA whereas transcription factors or other proteins will only bind a region of approximately 10-80 base pairs. Following treatment with M.CviPl, nucleosome and transcription factor sites can be differentiated based on the size of the unmethylated GpC region.
{{cite book}}
: CS1 maint: DOI inactive as of October 2024 (link)