Methylated DNA immunoprecipitation (MeDIP or mDIP) is a large-scale (chromosome- or genome-wide) purification technique in molecular biology that is used to enrich for methylated DNA sequences. It consists of isolating methylated DNA fragments via an antibody raised against 5-methylcytosine (5mC). This technique was first described by Weber M. et al. [1] in 2005 and has helped pave the way for viable methylome-level assessment efforts, as the purified fraction of methylated DNA can be input to high-throughput DNA detection methods such as high-resolution DNA microarrays (MeDIP-chip) or next-generation sequencing (MeDIP-seq). Nonetheless, understanding of the methylome remains rudimentary; its study is complicated by the fact that, like other epigenetic properties, patterns vary from cell-type to cell-type.
DNA methylation, referring to the reversible methylation of the 5 position of cytosine by methyltransferases, is a major epigenetic modification in multicellular organisms. [2] In mammals, this modification primarily occurs at CpG sites, which in turn tend to cluster in regions called CpG islands. [3] There is a small fraction of CpG islands that can overlap or be in close proximity to promoter regions of transcription start sites. The modification may also occur at other sites, [4] but methylation at either of these sites can repress gene expression by either interfering with the binding of transcription factors or modifying chromatin structure to a repressive state. [5]
Disease condition studies have largely fueled the effort in understanding the role of DNA methylation. Currently, the major research interest lies in investigating disease conditions such as cancer to identify regions of the DNA that has undergone extensive methylation changes. The genes contained in these regions are of functional interest as they may offer a mechanistic explanation to the underlying genetic causes of a disease. For instance, the abnormal methylation pattern of cancer cells [6] [7] [8] was initially shown to be a mechanism through which tumor suppressor-like genes are silenced, [9] although it was later observed that a much broader range of gene types are affected. [10] [11] [12]
There are two approaches to methylation analysis: typing and profiling technologies. Typing technologies are targeted towards a small number of loci across many samples, and involve the use of techniques such as PCR, restriction enzymes, and mass spectrometry. Profiling technologies such as MeDIP are targeted towards a genome- or methylome-wide level assessment of methylation; this includes restriction landmark genomic scanning (RLGS), [13] and bisulfite conversion-based methods, which rely on the treatment of DNA with bisulfite to convert unmethylated cytosine residues to uracil. [14] [15] [16] [17]
Other methods mapping and profiling the methylome have been effective but are not without their limitations that can affect resolution, level of throughput, or experimental variations. For instance, RLGS is limited by the number of restriction sites in genome that can be targets for the restriction enzyme; typically, a maximum of ~4100 landmarks can be assessed. [18] Bisulfite sequencing-based methods, despite possible single-nucleotide resolution, have a drawback: the conversion of unmethylated cytosine to uracil can be unstable. [19] In addition, when bisulfite conversion is coupled with DNA microarrays to detect bisulfite converted sites, the reduced sequence complexity of DNA is a problem. Microarrays capable of comprehensively profiling the whole-genome become difficult to design as fewer unique probes are available. [20]
The following sections outline the method of MeDIP coupled with either high-resolution array hybridization or high-throughput sequencing. Each DNA detection method will also briefly describe post-laboratory processing and analysis. Different post-processing of the raw data is required depending on the technology used to identify the methylated sequences. This is analogous to data generated using ChIP-chip and ChIP-seq.
Genomic DNA is extracted (DNA extraction) from the cells and purified. The purified DNA is then subjected to sonication to shear it into random fragments. This sonication process is quick, simple, and avoids restriction enzyme biases. The resulting fragments range from 300 to 1000 base pairs (bp) in length, although they are typically between 400 and 600 bp. [21] The short length of these fragments is important in obtaining adequate resolution, improving the efficiency of the downstream step in immunoprecipitation, and reducing fragment-length effects or biases. Also, the size of the fragment affects the binding of 5-methyl-cytidine (5mC) antibody because the antibody needs more than just a single 5mC for efficient binding. [22] To further improve binding affinity of the antibodies, the DNA fragments are denatured to produce single-stranded DNA. Following denaturation, the DNA is incubated with monoclonal 5mC antibodies. The classical immunoprecipitation technique is then applied: magnetic beads conjugated to anti-mouse-IgG are used to bind the anti-5mC antibodies, and unbound DNA is removed in the supernatant. To purify the DNA, proteinase K is added to digest the antibodies and release the DNA, which can be collected and prepared for DNA detection.
For more details regarding the experimental steps see. [1] [19] [23] [24]
A fraction of the input DNA obtained after the sonication step above is labeled with cyanine-5 (Cy5; red) deoxy-cytosine-triphosphate while the methylated DNA, enriched after the immunoprecipitation step, is labeled with cyanine-3 (Cy3; green). The labeled DNA samples are cohybridized on a 2-channel, high-density genomic microarray to probe for presence and relative quantities. The purpose of this comparison is to identify sequences that show significant differences in hybridization levels, thereby confirming the sequence of interest is enriched. Array-based identification of MeDIP sequences are limited to the array design. As a result, the resolution is restricted to the probes in the array design. There are additional standard steps required in signal processing to correct for hybridization issues such as noise, as is the case with most array technologies.
The MeDIP-seq approach, i.e. the coupling of MeDIP with next generation, short-read sequencing technologies such as 454 pyrosequencing or Illumina (Solexa), was first described by Down et al. in 2008. [20] The high-throughput sequencing of the methylated DNA fragments produces a large number of short reads (36-50bp [26] or 400 bp, [27] depending on the technology). The short reads are aligned to a reference genome using alignment software such as Mapping and Assembly with Quality (Maq), which uses a Bayesian approach, along with base and mapping qualities to model error probabilities for the alignments. [28] The reads can then be extended to represent the ~400 to 700 bp fragments from the sonication step. The coverage of these extended reads can be used to estimate the methylation level of the region. A genome browser such as Ensembl can also be used to visualize the data.
Validation of the approach to assess quality and accuracy of the data can be done with quantitative PCR. This is done by comparing a sequence from the MeDIP sample against an unmethylated control sequence. The samples are then run on a gel and the band intensities are compared. [19] The relative intensity serves as the guide for finding enrichment. The results can also be compared with MeDIP-chip results to help determine coverage needed.
The DNA methylation level estimations can be confounded by varying densities of methylated CpG sites across the genome when observing data generated by MeDIP. This can be problematic for analyzing CpG-poor (lower density) regions. One reason for this density issue is its effect on the efficiency of immunoprecipitation. In their study, Down et al. [20] developed a tool to estimate absolute methylation levels from data generated by MeDIP by modeling the density of methylated CpG sites. This tool is called Bayesian tool for methylation analysis (Batman). The study reports the coverage of ~90% of all CpG sites in promoters, gene-coding regions, islands, and regulatory elements where methylation levels can be estimated; this is almost 20 times better coverage than any previous methods.
Studies using MeDIP-seq or MeDIP-chip are both genome-wide approaches that have the common aim of obtaining the functional mapping of the methylome. Once regions of DNA methylation are identified, a number of bioinformatics analyses can be applied to answer certain biological questions. One obvious step is to investigate genes contained in these regions and investigate the functional significance of their repression. For example, silencing of tumour-suppressor genes in cancer can be attributed to DNA methylation. [29] By identifying mutational events leading to hypermethylation and subsequent repression of known tumour-suppressor genes, one can more specifically characterize the contributing factors to the cause of the disease. Alternatively, one can identify genes that are known to be normally methylated but, as a result of some mutation event, is no longer silenced.
Also, one can try and investigate and identify whether some epigenetic regulator has been affected such as DNA methyltransferase (DNMT); [21] in these cases, enrichment may be more limited.
Gene-set analysis (for example using tools like DAVID and GoSeq) has been shown to be severely biased when applied to high-throughput methylation data (e.g. MeDIP-seq and MeDIP-ChIP); it has been suggested that this can be corrected using sample label permutations or using a statistical model to control for differences in the numberes of CpG probes / CpG sites that target each gene. [30]
Limitations to take note when using MeDIP are typical experimental factors. This includes the quality and cross-reactivity of 5mC antibodies used in the procedure. Furthermore, DNA detection methods (i.e. array hybridization and high-throughput sequencing) typically involve well established limitations. Particularly for array-based procedures, as mentioned above, sequences being analyzed are limited to the specific array design used.
Most typical limitations to high-throughput, next generation sequencing apply. The problem of alignment accuracy to repetitive regions in the genome will result in less accurate analysis of methylation in those regions. Also, as was mentioned above, short reads (e.g. 36-50bp from an Illumina Genome Analyzer) represent a part of a sheared fragment when aligned to the genome; therefore, the exact methylation site can fall anywhere within a window that is a function of the fragment size. [19] In this respect, bisulfite sequencing has much higher resolution (down to a single CpG site; single nucleotide level). However, this level of resolution may not be required for most applications, as the methylation status of CpG sites within < 1000 bp has been shown to be significantly correlated. [20]
In biology, epigenetics are stable heritable traits that cannot be explained by changes in DNA sequence, and the study of a type of stable change in cell function that does not involve a change to the DNA sequence. The Greek prefix epi- in epigenetics implies features that are "on top of" or "in addition to" the traditional genetic mechanism of inheritance. Epigenetics usually involves a change that is not erased by cell division, and affects the regulation of gene expression. Such effects on cellular and physiological phenotypic traits may result from environmental factors, or be part of normal development. They can lead to cancer.
The CpG sites or CG sites are regions of DNA where a cytosine nucleotide is followed by a guanine nucleotide in the linear sequence of bases along its 5' → 3' direction. CpG sites occur with high frequency in genomic regions called CpG islands.
Regulation of gene expression, or gene regulation, includes a wide range of mechanisms that are used by cells to increase or decrease the production of specific gene products. Sophisticated programs of gene expression are widely observed in biology, for example to trigger developmental pathways, respond to environmental stimuli, or adapt to new food sources. Virtually any step of gene expression can be modulated, from transcriptional initiation, to RNA processing, and to the post-translational modification of a protein. Often, one gene regulator controls another, and so on, in a gene regulatory network.
DNA methylation is a biological process by which methyl groups are added to the DNA molecule. Methylation can change the activity of a DNA segment without changing the sequence. When located in a gene promoter, DNA methylation typically acts to repress gene transcription. In mammals, DNA methylation is essential for normal development and is associated with a number of key processes including genomic imprinting, X-chromosome inactivation, repression of transposable elements, aging, and carcinogenesis.
An epigenome consists of a record of the chemical changes to the DNA and histone proteins of an organism; these changes can be passed down to an organism's offspring via transgenerational stranded epigenetic inheritance. Changes to the epigenome can result in changes to the structure of chromatin and changes to the function of the genome.
Methylation specific oligonucleotide microarray, also known as MSO microarray, was developed as a technique to map epigenetic methylation changes in DNA of cancer cells.
Bisulfitesequencing (also known as bisulphite sequencing) is the use of bisulfite treatment of DNA before routine sequencing to determine the pattern of methylation. DNA methylation was the first discovered epigenetic mark, and remains the most studied. In animals it predominantly involves the addition of a methyl group to the carbon-5 position of cytosine residues of the dinucleotide CpG, and is implicated in repression of transcriptional activity.
Epigenomics is the study of the complete set of epigenetic modifications on the genetic material of a cell, known as the epigenome. The field is analogous to genomics and proteomics, which are the study of the genome and proteome of a cell. Epigenetic modifications are reversible modifications on a cell's DNA or histones that affect gene expression without altering the DNA sequence. Epigenomic maintenance is a continuous process and plays an important role in stability of eukaryotic genomes by taking part in crucial biological mechanisms like DNA repair. Plant flavones are said to be inhibiting epigenomic marks that cause cancers. Two of the most characterized epigenetic modifications are DNA methylation and histone modification. Epigenetic modifications play an important role in gene expression and regulation, and are involved in numerous cellular processes such as in differentiation/development and tumorigenesis. The study of epigenetics on a global level has been made possible only recently through the adaptation of genomic high-throughput assays.
The Illumina Methylation Assay using the Infinium I platform uses 'BeadChip' technology to generate a comprehensive genome-wide profiling of human DNA methylation. Similar to bisulfite sequencing and pyrosequencing, this method quantifies methylation levels at various loci within the genome. This assay is used for methylation probes on the Illumina Infinium HumanMethylation27 BeadChip. Probes on the 27k array target regions of the human genome to measure methylation levels at 27,578 CpG dinucleotides in 14,495 genes. The Infinium HumanMethylation450 BeadChip array targets > 450,000 methylation sites. In 2016, the Infinium MethylationEPIC BeadChip was released, which interrogates over 850,000 methylation sites across the human genome.
Combined Bisulfite Restriction Analysis is a molecular biology technique that allows for the sensitive quantification of DNA methylation levels at a specific genomic locus on a DNA sequence in a small sample of genomic DNA. The technique is a variation of bisulfite sequencing, and combines bisulfite conversion based polymerase chain reaction with restriction digestion. Originally developed to reliably handle minute amounts of genomic DNA from microdissected paraffin-embedded tissue samples, the technique has since seen widespread usage in cancer research and epigenetics studies.
Bayesian tool for methylation analysis, also known as BATMAN, is a statistical tool for analysing methylated DNA immunoprecipitation (MeDIP) profiles. It can be applied to large datasets generated using either oligonucleotide arrays (MeDIP-chip) or next-generation sequencing (MeDIP-seq), providing a quantitative estimation of absolute methylation state in a region of interest.
Cancer epigenetics is the study of epigenetic modifications to the DNA of cancer cells that do not involve a change in the nucleotide sequence, but instead involve a change in the way the genetic code is expressed. Epigenetic mechanisms are necessary to maintain normal sequences of tissue specific gene expression and are crucial for normal development. They may be just as important, if not even more important, than genetic mutations in a cell's transformation to cancer. The disturbance of epigenetic processes in cancers, can lead to a loss of expression of genes that occurs about 10 times more frequently by transcription silencing than by mutations. As Vogelstein et al. points out, in a colorectal cancer there are usually about 3 to 6 driver mutations and 33 to 66 hitchhiker or passenger mutations. However, in colon tumors compared to adjacent normal-appearing colonic mucosa, there are about 600 to 800 heavily methylated CpG islands in the promoters of genes in the tumors while these CpG islands are not methylated in the adjacent mucosa. Manipulation of epigenetic alterations holds great promise for cancer prevention, detection, and therapy. In different types of cancer, a variety of epigenetic mechanisms can be perturbed, such as the silencing of tumor suppressor genes and activation of oncogenes by altered CpG island methylation patterns, histone modifications, and dysregulation of DNA binding proteins. There are several medications which have epigenetic impact, that are now used in a number of these diseases.
Reduced representation bisulfite sequencing (RRBS) is an efficient and high-throughput technique for analyzing the genome-wide methylation profiles on a single nucleotide level. It combines restriction enzymes and bisulfite sequencing to enrich for areas of the genome with a high CpG content. Due to the high cost and depth of sequencing to analyze methylation status in the entire genome, Meissner et al. developed this technique in 2005 to reduce the amount of nucleotides required to sequence to 1% of the genome. The fragments that comprise the reduced genome still include the majority of promoters, as well as regions such as repeated sequences that are difficult to profile using conventional bisulfite sequencing approaches.
Single-cell sequencing examines the nucleic acid sequence information from individual cells with optimized next-generation sequencing technologies, providing a higher resolution of cellular differences and a better understanding of the function of an individual cell in the context of its microenvironment. For example, in cancer, sequencing the DNA of individual cells can give information about mutations carried by small populations of cells. In development, sequencing the RNAs expressed by individual cells can give insight into the existence and behavior of different cell types. In microbial systems, a population of the same species can appear genetically clonal. Still, single-cell sequencing of RNA or epigenetic modifications can reveal cell-to-cell variability that may help populations rapidly adapt to survive in changing environments.
Whole genome bisulfite sequencing is a next-generation sequencing technology used to determine the DNA methylation status of single cytosines by treating the DNA with sodium bisulfite before high-throughput DNA sequencing. The DNA methylation status at various genes can reveal information regarding gene regulation and transcriptional activities. This technique was developed in 2009 along with reduced representation bisulfite sequencing after bisulfite sequencing became the gold standard for DNA methylation analysis.
DNA methylation in cancer plays a variety of roles, helping to change the healthy cells by regulation of gene expression to a cancer cells or a diseased cells disease pattern. One of the most widely studied DNA methylation dysregulation is the promoter hypermethylation where the CPGs islands in the promoter regions are methylated contributing or causing genes to be silenced.
In epitranscriptomic sequencing, most methods focus on either (1) enrichment and purification of the modified RNA molecules before running on the RNA sequencer, or (2) improving or modifying bioinformatics analysis pipelines to call the modification peaks. Most methods have been adapted and optimized for mRNA molecules, except for modified bisulfite sequencing for profiling 5-methylcytidine which was optimized for tRNAs and rRNAs.
An epigenome-wide association study (EWAS) is an examination of a genome-wide set of quantifiable epigenetic marks, such as DNA methylation, in different individuals to derive associations between epigenetic variation and a particular identifiable phenotype/trait. When patterns change such as DNA methylation at specific loci, discriminating the phenotypically affected cases from control individuals, this is considered an indication that epigenetic perturbation has taken place that is associated, causally or consequentially, with the phenotype.
Single cell epigenomics is the study of epigenomics in individual cells by single cell sequencing. Since 2013, methods have been created including whole-genome single-cell bisulfite sequencing to measure DNA methylation, whole-genome ChIP-sequencing to measure histone modifications, whole-genome ATAC-seq to measure chromatin accessibility and chromosome conformation capture.
Nucleosome Occupancy and Methylome Sequencing (NOMe-seq) is a genomics technique used to simultaneously detect nucleosome positioning and DNA methylation... This method is an extension of bisulfite sequencing, which is the gold standard for determining DNA methylation. NOMe-seq relies on the methyltransferase M.CviPl, which methylates cytosines in GpC dinucleotides unbound by nucleosomes or other proteins, creating a nucleosome footprint. The mammalian genome naturally contains DNA methylation, but only at CpG sites, so GpC methylation can be differentiated from genomic methylation after bisulfite sequencing. This allows simultaneous analysis of the nucleosome footprint and endogenous methylation on the same DNA molecules. In addition to nucleosome foot-printing, NOMe-seq can determine locations bound by transcription factors. Nucleosomes are bound by 147 base pairs of DNA whereas transcription factors or other proteins will only bind a region of approximately 10-80 base pairs. Following treatment with M.CviPl, nucleosome and transcription factor sites can be differentiated based on the size of the unmethylated GpC region.