ChIP-exo is a chromatin immunoprecipitation based method for mapping the locations at which a protein of interest (transcription factor) binds to the genome. It is a modification of the ChIP-seq protocol, improving the resolution of binding sites from hundreds of base pairs to almost one base pair. It employs the use of exonucleases to degrade strands of the protein-bound DNA in the 5'-3' direction to within a small number of nucleotides of the protein binding site. The nucleotides of the exonuclease-treated ends are determined using some combination of DNA sequencing, microarrays, and PCR. These sequences are then mapped to the genome to identify the locations on the genome at which the protein binds.
Chromatin immunoprecipitation (ChIP) techniques have been in use since 1984 [1] to detect protein-DNA interactions. There have been many variations on ChIP to improve the quality of results. One such improvement, ChIP-on-chip (ChIP-chip), combines ChIP with microarray technology. This technique has limited sensitivity and specificity, especially in vivo where microarrays are constrained by thousands of proteins present in the nuclear compartment, resulting in a high rate of false positives. [2] Next came ChIP-sequencing (ChIP-seq), which combines ChIP with high-throughput sequencing. [3] However, the heterogeneous nature of sheared DNA fragments maps binding sites to within ±300 base pairs, limiting specificity. Secondly, contaminating DNA presents a grave problem since so few genetic loci are cross-linked to the protein of interest, making any non-specific genomic DNA a significant source of background noise. [4]
To address these problems, Rhee and Pugh revised the classic nuclease protection assay to develop ChIP-exo. [5] This new ChIP technique relies on a lambda exonuclease that degrades only, and all, unbound double-stranded DNA in the 5′-3′ direction. Briefly, a protein of interest (engineering one with an epitope tag can be useful for immunoprecipitation) is crosslinked in vivo to its natural binding locations across a genome using formaldehyde.
Cells are then collected, broken open, and the chromatin sheared and solubilized by sonication. An antibody is then used to immunoprecipitate the protein of interest, along with the crosslinked DNA. DNA PCR adaptors are then ligated to the ends, which serve as a priming point for second strand DNA synthesis after the exonuclease digestion. Lambda exonuclease then digests double DNA strands from the 5′ end until digestion is blocked at the border of the protein-DNA covalent interaction. Most contaminating DNA is degraded by the addition of a second single-strand specific exonuclease. After the cross-linking is reversed, the primers to the PCR adaptors are extended to form double stranded DNA, and a second adaptor is ligated to 5′ ends to demarcate the precise location of exonuclease digestion cessation. The library is then amplified by PCR, and the products are identified by high throughput sequencing. This method allows for resolution of up to a single base pair for any protein binding site within any genome, which is a much higher resolution than either ChIP-chip or ChIP-seq.
ChIP-exo has been shown to give up to single base pair resolution in identifying protein binding locations. This is in contrast to ChIP-seq which can locate a protein's binding site only to with ±300 base pairs. [4]
Contamination of non-protein-bound DNA fragments can result in a high rate of false positives and negatives in ChIP experiments. The addition of exonucleases to the process not only improves resolution of binding-site calling, but removes contaminating DNA from the solution before sequencing. [4]
Proteins that are inefficiently bound to a nucleotide fragment are more likely to be detected by ChIP-exo. This has allowed, for example, the recognition of more CTCF transcription factor binding sites than previously discovered. [5]
Due to the higher resolution and reduced background, less depth of sequencing coverage is needed when using ChIP-exo. [4]
If a protein-DNA complex has multiple locations of cross-linking within a single binding event, then it can appear as though there are multiple distinct binding events. This likely results from these proteins being denatured and cross-linking at one of the available binding sites within the same event. The exonuclease would then stop at one of the bound sites, depending on which site the protein is cross-linked to. [5]
As with any ChIP-based method, a suitable antibody for the protein of interest needs to be available in order to use this technique.
Rhee and Pugh introduce ChIP-exo by performing analyses on a small collection of transcription factors: Reb1, Gal4, Phd1, Rap1 in yeast and CTCF in human. Reb1 sites were often found in clusters and these clusters had ~10-fold higher occupancy than expected. Secondary sites in clusters were found ~40 bp from a primary binding site. Binding motifs of Gal4 showed a strong preference for three of the four nucleotides, suggesting a negative interaction between Gal4 and the excluded nucleotide. Phd1 recognizes three different motifs which explains previous reports of the ambiguity of Phd1's binding motif. Rap1 was found to recognize four motifs.
Ribosomal protein genes bound by this protein had a tendency to use a particular motif with a stronger consensus sequence. Other genes often used clusters of weaker consensus motifs, possibly to achieve a similar occupancy. Binding motifs of CTCF employed four "modules". Half of the bound CTCF sites used modules 1 and 2, while the rest used some combination of the four. It is believed that CTCF uses its zinc fingers to recognize different combinations of these modules. [5]
Rhee and Pugh analyzed pre-initiation complex (PIC) structure and organization in Saccharomyces genomes. Using ChIP-exo, they were able to, among other discoveries, precisely identify TATA-like features in promoters reported to be TATA-less. [6]
ChIP-on-chip is a technology that combines chromatin immunoprecipitation ('ChIP') with DNA microarray ("chip"). Like regular ChIP, ChIP-on-chip is used to investigate interactions between proteins and DNA in vivo. Specifically, it allows the identification of the cistrome, the sum of binding sites, for DNA-binding proteins on a genome-wide basis. Whole-genome analysis can be performed to determine the locations of binding sites for almost any protein of interest. As the name of the technique suggests, such proteins are generally those operating in the context of chromatin. The most prominent representatives of this class are transcription factors, replication-related proteins, like origin recognition complex protein (ORC), histones, their variants, and histone modifications.
Chromosome conformation capture techniques are a set of molecular biology methods used to analyze the spatial organization of chromatin in a cell. These methods quantify the number of interactions between genomic loci that are nearby in 3-D space, but may be separated by many nucleotides in the linear genome. Such interactions may result from biological functions, such as promoter-enhancer interactions, or from random polymer looping, where undirected physical motion of chromatin causes loci to collide. Interaction frequencies may be analyzed directly, or they may be converted to distances and used to reconstruct 3-D structures.
SOLiD (Sequencing by Oligonucleotide Ligation and Detection) is a next-generation DNA sequencing technology developed by Life Technologies and has been commercially available since 2006. This next generation technology generates 108 - 109 small sequence reads at one time. It uses 2 base encoding to decode the raw data generated by the sequencing platform into sequence data.
ChIP-sequencing, also known as ChIP-seq, is a method used to analyze protein interactions with DNA. ChIP-seq combines chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global binding sites precisely for any protein of interest. Previously, ChIP-on-chip was the most common technique utilized to study these protein–DNA relations.
DNA adenine methyltransferase identification, often abbreviated DamID, is a molecular biology protocol used to map the binding sites of DNA- and chromatin-binding proteins in eukaryotes. DamID identifies binding sites by expressing the proposed DNA-binding protein as a fusion protein with DNA methyltransferase. Binding of the protein of interest to DNA localizes the methyltransferase in the region of the binding site. Adenine methylation does not occur naturally in eukaryotes and therefore adenine methylation in any region can be concluded to have been caused by the fusion protein, implying the region is located near a binding site. DamID is an alternate method to ChIP-on-chip or ChIP-seq.
RIP-chip is a molecular biology technique which combines RNA immunoprecipitation with a microarray. The purpose of this technique is to identify which RNA sequences interact with a particular RNA binding protein of interest in vivo. It can also be used to determine relative levels of gene expression, to identify subsets of RNAs which may be co-regulated, or to identify RNAs that may have related functions. This technique provides insight into the post-transcriptional gene regulation which occurs between RNA and RNA binding proteins.
Epigenomics is the study of the complete set of epigenetic modifications on the genetic material of a cell, known as the epigenome. The field is analogous to genomics and proteomics, which are the study of the genome and proteome of a cell. Epigenetic modifications are reversible modifications on a cell's DNA or histones that affect gene expression without altering the DNA sequence. Epigenomic maintenance is a continuous process and plays an important role in stability of eukaryotic genomes by taking part in crucial biological mechanisms like DNA repair. Plant flavones are said to be inhibiting epigenomic marks that cause cancers. Two of the most characterized epigenetic modifications are DNA methylation and histone modification. Epigenetic modifications play an important role in gene expression and regulation, and are involved in numerous cellular processes such as in differentiation/development and tumorigenesis. The study of epigenetics on a global level has been made possible only recently through the adaptation of genomic high-throughput assays.
Chromatin Interaction Analysis by Paired-End Tag Sequencing is a technique that incorporates chromatin immunoprecipitation (ChIP)-based enrichment, chromatin proximity ligation, Paired-End Tags, and High-throughput sequencing to determine de novo long-range chromatin interactions genome-wide.
Chromatin immunoprecipitation (ChIP) is a type of immunoprecipitation experimental technique used to investigate the interaction between proteins and DNA in the cell. It aims to determine whether specific proteins are associated with specific genomic regions, such as transcription factors on promoters or other DNA binding sites, and possibly define cistromes. ChIP also aims to determine the specific location in the genome that various histone modifications are associated with, indicating the target of the histone modifiers. ChIP is crucial for the advancements in the field of epigenomics and learning more about epigenetic phenomena.
H3K4me3 is an epigenetic modification to the DNA packaging protein Histone H3 that indicates tri-methylation at the 4th lysine residue of the histone H3 protein and is often involved in the regulation of gene expression. The name denotes the addition of three methyl groups (trimethylation) to the lysine 4 on the histone H3 protein.
Selective microfluidics-based ligand enrichment followed by sequencing (SMiLE-seq) is a technique developed for the rapid identification of DNA binding specificities and affinities of full length monomeric and dimeric transcription factors in a fast and semi-high-throughput fashion.
CUT&RUN sequencing, also known as cleavage under targets and release using nuclease, is a method used to analyze protein interactions with DNA. CUT&RUN sequencing combines antibody-targeted controlled cleavage by micrococcal nuclease with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global DNA binding sites precisely for any protein of interest. Currently, ChIP-Seq is the most common technique utilized to study protein–DNA relations, however, it suffers from a number of practical and economical limitations that CUT&RUN sequencing does not.
H3K9me3 is an epigenetic modification to the DNA packaging protein Histone H3. It is a mark that indicates the tri-methylation at the 9th lysine residue of the histone H3 protein and is often associated with heterochromatin.
H3K79me2 is an epigenetic modification to the DNA packaging protein Histone H3. It is a mark that indicates the di-methylation at the 79th lysine residue of the histone H3 protein. H3K79me2 is detected in the transcribed regions of active genes.
CUT&Tag-sequencing, also known as cleavage under targets and tagmentation, is a method used to analyze protein interactions with DNA. CUT&Tag-sequencing combines antibody-targeted controlled cleavage by a protein A-Tn5 fusion with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global DNA binding sites precisely for any protein of interest. Currently, ChIP-Seq is the most common technique utilized to study protein–DNA relations, however, it suffers from a number of practical and economical limitations that CUT&RUN and CUT&Tag sequencing do not. CUT&Tag sequencing is an improvement over CUT&RUN because it does not require cells to be lysed or chromatin to be fractionated. CUT&RUN is not suitable for single-cell platforms so CUT&Tag is advantageous for these.
ChIL sequencing (ChIL-seq), also known as Chromatin Integration Labeling sequencing, is a method used to analyze protein interactions with DNA. ChIL-sequencing combines antibody-targeted controlled cleavage by Tn5 transposase with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global DNA binding sites precisely for any protein of interest. Currently, ChIP-Seq is the most common technique utilized to study protein–DNA relations, however, it suffers from a number of practical and economical limitations that ChIL-Sequencing does not. ChIL-Seq is a precise technique that reduces sample loss could be applied to single-cells.
H3K36me2 is an epigenetic modification to the DNA packaging protein Histone H3. It is a mark that indicates the di-methylation at the 36th lysine residue of the histone H3 protein.
MNase-seq, short for micrococcal nuclease digestion with deep sequencing, is a molecular biological technique that was first pioneered in 2006 to measure nucleosome occupancy in the C. elegans genome, and was subsequently applied to the human genome in 2008. Though, the term ‘MNase-seq’ had not been coined until a year later, in 2009. Briefly, this technique relies on the use of the non-specific endo-exonuclease micrococcal nuclease, an enzyme derived from the bacteria Staphylococcus aureus, to bind and cleave protein-unbound regions of DNA on chromatin. DNA bound to histones or other chromatin-bound proteins may remain undigested. The uncut DNA is then purified from the proteins and sequenced through one or more of the various Next-Generation sequencing methods.
H3K36me is an epigenetic modification to the DNA packaging protein Histone H3, specifically, the mono-methylation at the 36th lysine residue of the histone H3 protein.
Proximity ligation-assisted chromatin immunoprecipitation sequencing (PLAC-seq) is a chromatin conformation capture(3C)-based technique to detect and quantify genomic chromatin structure from a protein-centric approach. PLAC-seq combines in situ Hi-C and chromatin immunoprecipitation (ChIP), which allows for the identification of long-range chromatin interactions at a high resolution with low sequencing costs. Mapping long-range 3-dimensional(3D) chromatin interactions is important in identifying transcription enhancers and non-coding variants that can be linked to human diseases.