CUT&RUN sequencing, also known as cleavage under targets and release using nuclease, is a method used to analyze protein interactions with DNA. CUT&RUN sequencing combines antibody-targeted controlled cleavage by micrococcal nuclease with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global DNA binding sites precisely for any protein of interest. Currently, ChIP-Seq is the most common technique utilized to study protein–DNA relations, however, it suffers from a number of practical and economical limitations that CUT&RUN sequencing does not.
CUT&RUN sequencing can be used to examine gene regulation or to analyze transcription factor and other chromatin-associated protein binding. Protein-DNA interactions regulate gene expression and are responsible for many biological processes and disease states. This epigenetic information is complementary to genotype and expression analysis. CUT&RUN is an alternative to the current standard of ChIP-seq. ChIP-Seq suffers from limitations due to the cross linking step in ChIP-Seq protocols that can promote epitope masking and generate false-positive binding sites. [1] [2] As well, ChIP-seq suffers from suboptimal signal-to-noise ratios and poor resolution. [3] CUT&RUN sequencing has the advantage of being a simpler technique with lower costs due to the high signal-to-noise ratio, requiring less depth in sequencing. [4]
Specific DNA sites in direct physical interaction with proteins such as transcription factors can be isolated by Protein-A (pA) conjugated micrococcal nuclease (MNase) bound to a protein of interest. MNase mediated cleavage produces a library of target DNA sites bound to a protein of interest in situ . Sequencing of prepared DNA libraries and comparison to whole-genome sequence databases allows researchers to analyze the interactions between target proteins and DNA, as well as differences in epigenetic chromatin modifications. Therefore, the CUT&RUN method may be applied to proteins and modifications, including transcription factors, polymerases, structural proteins, protein modifications, and DNA modifications.
CUT&RUN is an adaptation and improvement on chromatin endogenous cleavage (ChEC) which uses a DNA-binding protein genetically fused to micrococcal nuclease (MNase). These transcription factor-MNase fusion proteins can cleave DNA around the DNA-binding site of the protein of interest. [5] In the adapted process, purified MNase is tagged with Protein A (pA) which targets an antibody that has been added to the cell and is specific for the DNA-binding protein that is of interest. There are seven general steps to the CUT&RUN process.
The first step required is the hypotonic lysis of the cells of interest to isolate the nuclei. The nuclei are then centrifuged, washed in a buffer solution, complexed with lectin-coated magnetic beads. The Lectin-Nuclei complex is then resuspended with an antibody targeted at the protein of interest. The antibody and nuclei are then incubated in the buffer for approximately 2 hours before the nuclei are washed in buffer to remove unbound antibodies. Next, the nuclei are resuspended in the buffer with Protein-A-MNase and are incubated for 1 hour. The nuclei are then again washed in buffer to remove any unbound protein-A-MNase. Next, the nuclei in tubes are placed in a metal block and placed in ice-water and CaCl2 is added to initiate the calcium dependent nuclease activity of MNase to cleave the DNA around the DNA-binding protein. The protein-A-MNase reaction is quenched by adding chelating agents (EDTA and EGTA). The cleaved DNA fragments are then liberated into the supernatant by incubating the nuclei for an hour before the nuclei is pelleted by centrifugation. The DNA fragments are then extracted from the supernatant and can be used to construct a sequencing library.
Unlike ChIP-Seq there is no size selection required before sequencing. A single sequencing run can scan for genome-wide associations with high resolution, due to the low background achieved by performing the reaction in situ with the CUT&RUN sequencing methodology. ChIP-Seq, by contrast, requires ten times the sequencing depth because of the intrinsically high background associated with the method. [6] The data is then collected and analyzed using software that aligns sample sequences to a known genomic sequence to identify the CUT&RUN DNA fragments. [4]
There are detailed CUT&RUN workflows available in an open-access methods repository.
CUT&RUN sequencing provides low levels of background signal because of in situ profiling which retains in vivo 3D confirmations of transcription factor-DNA interactions, so antibodies access only exposed surfaces. Sensitivity of sequencing depends on the depth of the sequencing run (i.e. the number of mapped sequence tags), the size of the genome and the distribution of the target factor. The sequencing depth is directly correlated with cost and negatively correlated with background. Therefore, low-background CUT&RUN sequencing is inherently more cost-effective than high-background ChIP-Sequencing.
There have already been a number of research projects that have made use of the new CUT&RUN technology.
In humans, researchers looking at fetal globin gene promoters have used CUT&RUN to investigate the involvement of the protein BCL11A in mediating the function of the HBBP1 gene region, [12] [13] highlighting a potential target for therapeutic genome editing for hemoglobinopathies.
A research group has used CUT&RUN to identify intermediates involved in nucleosome disruption during DNA transcription, [14] validating a general strategy for structural epigenomics.
In humans and in African green monkeys, researchers using CUT&RUN determined that the CENP-B protein (an important protein in centromere formation) and binding sites are specific to great ape centromeres, [15] addressing the paradox that CENP-B, which is required for artificial centromere function, is non-essential.
As with many high-throughput sequencing approaches, CUT&RUN-seq generates extremely large data sets, for which appropriate computational analysis methods are required. To predict DNA-binding sites from CUT&RUN-seq read count data, peak calling methods have been developed.
Peak calling is a process where an algorithm is used to predict the regions of the genome that a transcription factor binds to by finding regions of the genome that have many mapped reads from a ChIP-seq or CUT&RUN-seq experiment. MACS is a particularly popular peak calling algorithm for ChIP-seq data. [16] SEACR is a highly selective peak caller that definitively validates the accuracy of CUT&RUN for datasets with known true negatives. [17]
To identify the causal DNA-binding motif for CUT&RUN-seq peak calls one can apply the MEME motif-finding program to the CUT&RUN sequences. This involves using a position-specific scoring matrix (PSSM) along with the Motif Alignment and Search Tool (MAST) to identify motifs in a reference genome that match the acquired sequence reads. [4] This process allows the identification of the transcription-factor binding motif, or if the binding motif was previously known, this process can act to confirm the success of the experiment [18]
The primary limitation of CUT&RUN-seq is the likelihood of over-digestion of DNA due to inappropriate timing of the Calcium-dependent MNase reaction. A similar limitation exists for contemporary ChIP-Seq protocols where enzymatic or sonicated DNA shearing must be optimized. As with ChIP-Seq, a good quality antibody targeting the protein of interest is required.
ChIP-on-chip is a technology that combines chromatin immunoprecipitation ('ChIP') with DNA microarray ("chip"). Like regular ChIP, ChIP-on-chip is used to investigate interactions between proteins and DNA in vivo. Specifically, it allows the identification of the cistrome, the sum of binding sites, for DNA-binding proteins on a genome-wide basis. Whole-genome analysis can be performed to determine the locations of binding sites for almost any protein of interest. As the name of the technique suggests, such proteins are generally those operating in the context of chromatin. The most prominent representatives of this class are transcription factors, replication-related proteins, like origin recognition complex protein (ORC), histones, their variants, and histone modifications.
ChIP-sequencing, also known as ChIP-seq, is a method used to analyze protein interactions with DNA. ChIP-seq combines chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global binding sites precisely for any protein of interest. Previously, ChIP-on-chip was the most common technique utilized to study these protein–DNA relations.
DNA adenine methyltransferase identification, often abbreviated DamID, is a molecular biology protocol used to map the binding sites of DNA- and chromatin-binding proteins in eukaryotes. DamID identifies binding sites by expressing the proposed DNA-binding protein as a fusion protein with DNA methyltransferase. Binding of the protein of interest to DNA localizes the methyltransferase in the region of the binding site. Adenine methylation does not occur naturally in eukaryotes and therefore adenine methylation in any region can be concluded to have been caused by the fusion protein, implying the region is located near a binding site. DamID is an alternate method to ChIP-on-chip or ChIP-seq.
Chromatin immunoprecipitation (ChIP) is a type of immunoprecipitation experimental technique used to investigate the interaction between proteins and DNA in the cell. It aims to determine whether specific proteins are associated with specific genomic regions, such as transcription factors on promoters or other DNA binding sites, and possibly define cistromes. ChIP also aims to determine the specific location in the genome that various histone modifications are associated with, indicating the target of the histone modifiers. ChIP is crucial for the advancements in the field of epigenomics and learning more about epigenetic phenomena.
H3K27ac is an epigenetic modification to the DNA packaging protein histone H3. It is a mark that indicates acetylation of the lysine residue at N-terminal position 27 of the histone H3 protein.
H3K9me3 is an epigenetic modification to the DNA packaging protein Histone H3. It is a mark that indicates the tri-methylation at the 9th lysine residue of the histone H3 protein and is often associated with heterochromatin.
H3K9me2 is an epigenetic modification to the DNA packaging protein Histone H3. It is a mark that indicates the di-methylation at the 9th lysine residue of the histone H3 protein. H3K9me2 is strongly associated with transcriptional repression. H3K9me2 levels are higher at silent compared to active genes in a 10kb region surrounding the transcriptional start site. H3K9me2 represses gene expression both passively, by prohibiting acetylation as therefore binding of RNA polymerase or its regulatory factors, and actively, by recruiting transcriptional repressors. H3K9me2 has also been found in megabase blocks, termed Large Organised Chromatin K9 domains (LOCKS), which are primarily located within gene-sparse regions but also encompass genic and intergenic intervals. Its synthesis is catalyzed by G9a, G9a-like protein, and PRDM2. H3K9me2 can be removed by a wide range of histone lysine demethylases (KDMs) including KDM1, KDM3, KDM4 and KDM7 family members. H3K9me2 is important for various biological processes including cell lineage commitment, the reprogramming of somatic cells to induced pluripotent stem cells, regulation of the inflammatory response, and addiction to drug use.
H3K36me3 is an epigenetic modification to the DNA packaging protein Histone H3. It is a mark that indicates the tri-methylation at the 36th lysine residue of the histone H3 protein and often associated with gene bodies.
H3K79me2 is an epigenetic modification to the DNA packaging protein Histone H3. It is a mark that indicates the di-methylation at the 79th lysine residue of the histone H3 protein. H3K79me2 is detected in the transcribed regions of active genes.
H2BK5ac is an epigenetic modification to the DNA packaging protein Histone H2B. It is a mark that indicates the acetylation at the 5th lysine residue of the histone H2B protein. H2BK5ac is involved in maintaining stem cells and colon cancer.
H4K12ac is an epigenetic modification to the DNA packaging protein histone H4. It is a mark that indicates the acetylation at the 12th lysine residue of the histone H4 protein. H4K12ac is involved in learning and memory. It is possible that restoring this modification could reduce age-related decline in memory.
H3K14ac is an epigenetic modification to the DNA packaging protein Histone H3. It is a mark that indicates the acetylation at the 14th lysine residue of the histone H3 protein.
H3K36ac is an epigenetic modification to the DNA packaging protein Histone H3. It is a mark that indicates the acetylation at the 36th lysine residue of the histone H3 protein.
CUT&Tag-sequencing, also known as cleavage under targets and tagmentation, is a method used to analyze protein interactions with DNA. CUT&Tag-sequencing combines antibody-targeted controlled cleavage by a protein A-Tn5 fusion with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global DNA binding sites precisely for any protein of interest. Currently, ChIP-Seq is the most common technique utilized to study protein–DNA relations, however, it suffers from a number of practical and economical limitations that CUT&RUN and CUT&Tag sequencing do not. CUT&Tag sequencing is an improvement over CUT&RUN because it does not require cells to be lysed or chromatin to be fractionated. CUT&RUN is not suitable for single-cell platforms so CUT&Tag is advantageous for these.
ChIL sequencing (ChIL-seq), also known as Chromatin Integration Labeling sequencing, is a method used to analyze protein interactions with DNA. ChIL-sequencing combines antibody-targeted controlled cleavage by Tn5 transposase with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global DNA binding sites precisely for any protein of interest. Currently, ChIP-Seq is the most common technique utilized to study protein–DNA relations, however, it suffers from a number of practical and economical limitations that ChIL-Sequencing does not. ChIL-Seq is a precise technique that reduces sample loss could be applied to single-cells.
H3K36me2 is an epigenetic modification to the DNA packaging protein Histone H3. It is a mark that indicates the di-methylation at the 36th lysine residue of the histone H3 protein.
MNase-seq, short for micrococcal nuclease digestion with deep sequencing, is a molecular biological technique that was first pioneered in 2006 to measure nucleosome occupancy in the C. elegans genome, and was subsequently applied to the human genome in 2008. Though, the term ‘MNase-seq’ had not been coined until a year later, in 2009. Briefly, this technique relies on the use of the non-specific endo-exonuclease micrococcal nuclease, an enzyme derived from the bacteria Staphylococcus aureus, to bind and cleave protein-unbound regions of DNA on chromatin. DNA bound to histones or other chromatin-bound proteins may remain undigested. The uncut DNA is then purified from the proteins and sequenced through one or more of the various Next-Generation sequencing methods.
H3R17me2 is an epigenetic modification to the DNA packaging protein histone H3. It is a mark that indicates the di-methylation at the 17th arginine residue of the histone H3 protein. In epigenetics, arginine methylation of histones H3 and H4 is associated with a more accessible chromatin structure and thus higher levels of transcription. The existence of arginine demethylases that could reverse arginine methylation is controversial.
H3R26me2 is an epigenetic modification to the DNA packaging protein histone H3. It is a mark that indicates the di-methylation at the 26th arginine residue of the histone H3 protein. In epigenetics, arginine methylation of histones H3 and H4 is associated with a more accessible chromatin structure and thus higher levels of transcription. The existence of arginine demethylases that could reverse arginine methylation is controversial.
H3R8me2 is an epigenetic modification to the DNA packaging protein histone H3. It is a mark that indicates the di-methylation at the 8th arginine residue of the histone H3 protein. In epigenetics, arginine methylation of histones H3 and H4 is associated with a more accessible chromatin structure and thus higher levels of transcription. The existence of arginine demethylases that could reverse arginine methylation is controversial.