Cross-linking and immunoprecipitation (CLIP, or CLIP-seq) is a method used in molecular biology that combines UV crosslinking with immunoprecipitation in order to identify RNA binding sites of proteins on a transcriptome-wide scale, thereby increasing our understanding of post-transcriptional regulatory networks. [1] [2] [3] CLIP can be used either with antibodies against endogenous proteins, or with common peptide tags (including FLAG, V5, HA, and others) or affinity purification, which enables the possibility of profiling model organisms or RBPs otherwise lacking suitable antibodies. [4]
CLIP begins with the in vivo cross-linking of RNA-protein complexes using ultraviolet light (UV). Upon UV exposure, covalent bonds are formed between proteins and nucleic acids that are in close proximity (on the order of Angstroms apart). [5] The cross-linked cells are then lysed, RNA is fragmented, and the protein of interest is isolated via immunoprecipitation. In order to allow for priming of reverse transcription, RNA adapters are ligated to the 3' ends, and RNA fragments are labelled to enable the analysis of the RNA-protein complexes after they have been separated from free RNA using gel electrophoresis and membrane transfer. Proteinase K digestion is then performed in order to remove protein from the crosslinked RNA, which leaves a few amino acids at the crosslink site. This often leads to truncation of cDNAs at the crosslinked nucleotide, which is exploited in variants such as iCLIP to increase the resolution of the method. [6] cDNA is then synthesized via RT-PCR followed by high-throughput sequencing followed by mapping the reads back to the transcriptome and other computational analyses to study the interaction sites. [2]
CLIP was originally undertaken to study interactions between the neuron-specific RNA-binding protein and splicing factors NOVA1 and NOVA2 in the mouse brain, identifying RNA binding sites that contained the expected Nova-binding motifs. Sequencing of the cDNA library identified many positions close to alternative exons, several of which were found to require Nova1/2 for their brains-specific splicing patterns. [1] In 2008, CLIP was combined with high-throughput sequencing (termed "HITS-CLIP") to generate genome-wide protein-RNA interaction maps for Nova; [7] since then a number of other RNA-binding proteins have been studied with CLIP, including PTBP1, [8] RbFox2 (where it was referred to as "CLIP-seq"), [9] SFRS1, [10] Argonaute, [11] [12] [13] hnRNP C, [6] the Fragile-X mental retardation protein FMRP, [14] [15] Ptbp2 (in the mouse brain), [16] Mbnl2, [17] the nElavl proteins (the neuron-specific Hu proteins), [18] and has been applied to RNA binding proteins from all kingdoms of life, including prokaryotes. [19] CLIP analysis of the RNA-binding protein Argonaute led to identification of microRNA targets [20] by decoding microRNA-mRNA and protein-RNA interaction maps in the mouse brain [11] [21] and subsequently in budding yeast (Saccharomyces cerevisiae), [22] Caenorhabditis elegans , [23] embryonic stem cells [24] and tissue culture cells. [25]
HITS-CLIP combines UV cross-linking and immunoprecipitation with high-throughput sequencing to identify binding sites of RNA-binding proteins. [5] HITS-CLIP also introduced the addition of dinucleotide barcodes to primers, providing the ability to sequence and then deconvolute multiple experiments simultaneously. [7] With analysis of cross-linking induced mutation sites (CIMS) at high sequencing depths, crosslink sites can be differentiated from other sources of sequence variation. [28]
PAR-CLIP (photoactivatable ribonucleoside–enhanced cross-linking and immunoprecipitation) is also used for identifying the binding sites of cellular RNA-binding proteins (RBPs) and microRNA-containing ribonucleoprotein complexes (miRNPs). [25] The method relies on the incorporation of photoreactive ribonucleoside analogs, such as 4-thiouridine (4-SU) and 6-thioguanosine (6-SG) into nascent RNA transcripts by living cells. Irradiation of the cells by UV light of 365 nm induces efficient cross-linking of photoreactive nucleoside-labeled cellular RNAs to interacting RBPs. Immunoprecipitation of the RBP of interest is followed by the isolation of the cross-linked and co-immunoprecipitated RNA. The isolated RNA is converted into a cDNA library and deep sequenced using high-throughput sequencing technology. Cross-linking the 4-SU and 6-SG analogs results in thymidine to cytidine, and guanosine to adenosine transitions respectively. As a result, PAR-CLIP can identify binding site locations with high accuracy.
However, PAR-CLIP is limited mainly to cultured cells, and nucleoside cytotoxicity is a concern; [2] it has been reported that 4-SU inhibits ribosomal RNA synthesis, induces a nucleolar stress response, and reduces cell proliferation. [29] PAR-CLIP has been employed to determine the transcriptome-wide binding sites of several known RBPs and microRNA-containing ribonucleoprotein complexes at high resolution. This includes the miRNA targeting AGO and TNRC6 proteins. [21]
iCLIP (individual nucleotide–resolution crosslinking and immunoprecipitation) is a variant of CLIP that enabled amplification of truncated cDNAs, which are produced when reverse transcription stops prematurely at the cross-link site. [6] Other approaches to identify protein-RNA crosslink sites include mutational analysis of read-through cDNAs, such as nucleotide transitions in PAR-CLIP, [25] or rare errors introduced by reverse transcriptase when it reads through the crosslink sites in standard HITS-CLIP methods, termed Crosslink induced mutation site (CIMS) analysis. [30]
iCLIP also added a random sequence (unique molecular identifier, UMI) along with experimental barcodes to the primer used for reverse transcription, thereby barcoding unique cDNAs to minimise any errors or quantitative biases of PCR, and thus improving the quantification of binding events. Enabling amplification of truncated cDNAs led to identification of the sites of RNA-protein interactions at high resolution by analysing the starting position of truncated cDNAs, as well as their precise quantification using UMIs with software called "iCount". These innovations of iCLIP were adopted by later variants of CLIP such as eCLIP and irCLIP. [4] Another modification of iCLIP, miCLIP, identifies methylated RNA sites with use of mutant enzyme or modification-specific antibody. [31] [32] [2] The quantitative nature of iCLIP enabled comparison across samples at the level of full RNAs, [33] or to study competitive binding of multiple RNA-binding proteins [34] or subtle changes in binding of a mutant protein at the level of binding peaks. [35]
eCLIP (enhanced CrossLinking and ImmunoPrecipitation followed by high-throughput sequencing) is also used to map RBP binding sites on RNAs transcriptome-wide. [36] eCLIP was designed to improve upon iCLIP by increasing the efficiency in converting purified RNA fragments into cDNA library. At its publication, eCLIP was reported to increase such efficiency by >1000-fold, which not only decreases wasted sequencing of PCR duplicate molecules, but also dramatically decreases experimental failures during the CLIP procedure. Additionally, the amplification in eCLIP is now comparable to RNA-seq, enabling rigorous quantitative normalization against paired input controls (to remove background at ribosomal and other highly abundant RNAs) as well as quantitative comparison across peaks and samples, enabling the ability to detect allele-specific binding or differential RNA binding between conditions.
As in other CLIP methods, eCLIP relies on RBP-RNA interactions covalently linked using UV crosslinking of live cells. Cells are then lysed, and RNA is fragmented using limited RNase treatment. A specific RBP (and its bound RNA) is then immunoprecipitated using an antibody that specifically recognizes the targeted RBP. After ligation of a 3’ RNA adapter, immunoprecipitated material (as well as a paired input sample) are run on denaturing protein gels and transferred to nitrocellulose membranes. A region from the protein size to 75 kDa above is cut from the membrane and treated with Proteinase K to release RNA. After cleanup, RNA is then reverse transcribed to ssDNA, when a second adapter is ligated. By ligating the second adapter to cDNAs, eCLIP can identify truncated cDNAs, similar to iCLIP, and thereby study RNA-protein interaction sites with high resolution. PCR amplification is then used to obtain sufficient material for high-throughput sequencing. eCLIP can also be used to identify miRNA targets and profile RNA modifications such as m6A.
eCLIP datasets have been produced for over 150 RBPs with validated commercially available antibodies. [37]
sCLIP (simple CLIP) is a technique that requires lower amounts of input RNA and omits radio-labeling of the immunoprecipitated RNA. The method is based on linear amplification of the immunoprecipitated RNA and thereby improves the complexity of the sequencing-library despite significantly reducing the amount of input material and omitting several purification steps. Additionally, it permits a radiolabel-free visualization of immunoprecipitated RNA by using a highly sensitive biotin-based labeling technique. Along with a bioinformatical platform this method is designed to provide deep insights into RNA–protein interactomes in biomedical science, where the amount of starting material is often limited (i.e. in case of precious clinical samples). [38] Additional iCLIP variants have also been developed that retain the individual nucleotide resolution but differ in one or more steps from the original iCLIP method. These include iCLIP2, irCLIP, iiCLIP, and iCLIP1.5, a few to name.
As a modification of CLIP, methylated RNA sites were identified with the use of mutant enzyme or modification-specific antibody with the methods termed miCLIP or m6A-CLIP. [31] [32] [39] [2]
RNA-binding proteins are frequently components of multi-protein complexes, and RNAs from various genes are present in cells at a range of abundance, therefore it is common that RNAs bound to co-purified proteins or non-specifically sticking to beads may be isolated when immunoprecipitating a specific protein. The data specificity obtained using early immunoprecipitation methods such as RIP have been demonstrated to be dependent on the reaction conditions of the experiment, such as protein concentrations and ionic conditions, and reassociation of RNA-binding proteins following cell lysis could lead to detection of artificial interactions. [40] Formaldehyde crosslinking methods have been used to preserve RNA-protein interactions, but these also generate protein-protein cross-links. By employing UV crosslinking that is specific to direct protein-RNA contacts, CLIP avoid protein-protein cross-links and ensures high specificity, while also obtaining positional information on the sites of protein-RNA interactions.
Since UV crosslinking creates a covalent bond, the crosslinked RNA fragments retain a short peptide after Proteinase K digestion, which can be exploited to identify the crosslink site. Reverse transcription most often truncates at the crosslink sites, creating truncated cDNAs that are exploited by iCLIP, while read-through cDNAs often contain mutations at the crosslink site (see HITS-CLIP and PAR-CLIP). [2]
All CLIP library generation protocols require moderate quantities of cells or tissue (50–100 mg), require numerous enzymatic steps, and customised computational analyses. [12] [41] Certain steps are difficult to optimize and frequently have low efficiencies. For example, overdigestion with RNase can decrease the number of identified binding sites and thus needs to be optimised. [27] Crosslinking efficiency also varies between proteins, [42] and nucleotide bias of crosslinking has been reported, [43] for example by comparing cross-linking sites and motifs enriched when protein-RNA complexes are studied in vivo in living cells and in vitro, [44] though methods are being developed to minimise such bias for enriched motif discovery. [45] Computationally predicted miRNA targets derived from TargetScan are comparable to CLIP in identifying miRNA targets, raising questions as to its utility relative to existing predictions. [46] Because CLIP methods rely on immunoprecipitation, crosslinked RNA could in some cases affect antibody-epitope interactions. Finally, significant differences have been observed. Therefore, raw CLIP results require further computational analyses to thoroughly investigate RNA-protein binding site interactions within the cell.
Polyadenylation is the addition of a poly(A) tail to an RNA transcript, typically a messenger RNA (mRNA). The poly(A) tail consists of multiple adenosine monophosphates; in other words, it is a stretch of RNA that has only adenine bases. In eukaryotes, polyadenylation is part of the process that produces mature mRNA for translation. In many bacteria, the poly(A) tail promotes degradation of the mRNA. It, therefore, forms part of the larger process of gene expression.
Immunoprecipitation (IP) is the technique of precipitating a protein antigen out of solution using an antibody that specifically binds to that particular protein. This process can be used to isolate and concentrate a particular protein from a sample containing many thousands of different proteins. Immunoprecipitation requires that the antibody be coupled to a solid substrate at some point in the procedure.
RNA-binding proteins are proteins that bind to the double or single stranded RNA in cells and participate in forming ribonucleoprotein complexes. RBPs contain various structural motifs, such as RNA recognition motif (RRM), dsRNA binding domain, zinc finger and others. They are cytoplasmic and nuclear proteins. However, since most mature RNA is exported from the nucleus relatively quickly, most RBPs in the nucleus exist as complexes of protein and pre-mRNA called heterogeneous ribonucleoprotein particles (hnRNPs). RBPs have crucial roles in various cellular processes such as: cellular function, transport and localization. They especially play a major role in post-transcriptional control of RNAs, such as: splicing, polyadenylation, mRNA stabilization, mRNA localization and translation. Eukaryotic cells express diverse RBPs with unique RNA-binding activity and protein–protein interaction. According to the Eukaryotic RBP Database (EuRBPDB), there are 2961 genes encoding RBPs in humans. During evolution, the diversity of RBPs greatly increased with the increase in the number of introns. Diversity enabled eukaryotic cells to utilize RNA exons in various arrangements, giving rise to a unique RNP (ribonucleoprotein) for each RNA. Although RBPs have a crucial role in post-transcriptional regulation in gene expression, relatively few RBPs have been studied systematically.It has now become clear that RNA–RBP interactions play important roles in many biological processes among organisms.
RNA-binding protein Nova-1 is a protein that in humans is encoded by the NOVA1 gene.
ChIP-sequencing, also known as ChIP-seq, is a method used to analyze protein interactions with DNA. ChIP-seq combines chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global binding sites precisely for any protein of interest. Previously, ChIP-on-chip was the most common technique utilized to study these protein–DNA relations.
Post-transcriptional regulation is the control of gene expression at the RNA level. It occurs once the RNA polymerase has been attached to the gene's promoter and is synthesizing the nucleotide sequence. Therefore, as the name indicates, it occurs between the transcription phase and the translation phase of gene expression. These controls are critical for the regulation of many genes across human tissues. It also plays a big role in cell physiology, being implicated in pathologies such as cancer and neurodegenerative diseases.
RIP-chip is a molecular biology technique which combines RNA immunoprecipitation with a microarray. The purpose of this technique is to identify which RNA sequences interact with a particular RNA binding protein of interest in vivo. It can also be used to determine relative levels of gene expression, to identify subsets of RNAs which may be co-regulated, or to identify RNAs that may have related functions. This technique provides insight into the post-transcriptional gene regulation which occurs between RNA and RNA binding proteins.
Robert Bernard Darnell is an American neurooncologist and neuroscientist, founding director and former CEO of the New York Genome Center, the Robert and Harriet Heilbrunn Professor of Cancer Biology at The Rockefeller University, and an Investigator of the Howard Hughes Medical Institute. His research into rare autoimmune brain diseases led to the invention of the HITS-CLIP method to study RNA regulation, and he is developing ways to explore the regulatory portions—known as the "dark matter"—of the human genome.
Epigenomics is the study of the complete set of epigenetic modifications on the genetic material of a cell, known as the epigenome. The field is analogous to genomics and proteomics, which are the study of the genome and proteome of a cell. Epigenetic modifications are reversible modifications on a cell's DNA or histones that affect gene expression without altering the DNA sequence. Epigenomic maintenance is a continuous process and plays an important role in stability of eukaryotic genomes by taking part in crucial biological mechanisms like DNA repair. Plant flavones are said to be inhibiting epigenomic marks that cause cancers. Two of the most characterized epigenetic modifications are DNA methylation and histone modification. Epigenetic modifications play an important role in gene expression and regulation, and are involved in numerous cellular processes such as in differentiation/development and tumorigenesis. The study of epigenetics on a global level has been made possible only recently through the adaptation of genomic high-throughput assays.
Chromatin immunoprecipitation (ChIP) is a type of immunoprecipitation experimental technique used to investigate the interaction between proteins and DNA in the cell. It aims to determine whether specific proteins are associated with specific genomic regions, such as transcription factors on promoters or other DNA binding sites, and possibly define cistromes. ChIP also aims to determine the specific location in the genome that various histone modifications are associated with, indicating the target of the histone modifiers. ChIP is crucial for the advancements in the field of epigenomics and learning more about epigenetic phenomena.
PAR-CLIP is a biochemical method for identifying the binding sites of cellular RNA-binding proteins (RBPs) and microRNA-containing ribonucleoprotein complexes (miRNPs). The method relies on the incorporation of ribonucleoside analogs that are photoreactive, such as 4-thiouridine (4-SU) and 6-thioguanosine (6-SG), into nascent RNA transcripts by living cells. Irradiation of the cells by ultraviolet light of 365 nm wavelength induces efficient crosslinking of photoreactive nucleoside–labeled cellular RNAs to interacting RBPs. Immunoprecipitation of the RBP of interest is followed by isolation of the crosslinked and coimmunoprecipitated RNA. The isolated RNA is converted into a cDNA library and is deep sequenced using next-generation sequencing technology.
The RNA-binding Proteins Database (RBPDB) is a biological database of RNA-binding protein specificities that includes experimental observations of RNA-binding sites. The experimental results included are both in vitro and in vivo from primary literature. It includes four metazoan species, which are Homo sapiens, Mus musculus, Drosophila melanogaster, and Caenorhabditis elegans. RNA-binding domains included in this database are RNA recognition motif, K homology, CCCH zinc finger, and more domains. As of 2021, the latest RBPDB release includes 1,171 RNA-binding proteins.
High-throughput sequencing of RNA isolated by crosslinking immunoprecipitation (HITS-CLIP) is a variant of CLIP for genome-wide mapping protein–RNA binding sites or RNA modification sites in vivo. HITS-CLIP was originally used to generate genome-wide protein-RNA interaction maps for the neuron-specific RNA-binding protein and splicing factor NOVA1 and NOVA2; since then a number of other splicing factor maps have been generated, including those for PTB, RbFox2, SFRS1, hnRNP C, and even N6-Methyladenosine (m6A) mRNA modifications.
In molecular biology, competing endogenous RNAs regulate other RNA transcripts by competing for shared microRNAs (miRNAs). Models for ceRNA regulation describe how changes in the expression of one or multiple miRNA targets alter the number of unbound miRNAs and lead to observable changes in miRNA activity - i.e., the abundance of other miRNA targets. Models of ceRNA regulation differ greatly. Some describe the kinetics of target-miRNA-target interactions, where changes in the expression of one target species sequester one miRNA species and lead to changes in the dysregulation of the other target species. Others attempt to model more realistic cellular scenarios, where multiple RNA targets are affecting multiple miRNAs and where each target pair is co-regulated by multiple miRNA species. Some models focus on mRNA 3' UTRs as targets, and others consider long non-coding RNA targets as well. It's evident that our molecular-biochemical understanding of ceRNA regulation remains incomplete.
iCLIP is a variant of the original CLIP method used for identifying protein-RNA interactions, which uses UV light to covalently bind proteins and RNA molecules to identify RNA binding sites of proteins. This crosslinking step has generally less background than standard RNA immunoprecipitation (RIP) protocols, because the covalent bond formed by UV light allows RNA to be fragmented, followed by stringent purification, and this also enables CLIP to identify the positions of protein-RNA interactions. As with all CLIP methods, iCLIP allows for a very stringent purification of the linked protein-RNA complexes by stringent washing during immunoprecipitation followed by SDS-PAGE and transfer to nitrocellulose. The labelled protein-RNA complexes are then visualised for quality control, excised from nitrocellulose, and treated with proteinase to release the RNA, leaving only a few amino acids at the crosslink site of the RNA.
Mihaela Zavolan is a system biologist and Professor at the Biozentrum of the University of Basel.
In epitranscriptomic sequencing, most methods focus on either (1) enrichment and purification of the modified RNA molecules before running on the RNA sequencer, or (2) improving or modifying bioinformatics analysis pipelines to call the modification peaks. Most methods have been adapted and optimized for mRNA molecules, except for modified bisulfite sequencing for profiling 5-methylcytidine which was optimized for tRNAs and rRNAs.
Time-resolved RNA sequencing methods are applications of RNA-seq that allow for observations of RNA abundances over time in a biological sample or samples. Second-Generation DNA sequencing has enabled cost effective, high throughput and unbiased analysis of the transcriptome. Normally, RNA-seq is only capable of capturing a snapshot of the transcriptome at the time of sample collection. This necessitates multiple samplings at multiple time points, which increases both monetary and time costs for experiments. Methodological and technological innovations have allowed for the analysis of the RNA transcriptome over time without requiring multiple samplings at various time points.
ChIL sequencing (ChIL-seq), also known as Chromatin Integration Labeling sequencing, is a method used to analyze protein interactions with DNA. ChIL-sequencing combines antibody-targeted controlled cleavage by Tn5 transposase with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global DNA binding sites precisely for any protein of interest. Currently, ChIP-Seq is the most common technique utilized to study protein–DNA relations, however, it suffers from a number of practical and economical limitations that ChIL-Sequencing does not. ChIL-Seq is a precise technique that reduces sample loss could be applied to single-cells.
Ribonucleoprotein Networks Analyzed by Mutational Profiling (RNP-MaP) is a strategy for probing RNA-protein networks and protein binding sites at a nucleotide resolution. Information about RNP assembly and function can facilitate a better understanding of biological mechanisms. RNP-MaP uses NHS-diazirine (SDA), a hetero-bifunctional crosslinker, to freeze RNA-bound proteins in place. Once the RNA-protein crosslinks are formed, MaP reverse transcription is then conducted to reversely transcribe the protein-bound RNAs as well as introduce mutations at the site of RNA-protein crosslinks. Sequencing results of the cDNAs reveal information about both protein-RNA interaction networks and protein binding sites.