In molecular biology, circular RNA (or circRNA) is a type of single-stranded RNA which, unlike linear RNA, forms a covalently closed continuous loop. In circular RNA, the 3' and 5' ends normally present in an RNA molecule have been joined together. This feature confers numerous properties to circular RNA, many of which have only recently been identified.
Many types of circular RNA arise from otherwise protein-coding genes. Some circular RNA have been shown to code for proteins. [1] [2] Some types of circular RNA have also recently shown potential as gene regulators. The biological function of most circular RNA is unclear.
Because circular RNA do not have 5' or 3' ends, they are resistant to exonuclease-mediated degradation and are presumably more stable than most linear RNA in cells. [3] Circular RNA has been linked to some diseases such as cancer. [4]
In contrast to genes in bacteria, eukaryotic genes are split by non-coding sequences called introns. In eukaryotes, as a gene is transcribed from DNA into a messenger RNA (mRNA) transcript, intervening introns are removed, leaving only exons in the mature mRNA, which can subsequently be translated to produce the protein product. [5] The spliceosome, [5] a protein-RNA complex located in the nucleus, catalyzes splicing in the following manner:
Alternative splicing is a phenomenon through which one RNA transcript can yield different protein products based on which segments are considered "introns" and "exons" during a splicing event. [5] Although not specific to humans, it is a partial explanation for the fact that humans and other much simpler species (such as nematodes) have similar numbers of genes (in the range of 20 - 25 thousand). [6] One of the most striking examples of alternative splicing is in the Drosophila DSCAM gene, which can give rise to approximately 30 thousand distinct alternatively spliced isoforms. [7]
Exon scrambling, also called exon shuffling, describes an event in which exons are spliced in a "non-canonical" (atypical) order. There are three ways in which exon scrambling can occur:
The notion that circularized transcripts are byproducts from imperfect splicing is supported by the low abundance and the lack of sequence conservation of most circRNAs, [9] but has been challenged. [8] [10] [3]
Repetitive Alu sequences represent approximately 10% of the human genome. [11] The presence of Alu elements in flanking introns of protein-coding genes adjacent to the first and last exons that form circRNAs, influence the formation of circRNAs. [12] [13] [3] [14] It is important that the flanking intronic Alu elements are complementary, as this enables RNA pairing, which in turn facilitates circRNA synthesis. [15]
RNAs can undergo base modification by RNA editing after transcription. RNA editing occurs mainly in Alu elements of protein-coding genes. [16] A-to-I RNA editing in up- and downstream intronic Alu elements flanking the back-splice site (BSS) can reduces the formation of circRNAs in the human heart. [16] In the failing human heart, a predominant reduction in A-to-I RNA editing leads to an increased formation of circRNAs, which is presumably mediated by better complementary pairing of RNA of the Alu elements flanking the back-splice site. [16]
Early discoveries of circular RNAs led to the belief that they lacked significance due to their rarity. These early discoveries included the analysis of genes like the DCC and Sry genes, and the recent discovery of the human non-coding RNA ANRIL, all of which expressed circular isoforms. CircRNA producing genes like the human ETS-1 gene, the human and rat cytochrome P450 genes, the rat androgen binding protein gene (Shbg), and the human dystrophin gene were also discovered. [17]
In 2012, in an effort to initially identify cancer-specific exon scrambling events, scrambled exons were discovered in large numbers in both normal and cancer cells. It was found that scrambled exon isoforms comprised about 10% of the total transcript isoforms in leukocytes, with 2,748 scrambled isoforms in HeLa and H9 embryonic stem cells being identified. Additionally, about 1 in 50 expressed genes produced scrambled transcript isoforms at least 10% of the time. Tests used to recognize circularity included treating samples with RNase R, an enzyme that degrades linear but not circular RNAs, and testing for the presence of poly-A tails, which are not present in circular molecules. Overall, 98% of scrambled isoforms were found to represent circRNAs, circRNAs were found to be located in the cytoplasm, and circRNAs were found to be abundant. [17] [8]
In 2013, a higher abundance of circRNAs was discovered. Human fibroblast RNA was treated with RNase R to enrich for circular RNAs, followed by the categorization of circular transcripts based on their abundance (low, medium, high). [3] Approximately 1 in 8 expressed genes were found to produce detectable levels of circRNAs, including those of low abundance, which was significantly higher than previously suspected, and was attributed to greater sequencing depth. [3] [8]
At the same time, a computational method to detect circRNAs was developed, leading to de novo detection of circRNAs in humans, mice, and C. elegans, and extensively validating them. The expression of circRNAs was often found to be tissue/developmental stage specific. Additionally, circRNAs were found to have the ability to act as antagonists of miRNAs, microRNAs which interfere with translation of mRNAs, as exemplified by the circRNA CDR1as, which has miRNA binding sites (as seen below). [18]
In 2014, human circRNAs were identified and quantified from ENCODE Ribozero RNA-seq data. Most circRNAs were found to be minor splice isoforms and to be expressed in only a few cell types, with 7,112 human circRNAs having circular fractions (the fraction of similarity an isoform has to transcripts the same locus) of at least 10%. CircRNAs were also found to be no more conserved than their linear controls and, according to ribosome profiling, are not translated.< [9] As previously noted, circRNAs have the ability to act as antagonists of miRNA, which is also known as the potential to act as microRNA sponges. Aside from CDR1as, very few circRNAs have the potential to act as microRNA sponges. As a whole, the majority of circular RNAs were found to be inconsequential side-products of imperfect splicing. [18] [9]
In the same year, CIRCexplorer, a tool used to identify thousands of circRNAs in humans without RNase R RNA-seq data, was developed. The vast majority of identified highly expressed exonic circular RNAs were found to be processed from exons located in the middle of RefSeq genes, suggesting that the circular RNA formation is generally coupled to RNA splicing. It was determined that most circular RNAs contain multiple, most commonly, two to three, exons. Exons from circRNAs with only one circularized exon were found to be much longer than those from circRNAs with multiple circularized exons, indicating that processing may prefer a certain length to maximize exon(s) circularization. The introns of circularized exons generally contain high Alu densities that can form inverted repeated Alu pairs (IRAlus). IRAlus, either convergent or divergent, are juxtaposed across flanking introns of circRNAs in a parallel way with similar distances to adjacent exons. IRAlus, and other non-repetitive, but complementary, sequences were also found to promote circular RNA formation. On the other hand, exon circularization efficiency was determined to be affected by the competition of RNA pairing, such that alternative RNA pairing, and its competition, leads to alternative circularization. Finally, both exon circularization and its regulation were found to be evolutionarily dynamic. [15]
The Cruchaga lab performed the first large scale analyses of circRNA in Alzheimer disease (AD) and demonstrated the role of circRNAs in health and disease. A total of 148 circRNAs were found to be significantly associated in multiple datasets with Alzheimer's disease status and clinical dementia rating (CDR) at death after false discovery rate (FDR) correction. The expression of circRNAs was independent of the lineal form and that circRNA expression was also corrected by cell proportion. CircRNAs were also found to be co-expressed with known causal Alzheimer genes, such as APP and PSEN1, indicating that some circRNAs are also part of the causal pathway. Altogether, circRNA brain expression was found to explain more about Alzheimer's clinical manifestations than the number of APOε4 alleles, suggesting that circRNAs could be used as a potential biomarker for Alzheimer's. [19]
Circular RNAs can be separated into five classes: [20] [21]
Classes of Circular RNAs | Description |
Viroids and the hepatitis delta virus (HDV) | In viroids and HDV, single-stranded circRNAs are vital in RNA replication. Circularity allows for one initiation event to lead to multiple genomic copies in a process otherwise known as rolling circle RNA replication. [22] [23] [24] |
CircRNAs from introns | Circular molecules are produced by introns produced from spliceosomal splicing, tRNA splicing, and group I and group II (self-splicing ribozymes) introns. Group I introns form circRNAs through autocatalytic ribozymal action, and while they can be detected in vivo, their function is yet to be determined. [22] [23] [24] Group II introns also generate circRNAs in vivo. Circular introns produced by eukaryotic spliceosomal splicing are circularized intron lariats known as circular intronic RNAs (ciRNAs). Due to circularization, ciRNAs can avoid degradation and are believed to be highly overrepresented. CiRNA function is currently unknown; however, it is speculated they may play a role in enhancing the transcription of genes they are produced from, as they interact with RNA polymerase II. [21] |
CircRNAs from intermediates in RNA processing reactions | These are first spliced from precursors as linear molecules and then circularized with a ligase. They are essential in allowing for the rearrangement in RNA sequence order and vital in the biogenesis of permuted tRNA genes in certain algae and archaea. [21] |
Noncoding circRNAs in archaea | Certain archaeal species have circRNAs that are produced from excised circularized tRNA introns. Circularization of functional noncoding RNAs is thought to work as a protective mechanism against exonucleases and to promote proper folding. [21] [3] |
CircRNAs in eukaryotes produced by back-splicing | Circular RNAs produced by back-splicing (a form of exon scrambling) occur when a 5′ splice site is joined to an upstream 3′ splice site. Currently, more than 25,000 different circRNAs have been identified in humans. [21] [3] |
A recent study of human circRNAs revealed that these molecules are usually composed of 1–5 exons. [18] Each of these exons can be up to three times longer than the average expressed exon, [3] suggesting that exon length may play a role in deciding which exons to circularize. 85% of circularized exons overlap with exons that code for protein, [18] although the circular RNAs themselves do not appear to be translated. During circRNA formation, exon 2 is often the upstream "acceptor" exon. [8]
Introns surrounding exons that are selected to be circularized are, on average, up to three times longer than those not flanking pre-circle exons, [8] [3] although it is not yet clear why this is the case. Compared to regions not resulting in circles, these introns are much more likely to contain complementary inverted Alu repeats, Alu being the most common transposon in the genome. [3] By the Alu repeats base pairing to one another, it has been proposed that this may enable the splice sites to find each other, thus facilitating circularization. [10] [3]
Introns within the circRNAs are retained at a relatively high frequency (~25%), [9] thus adding extra sequence to the mature circRNAs.
In the cell, circRNAs are predominantly found in the cytoplasm, where the number of circular RNA transcripts derived from a gene can be up to ten times greater than the number of associated linear RNAs generated from that locus. It is unclear how circular RNAs exit the nucleus through a relatively small nuclear pore. Because the nuclear envelope breaks down during mitosis, one hypothesis is that the molecules exit the nucleus during this phase of the cell cycle. [3] However, certain circRNAs, such as CiRS-7/CDR1as, are expressed in neuronal tissues, [18] [25] where mitotic division is not prevalent.
CircRNAs lack a polyadenylated tail and, therefore, are predicted to be less prone to degradation by exonucleases. In 2015, Enuka et al. measured the half-lives of 60 circRNAs and their linear counterparts expressed from the same host gene and revealed that the median half-life of circRNAs of mammary cells (18.8 to 23.7 hours) is at least 2.5 times longer than the median half-life of their linear counterparts (4.0 to 7.4 hours). [26] Generally, the lifetime of RNA molecules defines their response time. [27] Accordingly, it was reported that mammary circRNAs respond slowly to stimulation by growth factors. [26]
CircRNAs have been identified in various species across the domains of life. In 2011, Danan et al. sequenced RNA from Archaea. After digesting total RNA with RNase R, they were able to identify circular species, indicating that circRNAs are not specific to eukaryotes. [28] However, these archaeal circular species are probably not made via splicing, suggesting that other mechanisms to generate circular RNA likely exist.
CircRNAs were found to be largely conserved between human and sheep. By analyzing total RNA sequencing data from sheep's parietal lobe cortex and peripheral blood mononuclear cells it was shown that 63% of the detected circRNAs are homologous to known human circRNAs. [29]
In a closer evolutionary connection, a comparison of RNA from mouse testes vs. RNA from a human cell found 69 orthologous circRNAs. For example, both humans and mice encode the HIPK2 and HIPK3 genes, two paralogous kinases which produce a large amount of circRNA from one particular exon in both species. [3] Evolutionary conservation reinforces the likelihood of a relevant and significant role for RNA circularization.
microRNAs (miRNAs) are small (~21nt) non-coding RNAs that repress translation of messenger RNAs involved in a large, diverse set of biological processes. [30] They directly base-pair to target messenger RNAs (mRNAs), and can trigger cleavage of the mRNA depending on the degree of complementarity.
MicroRNAs are grouped in "seed families". Family members share nucleotides 2–7, known as the seed region. [31] Argonaute proteins are the "effector proteins" which help miRNAs carry out their job, while microRNA sponges are RNAs that "sponge up" miRNAs of a particular family, thereby serving as competitive inhibitors that suppress the ability of the miRNA to bind its mRNA targets, thanks to the presence of multiple binding sites that recognize a specific seed region. [31] Certain circular RNAs have many miRNA binding sites, which yielded a clue that they may function in sponging. Two recent papers confirmed this hypothesis by investigating a circular sponge called CDR1as/CiRS-7 in Detail, while other groups found no direct evidence for circular RNAs acting as miRNA sponges by analyzing the potential interaction of circular RNAs with the Argonaut (AGO) Protein using high-throughput sequencing of RNA isolated by cross-linking and immunoprecipitation (HITS-CLIP) data . [32]
CDR1as/CiRS-7 is encoded in the genome antisense to the human CDR1 (gene) locus (hence the name CDR1as), [18] and targets miR-7 (hence the name CiRS-7 – Circular RNA Sponge for miR-7). [25] It has over 60 miR-7 binding sites, far more than any known linear miRNA sponge. [18] [25]
AGO2 is miR-7's associated Argonaute protein (see above). Though CDR1as/CiRS-7 can be cleaved by miR-671 and its associated Argonaute protein, [25] it cannot be cleaved by miR-7 and AGO2. MicroRNA cleavage activity depends on complementarity beyond the 12th nucleotide position; none of CiRS-7's binding sites meet this requirement.
An experiment with zebrafish, which do not have the CDR1 locus in their genome, provides evidence for CiRS-7's sponge activity. During development, miR-7 is strongly expressed in the zebrafish brain. To silence miR-7 expression in zebrafish, Memczak and colleagues took advantage of a tool called morpholino, which can base pair and sequester target molecules. [33] Morpholino treatment had the same severe effect on midbrain development as ectopically expressing CiRS-7 in zebrafish brains using injected plasmids. This indicates a significant interaction between CiRS-7 and miR-7 in vivo. [18]
Another notable circular miRNA sponge is SRY. SRY, which is highly expressed in murine testes, functions as a miR-138 sponge. [25] [34] In the genome, SRY is flanked by long inverted repeats (IRs) over 15.5 kilobases (kb) in length. When one or both of the IRs are deleted, circularization does not occur. It was this finding that introduced the idea of inverted repeats enabling circularization. [35]
Because circular RNA sponges are characterized by high expression levels, stability, and a large number of miRNA binding sites, they are likely to be more effective sponges than those that are linear. [10]
Though recent attention has been focused on circRNA's "sponge" functions, scientists are considering several other functional possibilities as well. For example, some areas of the mouse adult hippocampus show expression of CiRS-7 but not miR-7, suggesting that CiRS-7 may have roles that are independent of interacting with the miRNA. [18]
Potential roles include the following:
Usually, intronic lariats (see above) are debranched and rapidly degraded. However, a debranching failure can lead to the formation of circular intronic long non-coding RNAs, also known as ciRNAs. [39] CiRNA formation, rather than being a random process, seems to depend on the presence of specific elements near the 5' splice site and the branchpoint site (see above).
CiRNAs are distinct from circRNAs in that they are prominently found in the nucleus rather than the cytoplasm. In addition, these molecules contain few (if any) miRNA binding sites. Instead of acting as sponges, ciRNAs seem to function in regulating the expression of their parent genes. For example, a relatively abundant ciRNA called ci-ankrd52 positively regulates Pol II transcription. Many ciRNAs remain at their "sites of synthesis" in the nucleus. However, ciRNA may have roles other than simply regulating their parent genes, as ciRNAs do localize to additional sites in the nucleus other than their "sites of synthesis". [39]
As with most topics in molecular biology, it is important to consider how circular RNA can be used as a tool to help mankind. Given its abundance, evolutionary conservation, and potential regulatory role, it is worthwhile to look into how circular RNA can be used to study pathogenesis and devise therapeutic interventions. For example:
Dube et al., [19] demonstrated for the first time that brain circular RNAs (circRNA) are part of the pathogenic events that lead to Alzheimer's disease, hypothesizing that specific circRNA would be differentially expressed in AD cases compared to controls and that those effects could be detected early in the disease. They optimized and validated a novel analyses pipeline for circular RNAs (circRNA). They performed a three-stage study design, using the Knight ADRC brain RNA-seq data as discovery (stage 1), using the data from Mount Sinai as replication (stage 2) and a meta-analysis (stage 3) to identify the most significant circRNA differentially expressed in Alzheimer disease. Using his pipeline, they found 3,547 circRNA that passed stringent QC in the Knight ADRC cohort that includes RNA-seq from 13 controls and 83 Alzheimer cases, and 3,924 circRNA passed stringent QC in the MSBB dataset. A meta-analysis of the discovery and replication results revealed a total of 148 circRNAs that were significantly correlated with CDR after FDR correction. In addition, 33 circRNA passed the stringent gene-based, Bonferroni multiple test correction of 5×10-6, including circHOMER1 (P =2.21×10−18) and circCDR1-AS (P = 2.83 × 10−8), among others. They also performed additional analyses to demonstrate that the expression of circRNA were independent of the lineal form as well as the cell proportion that can confound the brain RNA-seq analyses in Alzheimer disease studies. They performed co-expression analyses of all the circRNA together with the lineal forms and found that circRNA, including those that were differentially expressed in Alzheimer disease compared to controls co-expressed with known causal Alzheimer genes, such as APP and PSEN1, indicating that some circRNA are also part of the causal pathway. They also demonstrated that circRNA brain expression explained more about Alzheimer clinical manifestations that the number of APOε4 alleles, suggesting that could be used as a potential biomarker for Alzheimer disease. This is an important study for the field, as it is the first time that circRNA are quantified and validated (by real-time PCR) in human brain samples at genome-wide scale and in large and well-characterized cohorts. It also demonstrates that these RNA forms are likely to be implicated on complex traits including Alzheimer disease will help to understand the biological events that leads to disease.
Recent studies have shown that circRNA is associated with heart failure and heart disease. circFOXO3, Titin genes, circSLC8A1-1 and circAmotl1 play an important role in cardiac function through upregulation or inhibition relevant to heart disease. Overexpression of circFOXO3 and its downregulation binds to the transcription factors E2F1, HIF1α and protein ID1, FAK, causing cardiomyopathy induced by DOX. Titin gene derived circRNA induces cardiotoxicity in cardiomyocytes. circSLC8A1-1 overexpression causes sponging of the cardiac hypertrophy regulator miR-133 and leads to heart failure. Apart from circRNA-mediated cardiac disease, some circRNAs have played a role in cardiac damage repair. For example, circAmotl1 overexpression increases cardiomyocyte longevity through binding and translocation of AKT that regulates cardiac repair. Circular RNA CDR1 has an important role during infection in the myocardium. Cardiac dysfunction occurs post myocardial infection due to CircNfix downregulation. Since various types of circular RNA are related to heart disease, it can be used as a potential biomarker and therapeutic target. For example, postoperative atrial fibrillation has been observed in some patients after cardiac surgery where circRNA_025016 is used as a biomarker. Although the relevance of circular RNA overexpression and downregulation to heart disease has been found from various research studies, it is still unclear. Therefore, further research is needed to trace disease progression in different stages of cardiac dysfunction using circular RNA as a biomarker and can be used for gene delivery purposes in cells. [42]
Various studies have demonstrated that circular RNA acts as a prognostic agent and biomarker in kidney diseases including renal cell carcinoma, acute kidney injury, diabetic nephropathy, and lupus nephritis. Renal chronicity is associated with miR-150, which is negatively regulated by circHLA-C, in patients with lupus nephritis. There is also evidence that circular RNA is involved in acute kidney injury. In these circumstances circular RNA proves to be a novel biomarker and is also used for targeted therapy of kidney disease because its pseudogene can alter DNA composition. [43]
Evidence found that circular RNA plays a role in chronic liver disease and homeostasis regulation leading to liver fibrosis and autoimmune disease by an epigenetic mechanism. [44]
Circular RNA has both positive and negative functions in cancer. For example, ciRS-7 was found to be an oncogene in colorectal cancer tissue that regulates the disease. Overexpression of this ciRS-7 leads to deregulated gene expression leading to malignant phenotypic features. On the other hand, some CircRNAs show positive effects such as circ-ITCH which regulates lung cancer associated with oncogenic sponges miR7 and miR214 and overexpression of circ-ITCH inhibits cell proliferation in lung cancer. From different research studies it has been found that F-circM-9, F-circPR and F-circEA, FcircEA-2 are involved in the development of leukemia and cancer. In osteosarcoma cell circ-0016347 induces tumor and downregulation of caspase-1 target. Another circular RNA hsa_circRNA_002178 leads to breast cancer when it overexpresses and down-regulates COL1A1 protein function. In contrast, silencing of hsa_circRNA_002178 reduced IL-6 and TNF-α production, which inhibited tumor growth and inflammation. Some viruses such as Epstein Barr virus and human papillomavirus can encode circular RNAs such as circEBNA_W1_C1 (EBV) and circE7 (HPV) that play a role in oncogenesis in infected individuals. As circRNAs involved in cancer development or regulation process so that it has the potential to use as a biomarker in cancer surveillance and identification process. [45] Circular RNA has the advantage of stability, tissue specificity and it can be found in the blood, saliva, urine, cerebrospinal fluid, and human body fluid secretion that has abundance in exosomes are good to use as cancer biomarker agent. [44] [45]
Circular RNA has a function in autoimmune disease progression acting as a miRNA sponge which regulates DNA methylation, adaptive immune activation, and costimulatory molecule secretion. [44]
Circular RNA plays a significant role in immune regulation and induction of T cell responses. circRNA100783 is involved in immunity and senescence of CD8+ T cells. circRNA-003780 and circRNA-010056 also have major roles for macrophage differentiation and polarization. [45]
Circular RNA acts as a very active immune agent when it combines with soluble protein antigens and induces adaptive immunity that does not require a specific route of administration. Plasma circular RNA and combined circRNa have higher efficiency in diagnosis than tissue specific treatment and single circular RNA. Treatment with circular RNA activates the differentiation and maturation of dendritic cells which then secrete a large number of different cytokines and chemokines by expressing the genes for IL-1β, IL-6 and TNFa. After immunization with circular RNA that encodes the antigen sequence, CD8+ mediator T cell responses to the target antigen are enhanced. Circular RNA has the very advantageous properties of stability and long shelf life, so it is useful for use as biomarkers and plasmids to express genes of interest. [46]
Circular RNA plays an important role in myogenesis mechanisms such as circRBFOX2, circLMO7 acts as a negative regulator and CircSVIL acts as a positive regulator. [47] circRBFOX2 regulates miR-206 expression and induces myoblast proliferation in a negative effect on the myogenesis process. circLMO7 is involved in overexpression of HDAC4 and downregulates MEF2A expression by upregulating miR-378a-3p leading to myoblast differentiation. [47] CircSVIL a positive regulator induces miR-203 activity that is the inhibitor of myoblast production and differentiation. circFUT10 is involved in inhibition of myoblast proliferation but enhances differentiation through enhancement of SRF expression. [47] circSNX29 sponges miR-744 and circFGFR2 sponges miR-133a-5p and miR-29b-1-5p that promote myoblast differentiation. circSNX29 activates Wnt pathways by enhancing Wnt5a and CaMKIId expression which are involved in myogenesis regulation. [47]
Viroids are mostly plant pathogens, which consist of short stretches (a few hundred nucleobases) of highly complementary, circular, single-stranded, and non-coding RNAs without a protein coat. Compared with other infectious plant pathogens, viroids are extremely small in size, ranging from 246 to 467 nucleobases; they thus consist of fewer than 10,000 atoms. In comparison, the genome of the smallest known viruses capable of causing an infection by themselves are around 2,000 nucleobases long. [48]
An exon is any part of a gene that will form a part of the final mature RNA produced by that gene after introns have been removed by RNA splicing. The term exon refers to both the DNA sequence within a gene and to the corresponding sequence in RNA transcripts. In RNA splicing, introns are removed and exons are covalently joined to one another as part of generating the mature RNA. Just as the entire set of genes for a species constitutes the genome, the entire set of exons constitutes the exome.
An intron is any nucleotide sequence within a gene that is not expressed or operative in the final RNA product. The word intron is derived from the term intragenic region, i.e., a region inside a gene. The term intron refers to both the DNA sequence within a gene and the corresponding RNA sequence in RNA transcripts. The non-intron sequences that become joined by this RNA processing to form the mature RNA are called exons.
RNA splicing is a process in molecular biology where a newly-made precursor messenger RNA (pre-mRNA) transcript is transformed into a mature messenger RNA (mRNA). It works by removing all the introns and splicing back together exons. For nuclear-encoded genes, splicing occurs in the nucleus either during or immediately after transcription. For those eukaryotic genes that contain introns, splicing is usually needed to create an mRNA molecule that can be translated into protein. For many eukaryotic introns, splicing occurs in a series of reactions which are catalyzed by the spliceosome, a complex of small nuclear ribonucleoproteins (snRNPs). There exist self-splicing introns, that is, ribozymes that can catalyze their own excision from their parent RNA molecule. The process of transcription, splicing and translation is called gene expression, the central dogma of molecular biology.
Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, proteins or non-coding RNA, and ultimately affect a phenotype. These products are often proteins, but in non-protein-coding genes such as transfer RNA (tRNA) and small nuclear RNA (snRNA), the product is a functional non-coding RNA. The process of gene expression is used by all known life—eukaryotes, prokaryotes, and utilized by viruses—to generate the macromolecular machinery for life.
A non-coding RNA (ncRNA) is a functional RNA molecule that is not translated into a protein. The DNA sequence from which a functional non-coding RNA is transcribed is often called an RNA gene. Abundant and functionally important types of non-coding RNAs include transfer RNAs (tRNAs) and ribosomal RNAs (rRNAs), as well as small RNAs such as microRNAs, siRNAs, piRNAs, snoRNAs, snRNAs, exRNAs, scaRNAs and the long ncRNAs such as Xist and HOTAIR.
Alternative splicing, or alternative RNA splicing, or differential splicing, is an alternative splicing process during gene expression that allows a single gene to produce different splice variants. For example, some exons of a gene may be included within or excluded from the final RNA product of the gene. This means the exons are joined in different combinations, leading to different splice variants. In the case of protein-coding genes, the proteins translated from these splice variants may contain differences in their amino acid sequence and in their biological functions.
Trans-splicing is a special form of RNA processing where exons from two different primary RNA transcripts are joined end to end and ligated. It is usually found in eukaryotes and mediated by the spliceosome, although some bacteria and archaea also have "half-genes" for tRNAs.
A primary transcript is the single-stranded ribonucleic acid (RNA) product synthesized by transcription of DNA, and processed to yield various mature RNA products such as mRNAs, tRNAs, and rRNAs. The primary transcripts designated to be mRNAs are modified in preparation for translation. For example, a precursor mRNA (pre-mRNA) is a type of primary transcript that becomes a messenger RNA (mRNA) after processing.
RNA-binding proteins are proteins that bind to the double or single stranded RNA in cells and participate in forming ribonucleoprotein complexes. RBPs contain various structural motifs, such as RNA recognition motif (RRM), dsRNA binding domain, zinc finger and others. They are cytoplasmic and nuclear proteins. However, since most mature RNA is exported from the nucleus relatively quickly, most RBPs in the nucleus exist as complexes of protein and pre-mRNA called heterogeneous ribonucleoprotein particles (hnRNPs). RBPs have crucial roles in various cellular processes such as: cellular function, transport and localization. They especially play a major role in post-transcriptional control of RNAs, such as: splicing, polyadenylation, mRNA stabilization, mRNA localization and translation. Eukaryotic cells express diverse RBPs with unique RNA-binding activity and protein–protein interaction. According to the Eukaryotic RBP Database (EuRBPDB), there are 2961 genes encoding RBPs in humans. During evolution, the diversity of RBPs greatly increased with the increase in the number of introns. Diversity enabled eukaryotic cells to utilize RNA exons in various arrangements, giving rise to a unique RNP (ribonucleoprotein) for each RNA. Although RBPs have a crucial role in post-transcriptional regulation in gene expression, relatively few RBPs have been studied systematically. It has now become clear that RNA–RBP interactions play important roles in many biological processes among organisms.
Nonsense-mediated mRNA decay (NMD) is a surveillance pathway that exists in all eukaryotes. Its main function is to reduce errors in gene expression by eliminating mRNA transcripts that contain premature stop codons. Translation of these aberrant mRNAs could, in some cases, lead to deleterious gain-of-function or dominant-negative activity of the resulting proteins.
A splice site mutation is a genetic mutation that inserts, deletes or changes a number of nucleotides in the specific site at which splicing takes place during the processing of precursor messenger RNA into mature messenger RNA. Splice site consensus sequences that drive exon recognition are located at the very termini of introns. The deletion of the splicing site results in one or more introns remaining in mature mRNA and may lead to the production of abnormal proteins. When a splice site mutation occurs, the mRNA transcript possesses information from these introns that normally should not be included. Introns are supposed to be removed, while the exons are expressed.
An exonic splicing silencer (ESS) is a short region of an exon and is a cis-regulatory element. A set of 103 hexanucleotides known as FAS-hex3 has been shown to be abundant in ESS regions. ESSs inhibit or silence splicing of the pre-mRNA and contribute to constitutive and alternate splicing. To elicit the silencing effect, ESSs recruit proteins that will negatively affect the core splicing machinery.
U1 spliceosomal RNA is the small nuclear RNA (snRNA) component of U1 snRNP, an RNA-protein complex that combines with other snRNPs, unmodified pre-mRNA, and various other proteins to assemble a spliceosome, a large RNA-protein molecular complex upon which splicing of pre-mRNA occurs. Splicing, or the removal of introns, is a major aspect of post-transcriptional modification, and takes place only in the nucleus of eukaryotes.
TIA1 or Tia1 cytotoxic granule-associated rna binding protein is a 3'UTR mRNA binding protein that can bind the 5'TOP sequence of 5'TOP mRNAs. It is associated with programmed cell death (apoptosis) and regulates alternative splicing of the gene encoding the Fas receptor, an apoptosis-promoting protein. Under stress conditions, TIA1 localizes to cellular RNA-protein conglomerations called stress granules. It is encoded by the TIA1 gene.
RNA binding motif protein 9 (RBM9), also known as Rbfox2, is a protein which in humans is encoded by the RBM9 gene.
Fox-1 homolog A, also known as ataxin 2-binding protein 1 (A2BP1) or hexaribonucleotide-binding protein 1 (HRNBP1) or RNA binding protein, fox-1 homolog (Rbfox1), is a protein that in humans is encoded by the RBFOX1 gene.
Long non-coding RNAs are a type of RNA, generally defined as transcripts more than 200 nucleotides that are not translated into protein. This arbitrary limit distinguishes long ncRNAs from small non-coding RNAs, such as microRNAs (miRNAs), small interfering RNAs (siRNAs), Piwi-interacting RNAs (piRNAs), small nucleolar RNAs (snoRNAs), and other short RNAs. Given that some lncRNAs have been reported to have the potential to encode small proteins or micro-peptides, the latest definition of lncRNA is a class of transcripts of over 200 nucleotides that have no or limited coding capacity. However, John S. Mattick and colleagues suggested to change definition of long non-coding RNAs to transcripts more than 500 nt, which are mostly generated by Pol II. That means that question of lncRNA exact definition is still under discussion in the field. Long intervening/intergenic noncoding RNAs (lincRNAs) are sequences of transcripts that do not overlap protein-coding genes.
Post-transcriptional regulation is the control of gene expression at the RNA level. It occurs once the RNA polymerase has been attached to the gene's promoter and is synthesizing the nucleotide sequence. Therefore, as the name indicates, it occurs between the transcription phase and the translation phase of gene expression. These controls are critical for the regulation of many genes across human tissues. It also plays a big role in cell physiology, being implicated in pathologies such as cancer and neurodegenerative diseases.
A minigene is a minimal gene fragment that includes an exon and the control regions necessary for the gene to express itself in the same way as a wild type gene fragment. This is a minigene in its most basic sense. More complex minigenes can be constructed containing multiple exons and intron(s). Minigenes provide a valuable tool for researchers evaluating splicing patterns both in vivo and in vitro biochemically assessed experiments. Specifically, minigenes are used as splice reporter vectors and act as a probe to determine which factors are important in splicing outcomes. They can be constructed to test the way both cis-regulatory elements and trans-regulatory elements affect gene expression.
Exitrons are produced through alternative splicing and have characteristics of both introns and exons, but are described as retained introns. Even though they are considered introns, which are typically cut out of pre mRNA sequences, there are significant problems that arise when exitrons are spliced out of these strands, with the most obvious result being altered protein structures and functions. They were first discovered in plants, but have recently been found in metazoan species as well.