RNA splicing is a process in molecular biology where a newly-made precursor messenger RNA (pre-mRNA) transcript is transformed into a mature messenger RNA (mRNA). It works by removing all the introns (non-coding regions of RNA) and splicing back together exons (coding regions). For nuclear-encoded genes, splicing occurs in the nucleus either during or immediately after transcription. For those eukaryotic genes that contain introns, splicing is usually needed to create an mRNA molecule that can be translated into protein. For many eukaryotic introns, splicing occurs in a series of reactions which are catalyzed by the spliceosome, a complex of small nuclear ribonucleoproteins (snRNPs). There exist self-splicing introns, that is, ribozymes that can catalyze their own excision from their parent RNA molecule. The process of transcription, splicing and translation is called gene expression, the central dogma of molecular biology.
Several methods of RNA splicing occur in nature; the type of splicing depends on the structure of the spliced intron and the catalysts required for splicing to occur.
The word intron is derived from the terms intragenic region, [1] and intracistron, [2] that is, a segment of DNA that is located between two exons of a gene. The term intron refers to both the DNA sequence within a gene and the corresponding sequence in the unprocessed RNA transcript. As part of the RNA processing pathway, introns are removed by RNA splicing either shortly after or concurrent with transcription. [3] Introns are found in the genes of most organisms and many viruses. They can be located in a wide range of genes, including those that generate proteins, ribosomal RNA (rRNA), and transfer RNA (tRNA). [4]
Within introns, a donor site (5' end of the intron), a branch site (near the 3' end of the intron) and an acceptor site (3' end of the intron) are required for splicing. The splice donor site includes an almost invariant sequence GU at the 5' end of the intron, within a larger, less highly conserved region. The splice acceptor site at the 3' end of the intron terminates the intron with an almost invariant AG sequence. Upstream (5'-ward) from the AG there is a region high in pyrimidines (C and U), or polypyrimidine tract. Further upstream from the polypyrimidine tract is the branchpoint, which includes an adenine nucleotide involved in lariat formation. [5] [6] The consensus sequence for an intron (in IUPAC nucleic acid notation) is: G-G-[cut]-G-U-R-A-G-U (donor site) ... intron sequence ... Y-U-R-A-C (branch sequence 20-50 nucleotides upstream of acceptor site) ... Y-rich-N-C-A-G-[cut]-G (acceptor site). [7] However, it is noted that the specific sequence of intronic splicing elements and the number of nucleotides between the branchpoint and the nearest 3' acceptor site affect splice site selection. [8] [9] Also, point mutations in the underlying DNA or errors during transcription can activate a cryptic splice site in part of the transcript that usually is not spliced. This results in a mature messenger RNA with a missing section of an exon. In this way, a point mutation, which might otherwise affect only a single amino acid, can manifest as a deletion or truncation in the final protein.[ citation needed ]
Splicing is catalyzed by the spliceosome, a large RNA-protein complex composed of five small nuclear ribonucleoproteins (snRNPs). Assembly and activity of the spliceosome occurs during transcription of the pre-mRNA. The RNA components of snRNPs interact with the intron and are involved in catalysis. Two types of spliceosomes have been identified (major and minor) which contain different snRNPs.
In most cases, splicing removes introns as single units from precursor mRNA transcripts. However, in some cases, especially in mRNAs with very long introns, splicing happens in steps, with part of an intron removed and then the remaining intron is spliced out in a following step. This has been found first in the Ultrabithorax (Ubx) gene of the fruit fly, Drosophila melanogaster , and a few other Drosophila genes, but cases in humans have been reported as well. [17] [18]
Trans-splicing is a form of splicing that removes introns or outrons, and joins two exons that are not within the same RNA transcript. [19] Trans-splicing can occur between two different endogenous pre-mRNAs or between an endogenous and an exogenous (such as from viruses) or artificial RNAs. [20]
Self-splicing occurs for rare introns that form a ribozyme, performing the functions of the spliceosome by RNA alone. There are three kinds of self-splicing introns, Group I , Group II and Group III . Group I and II introns perform splicing similar to the spliceosome without requiring any protein. This similarity suggests that Group I and II introns may be evolutionarily related to the spliceosome. Self-splicing may also be very ancient, and may have existed in an RNA world present before protein.[ citation needed ]
Two transesterifications characterize the mechanism in which group I introns are spliced:[ citation needed ]
The mechanism in which group II introns are spliced (two transesterification reaction like group I introns) is as follows:
tRNA (also tRNA-like) splicing is another rare form of splicing that usually occurs in tRNA. The splicing reaction involves a different biochemistry than the spliceosomal and self-splicing pathways.
In the yeast Saccharomyces cerevisiae , a yeast tRNA splicing endonuclease heterotetramer, composed of TSEN54, TSEN2, TSEN34, and TSEN15, cleaves pre-tRNA at two sites in the acceptor loop to form a 5'-half tRNA, terminating at a 2',3'-cyclic phosphodiester group, and a 3'-half tRNA, terminating at a 5'-hydroxyl group, along with a discarded intron. [21] Yeast tRNA kinase then phosphorylates the 5'-hydroxyl group using adenosine triphosphate. Yeast tRNA cyclic phosphodiesterase cleaves the cyclic phosphodiester group to form a 2'-phosphorylated 3' end. Yeast tRNA ligase adds an adenosine monophosphate group to the 5' end of the 3'-half and joins the two halves together. [22] NAD-dependent 2'-phosphotransferase then removes the 2'-phosphate group. [23] [24]
Splicing occurs in all the kingdoms or domains of life, however, the extent and types of splicing can be very different between the major divisions. Eukaryotes splice many protein-coding messenger RNAs and some non-coding RNAs. Prokaryotes, on the other hand, splice rarely and mostly non-coding RNAs. Another important difference between these two groups of organisms is that prokaryotes completely lack the spliceosomal pathway.
Because spliceosomal introns are not conserved in all species, there is debate concerning when spliceosomal splicing evolved. Two models have been proposed: the intron late and intron early models (see intron evolution).
Eukaryotes | Prokaryotes | |
---|---|---|
Spliceosomal | + | − |
Self-splicing | + | + |
tRNA | + | + |
Spliceosomal splicing and self-splicing involve a two-step biochemical process. Both steps involve transesterification reactions that occur between RNA nucleotides. tRNA splicing, however, is an exception and does not occur by transesterification. [25]
Spliceosomal and self-splicing transesterification reactions occur via two sequential transesterification reactions. First, the 2'OH of a specific branchpoint nucleotide within the intron, defined during spliceosome assembly, performs a nucleophilic attack on the first nucleotide of the intron at the 5' splice site, forming the lariat intermediate. Second, the 3'OH of the released 5' exon then performs a nucleophilic attack at the first nucleotide following the last nucleotide of the intron at the 3' splice site, thus joining the exons and releasing the intron lariat. [26]
In many cases, the splicing process can create a range of unique proteins by varying the exon composition of the same mRNA. This phenomenon is then called alternative splicing. Alternative splicing can occur in many ways. Exons can be extended or skipped, or introns can be retained. It is estimated that 95% of transcripts from multiexon genes undergo alternative splicing, some instances of which occur in a tissue-specific manner and/or under specific cellular conditions. [27] Development of high throughput mRNA sequencing technology can help quantify the expression levels of alternatively spliced isoforms. Differential expression levels across tissues and cell lineages allowed computational approaches to be developed to predict the functions of these isoforms. [28] [29] Given this complexity, alternative splicing of pre-mRNA transcripts is regulated by a system of trans-acting proteins (activators and repressors) that bind to cis-acting sites or "elements" (enhancers and silencers) on the pre-mRNA transcript itself. These proteins and their respective binding elements promote or reduce the usage of a particular splice site. The binding specificity comes from the sequence and structure of the cis-elements, e.g. in HIV-1 there are many donor and acceptor splice sites. Among the various splice sites, ssA7, which is 3' acceptor site, folds into three stem loop structures, i.e. Intronic splicing silencer (ISS), Exonic splicing enhancer (ESE), and Exonic splicing silencer (ESSE3). Solution structure of Intronic splicing silencer and its interaction to host protein hnRNPA1 give insight into specific recognition. [30] However, adding to the complexity of alternative splicing, it is noted that the effects of regulatory factors are many times position-dependent. For example, a splicing factor that serves as a splicing activator when bound to an intronic enhancer element may serve as a repressor when bound to its splicing element in the context of an exon, and vice versa. [31] In addition to the position-dependent effects of enhancer and silencer elements, the location of the branchpoint (i.e., distance upstream of the nearest 3' acceptor site) also affects splicing. [8] The secondary structure of the pre-mRNA transcript also plays a role in regulating splicing, such as by bringing together splicing elements or by masking a sequence that would otherwise serve as a binding element for a splicing factor. [32] [33]
The location of pre-mRNA splicing is throughout the nucleus, and once mature mRNA is generated, it is transported to the cytoplasm for translation. In both plant and animal cells, nuclear speckles are regions with high concentrations of splicing factors. These speckles were once thought to be mere storage centers for splicing factors. However, it is now understood that nuclear speckles help concentrate splicing factors near genes that are physically located close to them. Genes located farther from speckles can still be transcribed and spliced, but their splicing is less efficient compared to those closer to speckles. Cells can vary their genomic positions of genes relative to nuclear speckles as a mechanism to modulate the expression of genes via splicing. [34]
The process of splicing is linked with HIV integration, as HIV-1 targets highly spliced genes. [35]
DNA damage affects splicing factors by altering their post-translational modification, localization, expression and activity. [36] Furthermore, DNA damage often disrupts splicing by interfering with its coupling to transcription. DNA damage also has an impact on the splicing and alternative splicing of genes intimately associated with DNA repair. [36] For instance, DNA damages modulate the alternative splicing of the DNA repair genes Brca1 and Ercc1 .
Splicing events can be experimentally altered [37] [38] by binding steric-blocking antisense oligos, such as Morpholinos or Peptide nucleic acids to snRNP binding sites, to the branchpoint nucleotide that closes the lariat, [39] or to splice-regulatory element binding sites. [40]
The use of antisense oligonucleotides to modulate splicing has shown great promise as a therapeutic strategy for a variety of genetic diseases caused by splicing defects. [41]
Recent studies have shown that RNA splicing can be regulated by a variety of epigenetic modifications, including DNA methylation and histone modifications. [42]
It has been suggested that one third of all disease-causing mutations impact on splicing. [31] Common errors include:
Although many splicing errors are safeguarded by a cellular quality control mechanism termed nonsense-mediated mRNA decay (NMD), [43] a number of splicing-related diseases also exist, as suggested above. [44]
Allelic differences in mRNA splicing are likely to be a common and important source of phenotypic diversity at the molecular level, in addition to their contribution to genetic disease susceptibility. Indeed, genome-wide studies in humans have identified a range of genes that are subject to allele-specific splicing.
In plants, variation for flooding stress tolerance correlated with stress-induced alternative splicing of transcripts associated with gluconeogenesis and other processes. [45]
In addition to RNA, proteins can undergo splicing. Although the biomolecular mechanisms are different, the principle is the same: parts of the protein, called inteins instead of introns, are removed. The remaining parts, called exteins instead of exons, are fused together. Protein splicing has been observed in a wide range of organisms, including bacteria, archaea, plants, yeast and humans. [46]
The existence of backsplicing was first suggested in 2012. [47] This backsplicing explains the genesis of circular RNAs resulting from the exact junction between the 3' boundary of an exon with the 5' boundary of an exon located upstream. [48] In these exonic circular RNAs, the junction is a classic 3'-5'link.
The exclusion of intronic sequences during splicing can also leave traces, in the form of circular RNAs. [49] In some cases, the intronic lariat is not destroyed and the circular part remains as a lariat-derived circRNA [50] .In these lariat-derived circular RNAs, the junction is a 2'-5'link.
An exon is any part of a gene that will form a part of the final mature RNA produced by that gene after introns have been removed by RNA splicing. The term exon refers to both the DNA sequence within a gene and to the corresponding sequence in RNA transcripts. In RNA splicing, introns are removed and exons are covalently joined to one another as part of generating the mature RNA. Just as the entire set of genes for a species constitutes the genome, the entire set of exons constitutes the exome.
An intron is any nucleotide sequence within a gene that is not expressed or operative in the final RNA product. The word intron is derived from the term intragenic region, i.e., a region inside a gene. The term intron refers to both the DNA sequence within a gene and the corresponding RNA sequence in RNA transcripts. The non-intron sequences that become joined by this RNA processing to form the mature RNA are called exons.
Alternative splicing, or alternative RNA splicing, or differential splicing, is an alternative splicing process during gene expression that allows a single gene to produce different splice variants. For example, some exons of a gene may be included within or excluded from the final RNA product of the gene. This means the exons are joined in different combinations, leading to different splice variants. In the case of protein-coding genes, the proteins translated from these splice variants may contain differences in their amino acid sequence and in their biological functions.
A spliceosome is a large ribonucleoprotein (RNP) complex found primarily within the nucleus of eukaryotic cells. The spliceosome is assembled from small nuclear RNAs (snRNA) and numerous proteins. Small nuclear RNA (snRNA) molecules bind to specific proteins to form a small nuclear ribonucleoprotein complex, which in turn combines with other snRNPs to form a large ribonucleoprotein complex called a spliceosome. The spliceosome removes introns from a transcribed pre-mRNA, a type of primary transcript. This process is generally referred to as splicing. An analogy is a film editor, who selectively cuts out irrelevant or incorrect material from the initial film and sends the cleaned-up version to the director for the final cut.
SR proteins are a conserved family of proteins involved in RNA splicing. SR proteins are named because they contain a protein domain with long repeats of serine and arginine amino acid residues, whose standard abbreviations are "S" and "R" respectively. SR proteins are ~200-600 amino acids in length and composed of two domains, the RNA recognition motif (RRM) region and the RS domain. SR proteins are more commonly found in the nucleus than the cytoplasm, but several SR proteins are known to shuttle between the nucleus and the cytoplasm.
snRNPs, or small nuclear ribonucleoproteins, are RNA-protein complexes that combine with unmodified pre-mRNA and various other proteins to form a spliceosome, a large RNA-protein molecular complex upon which splicing of pre-mRNA occurs. The action of snRNPs is essential to the removal of introns from pre-mRNA, a critical aspect of post-transcriptional modification of RNA, occurring only in the nucleus of eukaryotic cells. Additionally, U7 snRNP is not involved in splicing at all, as U7 snRNP is responsible for processing the 3′ stem-loop of histone pre-mRNA.
Small nuclear RNA (snRNA) is a class of small RNA molecules that are found within the splicing speckles and Cajal bodies of the cell nucleus in eukaryotic cells. The length of an average snRNA is approximately 150 nucleotides. They are transcribed by either RNA polymerase II or RNA polymerase III. Their primary function is in the processing of pre-messenger RNA (hnRNA) in the nucleus. They have also been shown to aid in the regulation of transcription factors or RNA polymerase II, and maintaining the telomeres.
The minor spliceosome is a ribonucleoprotein complex that catalyses the removal (splicing) of an atypical class of spliceosomal introns (U12-type) from messenger RNAs in some clades of eukaryotes. This process is called noncanonical splicing, as opposed to U2-dependent canonical splicing. U12-type introns represent less than 1% of all introns in human cells. However they are found in genes performing essential cellular functions.
Group II introns are a large class of self-catalytic ribozymes and mobile genetic elements found within the genes of all three domains of life. Ribozyme activity can occur under high-salt conditions in vitro. However, assistance from proteins is required for in vivo splicing. In contrast to group I introns, intron excision occurs in the absence of GTP and involves the formation of a lariat, with an A-residue branchpoint strongly resembling that found in lariats formed during splicing of nuclear pre-mRNA. It is hypothesized that pre-mRNA splicing may have evolved from group II introns, due to the similar catalytic mechanism as well as the structural similarity of the Group II Domain V substructure to the U6/U2 extended snRNA. Finally, their ability to site-specifically insert into DNA sites has been exploited as a tool for biotechnology. For example, group II introns can be modified to make site-specific genome insertions and deliver cargo DNA such as reporter genes or lox sites
In molecular biology, an exonic splicing enhancer (ESE) is a DNA sequence motif consisting of 6 bases within an exon that directs, or enhances, accurate splicing of heterogeneous nuclear RNA (hnRNA) or pre-mRNA into messenger RNA (mRNA).
An exonic splicing silencer (ESS) is a short region of an exon and is a cis-regulatory element. A set of 103 hexanucleotides known as FAS-hex3 has been shown to be abundant in ESS regions. ESSs inhibit or silence splicing of the pre-mRNA and contribute to constitutive and alternate splicing. To elicit the silencing effect, ESSs recruit proteins that will negatively affect the core splicing machinery.
The U11 snRNA is an important non-coding RNA in the minor spliceosome protein complex, which activates the alternative splicing mechanism. The minor spliceosome is associated with similar protein components as the major spliceosome. It uses U11 snRNA to recognize the 5' splice site while U12 snRNA binds to the branchpoint to recognize the 3' splice site.
U1 spliceosomal RNA is the small nuclear RNA (snRNA) component of U1 snRNP, an RNA-protein complex that combines with other snRNPs, unmodified pre-mRNA, and various other proteins to assemble a spliceosome, a large RNA-protein molecular complex upon which splicing of pre-mRNA occurs. Splicing, or the removal of introns, is a major aspect of post-transcriptional modification, and takes place only in the nucleus of eukaryotes.
U2 spliceosomal snRNAs are a species of small nuclear RNA (snRNA) molecules found in the major spliceosomal (Sm) machinery of virtually all eukaryotic organisms. In vivo, U2 snRNA along with its associated polypeptides assemble to produce the U2 small nuclear ribonucleoprotein (snRNP), an essential component of the major spliceosomal complex. The major spliceosomal-splicing pathway is occasionally referred to as U2 dependent, based on a class of Sm intron—found in mRNA primary transcripts—that are recognized exclusively by the U2 snRNP during early stages of spliceosomal assembly. In addition to U2 dependent intron recognition, U2 snRNA has been theorized to serve a catalytic role in the chemistry of pre-RNA splicing as well. Similar to ribosomal RNAs (rRNAs), Sm snRNAs must mediate both RNA:RNA and RNA:protein contacts and hence have evolved specialized, highly conserved, primary and secondary structural elements to facilitate these types of interactions.
U5 snRNA is a small nuclear RNA (snRNA) that participates in RNA splicing as a component of the spliceosome. It forms the U5 snRNP by associating with several proteins including Prp8 - the largest and most conserved protein in the spliceosome, Brr2 - a helicase required for spliceosome activation, Snu114, and the 7 Sm proteins. U5 snRNA forms a coaxially-stacked series of helices that project into the active site of the spliceosome. Loop 1, which caps this series of helices, forms 4-5 base pairs with the 5'-exon during the two chemical reactions of splicing. This interaction appears to be especially important during step two of splicing, exon ligation.
U6 snRNA is the non-coding small nuclear RNA (snRNA) component of U6 snRNP, an RNA-protein complex that combines with other snRNPs, unmodified pre-mRNA, and various other proteins to assemble a spliceosome, a large RNA-protein molecular complex that catalyzes the excision of introns from pre-mRNA. Splicing, or the removal of introns, is a major aspect of post-transcriptional modification and takes place only in the nucleus of eukaryotes.
Splicing factor U2AF 65 kDa subunit is a protein that in humans is encoded by the U2AF2 gene.
SmY ribonucleic acids are a family of small nuclear RNAs found in some species of nematode worms. They are thought to be involved in mRNA trans-splicing.
Prp8 refers to both the Prp8 protein and Prp8 gene. Prp8's name originates from its involvement in pre-mRNA processing. The Prp8 protein is a large, highly conserved, and unique protein that resides in the catalytic core of the spliceosome and has been found to have a central role in molecular rearrangements that occur there. Prp8 protein is a major central component of the catalytic core in the spliceosome, and the spliceosome is responsible for splicing of precursor mRNA that contains introns and exons. Unexpressed introns are removed by the spliceosome complex in order to create a more concise mRNA transcript. Splicing is just one of many different post-transcriptional modifications that mRNA must undergo before translation. Prp8 has also been hypothesized to be a cofactor in RNA catalysis.
The split gene theory is a theory of the origin of introns, long non-coding sequences in eukaryotic genes between the exons. The theory holds that the randomness of primordial DNA sequences would only permit small (< 600bp) open reading frames (ORFs), and that important intron structures and regulatory sequences are derived from stop codons. In this introns-first framework, the spliceosomal machinery and the nucleus evolved due to the necessity to join these ORFs into larger proteins, and that intronless bacterial genes are less ancestral than the split eukaryotic genes. The theory originated with Periannan Senapathy.
{{cite journal}}
: CS1 maint: DOI inactive as of November 2024 (link)