Trans-splicing is a special form of RNA processing where exons from two different primary RNA transcripts are joined end to end and ligated. It is usually found in eukaryotes and mediated by the spliceosome, although some bacteria and archaea also have "half-genes" for tRNAs. [1]
Whereas "normal" (cis-)splicing processes a single molecule, trans-splicing generates a single RNA transcript from multiple separate pre-mRNAs. This phenomenon can be exploited for molecular therapy to address mutated gene products. [2] Genic trans-splicing allows variability in RNA diversity and increases proteome complexity. [3]
While some fusion transcripts occur via trans-splicing in normal human cells, [1] trans-splicing can also be the mechanism behind certain oncogenic fusion transcripts. [4] [5]
Spliced leader (SL) trans-splicing is used by certain microorganisms, notably protists of the Kinetoplastea class to express genes. In these organisms, a capped splice leader RNA is transcribed, and simultaneously, genes are transcribed in long polycistrons. [6] The capped splice leader is trans-spliced onto each gene to generate monocistronic capped and polyadenylated transcripts. [7] These early-diverging eukaryotes use few introns, and the spliceosome they possess show some unusual variations in their structure assembly. [7] [8] They also possess multiple eIF4E isoforms with specialized roles in capping. [9] The spliced leader sequence is highly conserved in lower species that undergo trans-splicing. Such as trypanosomes. While the spliced leader's role is not known in the cell, it's thought to be involved in translation initiation. In C.elegans, the splicing of the sequence leader occurs close to the initiation codon. Some scientists also suggest the sequence is required for cell viability. In Ascaris, the spliced leader sequence is needed to the RNA gene can be transcribed. The Spliced leader sequence may be responsible for initiation, mRNA localization, and translation initiation or inhibition. [10]
Some other eukaryotes, notably among dinoflagellates, sponges, nematodes, cnidarians, ctenophores, flatworms, crustaceans, chaetognaths, rotifers, and tunicates also use more or less frequently the SL trans-splicing. [1] [11] In the tunicate Ciona intestinalis , the extent of SL trans-splicing is better described by a quantitative view recognising frequently and infrequently trans-spliced genes rather than a binary and conventional categorisation of trans-spliced versus non-trans-spliced genes. [12]
The SL trans-splicing functions in the resolution of polycistronic transcripts of operons into individual 5'-capped mRNAs. This processing is achieved when the outrons are trans-spliced to unpaired, downstream acceptor sites adjacent to cistron open reading frames. [13] [14]
Trans-splicing is characterized by the joining of two separate exons transcribed RNAs. The signal for this splicing is the outron at the 5’ end of the mRNA, in the absence of a functional 5’ splice site upstream. When the 5’ outron in spliced, the 5’ splice site of the spliced leader RNA is branched to the outron and forms an intermediate. [10] This step results in a free spliced leader exon. The exon is then spliced to the first exon on the pre-mRNA and the intermediate is released. Trans-splicing differs from cis-splicing in that there is no 5' splice site on the pre-mRNA. Instead the 5' splice site is provided by the SL sequence. [14]
As a result of the sense strand undergoing transcription, a pre-mRNA is formed that complements the sense strand. The anti-sense strand is also transcribed resulting in a complementary pre-mRNA strand. The exons from the two transcripts are spliced together to form a chimeric mRNA. [15]
Alternative trans-splicing includes intragenic trans-splicing and intergenic trans-splicing. Intragenic trans-splicing involves duplication of exons in the pre-mRNA. Intergenic trans-splicing is characterized by the splicing together of exons formed form the pre-mRNA of two different genes, resulting in trans-genic mRNA. [16]
An exon is any part of a gene that will form a part of the final mature RNA produced by that gene after introns have been removed by RNA splicing. The term exon refers to both the DNA sequence within a gene and to the corresponding sequence in RNA transcripts. In RNA splicing, introns are removed and exons are covalently joined to one another as part of generating the mature RNA. Just as the entire set of genes for a species constitutes the genome, the entire set of exons constitutes the exome.
An intron is any nucleotide sequence within a gene that is not expressed or operative in the final RNA product. The word intron is derived from the term intragenic region, i.e., a region inside a gene. The term intron refers to both the DNA sequence within a gene and the corresponding RNA sequence in RNA transcripts. The non-intron sequences that become joined by this RNA processing to form the mature RNA are called exons.
RNA splicing is a process in molecular biology where a newly-made precursor messenger RNA (pre-mRNA) transcript is transformed into a mature messenger RNA (mRNA). It works by removing all the introns and splicing back together exons. For nuclear-encoded genes, splicing occurs in the nucleus either during or immediately after transcription. For those eukaryotic genes that contain introns, splicing is usually needed to create an mRNA molecule that can be translated into protein. For many eukaryotic introns, splicing occurs in a series of reactions which are catalyzed by the spliceosome, a complex of small nuclear ribonucleoproteins (snRNPs). There exist self-splicing introns, that is, ribozymes that can catalyze their own excision from their parent RNA molecule. The process of transcription, splicing and translation is called gene expression, the central dogma of molecular biology.
Transcription is the process of copying a segment of DNA into RNA. The segments of DNA transcribed into RNA molecules that can encode proteins produce messenger RNA (mRNA). Other segments of DNA are transcribed into RNA molecules called non-coding RNAs (ncRNAs).
The coding region of a gene, also known as the coding sequence (CDS), is the portion of a gene's DNA or RNA that codes for a protein. Studying the length, composition, regulation, splicing, structures, and functions of coding regions compared to non-coding regions over different species and time periods can provide a significant amount of important information regarding gene organization and evolution of prokaryotes and eukaryotes. This can further assist in mapping the human genome and developing gene therapy.
Alternative splicing, or alternative RNA splicing, or differential splicing, is an alternative splicing process during gene expression that allows a single gene to code for multiple proteins. In this process, particular exons of a gene may be included within or excluded from the final, processed messenger RNA (mRNA) produced from that gene. This means the exons are joined in different combinations, leading to different (alternative) mRNA strands. Consequently, the proteins translated from alternatively spliced mRNAs usually contain differences in their amino acid sequence and, often, in their biological functions.
A spliceosome is a large ribonucleoprotein (RNP) complex found primarily within the nucleus of eukaryotic cells. The spliceosome is assembled from small nuclear RNAs (snRNA) and numerous proteins. Small nuclear RNA (snRNA) molecules bind to specific proteins to form a small nuclear ribonucleoprotein complex, which in turn combines with other snRNPs to form a large ribonucleoprotein complex called a spliceosome. The spliceosome removes introns from a transcribed pre-mRNA, a type of primary transcript. This process is generally referred to as splicing. An analogy is a film editor, who selectively cuts out irrelevant or incorrect material from the initial film and sends the cleaned-up version to the director for the final cut.
SR proteins are a conserved family of proteins involved in RNA splicing. SR proteins are named because they contain a protein domain with long repeats of serine and arginine amino acid residues, whose standard abbreviations are "S" and "R" respectively. SR proteins are ~200-600 amino acids in length and composed of two domains, the RNA recognition motif (RRM) region and the RS domain. SR proteins are more commonly found in the nucleus than the cytoplasm, but several SR proteins are known to shuttle between the nucleus and the cytoplasm.
A primary transcript is the single-stranded ribonucleic acid (RNA) product synthesized by transcription of DNA, and processed to yield various mature RNA products such as mRNAs, tRNAs, and rRNAs. The primary transcripts designated to be mRNAs are modified in preparation for translation. For example, a precursor mRNA (pre-mRNA) is a type of primary transcript that becomes a messenger RNA (mRNA) after processing.
The 5′ untranslated region is the region of a messenger RNA (mRNA) that is directly upstream from the initiation codon. This region is important for the regulation of translation of a transcript by differing mechanisms in viruses, prokaryotes and eukaryotes. While called untranslated, the 5′ UTR or a portion of it is sometimes translated into a protein product. This product can then regulate the translation of the main coding sequence of the mRNA. In many organisms, however, the 5′ UTR is completely untranslated, instead forming a complex secondary structure to regulate translation.
In eukaryote cells, RNA polymerase III is a protein that transcribes DNA to synthesize 5S ribosomal RNA, tRNA, and other small RNAs.
Transcriptional modification or co-transcriptional modification is a set of biological processes common to most eukaryotic cells by which an RNA primary transcript is chemically altered following transcription from a gene to produce a mature, functional RNA molecule that can then leave the nucleus and perform any of a variety of different functions in the cell. There are many types of post-transcriptional modifications achieved through a diverse class of molecular mechanisms.
Small nuclear RNA (snRNA) is a class of small RNA molecules that are found within the splicing speckles and Cajal bodies of the cell nucleus in eukaryotic cells. The length of an average snRNA is approximately 150 nucleotides. They are transcribed by either RNA polymerase II or RNA polymerase III. Their primary function is in the processing of pre-messenger RNA (hnRNA) in the nucleus. They have also been shown to aid in the regulation of transcription factors or RNA polymerase II, and maintaining the telomeres.
Gene structure is the organisation of specialised sequence elements within a gene. Genes contain most of the information necessary for living cells to survive and reproduce. In most organisms, genes are made of DNA, where the particular DNA sequence determines the function of the gene. A gene is transcribed (copied) from DNA into RNA, which can either be non-coding (ncRNA) with a direct function, or an intermediate messenger (mRNA) that is then translated into protein. Each of these steps is controlled by specific sequence elements, or regions, within the gene. Every gene, therefore, requires multiple sequence elements to be functional. This includes the sequence that actually encodes the functional protein or ncRNA, as well as multiple regulatory sequence regions. These regions may be as short as a few base pairs, up to many thousands of base pairs long.
Group II introns are a large class of self-catalytic ribozymes and mobile genetic elements found within the genes of all three domains of life. Ribozyme activity can occur under high-salt conditions in vitro. However, assistance from proteins is required for in vivo splicing. In contrast to group I introns, intron excision occurs in the absence of GTP and involves the formation of a lariat, with an A-residue branchpoint strongly resembling that found in lariats formed during splicing of nuclear pre-mRNA. It is hypothesized that pre-mRNA splicing may have evolved from group II introns, due to the similar catalytic mechanism as well as the structural similarity of the Group II Domain V substructure to the U6/U2 extended snRNA. Finally, their ability to site-specifically insert into DNA sites has been exploited as a tool for biotechnology. For example, group II introns can be modified to make site-specific genome insertions and deliver cargo DNA such as reporter genes or lox sites
SmY ribonucleic acids are a family of small nuclear RNAs found in some species of nematode worms. They are thought to be involved in mRNA trans-splicing.
mRNA surveillance mechanisms are pathways utilized by organisms to ensure fidelity and quality of messenger RNA (mRNA) molecules. There are a number of surveillance mechanisms present within cells. These mechanisms function at various steps of the mRNA biogenesis pathway to detect and degrade transcripts that have not properly been processed.
Chimeric RNA, sometimes referred to as a fusion transcript, is composed of exons from two or more different genes that have the potential to encode novel proteins. These mRNAs are different from those produced by conventional splicing as they are produced by two or more gene loci.
The split gene theory is a theory of the origin of introns, long non-coding sequences in eukaryotic genes between the exons. The theory holds that the randomness of primordial DNA sequences would only permit small (< 600bp) open reading frames (ORFs), and that important intron structures and regulatory sequences are derived from stop codons. In this introns-first framework, the spliceosomal machinery and the nucleus evolved due to the necessity to join these ORFs into larger proteins, and that intronless bacterial genes are less ancestral than the split eukaryotic genes. The theory originated with Periannan Senapathy.
An outron is a nucleotide sequence at the 5' end of the primary transcript of a gene that is removed by a special form of RNA splicing during maturation of the final RNA product. Whereas intron sequences are located inside the gene, outron sequences lie outside the gene.