Exon

Last updated
Introns are removed and exons joined in the process of RNA splicing. RNAs could be mRNA or non-coding RNA RNA splicing diagram en.svg
Introns are removed and exons joined in the process of RNA splicing. RNAs could be mRNA or non-coding RNA

An exon is any part of a gene that will form a part of the final mature RNA produced by that gene after introns have been removed by RNA splicing. The term exon refers to both the DNA sequence within a gene and to the corresponding sequence in RNA transcripts. In RNA splicing, introns are removed and exons are covalently joined to one another as part of generating the mature RNA. Just as the entire set of genes for a species constitutes the genome, the entire set of exons constitutes the exome.

Contents

History

The term exon derives from the expressed region and was coined by American biochemist Walter Gilbert in 1978: "The notion of the cistron... must be replaced by that of a transcription unit containing regions which will be lost from the mature messenger which I suggest we call introns (for intragenic regions) alternating with regions which will be expressed exons." [1]

This definition was originally made for protein-coding transcripts that are spliced before being translated. The term later came to include sequences removed from rRNA [2] and tRNA, [3] and other ncRNA [4] and it also was used later for RNA molecules originating from different parts of the genome that are then ligated by trans-splicing. [5]

Contribution to genomes and size distribution

Although unicellular eukaryotes such as yeast have either no introns or very few, metazoans and especially vertebrate genomes have a large fraction of non-coding DNA. For instance, in the human genome only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. [6] This can provide a practical advantage in omics-aided health care (such as precision medicine) because it makes commercialized whole exome sequencing a smaller and less expensive challenge than commercialized whole genome sequencing. The large variation in genome size and C-value across life forms has posed an interesting challenge called the C-value enigma.

Across all eukaryotic genes in GenBank, there were (in 2002), on average, 5.48 exons per protein coding gene. The average exon encoded 30-36 amino acids. [7] While the longest exon in the human genome is 11555 bp long, several exons have been found to be only 2 bp long. [8] A single-nucleotide exon has been reported from the Arabidopsis genome. [9] In humans, like protein coding mRNA, most non-coding RNA also contain multiple exons [10]

Structure and function

Exons in a messenger RNA precursor (pre-mRNA). Exons can include both sequences that code for amino acids (red) and untranslated sequences (grey). Introns -- those parts of the pre-mRNA that are not in the mRNA -- (blue) are removed, and the exons are joined (spliced) to form the final functional mRNA. The 5' and 3' ends of the mRNA are marked to differentiate the two untranslated regions (grey). Gene structure.svg
Exons in a messenger RNA precursor (pre-mRNA). Exons can include both sequences that code for amino acids (red) and untranslated sequences (grey). Introns — those parts of the pre-mRNA that are not in the mRNA — (blue) are removed, and the exons are joined (spliced) to form the final functional mRNA. The 5′ and 3′ ends of the mRNA are marked to differentiate the two untranslated regions (grey).

In protein-coding genes, the exons include both the protein-coding sequence and the 5′- and 3′-untranslated regions (UTR). Often the first exon includes both the 5′-UTR and the first part of the coding sequence, but exons containing only regions of 5′-UTR or (more rarely) 3′-UTR occur in some genes, i.e. the UTRs may contain introns. [11] Some non-coding RNA transcripts also have exons and introns.

Mature mRNAs originating from the same gene need not include the same exons, since different introns in the pre-mRNA can be removed by the process of alternative splicing.

Exonization is the creation of a new exon, as a result of mutations in introns. [12]

Experimental approaches using exons

Exon trapping or 'gene trapping' is a molecular biology technique that exploits the existence of the intron-exon splicing to find new genes. [13] The first exon of a 'trapped' gene splices into the exon that is contained in the insertional DNA. This new exon contains the ORF for a reporter gene that can now be expressed using the enhancers that control the target gene. A scientist knows that a new gene has been trapped when the reporter gene is expressed.

Splicing can be experimentally modified so that targeted exons are excluded from mature mRNA transcripts by blocking the access of splice-directing small nuclear ribonucleoprotein particles (snRNPs) to pre-mRNA using Morpholino antisense oligos. [14] This has become a standard technique in developmental biology. Morpholino oligos can also be targeted to prevent molecules that regulate splicing (e.g. splice enhancers, splice suppressors) from binding to pre-mRNA, altering patterns of splicing.

Common misuse of the term

Common incorrect uses of the term exon are that 'exons code for protein', or 'exons code for amino-acids' or 'exons are translated'. However, these sorts of definitions only cover protein-coding genes, and omit those exons that become part of a non-coding RNA [15] or the untranslated region of an mRNA. [16] [17] Such incorrect definitions still occur in overall reputable secondary sources. [18] [19]

See also

Related Research Articles

An intron is any nucleotide sequence within a gene that is not expressed or operative in the final RNA product. The word intron is derived from the term intragenic region, i.e., a region inside a gene. The term intron refers to both the DNA sequence within a gene and the corresponding RNA sequence in RNA transcripts. The non-intron sequences that become joined by this RNA processing to form the mature RNA are called exons.

<span class="mw-page-title-main">Messenger RNA</span> RNA that is read by the ribosome to produce a protein

In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of synthesizing a protein.

<span class="mw-page-title-main">RNA splicing</span> Process in molecular biology

RNA splicing is a process in molecular biology where a newly-made precursor messenger RNA (pre-mRNA) transcript is transformed into a mature messenger RNA (mRNA). It works by removing all the introns and splicing back together exons. For nuclear-encoded genes, splicing occurs in the nucleus either during or immediately after transcription. For those eukaryotic genes that contain introns, splicing is usually needed to create an mRNA molecule that can be translated into protein. For many eukaryotic introns, splicing occurs in a series of reactions which are catalyzed by the spliceosome, a complex of small nuclear ribonucleoproteins (snRNPs). There exist self-splicing introns, that is, ribozymes that can catalyze their own excision from their parent RNA molecule. The process of transcription, splicing and translation is called gene expression, the central dogma of molecular biology.

Non-coding DNA (ncDNA) sequences are components of an organism's DNA that do not encode protein sequences. Some non-coding DNA is transcribed into functional non-coding RNA molecules. Other functional regions of the non-coding DNA fraction include regulatory sequences that control gene expression; scaffold attachment regions; origins of DNA replication; centromeres; and telomeres. Some non-coding regions appear to be mostly nonfunctional, such as introns, pseudogenes, intergenic DNA, and fragments of transposons and viruses. Regions that are completely nonfunctional are called junk DNA.

<span class="mw-page-title-main">Gene expression</span> Conversion of a genes sequence into a mature gene product or products

Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, proteins or non-coding RNA, and ultimately affect a phenotype. These products are often proteins, but in non-protein-coding genes such as transfer RNA (tRNA) and small nuclear RNA (snRNA), the product is a functional non-coding RNA. The process of gene expression is used by all known life—eukaryotes, prokaryotes, and utilized by viruses—to generate the macromolecular machinery for life.

The coding region of a gene, also known as the coding sequence (CDS), is the portion of a gene's DNA or RNA that codes for a protein. Studying the length, composition, regulation, splicing, structures, and functions of coding regions compared to non-coding regions over different species and time periods can provide a significant amount of important information regarding gene organization and evolution of prokaryotes and eukaryotes. This can further assist in mapping the human genome and developing gene therapy.

<span class="mw-page-title-main">Alternative splicing</span> Process by which a gene can code for multiple proteins

Alternative splicing, or alternative RNA splicing, or differential splicing, is an alternative splicing process during gene expression that allows a single gene to code for multiple proteins. In this process, particular exons of a gene may be included within or excluded from the final, processed messenger RNA (mRNA) produced from that gene. This means the exons are joined in different combinations, leading to different (alternative) mRNA strands. Consequently, the proteins translated from alternatively spliced mRNAs usually contain differences in their amino acid sequence and, often, in their biological functions.

<span class="mw-page-title-main">Morpholino</span> Chemical compound

A Morpholino, also known as a Morpholino oligomer and as a phosphorodiamidate Morpholino oligomer (PMO), is a type of oligomer molecule used in molecular biology to modify gene expression. Its molecular structure contains DNA bases attached to a backbone of methylenemorpholine rings linked through phosphorodiamidate groups. Morpholinos block access of other molecules to small specific sequences of the base-pairing surfaces of ribonucleic acid (RNA). Morpholinos are used as research tools for reverse genetics by knocking down gene function.

The 5′ untranslated region is the region of a messenger RNA (mRNA) that is directly upstream from the initiation codon. This region is important for the regulation of translation of a transcript by differing mechanisms in viruses, prokaryotes and eukaryotes. While called untranslated, the 5′ UTR or a portion of it is sometimes translated into a protein product. This product can then regulate the translation of the main coding sequence of the mRNA. In many organisms, however, the 5′ UTR is completely untranslated, instead forming a complex secondary structure to regulate translation.

<span class="mw-page-title-main">Gene</span> Sequence of DNA or RNA that codes for an RNA or protein product

In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA, that is transcribed to produce a functional RNA. There are two types of molecular genes: protein-coding genes and non-coding genes.

Gene structure is the organisation of specialised sequence elements within a gene. Genes contain most of the information necessary for living cells to survive and reproduce. In most organisms, genes are made of DNA, where the particular DNA sequence determines the function of the gene. A gene is transcribed (copied) from DNA into RNA, which can either be non-coding (ncRNA) with a direct function, or an intermediate messenger (mRNA) that is then translated into protein. Each of these steps is controlled by specific sequence elements, or regions, within the gene. Every gene, therefore, requires multiple sequence elements to be functional. This includes the sequence that actually encodes the functional protein or ncRNA, as well as multiple regulatory sequence regions. These regions may be as short as a few base pairs, up to many thousands of base pairs long.

Eukaryotic chromosome fine structure refers to the structure of sequences for eukaryotic chromosomes. Some fine sequences are included in more than one class, so the classification listed is not intended to be completely separate.

Exon shuffling is a molecular mechanism for the formation of new genes. It is a process through which two or more exons from different genes can be brought together ectopically, or the same exon can be duplicated, to create a new exon-intron structure. There are different mechanisms through which exon shuffling occurs: transposon mediated exon shuffling, crossover during sexual recombination of parental genomes and illegitimate recombination.

<span class="mw-page-title-main">Untranslated region</span> Non-coding regions on either end of mRNA

In molecular genetics, an untranslated region refers to either of two sections, one on each side of a coding sequence on a strand of mRNA. If it is found on the 5' side, it is called the 5' UTR, or if it is found on the 3' side, it is called the 3' UTR. mRNA is RNA that carries information from DNA to the ribosome, the site of protein synthesis (translation) within a cell. The mRNA is initially transcribed from the corresponding DNA sequence and then translated into protein. However, several regions of the mRNA are usually not translated into protein, including the 5' and 3' UTRs.

<span class="mw-page-title-main">BLCAP</span> Protein-coding gene in the species Homo sapiens

Bladder cancer-associated protein is a protein that in humans is encoded by the BLCAP gene.

The exome is composed of all of the exons within the genome, the sequences which, when transcribed, remain within the mature RNA after introns are removed by RNA splicing. This includes untranslated regions of messenger RNA (mRNA), and coding regions. Exome sequencing has proven to be an efficient method of determining the genetic basis of more than two dozen Mendelian or single gene disorders.

A conserved non-coding sequence (CNS) is a DNA sequence of noncoding DNA that is evolutionarily conserved. These sequences are of interest for their potential to regulate gene production.

Periannan Senapathy is a molecular biologist, geneticist, author and entrepreneur. He is the founder, president and chief scientific officer at Genome International Corporation, a biotechnology, bioinformatics, and information technology firm based in Madison, Wisconsin, which develops computational genomics applications of next-generation DNA sequencing (NGS) and clinical decision support systems for analyzing patient genome data that aids in diagnosis and treatment of diseases.

Exitrons are produced through alternative splicing and have characteristics of both introns and exons, but are described as retained introns. Even though they are considered introns, which are typically cut out of pre mRNA sequences, there are significant problems that arise when exitrons are spliced out of these strands, with the most obvious result being altered protein structures and functions. They were first discovered in plants, but have recently been found in metazoan species as well.

The split gene theory is a theory of the origin of introns, long non-coding sequences in eukaryotic genes between the exons. The theory holds that the randomness of primordial DNA sequences would only permit small (< 600bp) open reading frames (ORFs), and that important intron structures and regulatory sequences are derived from stop codons. In this introns-first framework, the spliceosomal machinery and the nucleus evolved due to the necessity to join these ORFs into larger proteins, and that intronless bacterial genes are less ancestral than the split eukaryotic genes. The theory originated with Periannan Senapathy.

References

  1. Gilbert W (February 1978). "Why genes in pieces?". Nature. 271 (5645): 501. Bibcode:1978Natur.271..501G. doi: 10.1038/271501a0 . PMID   622185.
  2. Kister KP, Eckert WA (March 1987). "Characterization of an authentic intermediate in the self-splicing process of ribosomal precursor RNA in macronuclei of Tetrahymena thermophila". Nucleic Acids Research. 15 (5): 1905–20. doi:10.1093/nar/15.5.1905. PMC   340607 . PMID   3645543.
  3. Valenzuela P, Venegas A, Weinberg F, Bishop R, Rutter WJ (January 1978). "Structure of yeast phenylalanine-tRNA genes: an intervening DNA segment within the region coding for the tRNA". Proceedings of the National Academy of Sciences of the United States of America. 75 (1): 190–4. Bibcode:1978PNAS...75..190V. doi: 10.1073/pnas.75.1.190 . PMC   411211 . PMID   343104.
  4. Khan, MR; Wellinger, RJ; Laurent, B (August 2021). "Exploring the Alternative Splicing of Long Noncoding RNAs". Trends in Genetics. 37 (8): 695–698. doi:10.1016/j.tig.2021.03.010. PMID   33892960. S2CID   233382870.
  5. Liu AY, Van der Ploeg LH, Rijsewijk FA, Borst P (June 1983). "The transposition unit of variant surface glycoprotein gene 118 of Trypanosoma brucei. Presence of repeated elements at its border and absence of promoter-associated sequences". Journal of Molecular Biology. 167 (1): 57–75. doi:10.1016/S0022-2836(83)80034-5. PMID   6306255.
  6. Venter J.C.; et al. (2000). "The Sequence of the Human Genome". Science. 291 (5507): 1304–51. Bibcode:2001Sci...291.1304V. doi: 10.1126/science.1058040 . PMID   11181995.
  7. Sakharkar M, Passetti F, de Souza JE, Long M, de Souza SJ (2002). "ExInt: an Exon Intron Database". Nucleic Acids Res. 30 (1): 191–4. doi:10.1093/nar/30.1.191. PMC   99089 . PMID   11752290.
  8. Sakharkar M.K.; Chow VT; Kangueane P. (2004). "Distributions of exons and introns in the human genome". In Silico Biol. 4 (4): 387–93. PMID   15217358.
  9. Guo Lei, Liu Chun-Ming (2015). "A single-nucleotide exon found in Arabidopsis". Scientific Reports. 5: 18087. Bibcode:2015NatSR...518087G. doi:10.1038/srep18087. PMC   4674806 . PMID   26657562.
  10. Derrien, T; Johnson, R; Bussotti, G; Tanzer, A; Djebali, S; Tilgner, H; Guernec, G; Martin, D; Merkel, A; Knowles, DG; Lagarde, J; Veeravalli, L; Ruan, X; Ruan, Y; Lassmann, T; Carninci, P; Brown, JB; Lipovich, L; Gonzalez, JM; Thomas, M; Davis, CA; Shiekhattar, R; Gingeras, TR; Hubbard, TJ; Notredame, C; Harrow, J; Guigó, R (September 2012). "The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression". Genome Research. 22 (9): 1775–89. doi:10.1101/gr.132159.111. PMC   3431493 . PMID   22955988.
  11. Bicknell, AA (December 2012). "Introns in UTRs: Why we should stop ignoring them". BioEssays. 34 (12): 1025–1034. doi: 10.1002/bies.201200073 . PMID   23108796. S2CID   5808466.
  12. Sorek R (October 2007). "The birth of new exons: mechanisms and evolutionary consequences". RNA. 13 (10): 1603–8. doi:10.1261/rna.682507. PMC   1986822 . PMID   17709368.
  13. Duyk G. M; Kim S. W.; Myers R. M; Cox D. R (1990). "Exon Trapping: a Genetic Screen to Identify Candidate Transcribed Sequences in Cloned Mammalian Genomic DNA". Proceedings of the National Academy of Sciences. 87 (22): 8995–8999. Bibcode:1990PNAS...87.8995D. doi: 10.1073/pnas.87.22.8995 . PMC   55087 . PMID   2247475.
  14. Morcos PA (June 2007). "Achieving targeted and quantifiable alteration of mRNA splicing with Morpholino oligos". Biochemical and Biophysical Research Communications. 358 (2): 521–7. doi:10.1016/j.bbrc.2007.04.172. PMID   17493584.
  15. Khan, MR; Wellinger, RJ; Laurent, B (August 2021). "Exploring the Alternative Splicing of Long Noncoding RNAs". Trends in Genetics. 37 (8): 695–698. doi:10.1016/j.tig.2021.03.010. PMID   33892960. S2CID   233382870.
  16. Lu, J; Williams, JA; Luke, J; Zhang, F; Chu, K; Kay, MA (January 2017). "A 5' Noncoding Exon Containing Engineered Intron Enhances Transgene Expression from Recombinant AAV Vectors in vivo". Human Gene Therapy. 28 (1): 125–134. doi:10.1089/hum.2016.140. PMC   5278795 . PMID   27903072.
  17. Chung, BY; Simons, C; Firth, AE; Brown, CM; Hellens, RP (19 May 2006). "Effect of 5'UTR introns on gene expression in Arabidopsis thaliana". BMC Genomics. 7: 120. doi: 10.1186/1471-2164-7-120 . PMC   1482700 . PMID   16712733.
  18. "Exon". Genome.gov. Archived from the original on 2023-03-16. Retrieved 2023-03-23.
  19. "Exon". www.nature.com. Scitable. Archived from the original on 2023-03-23. Retrieved 2023-03-23.

Bibliography