Untranslated region

Last updated
The flow of genetic information within a cell. DNA is initially transcribed into a messenger RNA (mRNA) molecule. The mRNA is then translated into a protein. (See Central dogma of molecular biology.) Central Dogma of Molecular Biochemistry with Enzymes.jpg
The flow of genetic information within a cell. DNA is initially transcribed into a messenger RNA (mRNA) molecule. The mRNA is then translated into a protein. (See Central dogma of molecular biology.)
mRNA structure, approximately to scale for a human mRNA MRNA structure.svg
mRNA structure, approximately to scale for a human mRNA

In molecular genetics, an untranslated region (or UTR) refers to either of two sections, one on each side of a coding sequence on a strand of mRNA. If it is found on the 5' side, it is called the 5' UTR (or leader sequence), or if it is found on the 3' side, it is called the 3' UTR (or trailer sequence). mRNA is RNA that carries information from DNA to the ribosome, the site of protein synthesis (translation) within a cell. The mRNA is initially transcribed from the corresponding DNA sequence and then translated into protein. However, several regions of the mRNA are usually not translated into protein, including the 5' and 3' UTRs.

Contents

Although they are called untranslated regions, and do not form the protein-coding region of the gene, uORFs located within the 5' UTR can be translated into peptides. [1]

The 5' UTR is upstream from the coding sequence. Within the 5' UTR is a sequence that is recognized by the ribosome which allows the ribosome to bind and initiate translation. The mechanism of translation initiation differs in prokaryotes and eukaryotes. The 3' UTR is found immediately following the translation stop codon. The 3' UTR plays a critical role in translation termination as well as post-transcriptional modification. [2]

These often long sequences were once thought to be useless or junk mRNA that has simply accumulated over evolutionary time. However, it is now known that the untranslated region of mRNA is involved in many regulatory aspects of gene expression in eukaryotic organisms. The importance of these non-coding regions is supported by evolutionary reasoning, as natural selection would have otherwise eliminated this unusable RNA.

It is important to distinguish the 5' and 3' UTRs from other non-protein-coding RNA. Within the coding sequence of pre-mRNA, there can be found sections of RNA that will not be included in the protein product. These sections of RNA are called introns. The RNA that results from RNA splicing is a sequence of exons. The reason why introns are not considered untranslated regions is that the introns are spliced out in the process of RNA splicing. The introns are not included in the mature mRNA molecule that will undergo translation and are thus considered non-protein-coding RNA.

History

The untranslated regions of mRNA became a subject of study as early as the late 1970s, after the first mRNA molecule was fully sequenced. In 1978, the 5' UTR of the human gamma-globin mRNA was fully sequenced. [3] In 1980, a study was conducted on the 3' UTR of the duplicated human alpha-globin genes. [4]

Evolution

The untranslated region is seen in prokaryotes and eukaryotes, although the length and composition may vary. In prokaryotes, the 5' UTR is typically between 3 and 10 nucleotides long. In eukaryotes, the 5' UTR can be hundreds to thousands of nucleotides long. This is consistent with the higher complexity of the genomes of eukaryotes compared to prokaryotes. The 3' UTR varies in length as well. The poly-A tail is essential for keeping the mRNA from being degraded. Although there is variation in lengths of both the 5' and 3' UTR, it has been seen that the 5' UTR length is more highly conserved in evolution than the 3' UTR length. [5]

Prokaryotes

The 5' UTR of Prokaryotes consists of the Shine-Dalgarno sequence (5'-AGGAGGU-3'). [6] This sequence is found 3-10 base pairs upstream from the initiation codon. The initiation codon is the start site of translation into protein.

Eukaryotes

The 5' UTR of Eukaryotes is more complex than prokaryotes. It contains a Kozak consensus sequence (ACCAUGG). [7] This sequence contains the initiation codon. The initiation codon is the start site of translation into protein.

The importance of these untranslated regions of mRNA is just beginning to be understood. Various medical studies are being conducted that have found connections between mutations in untranslated regions and increased risk for developing a particular disease, such as cancer. For example, associations between polymorphisms in the HLA-G 3′UTR region and development of colorectal cancer have been discovered. [8] Single Nucleotide Polymorphisms in the 3' UTR of another gene have also been associated with susceptibility to preterm birth. [9] Mutations in the 3' UTR of the APP gene are related to development of cerebral amyloid angiopathy. [10]

Further study

Through the recent study of untranslated regions, general information has been gathered about the nature and function of these elements. However, there is still much that is unknown about these regions of mRNA. Since the regulation of gene expression is critical in the proper function of cells, this is an area of study that needs to be investigated further. It is important to consider that mutations in 3' untranslated regions have the potential to alter the expression of several genes that may appear unrelated. [11] We are only beginning to understand the links between proper untranslated region function, and disease states of cells.

See also

Related Research Articles

Exon Gene portion that is not removed during RNA splicing and becomes part of mature mRNA

An exon is any part of a gene that will encode a part of the final mature RNA produced by that gene after introns have been removed by RNA splicing. The term exon refers to both the DNA sequence within a gene and to the corresponding sequence in RNA transcripts. In RNA splicing, introns are removed and exons are covalently joined to one another as part of generating the mature messenger RNA. Just as the entire set of genes for a species constitutes the genome, the entire set of exons constitutes the exome.

Messenger RNA RNA that is read by the ribosome to produce a protein

In molecular biology, messenger RNA (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of synthesizing a protein.

Protein biosynthesis Assembly of proteins inside biological cells

Protein biosynthesis is a core biological process, occurring inside cells, balancing the loss of cellular proteins through the production of new proteins. Proteins perform a number of critical functions as enzymes, structural proteins or hormones. Protein synthesis is a very similar process for both prokaryotes and eukaryotes but there are some distinct differences.

The coding region of a gene, also known as the CDS, is the portion of a gene's DNA or RNA that codes for protein. Studying the length, composition, regulation, splicing, structures, and functions of coding regions compared to non-coding regions over different species and time periods can provide a significant amount of important information regarding gene organization and evolution of prokaryotes and eukaryotes. This can further assist in mapping the human genome and developing gene therapy.

Three prime untranslated region

In molecular genetics, the three prime untranslated region (3′-UTR) is the section of messenger RNA (mRNA) that immediately follows the translation termination codon. The 3′-UTR often contains regulatory regions that post-transcriptionally influence gene expression.

Polyadenylation is the addition of a poly(A) tail to a RNA transcript, typically a messenger RNA (mRNA). The poly(A) tail consists of multiple adenosine monophosphates; in other words, it is a stretch of RNA that has only adenine bases. In eukaryotes, polyadenylation is part of the process that produces mature mRNA for translation. In many bacteria, the poly(A) tail prevents degradation of the mRNA. It, therefore, forms part of the larger process of gene expression.

In molecular genetics, an open reading frame (ORF) is the part of a reading frame that has the ability to be translated. An ORF is a continuous stretch of codons that begins with a start codon and ends at a stop codon. An ATG codon within the ORF may indicate where translation starts. The transcription termination site is located after the ORF, beyond the translation stop codon. If transcription were to cease before the stop codon, an incomplete protein would be made during translation. In eukaryotic genes with multiple exons, introns are removed and exons are then joined together after transcription to yield the final mRNA for protein translation. In the context of gene finding, the start-stop definition of an ORF therefore only applies to spliced mRNAs, not genomic DNA, since introns may contain stop codons and/or cause shifts between reading frames. An alternative definition says that an ORF is a sequence that has a length divisible by three and is bounded by stop codons. This more general definition can also be useful in the context of transcriptomics and/or metagenomics, where start and/or stop codon may not be present in the obtained sequences. Such an ORF corresponds to parts of a gene rather than the complete gene.

The 5′ untranslated region is the region of an mRNA that is directly upstream from the initiation codon. This region is important for the regulation of translation of a transcript by differing mechanisms in viruses, prokaryotes and eukaryotes. While called untranslated, the 5′ UTR or a portion of it is sometimes translated into a protein product. This product can then regulate the translation of the main coding sequence of the mRNA. In many organisms, however, the 5′ UTR is completely untranslated, instead forming complex secondary structure to regulate translation.

Start codon First codon of a messenger RNA transcript translated by a ribosome

The start codon is the first codon of a messenger RNA (mRNA) transcript translated by a ribosome. The start codon always codes for methionine in eukaryotes and Archaea and a N-formylmethionine (fMet) in bacteria, mitochondria and plastids. The most common start codon is AUG.

Gene Sequence of DNA or RNA that codes for an RNA or protein product

In biology, a gene is a sequence of nucleotides in DNA or RNA that encodes the synthesis of a gene product, either RNA or protein.

The Kozak consensus sequence is a nucleic acid motif that functions as the protein translation initiation site in most eukaryotic mRNA transcripts. Regarded as the optimum sequence for initiating translation in eukaryotes, the sequence is an integral aspect of protein regulation and overall cellular health as well as having implications in human disease. It ensures that a protein is correctly translated from the genetic message, mediating ribosome assembly and translation initiation. A wrong start site can result in non-functional proteins. As it has become more studied, expansions of the nucleotide sequence, bases of importance, and notable exceptions have arisen. The sequence was named after the scientist who discovered it, Marilyn Kozak. Kozak discovered the sequence through a detailed analysis of DNA genomic sequences.

Gene structure is the organisation of specialised sequence elements within a gene. Genes contain the information necessary for living cells to survive and reproduce. In most organisms, genes are made of DNA, where the particular DNA sequence determines the function of the gene. A gene is transcribed (copied) from DNA into RNA, which can either be non-coding (ncRNA) with a direct function, or an intermediate messenger (mRNA) that is then translated into protein. Each of these steps is controlled by specific sequence elements, or regions, within the gene. Every gene, therefore, requires multiple sequence elements to be functional. This includes the sequence that actually encodes the functional protein or ncRNA, as well as multiple regulatory sequence regions. These regions may be as short as a few base pairs, up to many thousands of base pairs long.

Eukaryotic chromosome fine structure refers to the structure of sequences for eukaryotic chromosomes. Some fine sequences are included in more than one class, so the classification listed is not intended to be completely separate.

Directionality (molecular biology) End-to-end chemical orientation of a single strand of nucleic acid

Directionality, in molecular biology and biochemistry, is the end-to-end chemical orientation of a single strand of nucleic acid. In a single strand of DNA or RNA, the chemical convention of naming carbon atoms in the nucleotide pentose-sugar-ring means that there will be a 5′-end, which frequently contains a phosphate group attached to the 5′ carbon of the ribose ring, and a 3′-end, which typically is unmodified from the ribose -OH substituent. In a DNA double helix, the strands run in opposite directions to permit base pairing between them, which is essential for replication or transcription of the encoded information.

A ribosome binding site, or ribosomal binding site (RBS), is a sequence of nucleotides upstream of the start codon of an mRNA transcript that is responsible for the recruitment of a ribosome during the initiation of protein translation. Mostly, RBS refers to bacterial sequences, although internal ribosome entry sites (IRES) have been described in mRNAs of eukaryotic cells or viruses that infect eukaryotes. Ribosome recruitment in eukaryotes is generally mediated by the 5' cap present on eukaryotic mRNAs.

5′ flanking region

The 5′ flanking region is a region of DNA that is adjacent to the 5′ end of the gene. The 5′ flanking region contains the promoter, and may contain enhancers or other protein binding sites. It is the region of DNA that is not transcribed into RNA. Not to be confused with the 5′ untranslated region, this region is neither transcribed into RNA, nor translated into a functional protein. These regions primarily function in the regulation of gene transcription. 5′ flanking regions differ between prokaryotes and eukaryotes.

Red clover necrotic mosaic virus translation enhancer elements

Red clover necrotic mosaic virus (RCNMV) contains several structural elements present within the 3′ and 5′ untranslated regions (UTR) of the genome that enhance translation. In eukaryotes transcription is a prerequisite for translation. During transcription the pre-mRNA transcript is processes where a 5′ cap is attached onto mRNA and this 5′ cap allows for ribosome assembly onto the mRNA as it acts as a binding site for the eukaryotic initiation factor eIF4F. Once eIF4F is bound to the mRNA this protein complex interacts with the poly(A) binding protein which is present within the 3′ UTR and results in mRNA circularization. This multiprotein-mRNA complex then recruits the ribosome subunits and scans the mRNA until it reaches the start codon. Transcription of viral genomes differs from eukaryotes as viral genomes produce mRNA transcripts that lack a 5’ cap site. Despite lacking a cap site viral genes contain a structural element within the 5’ UTR known as an internal ribosome entry site (IRES). IRES is a structural element that recruits the 40s ribosome subunit to the mRNA within close proximity of the start codon.

Translational regulation refers to the control of the levels of protein synthesized from its mRNA. This regulation is vastly important to the cellular response to stressors, growth cues, and differentiation. In comparison to transcriptional regulation, it results in much more immediate cellular adjustment through direct regulation of protein concentration. The corresponding mechanisms are primarily targeted on the control of ribosome recruitment on the initiation codon, but can also involve modulation of peptide elongation, termination of protein synthesis, or ribosome biogenesis. While these general concepts are widely conserved, some of the finer details in this sort of regulation have been proven to differ between prokaryotic and eukaryotic organisms.

The "split gene" theory by Periannan Senapathy is a theory of the origin of introns, long non-coding sequences in eukaryotic genes that intervene the exons. The theory holds that the randomness of primordial DNA sequences would only permit small (< 600bp) open reading frames, and that important intron structures and regulatory sequences are derived from stop codons. In this introns-first framework, the spliceosomal machinery and the nucleus evolved due to the necessity to join these ORFs into larger proteins, and that intronless bacterial genes are less ancestral than the split eukaryotic genes.

Flavivirus 5 UTR

Flavivirus 5' UTR are untranslated regions in the genome of viruses in the genus Flavivirus.

References

  1. Vilela, Cristina; McCarthy, John E. G. (2003-08-01). "Regulation of fungal gene expression via short open reading frames in the mRNA 5'untranslated region". Molecular Microbiology. 49 (4): 859–867. doi: 10.1046/j.1365-2958.2003.03622.x . ISSN   0950-382X. PMID   12890013.
  2. Barrett, Lucy W; Fletcher, Sue; Wilton, Steve D (2013). Untranslated Gene Regions and Other Non-Coding Elements. Springer. ISBN   978-3-0348-0679-4.
  3. Chang, J. C.; Poon, R.; Neumann, K. H.; Kan, Y. W. (1978-10-01). "The nucleotide sequence of the 5' untranslated region of human gamma-globin mRNA". Nucleic Acids Research. 5 (10): 3515–3522. doi:10.1093/nar/5.10.3515. ISSN   0305-1048. PMC   342692 . PMID   318162.
  4. Michelson, A. M.; Orkin, S. H. (1980-11-01). "The 3' untranslated regions of the duplicated human alpha-globin genes are unexpectedly divergent". Cell. 22 (2 Pt 2): 371–377. doi:10.1016/0092-8674(80)90347-5. ISSN   0092-8674. PMID   7448866. S2CID   54238986.
  5. Lin, Zhenguo; Li, Wen-Hsiung (2012-01-01). "Evolution of 5' untranslated region length and gene expression reprogramming in yeasts". Molecular Biology and Evolution. 29 (1): 81–89. doi:10.1093/molbev/msr143. ISSN   1537-1719. PMC   3245540 . PMID   21965341.
  6. Jin, H; Zhao, Q; Gonzalez; de Valdivia, EI; Ardell, DH; Stenström, M; Isaksson, LA (April 2006). "Influences on gene expression in vivo by a Shine-Dalgarno sequence". Molecular Microbiology. 60 (2): 480–492. doi:10.1111/j.1365-2958.2006.05110.x. PMID   16573696.
  7. Nakagawa, So; Niimura, Yoshihito; Gojobori, Takashi; Tanaka, Hiroshi; Miura, Kin-ichiro (2008-02-01). "Diversity of preferred nucleotide sequences around the translation initiation codon in eukaryote genomes". Nucleic Acids Research. 36 (3): 861–871. doi:10.1093/nar/gkm1102. ISSN   0305-1048. PMC   2241899 . PMID   18086709.
  8. M. Garziera and E. Catamo and S. Crovella and M. Montico and E. Cecchin and S. Lonardi and E. Mini and S. Nobili and L. Romanato and G. Toffoli (2015). "Association of the HLA-G 3′UTR polymorphisms with colorectal cancer in Italy: a first insight". International Journal of Immunogenetics. 43 (1): 32–39. doi:10.1111/iji.12243. PMID   26752414.CS1 maint: uses authors parameter (link)
  9. Zhu, Qin; Chen, Ying; Dai, Jianrong; Wang, Benjing; Liu, Minjuan; Wang, Yun; Tao, Jianying; Li, Hong (2015-01-01). "Methylenetetrahydrofolate reductase polymorphisms at 3'-untranslated region are associated with susceptibility to preterm birth". Translational Pediatrics. 4 (1): 57–62. doi:10.3978/j.issn.2224-4336.2015.01.02. ISSN   2224-4344. PMC   4729064 . PMID   26835361.
  10. G. Nicolas and D. Wallon and C. Goupil and A.-C. Richard and C. Pottier and V. Dorval and M. Sarov-Riviere and F. Riant and D. Herve and P. Amouyel and M. Guerchet and B. Ndamba-Bandzouzi and P. Mbelesso and J.-F. Dartigues and J.-C. Lambert and P.-M. Preux and T. Frebourg and D. Campion and D. Hannequin and E. Tournier-Lasserve and S. S. Hebert and A. Rovelet-Lecrux (2016). "Mutation in the 3'untranslated region of APP as a genetic determinant of cerebral amyloid angiopathy". European Journal of Human Genetics. 24 (1): 92–98. doi:10.1038/ejhg.2015.61. PMC   4795229 . PMID   25828868.CS1 maint: uses authors parameter (link)
  11. Chatterjee, Sangeeta; Pal, Jayanta K. (2009-05-01). "Role of 5′- and 3′-untranslated regions of mRNAs in human diseases". Biology of the Cell. 101 (5): 251–262. doi: 10.1042/BC20080104 . ISSN   1768-322X. PMID   19275763. S2CID   22689654.