Untranslated region

Last updated
The flow of genetic information within a cell. DNA is initially transcribed into a messenger RNA (mRNA) molecule. The mRNA is then translated into a protein. (See Central dogma of molecular biology.) Central Dogma of Molecular Biochemistry with Enzymes.jpg
The flow of genetic information within a cell. DNA is initially transcribed into a messenger RNA (mRNA) molecule. The mRNA is then translated into a protein. (See Central dogma of molecular biology.)
mRNA structure, approximately to scale for a human mRNA MRNA structure.svg
mRNA structure, approximately to scale for a human mRNA

In molecular genetics, an untranslated region (or UTR) refers to either of two sections, one on each side of a coding sequence on a strand of mRNA. If it is found on the 5' side, it is called the 5' UTR (or leader sequence), or if it is found on the 3' side, it is called the 3' UTR (or trailer sequence). mRNA is RNA that carries information from DNA to the ribosome, the site of protein synthesis (translation) within a cell. The mRNA is initially transcribed from the corresponding DNA sequence and then translated into protein. However, several regions of the mRNA are usually not translated into protein, including the 5' and 3' UTRs.

Contents

Although they are called untranslated regions, and do not form the protein-coding region of the gene, uORFs located within the 5' UTR can be translated into peptides. [1]

The 5' UTR is upstream from the coding sequence. Within the 5' UTR is a sequence that is recognized by the ribosome which allows the ribosome to bind and initiate translation. The mechanism of translation initiation differs in prokaryotes and eukaryotes. The 3' UTR is found immediately following the translation stop codon. The 3' UTR plays a critical role in translation termination as well as post-transcriptional modification. [2]

These often long sequences were once thought to be useless or junk mRNA that has simply accumulated over evolutionary time. However, it is now known that the untranslated region of mRNA is involved in many regulatory aspects of gene expression in eukaryotic organisms. The importance of these non-coding regions is supported by evolutionary reasoning, as natural selection would have otherwise eliminated this unusable RNA.

It is important to distinguish the 5' and 3' UTRs from other non-protein-coding RNA. Within the coding sequence of pre-mRNA, there can be found sections of RNA that will not be included in the protein product. These sections of RNA are called introns. The RNA that results from RNA splicing is a sequence of exons. The reason why introns are not considered untranslated regions is that the introns are spliced out in the process of RNA splicing. The introns are not included in the mature mRNA molecule that will undergo translation and are thus considered non-protein-coding RNA.

History

The untranslated regions of mRNA became a subject of study as early as the late 1970s, after the first mRNA molecule was fully sequenced. In 1978, the 5' UTR of the human gamma-globin mRNA was fully sequenced. [3] In 1980, a study was conducted on the 3' UTR of the duplicated human alpha-globin genes. [4]

Evolution

The untranslated region is seen in prokaryotes and eukaryotes, although the length and composition may vary. In prokaryotes, the 5' UTR is typically between 3 and 10 nucleotides long. In eukaryotes, the 5' UTR can be hundreds to thousands of nucleotides long. This is consistent with the higher complexity of the genomes of eukaryotes compared to prokaryotes. The 3' UTR varies in length as well. The poly-A tail is essential for keeping the mRNA from being degraded. Although there is variation in lengths of both the 5' and 3' UTR, it has been seen that the 5' UTR length is more highly conserved in evolution than the 3' UTR length. [5]

Prokaryotes

The 5' UTR of prokaryotes consists of the Shine–Dalgarno sequence (5'-AGGAGGU-3'). [6] This sequence is found 3-10 base pairs upstream from the initiation codon. The initiation codon is the start site of translation into protein.

Eukaryotes

The 5' UTR of eukaryotes is more complex than prokaryotes. It contains a Kozak consensus sequence (ACCAUGG). [7] This sequence contains the initiation codon. The initiation codon is the start site of translation into protein.

The importance of these untranslated regions of mRNA is just beginning to be understood. Various medical studies are being conducted that have found connections between mutations in untranslated regions and increased risk for developing a particular disease, such as cancer. For example, associations between polymorphisms in the HLA-G 3′UTR region and development of colorectal cancer have been discovered. [8] Single Nucleotide Polymorphisms in the 3' UTR of another gene have also been associated with susceptibility to preterm birth. [9] Mutations in the 3' UTR of the APP gene are related to development of cerebral amyloid angiopathy. [10]

Further study

Through the recent study of untranslated regions, general information has been gathered about the nature and function of these elements. However, there is still much that is unknown about these regions of mRNA. Since the regulation of gene expression is critical in the proper function of cells, this is an area of study that needs to be investigated further. It is important to consider that mutations in 3' untranslated regions have the potential to alter the expression of several genes that may appear unrelated. [11] We are only beginning to understand the links between proper untranslated region function, and disease states of cells.

See also

Related Research Articles

<span class="mw-page-title-main">Exon</span> A region of a transcribed gene present in the final functional mRNA molecule

An exon is any part of a gene that will form a part of the final mature RNA produced by that gene after introns have been removed by RNA splicing. The term exon refers to both the DNA sequence within a gene and to the corresponding sequence in RNA transcripts. In RNA splicing, introns are removed and exons are covalently joined to one another as part of generating the mature RNA. Just as the entire set of genes for a species constitutes the genome, the entire set of exons constitutes the exome.

<span class="mw-page-title-main">Messenger RNA</span> RNA that is read by the ribosome to produce a protein

In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of synthesizing a protein.

<span class="mw-page-title-main">Protein biosynthesis</span> Assembly of proteins inside biological cells

Protein biosynthesis is a core biological process, occurring inside cells, balancing the loss of cellular proteins through the production of new proteins. Proteins perform a number of critical functions as enzymes, structural proteins or hormones. Protein synthesis is a very similar process for both prokaryotes and eukaryotes but there are some distinct differences.

Non-coding DNA (ncDNA) sequences are components of an organism's DNA that do not encode protein sequences. Some non-coding DNA is transcribed into functional non-coding RNA molecules. Other functional regions of the non-coding DNA fraction include regulatory sequences that control gene expression; scaffold attachment regions; origins of DNA replication; centromeres; and telomeres. Some non-coding regions appear to be mostly nonfunctional such as introns, pseudogenes, intergenic DNA, and fragments of transposons and viruses.

The coding region of a gene, also known as the coding sequence(CDS), is the portion of a gene's DNA or RNA that codes for a protein. Studying the length, composition, regulation, splicing, structures, and functions of coding regions compared to non-coding regions over different species and time periods can provide a significant amount of important information regarding gene organization and evolution of prokaryotes and eukaryotes. This can further assist in mapping the human genome and developing gene therapy.

<span class="mw-page-title-main">Three prime untranslated region</span> Sequence at the 3 end of messenger RNA that does not code for product

In molecular genetics, the three prime untranslated region (3′-UTR) is the section of messenger RNA (mRNA) that immediately follows the translation termination codon. The 3′-UTR often contains regulatory regions that post-transcriptionally influence gene expression.

In molecular biology, open reading frames (ORFs) are defined as spans of DNA sequence between the start and stop codons. Usually, this is considered within a studied region of a prokaryotic DNA sequence, where only one of the six possible reading frames will be "open". Such an ORF may contain a start codon and by definition cannot extend beyond a stop codon. That start codon indicates where translation may start. The transcription termination site is located after the ORF, beyond the translation stop codon. If transcription were to cease before the stop codon, an incomplete protein would be made during translation.

The 5′ untranslated region is the region of a messenger RNA (mRNA) that is directly upstream from the initiation codon. This region is important for the regulation of translation of a transcript by differing mechanisms in viruses, prokaryotes and eukaryotes. While called untranslated, the 5′ UTR or a portion of it is sometimes translated into a protein product. This product can then regulate the translation of the main coding sequence of the mRNA. In many organisms, however, the 5′ UTR is completely untranslated, instead forming a complex secondary structure to regulate translation.

<span class="mw-page-title-main">Silencer (genetics)</span> Type of DNA sequence

In genetics, a silencer is a DNA sequence capable of binding transcription regulation factors, called repressors. DNA contains genes and provides the template to produce messenger RNA (mRNA). That mRNA is then translated into proteins. When a repressor protein binds to the silencer region of DNA, RNA polymerase is prevented from transcribing the DNA sequence into RNA. With transcription blocked, the translation of RNA into proteins is impossible. Thus, silencers prevent genes from being expressed as proteins.

<span class="mw-page-title-main">Start codon</span> First codon of a messenger RNA translated by a ribosome

The start codon is the first codon of a messenger RNA (mRNA) transcript translated by a ribosome. The start codon always codes for methionine in eukaryotes and archaea and a N-formylmethionine (fMet) in bacteria, mitochondria and plastids.

Eukaryotic translation is the biological process by which messenger RNA is translated into proteins in eukaryotes. It consists of four phases: initiation, elongation, termination, and recapping.

The Kozak consensus sequence is a nucleic acid motif that functions as the protein translation initiation site in most eukaryotic mRNA transcripts. Regarded as the optimum sequence for initiating translation in eukaryotes, the sequence is an integral aspect of protein regulation and overall cellular health as well as having implications in human disease. It ensures that a protein is correctly translated from the genetic message, mediating ribosome assembly and translation initiation. A wrong start site can result in non-functional proteins. As it has become more studied, expansions of the nucleotide sequence, bases of importance, and notable exceptions have arisen. The sequence was named after the scientist who discovered it, Marilyn Kozak. Kozak discovered the sequence through a detailed analysis of DNA genomic sequences.

Gene structure is the organisation of specialised sequence elements within a gene. Genes contain most of the information necessary for living cells to survive and reproduce. In most organisms, genes are made of DNA, where the particular DNA sequence determines the function of the gene. A gene is transcribed (copied) from DNA into RNA, which can either be non-coding (ncRNA) with a direct function, or an intermediate messenger (mRNA) that is then translated into protein. Each of these steps is controlled by specific sequence elements, or regions, within the gene. Every gene, therefore, requires multiple sequence elements to be functional. This includes the sequence that actually encodes the functional protein or ncRNA, as well as multiple regulatory sequence regions. These regions may be as short as a few base pairs, up to many thousands of base pairs long.

Eukaryotic chromosome fine structure refers to the structure of sequences for eukaryotic chromosomes. Some fine sequences are included in more than one class, so the classification listed is not intended to be completely separate.

<span class="mw-page-title-main">Directionality (molecular biology)</span> End-to-end chemical orientation of a single strand of nucleic acid

Directionality, in molecular biology and biochemistry, is the end-to-end chemical orientation of a single strand of nucleic acid. In a single strand of DNA or RNA, the chemical convention of naming carbon atoms in the nucleotide pentose-sugar-ring means that there will be a 5′ end, which frequently contains a phosphate group attached to the 5′ carbon of the ribose ring, and a 3′ end, which typically is unmodified from the ribose -OH substituent. In a DNA double helix, the strands run in opposite directions to permit base pairing between them, which is essential for replication or transcription of the encoded information.

A ribosome binding site, or ribosomal binding site (RBS), is a sequence of nucleotides upstream of the start codon of an mRNA transcript that is responsible for the recruitment of a ribosome during the initiation of translation. Mostly, RBS refers to bacterial sequences, although internal ribosome entry sites (IRES) have been described in mRNAs of eukaryotic cells or viruses that infect eukaryotes. Ribosome recruitment in eukaryotes is generally mediated by the 5' cap present on eukaryotic mRNAs.

<span class="mw-page-title-main">5′ flanking region</span>

The 5′ flanking region is a region of DNA that is adjacent to the 5′ end of the gene. The 5′ flanking region contains the promoter, and may contain enhancers or other protein binding sites. It is the region of DNA that is not transcribed into RNA. Not to be confused with the 5′ untranslated region, this region is not transcribed into RNA or translated into a functional protein. These regions primarily function in the regulation of gene transcription. 5′ flanking regions are categorized between prokaryotes and eukaryotes.

<span class="mw-page-title-main">Red clover necrotic mosaic virus translation enhancer elements</span>

Red clover necrotic mosaic virus (RCNMV) contains several structural elements present within the 3′ and 5′ untranslated regions (UTR) of the genome that enhance translation. In eukaryotes transcription is a prerequisite for translation. During transcription the pre-mRNA transcript is processes where a 5′ cap is attached onto mRNA and this 5′ cap allows for ribosome assembly onto the mRNA as it acts as a binding site for the eukaryotic initiation factor eIF4F. Once eIF4F is bound to the mRNA this protein complex interacts with the poly(A) binding protein which is present within the 3′ UTR and results in mRNA circularization. This multiprotein-mRNA complex then recruits the ribosome subunits and scans the mRNA until it reaches the start codon. Transcription of viral genomes differs from eukaryotes as viral genomes produce mRNA transcripts that lack a 5’ cap site. Despite lacking a cap site viral genes contain a structural element within the 5’ UTR known as an internal ribosome entry site (IRES). IRES is a structural element that recruits the 40s ribosome subunit to the mRNA within close proximity of the start codon.

Translational regulation refers to the control of the levels of protein synthesized from its mRNA. This regulation is vastly important to the cellular response to stressors, growth cues, and differentiation. In comparison to transcriptional regulation, it results in much more immediate cellular adjustment through direct regulation of protein concentration. The corresponding mechanisms are primarily targeted on the control of ribosome recruitment on the initiation codon, but can also involve modulation of peptide elongation, termination of protein synthesis, or ribosome biogenesis. While these general concepts are widely conserved, some of the finer details in this sort of regulation have been proven to differ between prokaryotic and eukaryotic organisms.

<span class="mw-page-title-main">Translation regulation by 5′ transcript leader cis-elements</span>

Translation regulation by 5′ transcript leader cis-elements is a process in cellular translation.

References

  1. Vilela, Cristina; McCarthy, John E. G. (2003-08-01). "Regulation of fungal gene expression via short open reading frames in the mRNA 5'untranslated region". Molecular Microbiology. 49 (4): 859–867. doi: 10.1046/j.1365-2958.2003.03622.x . ISSN   0950-382X. PMID   12890013.
  2. Barrett, Lucy W; Fletcher, Sue; Wilton, Steve D (2013). Untranslated Gene Regions and Other Non-Coding Elements. Springer. ISBN   978-3-0348-0679-4.
  3. Chang, J. C.; Poon, R.; Neumann, K. H.; Kan, Y. W. (1978-10-01). "The nucleotide sequence of the 5' untranslated region of human gamma-globin mRNA". Nucleic Acids Research. 5 (10): 3515–3522. doi:10.1093/nar/5.10.3515. ISSN   0305-1048. PMC   342692 . PMID   318162.
  4. Michelson, A. M.; Orkin, S. H. (1980-11-01). "The 3' untranslated regions of the duplicated human alpha-globin genes are unexpectedly divergent". Cell. 22 (2 Pt 2): 371–377. doi:10.1016/0092-8674(80)90347-5. ISSN   0092-8674. PMID   7448866. S2CID   54238986.
  5. Lin, Zhenguo; Li, Wen-Hsiung (2012-01-01). "Evolution of 5' untranslated region length and gene expression reprogramming in yeasts". Molecular Biology and Evolution. 29 (1): 81–89. doi:10.1093/molbev/msr143. ISSN   1537-1719. PMC   3245540 . PMID   21965341.
  6. Jin, H; Zhao, Q; Gonzalez; de Valdivia, EI; Ardell, DH; Stenström, M; Isaksson, LA (April 2006). "Influences on gene expression in vivo by a Shine-Dalgarno sequence". Molecular Microbiology. 60 (2): 480–492. doi: 10.1111/j.1365-2958.2006.05110.x . PMID   16573696. S2CID   5686240.
  7. Nakagawa, So; Niimura, Yoshihito; Gojobori, Takashi; Tanaka, Hiroshi; Miura, Kin-ichiro (2008-02-01). "Diversity of preferred nucleotide sequences around the translation initiation codon in eukaryote genomes". Nucleic Acids Research. 36 (3): 861–871. doi:10.1093/nar/gkm1102. ISSN   0305-1048. PMC   2241899 . PMID   18086709.
  8. M. Garziera; E. Catamo; S. Crovella; M. Montico; E. Cecchin; S. Lonardi; E. Mini; S. Nobili; L. Romanato; G. Toffoli (2015). "Association of the HLA-G 3′UTR polymorphisms with colorectal cancer in Italy: a first insight". International Journal of Immunogenetics. 43 (1): 32–39. doi:10.1111/iji.12243. PMID   26752414. S2CID   205193023.
  9. Zhu, Qin; Chen, Ying; Dai, Jianrong; Wang, Benjing; Liu, Minjuan; Wang, Yun; Tao, Jianying; Li, Hong (2015-01-01). "Methylenetetrahydrofolate reductase polymorphisms at 3'-untranslated region are associated with susceptibility to preterm birth". Translational Pediatrics. 4 (1): 57–62. doi:10.3978/j.issn.2224-4336.2015.01.02. ISSN   2224-4344. PMC   4729064 . PMID   26835361.
  10. G. Nicolas; D. Wallon; C. Goupil; A.-C. Richard; C. Pottier; V. Dorval; M. Sarov-Riviere; F. Riant; D. Herve; P. Amouyel; M. Guerchet; B. Ndamba-Bandzouzi; P. Mbelesso; J.-F. Dartigues; J.-C. Lambert; P.-M. Preux; T. Frebourg; D. Campion; D. Hannequin; E. Tournier-Lasserve; S. S. Hebert; A. Rovelet-Lecrux (2016). "Mutation in the 3'untranslated region of APP as a genetic determinant of cerebral amyloid angiopathy". European Journal of Human Genetics. 24 (1): 92–98. doi:10.1038/ejhg.2015.61. PMC   4795229 . PMID   25828868.
  11. Chatterjee, Sangeeta; Pal, Jayanta K. (2009-05-01). "Role of 5′- and 3′-untranslated regions of mRNAs in human diseases". Biology of the Cell. 101 (5): 251–262. doi:10.1042/BC20080104. ISSN   1768-322X. PMID   19275763. S2CID   22689654.