Upstream open reading frame

Last updated

An upstream open reading frame (uORF) is an open reading frame (ORF) within the 5' untranslated region (5'UTR) of an mRNA. uORFs can regulate eukaryotic gene expression. [1] [2] Translation of the uORF typically inhibits downstream expression of the primary ORF. However, in some genes such as yeast GCN4, translation of specific uORFs may increase translation of the main ORF. [3] In bacteria, uORFs are called leader peptides and were originally discovered on the basis of their impact on the regulation of genes involved in the synthesis or transport of amino acids.

Approximately 50% of human genes contain uORFs in their 5'UTR, and when present, these cause reductions in protein expression. [4] Human peptides derived from translated uORFs can be detected from cellular material with a mass spectrometer. [5]

See also

Related Research Articles

<span class="mw-page-title-main">Messenger RNA</span> RNA that is read by the ribosome to produce a protein

In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of synthesizing a protein.

<span class="mw-page-title-main">Translation (biology)</span> Cellular process of protein synthesis

In biology, translation is the process in living cells in which proteins are produced using RNA molecules as templates. The generated protein is a sequence of amino acids. This sequence is determined by the sequence of nucleotides in the RNA. The nucleotides are considered three at a time. Each such triple results in addition of one specific amino acid to the protein being generated. The matching from nucleotide triple to amino acid is called the genetic code. The translation is performed by a large complex of functional RNA and proteins called ribosomes. The entire process is called gene expression.

<span class="mw-page-title-main">Three prime untranslated region</span> Sequence at the 3 end of messenger RNA that does not code for product

In molecular genetics, the three prime untranslated region (3′-UTR) is the section of messenger RNA (mRNA) that immediately follows the translation termination codon. The 3′-UTR often contains regulatory regions that post-transcriptionally influence gene expression.

A signal peptide is a short peptide present at the N-terminus of most newly synthesized proteins that are destined toward the secretory pathway. These proteins include those that reside either inside certain organelles, secreted from the cell, or inserted into most cellular membranes. Although most type I membrane-bound proteins have signal peptides, most type II and multi-spanning membrane-bound proteins are targeted to the secretory pathway by their first transmembrane domain, which biochemically resembles a signal sequence except that it is not cleaved. They are a kind of target peptide.

In molecular biology, open reading frames (ORFs) are defined as spans of DNA sequence between the start and stop codons. Usually, this is considered within a studied region of a prokaryotic DNA sequence, where only one of the six possible reading frames will be "open". Such an ORF may contain a start codon and by definition cannot extend beyond a stop codon. That start codon indicates where translation may start. The transcription termination site is located after the ORF, beyond the translation stop codon. If transcription were to cease before the stop codon, an incomplete protein would be made during translation.

The 5′ untranslated region is the region of a messenger RNA (mRNA) that is directly upstream from the initiation codon. This region is important for the regulation of translation of a transcript by differing mechanisms in viruses, prokaryotes and eukaryotes. While called untranslated, the 5′ UTR or a portion of it is sometimes translated into a protein product. This product can then regulate the translation of the main coding sequence of the mRNA. In many organisms, however, the 5′ UTR is completely untranslated, instead forming a complex secondary structure to regulate translation.

An internal ribosome entry site, abbreviated IRES, is an RNA element that allows for translation initiation in a cap-independent manner, as part of the greater process of protein synthesis. Initiation of eukaryotic translation nearly always occurs at and is dependent on the 5' cap of mRNA molecules, where the translation initiation complex forms and ribosomes engage the mRNA. IRES elements, however allow ribosomes to engage the mRNA and begin translation independently of the 5' cap.

Ribosome shunting is a mechanism of translation initiation in which ribosomes bypass, or "shunt over", parts of the 5' untranslated region to reach the start codon. However, a benefit of ribosomal shunting is that it can translate backwards allowing more information to be stored than usual in an mRNA molecule. Some viral RNAs have been shown to use ribosome shunting as a more efficient form of translation during certain stages of viral life cycle or when translation initiation factors are scarce. Some viruses known to use this mechanism include adenovirus, Sendai virus, human papillomavirus, duck hepatitis B pararetrovirus, rice tungro bacilliform viruses, and cauliflower mosaic virus. In these viruses the ribosome is directly translocated from the upstream initiation complex to the start codon (AUG) without the need to unwind RNA secondary structures.

<span class="mw-page-title-main">Start codon</span> First codon of a messenger RNA translated by a ribosome

The start codon is the first codon of a messenger RNA (mRNA) transcript translated by a ribosome. The start codon always codes for methionine in eukaryotes and archaea and a N-formylmethionine (fMet) in bacteria, mitochondria and plastids.

Eukaryotic translation is the biological process by which messenger RNA is translated into proteins in eukaryotes. It consists of four phases: initiation, elongation, termination, and recapping.

In genetics, attenuation is a regulatory mechanism for some bacterial operons that results in premature termination of transcription. The canonical example of attenuation used in many introductory genetics textbooks, is ribosome-mediated attenuation of the trp operon. Ribosome-mediated attenuation of the trp operon relies on the fact that, in bacteria, transcription and translation proceed simultaneously. Attenuation involves a provisional stop signal (attenuator), located in the DNA segment that corresponds to the leader sequence of mRNA. During attenuation, the ribosome becomes stalled (delayed) in the attenuator region in the mRNA leader. Depending on the metabolic conditions, the attenuator either stops transcription at that point or allows read-through to the structural gene part of the mRNA and synthesis of the appropriate protein.

The Kozak consensus sequence is a nucleic acid motif that functions as the protein translation initiation site in most eukaryotic mRNA transcripts. Regarded as the optimum sequence for initiating translation in eukaryotes, the sequence is an integral aspect of protein regulation and overall cellular health as well as having implications in human disease. It ensures that a protein is correctly translated from the genetic message, mediating ribosome assembly and translation initiation. A wrong start site can result in non-functional proteins. As it has become more studied, expansions of the nucleotide sequence, bases of importance, and notable exceptions have arisen. The sequence was named after the scientist who discovered it, Marilyn Kozak. Kozak discovered the sequence through a detailed analysis of DNA genomic sequences.

Gene structure is the organisation of specialised sequence elements within a gene. Genes contain most of the information necessary for living cells to survive and reproduce. In most organisms, genes are made of DNA, where the particular DNA sequence determines the function of the gene. A gene is transcribed (copied) from DNA into RNA, which can either be non-coding (ncRNA) with a direct function, or an intermediate messenger (mRNA) that is then translated into protein. Each of these steps is controlled by specific sequence elements, or regions, within the gene. Every gene, therefore, requires multiple sequence elements to be functional. This includes the sequence that actually encodes the functional protein or ncRNA, as well as multiple regulatory sequence regions. These regions may be as short as a few base pairs, up to many thousands of base pairs long.

<span class="mw-page-title-main">Untranslated region</span> Non-coding regions on either end of mRNA

In molecular genetics, an untranslated region refers to either of two sections, one on each side of a coding sequence on a strand of mRNA. If it is found on the 5' side, it is called the 5' UTR, or if it is found on the 3' side, it is called the 3' UTR. mRNA is RNA that carries information from DNA to the ribosome, the site of protein synthesis (translation) within a cell. The mRNA is initially transcribed from the corresponding DNA sequence and then translated into protein. However, several regions of the mRNA are usually not translated into protein, including the 5' and 3' UTRs.

Translational regulation refers to the control of the levels of protein synthesized from its mRNA. This regulation is vastly important to the cellular response to stressors, growth cues, and differentiation. In comparison to transcriptional regulation, it results in much more immediate cellular adjustment through direct regulation of protein concentration. The corresponding mechanisms are primarily targeted on the control of ribosome recruitment on the initiation codon, but can also involve modulation of peptide elongation, termination of protein synthesis, or ribosome biogenesis. While these general concepts are widely conserved, some of the finer details in this sort of regulation have been proven to differ between prokaryotic and eukaryotic organisms.

<span class="mw-page-title-main">Gcn2</span>

GCN2 is a serine/threonine-protein kinase that senses amino acid deficiency through binding to uncharged transfer RNA (tRNA). It plays a key role in modulating amino acid metabolism as a response to nutrient deprivation.

<span class="mw-page-title-main">Downstream-peptide motif</span>

The Downstream-peptide motif refers to a conserved RNA structure identified by bioinformatics in the cyanobacterial genera Synechococcus and Prochlorococcus and one phage that infects such bacteria. It was also detected in marine samples of DNA from uncultivated bacteria, which are presumably other species of cyanobacteria.

The Consensus Coding Sequence (CCDS) Project is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies. The CCDS project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier, and ensures that they are consistently represented by the National Center for Biotechnology Information (NCBI), Ensembl, and UCSC Genome Browser. The integrity of the CCDS dataset is maintained through stringent quality assurance testing and on-going manual curation.

<span class="mw-page-title-main">Micropeptide</span> Short length polypeptides

Micropeptides are polypeptides with a length of less than 100-150 amino acids that are encoded by short open reading frames (sORFs). In this respect, they differ from many other active small polypeptides, which are produced through the posttranslational cleavage of larger polypeptides. In terms of size, micropeptides are considerably shorter than "canonical" proteins, which have an average length of 330 and 449 amino acids in prokaryotes and eukaryotes, respectively. Micropeptides are sometimes named according to their genomic location. For example, the translated product of an upstream open reading frame (uORF) might be called a uORF-encoded peptide (uPEP). Micropeptides lack an N-terminal signaling sequences, suggesting that they are likely to be localized to the cytoplasm. However, some micropeptides have been found in other cell compartments, as indicated by the existence of transmembrane micropeptides. They are found in both prokaryotes and eukaryotes. The sORFs from which micropeptides are translated can be encoded in 5' UTRs, small genes, or polycistronic mRNAs. Some micropeptide-coding genes were originally mis-annotated as long non-coding RNAs (lncRNAs).

<span class="mw-page-title-main">Translation regulation by 5′ transcript leader cis-elements</span>

Translation regulation by 5′ transcript leader cis-elements is a process in cellular translation.

References

  1. Vilela C, McCarthy JE (August 2003). "Regulation of fungal gene expression via short open reading frames in the mRNA 5'untranslated region". Molecular Microbiology. 49 (4): 859–67. doi: 10.1046/j.1365-2958.2003.03622.x . PMID   12890013.
  2. Lovett PS, Rogers EJ (June 1996). "Ribosome regulation by the nascent peptide". Microbiological Reviews. 60 (2): 366–85. doi: 10.1128/MMBR.60.2.366-385.1996 . PMC   239448 . PMID   8801438.
  3. Hinnebusch, Alan G. (1997-08-29). "Translational Regulation of Yeast GCN4: A window on factors that control initiator-tRNA binding to the ribosome *". Journal of Biological Chemistry. 272 (35): 21661–21664. doi: 10.1074/jbc.272.35.21661 . ISSN   0021-9258. PMID   9268289.
  4. Calvo SE, Pagliarini DJ, Mootha VK (May 2009). "Upstream open reading frames cause widespread reduction of protein expression and are polymorphic among humans" (PDF). Proceedings of the National Academy of Sciences of the United States of America. 106 (18): 7507–12. Bibcode: 2009PNAS..106.7507C . doi: 10.1073/pnas.0810916106 . PMC   2669787 . PMID   19372376. Archived from the original on Dec 3, 2023 via MIT Libraries.
  5. Slavoff SA, Mitchell AJ, Schwaid AG, Cabili MN, Ma J, Levin JZ, Karger AD, Budnik BA, Rinn JL, Saghatelian A (January 2013). "Peptidomic discovery of short open reading frame-encoded peptides in human cells". Nature Chemical Biology. 9 (1): 59–64. doi:10.1038/nchembio.1120. PMC   3625679 . PMID   23160002.