Start codon

Last updated
Start codon (blue circle) of the human mitochondrial DNA MT-ATP6 gene. For each nucleotide triplet (square brackets), the corresponding amino acid is given (one-letter code), either in the +1 reading frame for MT-ATP8 (in red) or in the +3 frame for MT-ATP6 (in blue). In this genomic region, the two genes overlap. Homo sapiens-mtDNA~NC 012920-ATP8+ATP6 Overlap.svg
Start codon (blue circle) of the human mitochondrial DNA MT-ATP6 gene. For each nucleotide triplet (square brackets), the corresponding amino acid is given (one-letter code), either in the +1 reading frame for MT-ATP8 (in red) or in the +3 frame for MT-ATP6 (in blue). In this genomic region, the two genes overlap.

The start codon is the first codon of a messenger RNA (mRNA) transcript translated by a ribosome. The start codon always codes for methionine in eukaryotes and archaea and a N-formylmethionine (fMet) in bacteria, mitochondria and plastids.

Contents

The start codon is often preceded by a 5' untranslated region (5' UTR). In prokaryotes this includes the ribosome binding site.

Decoding

In all three domains of life, the start codon is decoded by a special "initiation" transfer RNA different from the tRNAs used for elongation. There are important structural differences between an initiating tRNA and an elongating one, with distinguish features serving to satisfy the constraints of the translation system. In bacteria and organelles, an acceptor stem C1:A72 mismatch guide formylation, which is essential direct recruitment by the 30S ribosome into the P site; so-called "3GC" base pairs allow assembly into the 70S ribosome. [1] In eukaryotes and archaea, the T stem prevents the elongation factors from binding, while eIF2 specifically recognizes the attached methionine and a A1:U72 basepair. [2]

In any case, the natural initiating tRNA only codes for methionine. [3] Knowledge of the key recognizing features has allowed researchers to construct alternative initiating tRNAs that code for different amino acids; see below.

Alternative start codons

Alternative start codons are different from the standard AUG codon and are found in both prokaryotes (bacteria and archaea) and eukaryotes. Alternate start codons are still translated as Met when they are at the start of a protein (even if the codon encodes a different amino acid otherwise). This is because a separate tRNA is used for initiation. [3]

Eukaryotes

Alternate start codons (non-AUG) are very rare in eukaryotic genomes: a wide range of mechanisms work to guarantee the relative fidelity of AUG initiation. [4] However, naturally occurring non-AUG start codons have been reported for some cellular mRNAs. [5] Seven out of the nine possible single-nucleotide substitutions at the AUG start codon of dihydrofolate reductase are functional as translation start sites in mammalian cells. [6]

Bacteria

Bacteria do not generally have the wide range of translation factors monitoring start codon fidelity. GUG and UUG are the main, even "canonical", alternate start codons. [4] GUG in particular is important to controlling the replication of plasmids. [4]

E. coli uses 83% AUG (3542/4284), 14% (612) GUG, 3% (103) UUG [7] and one or two others (e.g., an AUU and possibly a CUG). [8] [9]

Well-known coding regions that do not have AUG initiation codons are those of lacI (GUG) [10] [11] and lacA (UUG) [12] in the E. coli lac operon. Two more recent studies have independently shown that 17 or more non-AUG start codons may initiate translation in E. coli. [13] [14]

Mitochondria

Mitochondrial genomes use alternate start codons more significantly (AUA and AUG in humans). [15] Many such examples, with codons, systematic range, and citations, are given in the NCBI list of translation tables. [16]

Archaea

Archaea, which are prokaryotes with a translation machinery similar to but simpler than that of eukaryotes, allow initiation at UUG and GUG. [4]

Upstream start codons

These are "alternative" start codons in the sense that they are upstream of the regular start codons and thus could be used as alternative start codons. More than half of all human mRNAs have at least one AUG codon upstream (uAUG) of their annotated translation initiation starts (TIS) (58% in the current versions of the human RefSeq sequence). Their potential use as TISs could result in translation of so-called upstream Open Reading Frames (uORFs). uORF translation usually results in the synthesis of short polypeptides, some of which have been shown to be functional, e.g., in ASNSD1, MIEF1, MKKS, and SLC35A4. [17] However, it is believed that most translated uORFs only have a mild inhibitory effect on downstream translation because most uORF starts are leaky (i.e. don't initiate translation or because ribosomes terminating after translation of short ORFs are often capable of reinitiating). [17]

Standard genetic code

Amino-acid biochemical propertiesNonpolarPolarBasicAcidicTermination: stop codon
Standard genetic code (NCBI table 1) [18]
1st
base
2nd base3rd
base
UCAG
UUUU(Phe/F) Phenylalanine UCU(Ser/S) Serine UAU(Tyr/Y) Tyrosine UGU(Cys/C) Cysteine U
UUCUCCUACUGCC
UUA(Leu/L) Leucine UCAUAA Stop (Ochre) [B] UGA Stop (Opal) [B] A
UUG [A] UCGUAG Stop (Amber) [B] UGG(Trp/W) Tryptophan G
CCUUCCU(Pro/P) Proline CAU(His/H) Histidine CGU(Arg/R) Arginine U
CUCCCCCACCGCC
CUACCACAA(Gln/Q) Glutamine CGAA
CUGCCGCAGCGGG
AAUU(Ile/I) Isoleucine ACU(Thr/T) Threonine AAU(Asn/N) Asparagine AGU(Ser/S) Serine U
AUCACCAACAGCC
AUAACAAAA(Lys/K) Lysine AGA(Arg/R) Arginine A
AUG [A] (Met/M) Methionine ACGAAGAGGG
GGUU(Val/V) Valine GCU(Ala/A) Alanine GAU(Asp/D) Aspartic acid GGU(Gly/G) Glycine U
GUCGCCGACGGCC
GUAGCAGAA(Glu/E) Glutamic acid GGAA
GUG [A] GCGGAGGGGG
A Possible start codons in NCBI table 1. AUG is most common. [19] The two other start codons listed by table 1 (GUG and UUG) are rare in eukaryotes. [4] Prokaryotes have less strigent start codon requirements; they are described by NCBI table 11.
B ^ ^ ^ The historical basis for designating the stop codons as amber, ochre and opal is described in an autobiography by Sydney Brenner [20] and in a historical article by Bob Edgar. [21]

Non-methionine start codons

Natural

In addition to the canonical initiation-tRNA and AUG codon pathway, mammalian cells can initiate translation with leucine using a specific leucyl-tRNA that decodes the codon CUG. This mechanism is independent of eIF2. [22] [23] It is also previously known that translation started by an internal ribosome entry site, which bypasses a number of regular eukaryotic initiation systems, can have a non-methinone start with GCU or CAA codons. [24] [25]

Engineered start codons

Engineered initiator tRNA (tRNAfMet
CUA
, changed from a MetY tRNAfMet
CAU
) have been used to initiate translation at the amber stop codon UAG in E. coli. Initiation with this tRNA not only inserts the traditional formylmethionine, but also formylglutamine, as glutamyl-tRNA synthase also recognizes the new tRNA. [26] (Recall from above that the bacterial translation initiation system does not specifically check for methionine, only the formyl modification). [1] One study has shown that the amber initiator tRNA does not initiate translation to any measurable degree from genomically-encoded UAG codons, only plasmid-borne reporters with strong upstream Shine-Dalgarno sites. [27]

See also

Related Research Articles

<span class="mw-page-title-main">Translation (biology)</span> Cellular process of protein synthesis

In biology, translation is the process in living cells in which proteins are produced using RNA molecules as templates. The generated protein is a sequence of amino acids. This sequence is determined by the sequence of nucleotides in the RNA. The nucleotides are considered three at a time. Each such triple results in addition of one specific amino acid to the protein being generated. The matching from nucleotide triple to amino acid is called the genetic code. The translation is performed by a large complex of functional RNA and proteins called ribosomes. The entire process is called gene expression.

In molecular biology, open reading frames (ORFs) are defined as spans of DNA sequence between the start and stop codons. Usually, this is considered within a studied region of a prokaryotic DNA sequence, where only one of the six possible reading frames will be "open". Such an ORF may contain a start codon and by definition cannot extend beyond a stop codon. That start codon indicates where translation may start. The transcription termination site is located after the ORF, beyond the translation stop codon. If transcription were to cease before the stop codon, an incomplete protein would be made during translation.

The 5′ untranslated region is the region of a messenger RNA (mRNA) that is directly upstream from the initiation codon. This region is important for the regulation of translation of a transcript by differing mechanisms in viruses, prokaryotes and eukaryotes. While called untranslated, the 5′ UTR or a portion of it is sometimes translated into a protein product. This product can then regulate the translation of the main coding sequence of the mRNA. In many organisms, however, the 5′ UTR is completely untranslated, instead forming a complex secondary structure to regulate translation.

The Shine–Dalgarno (SD) sequence is a ribosomal binding site in bacterial and archaeal messenger RNA, generally located around 8 bases upstream of the start codon AUG. The RNA sequence helps recruit the ribosome to the messenger RNA (mRNA) to initiate protein synthesis by aligning the ribosome with the start codon. Once recruited, tRNA may add amino acids in sequence as dictated by the codons, moving downstream from the translational start site.

Bacterial translation is the process by which messenger RNA is translated into proteins in bacteria.

Eukaryotic translation is the biological process by which messenger RNA is translated into proteins in eukaryotes. It consists of four phases: initiation, elongation, termination, and recapping.

<i>N</i>-Formylmethionine Chemical compound

N-Formylmethionine is a derivative of the amino acid methionine in which a formyl group has been added to the amino group. It is specifically used for initiation of protein synthesis from bacterial and organellar genes, and may be removed post-translationally.

The Kozak consensus sequence is a nucleic acid motif that functions as the protein translation initiation site in most eukaryotic mRNA transcripts. Regarded as the optimum sequence for initiating translation in eukaryotes, the sequence is an integral aspect of protein regulation and overall cellular health as well as having implications in human disease. It ensures that a protein is correctly translated from the genetic message, mediating ribosome assembly and translation initiation. A wrong start site can result in non-functional proteins. As it has become more studied, expansions of the nucleotide sequence, bases of importance, and notable exceptions have arisen. The sequence was named after the scientist who discovered it, Marilyn Kozak. Kozak discovered the sequence through a detailed analysis of DNA genomic sequences.

Gene structure is the organisation of specialised sequence elements within a gene. Genes contain most of the information necessary for living cells to survive and reproduce. In most organisms, genes are made of DNA, where the particular DNA sequence determines the function of the gene. A gene is transcribed (copied) from DNA into RNA, which can either be non-coding (ncRNA) with a direct function, or an intermediate messenger (mRNA) that is then translated into protein. Each of these steps is controlled by specific sequence elements, or regions, within the gene. Every gene, therefore, requires multiple sequence elements to be functional. This includes the sequence that actually encodes the functional protein or ncRNA, as well as multiple regulatory sequence regions. These regions may be as short as a few base pairs, up to many thousands of base pairs long.

A ribosome binding site, or ribosomal binding site (RBS), is a sequence of nucleotides upstream of the start codon of an mRNA transcript that is responsible for the recruitment of a ribosome during the initiation of translation. Mostly, RBS refers to bacterial sequences, although internal ribosome entry sites (IRES) have been described in mRNAs of eukaryotic cells or viruses that infect eukaryotes. Ribosome recruitment in eukaryotes is generally mediated by the 5' cap present on eukaryotic mRNAs.

<span class="mw-page-title-main">Untranslated region</span> Non-coding regions on either end of mRNA

In molecular genetics, an untranslated region refers to either of two sections, one on each side of a coding sequence on a strand of mRNA. If it is found on the 5' side, it is called the 5' UTR, or if it is found on the 3' side, it is called the 3' UTR. mRNA is RNA that carries information from DNA to the ribosome, the site of protein synthesis (translation) within a cell. The mRNA is initially transcribed from the corresponding DNA sequence and then translated into protein. However, several regions of the mRNA are usually not translated into protein, including the 5' and 3' UTRs.

Translational regulation refers to the control of the levels of protein synthesized from its mRNA. This regulation is vastly important to the cellular response to stressors, growth cues, and differentiation. In comparison to transcriptional regulation, it results in much more immediate cellular adjustment through direct regulation of protein concentration. The corresponding mechanisms are primarily targeted on the control of ribosome recruitment on the initiation codon, but can also involve modulation of peptide elongation, termination of protein synthesis, or ribosome biogenesis. While these general concepts are widely conserved, some of the finer details in this sort of regulation have been proven to differ between prokaryotic and eukaryotic organisms.

<span class="mw-page-title-main">DNA and RNA codon tables</span> List of standard rules to translate DNA encoded information into proteins

A codon table can be used to translate a genetic code into a sequence of amino acids. The standard genetic code is traditionally represented as an RNA codon table, because when proteins are made in a cell by ribosomes, it is messenger RNA (mRNA) that directs protein synthesis. The mRNA sequence is determined by the sequence of genomic DNA. In this context, the standard genetic code is referred to as translation table 1. It can also be represented in a DNA codon table. The DNA codons in such tables occur on the sense DNA strand and are arranged in a 5′-to-3′ direction. Different tables with alternate codons are used depending on the source of the genetic code, such as from a cell nucleus, mitochondrion, plastid, or hydrogenosome.

The Consensus Coding Sequence (CCDS) Project is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies. The CCDS project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier, and ensures that they are consistently represented by the National Center for Biotechnology Information (NCBI), Ensembl, and UCSC Genome Browser. The integrity of the CCDS dataset is maintained through stringent quality assurance testing and on-going manual curation.

<span class="mw-page-title-main">SUI1</span>

In molecular biology, the single-domain protein SUI1 is a translation initiation factor often found in the fungus, Saccharomyces cerevisiae but it is also found in other eukaryotes and prokaryotes as well as archaea. It is otherwise known as Eukaryotic translation initiation factor 1 (eIF1) in eukaryotes or YciH in bacteria.

The vertebrate mitochondrial code is the genetic code found in the mitochondria of all vertebrata.

The bacterial, archaeal and plant plastid code is the DNA code used by bacteria, archaea, prokaryotic viruses and chloroplast proteins. It is essentially the same as the standard code, however there are some variations in alternative start codons.

The invertebrate mitochondrial code is a genetic code used by the mitochondrial genome of invertebrates. Mitochondria contain their own DNA and reproduce independently from their host cell. Variation in translation of the mitochondrial genetic code occurs when DNA codons result in non-standard amino acids has been identified in invertebrates, most notably arthropods. This variation has been helpful as a tool to improve upon the phylogenetic tree of invertebrates, like flatworms.

The pachysolen tannophilus nuclear code is a genetic code found in the ascomycete fungus Pachysolen tannophilus.

<span class="mw-page-title-main">Translation regulation by 5′ transcript leader cis-elements</span>

Translation regulation by 5′ transcript leader cis-elements is a process in cellular translation.

References

  1. 1 2 Shetty, S; Shah, RA; Chembazhi, UV; Sah, S; Varshney, U (28 February 2017). "Two highly conserved features of bacterial initiator tRNAs license them to pass through distinct checkpoints in translation initiation". Nucleic Acids Research. 45 (4): 2040–2050. doi:10.1093/nar/gkw854. PMC   5389676 . PMID   28204695.
  2. Kolitz, SE; Lorsch, JR (21 January 2010). "Eukaryotic initiator tRNA: finely tuned and ready for action". FEBS Letters. 584 (2): 396–404. doi:10.1016/j.febslet.2009.11.047. PMC   2795131 . PMID   19925799.
  3. 1 2 Lobanov, A. V.; Turanov, A. A.; Hatfield, D. L.; Gladyshev, V. N. (2010). "Dual functions of codons in the genetic code". Critical Reviews in Biochemistry and Molecular Biology. 45 (4): 257–65. doi:10.3109/10409231003786094. PMC   3311535 . PMID   20446809.
  4. 1 2 3 4 5 Asano, K (2014). "Why is start codon selection so precise in eukaryotes?". Translation (Austin, Tex.). 2 (1): e28387. doi:10.4161/trla.28387. PMC   4705826 . PMID   26779403.
  5. Ivanov IP, Firth AE, Michel AM, Atkins JF, Baranov PV (2011). "Identification of evolutionarily conserved non-AUG-initiated N-terminal extensions in human coding sequences". Nucleic Acids Research. 39 (10): 4220–4234. doi:10.1093/nar/gkr007. PMC   3105428 . PMID   21266472.
  6. Peabody, D. S. (1989). "Translation initiation at non-AUG triplets in mammalian cells". The Journal of Biological Chemistry. 264 (9): 5031–5. doi: 10.1016/S0021-9258(18)83694-8 . PMID   2538469.
  7. Blattner, F. R.; Plunkett g, G.; Bloch, C. A.; Perna, N. T.; Burland, V.; Riley, M.; Collado-Vides, J.; Glasner, J. D.; Rode, C. K.; Mayhew, G. F.; Gregor, J.; Davis, N. W.; Kirkpatrick, H. A.; Goeden, M. A.; Rose, D. J.; Mau, B.; Shao, Y. (1997). "The Complete Genome Sequence of Escherichia coli K-12". Science. 277 (5331): 1453–1462. doi: 10.1126/science.277.5331.1453 . PMID   9278503.
  8. Sacerdot, C.; Fayat, G.; Dessen, P.; Springer, M.; Plumbridge, J. A.; Grunberg-Manago, M.; Blanquet, S. (1982). "Sequence of a 1.26-kb DNA fragment containing the structural gene for E.coli initiation factor IF3: Presence of an AUU initiator codon". The EMBO Journal. 1 (3): 311–315. doi:10.1002/j.1460-2075.1982.tb01166.x. PMC   553041 . PMID   6325158.
  9. Missiakas, D.; Georgopoulos, C.; Raina, S. (1993). "The Escherichia coli heat shock gene htpY: Mutational analysis, cloning, sequencing, and transcriptional regulation". Journal of Bacteriology. 175 (9): 2613–2624. doi:10.1128/jb.175.9.2613-2624.1993. PMC   204563 . PMID   8478327.
  10. E.coli lactose operon with lacI, lacZ, lacY and lacA genes GenBank: J01636.1
  11. Farabaugh, P. J. (1978). "Sequence of the lacI gene". Nature. 274 (5673): 765–769. Bibcode:1978Natur.274..765F. doi:10.1038/274765a0. PMID   355891. S2CID   4208767.
  12. NCBI Sequence Viewer v2.0
  13. Hecht, Ariel; Glasgow, Jeff; Jaschke, Paul R.; Bawazer, Lukmaan A.; Munson, Matthew S.; Cochran, Jennifer R.; Endy, Drew; Salit, Marc (2017). "Measurements of translation initiation from all 64 codons in E. coli". Nucleic Acids Research. 45 (7): 3615–3626. doi:10.1093/nar/gkx070. PMC   5397182 . PMID   28334756.
  14. Firnberg, Elad; Labonte, Jason; Gray, Jeffrey; Ostermeir, Marc A. (2014). "A comprehensive, high-resolution map of a gene's fitness landscape". Molecular Biology and Evolution. 31 (6): 1581–1592. doi:10.1093/molbev/msu081. PMC   4032126 . PMID   24567513.
  15. Watanabe, Kimitsuna; Suzuki, Tsutomu (2001). "Genetic Code and its Variants". Encyclopedia of Life Sciences. doi:10.1038/npg.els.0000810. ISBN   978-0470015902.
  16. Elzanowski, Andrzej; Ostell, Jim. "The Genetic Codes". NCBI. Retrieved 29 March 2019.
  17. 1 2 Andreev, Dmitry E.; Loughran, Gary; Fedorova, Alla D.; Mikhaylova, Maria S.; Shatsky, Ivan N.; Baranov, Pavel V. (2022-05-09). "Non-AUG translation initiation in mammals". Genome Biology. 23 (1): 111. doi: 10.1186/s13059-022-02674-2 . ISSN   1474-760X. PMC   9082881 . PMID   35534899.
  18. Elzanowski A, Ostell J (7 January 2019). "The Genetic Codes". National Center for Biotechnology Information. Archived from the original on 5 October 2020. Retrieved 21 February 2019.
  19. Nakamoto T (March 2009). "Evolution and the universality of the mechanism of initiation of protein synthesis". Gene. 432 (1–2): 1–6. doi:10.1016/j.gene.2008.11.001. PMID   19056476.
  20. Brenner S. A Life in Science (2001) Published by Biomed Central Limited ISBN   0-9540278-0-9 see pages 101-104
  21. Edgar B (2004). "The genome of bacteriophage T4: an archeological dig". Genetics. 168 (2): 575–82. PMC   1448817 . PMID   15514035. see pages 580-581
  22. Starck, S. R.; Jiang, V; Pavon-Eternod, M; Prasad, S; McCarthy, B; Pan, T; Shastri, N (2012). "Leucine-tRNA initiates at CUG start codons for protein synthesis and presentation by MHC class I". Science. 336 (6089): 1719–23. Bibcode:2012Sci...336.1719S. doi:10.1126/science.1220270. PMID   22745432. S2CID   206540614.
  23. Dever, T. E. (2012). "Molecular biology. A new start for protein synthesis". Science. 336 (6089): 1645–6. doi:10.1126/science.1224439. PMID   22745408. S2CID   44326947.
  24. "Where to Start? Alternate Protein Translation Mechanism Creates Unanticipated Antigens". PLOS Biology. 2 (11): e397. 26 October 2004. doi: 10.1371/journal.pbio.0020397 . PMC   524256 .
  25. RajBhandary, Uttam L. (15 February 2000). "More surprises in translation: Initiation without the initiator tRNA". Proceedings of the National Academy of Sciences. 97 (4): 1325–1327. Bibcode:2000PNAS...97.1325R. doi: 10.1073/pnas.040579197 . PMC   34295 . PMID   10677458.
  26. Varshney, U.; RajBhandary, U. L. (1990-02-01). "Initiation of protein synthesis from a termination codon". Proceedings of the National Academy of Sciences. 87 (4): 1586–1590. Bibcode:1990PNAS...87.1586V. doi: 10.1073/pnas.87.4.1586 . ISSN   0027-8424. PMC   53520 . PMID   2406724.
  27. Vincent, Russel M.; Wright, Bradley W.; Jaschke, Paul R. (2019-03-15). "Measuring Amber Initiator tRNA Orthogonality in a Genomically Recoded Organism" (PDF). ACS Synthetic Biology. 8 (4): 675–685. doi:10.1021/acssynbio.9b00021. ISSN   2161-5063. PMID   30856316. S2CID   75136654.