Codon usage bias

Last updated
Codon usage bias in Physcomitrella patens Codon usage bias in P. patens.png
Codon usage bias in Physcomitrella patens

Codon usage bias refers to differences in the frequency of occurrence of synonymous codons in coding DNA. A codon is a series of three nucleotides (a triplet) that encodes a specific amino acid residue in a polypeptide chain or for the termination of translation (stop codons).

Contents

There are 64 different codons (61 codons encoding for amino acids and 3 stop codons) but only 20 different translated amino acids. The overabundance in the number of codons allows many amino acids to be encoded by more than one codon. Because of such redundancy it is said that the genetic code is degenerate. The genetic codes of different organisms are often biased towards using one of the several codons that encode the same amino acid over the others—that is, a greater frequency of one will be found than expected by chance. How such biases arise is a much debated area of molecular evolution. Codon usage tables detailing genomic codon usage bias for organisms in GenBank and RefSeq can be found in the HIVE-Codon Usage Tables (HIVE-CUTs) project, [1] which contains two distinct databases, CoCoPUTs and TissueCoCoPUTs. Together, these two databases provide comprehensive, up-to-date codon, codon pair and dinucleotide usage statistics for all organisms with available sequence information and 52 human tissues, respectively. [2] [3]

It is generally acknowledged that codon biases reflect the contributions of 3 main factors, GC-biased gene conversion that favors GC-ending codons in diploid organisms, arrival biases reflecting mutational preferences (typically favoring AT-ending codons), and natural selection for codons that are favorable in regard to translation. [4] [5] [6] Optimal codons in fast-growing microorganisms, like Escherichia coli or Saccharomyces cerevisiae (baker's yeast), reflect the composition of their respective genomic transfer RNA (tRNA) pool. [7] It is thought that optimal codons help to achieve faster translation rates and high accuracy. As a result of these factors, translational selection is expected to be stronger in highly expressed genes, as is indeed the case for the above-mentioned organisms. [8] [9] In other organisms that do not show high growing rates or that present small genomes, codon usage optimization is normally absent, and codon preferences are determined by the characteristic mutational biases seen in that particular genome. Examples of this are Homo sapiens (human) and Helicobacter pylori. [10] [11] Organisms that show an intermediate level of codon usage optimization include Drosophila melanogaster (fruit fly), Caenorhabditis elegans (nematode worm), Strongylocentrotus purpuratus (sea urchin), and Arabidopsis thaliana (thale cress). [12] Several viral families (herpesvirus, lentivirus, papillomavirus, polyomavirus, adenovirus, and parvovirus) are known to encode structural proteins that display heavily skewed codon usage compared to the host cell. The suggestion has been made that these codon biases play a role in the temporal regulation of their late proteins. [13]

The nature of the codon usage-tRNA optimization has been fiercely debated. It is not clear whether codon usage drives tRNA evolution or vice versa. At least one mathematical model has been developed where both codon usage and tRNA expression co-evolve in feedback fashion (i.e., codons already present in high frequencies drive up the expression of their corresponding tRNAs, and tRNAs normally expressed at high levels drive up the frequency of their corresponding codons). However, this model does not seem to yet have experimental confirmation. Another problem is that the evolution of tRNA genes has been a very inactive area of research.[ citation needed ]

Contributing factors

Different factors have been proposed to be related to codon usage bias, including gene expression level (reflecting selection for optimizing the translation process by tRNA abundance), guanine-cytosine content (GC content, reflecting horizontal gene transfer or mutational bias), guanine-cytosine skew (GC skew, reflecting strand-specific mutational bias), amino acid conservation, protein hydropathy, transcriptional selection, RNA stability, optimal growth temperature, hypersaline adaptation, and dietary nitrogen. [14] [15] [16] [17] [18] [19]

Evolutionary theories

Mutational bias versus selection

Although the mechanism of codon bias selection remains controversial, possible explanations for this bias fall into two general categories. One explanation revolves around the selectionist theory, in which codon bias contributes to the efficiency and/or accuracy of protein expression and therefore undergoes positive selection. The selectionist model also explains why more frequent codons are recognized by more abundant tRNA molecules, as well as the correlation between preferred codons, tRNA levels, and gene copy numbers. Although it has been shown that the rate of amino acid incorporation at more frequent codons occurs at a much higher rate than that of rare codons, the speed of translation has not been shown to be directly affected and therefore the bias towards more frequent codons may not be directly advantageous. However, the increase in translation elongation speed may still be indirectly advantageous by increasing the cellular concentration of free ribosomes and potentially the rate of initiation for messenger RNAs (mRNAs). [20]

The second explanation for codon usage can be explained by mutational bias, a theory which posits that codon bias exists because of nonrandomness in the mutational patterns. In other words, some codons can undergo more changes and therefore result in lower equilibrium frequencies, also known as “rare” codons. Different organisms also exhibit different mutational biases, and there is growing evidence that the level of genome-wide GC content is the most significant parameter in explaining codon bias differences between organisms. Additional studies have demonstrated that codon biases can be statistically predicted in prokaryotes using only intergenic sequences, arguing against the idea of selective forces on coding regions and further supporting the mutation bias model. However, this model alone cannot fully explain why preferred codons are recognized by more abundant tRNAs. [20]

Mutation-selection-drift balance model

To reconcile the evidence from both mutational pressures and selection, the prevailing hypothesis for codon bias can be explained by the mutation-selection-drift balance model. This hypothesis states that selection favors major codons over minor codons, but minor codons are able to persist due to mutation pressure and genetic drift. It also suggests that selection is generally weak, but that selection intensity scales to higher expression and more functional constraints of coding sequences. [20]

Consequences of codon composition

Effect on RNA secondary structure

Because secondary structure of the 5’ end of mRNA influences translational efficiency, synonymous changes at this region on the mRNA can result in profound effects on gene expression. Codon usage in noncoding DNA regions can therefore play a major role in RNA secondary structure and downstream protein expression, which can undergo further selective pressures. In particular, strong secondary structure at the ribosome-binding site or initiation codon can inhibit translation, and mRNA folding at the 5’ end generates a large amount of variation in protein levels. [21]

Effect on transcription or gene expression

Heterologous gene expression is used in many biotechnological applications, including protein production and metabolic engineering. Because tRNA pools vary between different organisms, the rate of transcription and translation of a particular coding sequence can be less efficient when placed in a non-native context. For an overexpressed transgene, the corresponding mRNA makes a large percent of total cellular RNA, and the presence of rare codons along the transcript can lead to inefficient use and depletion of ribosomes and ultimately reduce levels of heterologous protein production. In addition, the composition of the gene (e.g. the total number of rare codons and the presence of consecutive rare codons) may also affect translation accuracy. [22] [23] However, using codons that are optimized for tRNA pools in a particular host to overexpress a heterologous gene may also cause amino acid starvation and alter the equilibrium of tRNA pools. This method of adjusting codons to match host tRNA abundances, called codon optimization, has traditionally been used for expression of a heterologous gene. However, new strategies for optimization of heterologous expression consider global nucleotide content such as local mRNA folding, codon pair bias, a codon ramp, codon harmonization or codon correlations. [24] [25] With the number of nucleotide changes introduced, artificial gene synthesis is often necessary for the creation of such an optimized gene.

Specialized codon bias is further seen in some endogenous genes such as those involved in amino acid starvation. For example, amino acid biosynthetic enzymes preferentially use codons that are poorly adapted to normal tRNA abundances, but have codons that are adapted to tRNA pools under starvation conditions. Thus, codon usage can introduce an additional level of transcriptional regulation for appropriate gene expression under specific cellular conditions. [25]

Effect on speed of translation elongation

Generally speaking for highly expressed genes, translation elongation rates are faster along transcripts with higher codon adaptation to tRNA pools, and slower along transcripts with rare codons. This correlation between codon translation rates and cognate tRNA concentrations provides additional modulation of translation elongation rates, which can provide several advantages to the organism. Specifically, codon usage can allow for global regulation of these rates, and rare codons may contribute to the accuracy of translation at the expense of speed. [26]

Effect on protein folding

Protein folding in vivo is vectorial, such that the N-terminus of a protein exits the translating ribosome and becomes solvent-exposed before its more C-terminal regions. As a result, co-translational protein folding introduces several spatial and temporal constraints on the nascent polypeptide chain in its folding trajectory. Because mRNA translation rates are coupled to protein folding, and codon adaptation is linked to translation elongation, it has been hypothesized that manipulation at the sequence level may be an effective strategy to regulate or improve protein folding. Several studies have shown that pausing of translation as a result of local mRNA structure occurs for certain proteins, which may be necessary for proper folding. Furthermore, synonymous mutations have been shown to have significant consequences in the folding process of the nascent protein and can even change substrate specificity of enzymes. These studies suggest that codon usage influences the speed at which polypeptides emerge vectorially from the ribosome, which may further impact protein folding pathways throughout the available structural space. [26]

Methods of analysis

In the field of bioinformatics and computational biology, many statistical methods have been proposed and used to analyze codon usage bias. [27] Methods such as the 'frequency of optimal codons' (Fop), [28] the relative codon adaptation (RCA) [29] or the codon adaptation index (CAI) [30] are used to predict gene expression levels, while methods such as the 'effective number of codons' (Nc) and Shannon entropy from information theory are used to measure codon usage evenness. [31] Multivariate statistical methods, such as correspondence analysis and principal component analysis, are widely used to analyze variations in codon usage among genes. [32] There are many computer programs to implement the statistical analyses enumerated above, including CodonW, GCUA, INCA, etc. Codon optimization has applications in designing synthetic genes and DNA vaccines. Several software packages are available online for this purpose (refer to external links).[ citation needed ]

Related Research Articles

<span class="mw-page-title-main">Genetic code</span> Rules by which information encoded within genetic material is translated into proteins

The genetic code is the set of rules used by living cells to translate information encoded within genetic material into proteins. Translation is accomplished by the ribosome, which links proteinogenic amino acids in an order specified by messenger RNA (mRNA), using transfer RNA (tRNA) molecules to carry amino acids and to read the mRNA three nucleotides at a time. The genetic code is highly similar among all organisms and can be expressed in a simple table with 64 entries.

<span class="mw-page-title-main">Protein biosynthesis</span> Assembly of proteins inside biological cells

Protein biosynthesis is a core biological process, occurring inside cells, balancing the loss of cellular proteins through the production of new proteins. Proteins perform a number of critical functions as enzymes, structural proteins or hormones. Protein synthesis is a very similar process for both prokaryotes and eukaryotes but there are some distinct differences.

<span class="mw-page-title-main">Stop codon</span> Codon that marks the end of a protein-coding sequence

In molecular biology, a stop codon is a codon that signals the termination of the translation process of the current protein. Most codons in messenger RNA correspond to the addition of an amino acid to a growing polypeptide chain, which may ultimately become a protein; stop codons signal the termination of this process by binding release factors, which cause the ribosomal subunits to disassociate, releasing the amino acid chain.

Molecular evolution is the process of change in the sequence composition of cellular molecules such as DNA, RNA, and proteins across generations. The field of molecular evolution uses principles of evolutionary biology and population genetics to explain patterns in these changes. Major topics in molecular evolution concern the rates and impacts of single nucleotide changes, neutral evolution vs. natural selection, origins of new genes, the genetic nature of complex traits, the genetic basis of speciation, the evolution of development, and ways that evolutionary forces influence genomic and phenotypic changes.

The coding region of a gene, also known as the coding sequence(CDS), is the portion of a gene's DNA or RNA that codes for a protein. Studying the length, composition, regulation, splicing, structures, and functions of coding regions compared to non-coding regions over different species and time periods can provide a significant amount of important information regarding gene organization and evolution of prokaryotes and eukaryotes. This can further assist in mapping the human genome and developing gene therapy.

<span class="mw-page-title-main">Translation (biology)</span> Cellular process of protein synthesis

In biology, translation is the process in living cells in which proteins are produced using RNA molecules as templates. The generated protein is a sequence of amino acids. This sequence is determined by the sequence of nucleotides in the RNA. The nucleotides are considered three at a time. Each such triple results in addition of one specific amino acid to the protein being generated. The matching from nucleotide triple to amino acid is called the genetic code. The translation is performed by a large complex of functional RNA and proteins called ribosomes. The entire process is called gene expression.

<span class="mw-page-title-main">Reading frame</span> Division of RNA/DNA sequences into sets of triplets which correspond to amino acids

In molecular biology, a reading frame is a way of dividing the sequence of nucleotides in a nucleic acid molecule into a set of consecutive, non-overlapping triplets. Where these triplets equate to amino acids or stop signals during translation, they are called codons.

<span class="mw-page-title-main">Transfer RNA</span> RNA that facilitates the addition of amino acids to a new protein

Transfer RNA is an adaptor molecule composed of RNA, typically 76 to 90 nucleotides in length, that serves as the physical link between the mRNA and the amino acid sequence of proteins. Transfer RNA (tRNA) does this by carrying an amino acid to the protein-synthesizing machinery of a cell called the ribosome. Complementation of a 3-nucleotide codon in a messenger RNA (mRNA) by a 3-nucleotide anticodon of the tRNA results in protein synthesis based on the mRNA code. As such, tRNAs are a necessary component of translation, the biological synthesis of new proteins in accordance with the genetic code.

<span class="mw-page-title-main">Silent mutation</span> DNA mutation with no observable effect on an organisms phenotype

Silent mutations are mutations in DNA that do not have an observable effect on the organism's phenotype. They are a specific type of neutral mutation. The phrase silent mutation is often used interchangeably with the phrase synonymous mutation; however, synonymous mutations are not always silent, nor vice versa. Synonymous mutations can affect transcription, splicing, mRNA transport, and translation, any of which could alter phenotype, rendering the synonymous mutation non-silent. The substrate specificity of the tRNA to the rare codon can affect the timing of translation, and in turn the co-translational folding of the protein. This is reflected in the codon usage bias that is observed in many species. Mutations that cause the altered codon to produce an amino acid with similar functionality are often classified as silent; if the properties of the amino acid are conserved, this mutation does not usually significantly affect protein function.

<span class="mw-page-title-main">Synonymous substitution</span>

A synonymous substitution is the evolutionary substitution of one base for another in an exon of a gene coding for a protein, such that the produced amino acid sequence is not modified. This is possible because the genetic code is "degenerate", meaning that some amino acids are coded for by more than one three-base-pair codon; since some of the codons for a given amino acid differ by just one base pair from others coding for the same amino acid, a mutation that replaces the "normal" base by one of the alternatives will result in incorporation of the same amino acid into the growing polypeptide chain when the gene is translated. Synonymous substitutions and mutations affecting noncoding DNA are often considered silent mutations; however, it is not always the case that the mutation is silent.

Eukaryotic translation is the biological process by which messenger RNA is translated into proteins in eukaryotes. It consists of four phases: initiation, elongation, termination, and recapping.

The Kozak consensus sequence is a nucleic acid motif that functions as the protein translation initiation site in most eukaryotic mRNA transcripts. Regarded as the optimum sequence for initiating translation in eukaryotes, the sequence is an integral aspect of protein regulation and overall cellular health as well as having implications in human disease. It ensures that a protein is correctly translated from the genetic message, mediating ribosome assembly and translation initiation. A wrong start site can result in non-functional proteins. As it has become more studied, expansions of the nucleotide sequence, bases of importance, and notable exceptions have arisen. The sequence was named after the scientist who discovered it, Marilyn Kozak. Kozak discovered the sequence through a detailed analysis of DNA genomic sequences.

Neutral mutations are changes in DNA sequence that are neither beneficial nor detrimental to the ability of an organism to survive and reproduce. In population genetics, mutations in which natural selection does not affect the spread of the mutation in a species are termed neutral mutations. Neutral mutations that are inheritable and not linked to any genes under selection will be lost or will replace all other alleles of the gene. That loss or fixation of the gene proceeds based on random sampling known as genetic drift. A neutral mutation that is in linkage disequilibrium with other alleles that are under selection may proceed to loss or fixation via genetic hitchhiking and/or background selection.

<i>k</i>-mer Substrings of length k contained in a biological sequence

In bioinformatics, k-mers are substrings of length contained within a biological sequence. Primarily used within the context of computational genomics and sequence analysis, in which k-mers are composed of nucleotides, k-mers are capitalized upon to assemble DNA sequences, improve heterologous gene expression, identify species in metagenomic samples, and create attenuated vaccines. Usually, the term k-mer refers to all of a sequence's subsequences of length , such that the sequence AGAT would have four monomers, three 2-mers, two 3-mers and one 4-mer (AGAT). More generally, a sequence of length will have k-mers and total possible k-mers, where is number of possible monomers.

Ribosomal frameshifting, also known as translational frameshifting or translational recoding, is a biological phenomenon that occurs during translation that results in the production of multiple, unique proteins from a single mRNA. The process can be programmed by the nucleotide sequence of the mRNA and is sometimes affected by the secondary, 3-dimensional mRNA structure. It has been described mainly in viruses, retrotransposons and bacterial insertion elements, and also in some cellular genes.

<span class="mw-page-title-main">Expanded genetic code</span> Modified genetic code

An expanded genetic code is an artificially modified genetic code in which one or more specific codons have been re-allocated to encode an amino acid that is not among the 22 common naturally-encoded proteinogenic amino acids.

The Codon Adaptation Index (CAI) is the most widespread technique for analyzing codon usage bias. As opposed to other measures of codon usage bias, such as the 'effective number of codons' (Nc), which measure deviation from a uniform bias, CAI measures the deviation of a given protein coding gene sequence with respect to a reference set of genes. CAI is used as a quantitative method of predicting the level of expression of a gene based on its codon sequence.

<span class="mw-page-title-main">Ambush hypothesis</span>

The ambush hypothesis is a hypothesis in the field of molecular genetics that suggests that the prevalence of “hidden” or off-frame stop codons in DNA selectively deters off-frame translation of mRNA to save energy, molecular resources, and to reduce strain on biosynthetic machinery by truncating the production of non-functional, potentially cytotoxic protein products. Typical coding sequences of DNA lack in-frame internal stop codons to avoid the premature reduction of protein products when translation proceeds normally. The ambush hypothesis suggests that kinetic, cis-acting mechanisms are responsible for the productive frameshifting of translational units so that the degeneracy of the genetic code can be used to prevent deleterious translation. Ribosomal slippage is the most well described mechanism of translational frameshifting where the ribosome moves one codon position either forward (+1) or backward (-1) to translate the mRNA sequence in a different reading frame and thus produce different protein products.

<span class="mw-page-title-main">Translation regulation by 5′ transcript leader cis-elements</span>

Translation regulation by 5′ transcript leader cis-elements is a process in cellular translation.

This glossary of genetics is a list of definitions of terms and concepts commonly used in the study of genetics and related disciplines in biology, including molecular biology, cell biology, and evolutionary biology. It is intended as introductory material for novices; for more specific and technical detail, see the article corresponding to each term. For related terms, see Glossary of evolutionary biology.

References

  1. Athey, John; Alexaki, Aikaterini; Osipova, Ekaterina; Rostovtsev, Alexandre; Santana-Quintero, Luis V.; Katneni, Upendra; Simonyan, Vahan; Kimchi-Sarfaty, Chava (2017-09-02). "A new and updated resource for codon usage tables". BMC Bioinformatics. 18 (391): 391. doi: 10.1186/s12859-017-1793-7 . PMC   5581930 . PMID   28865429.
  2. Alexaki, Aikaterini; Kames, Jacob; Holcomb, David D.; Athey, John; Santana-Quintero, Luis V.; Lam, Phuc Vihn Nguyen; Hamasaki-Katagiri, Nobuko; Osipova, Ekaterina; Simonyan, Vahan; Bar, Haim; Komar, Anton A.; Kimchi-Sarfaty, Chava (June 2019). "Codon and Codon-Pair Usage Tables (CoCoPUTs): Facilitating Genetic Variation Analyses and Recombinant Gene Design". Journal of Molecular Biology. 431 (13): 2434–2441. doi: 10.1016/j.jmb.2019.04.021 . PMID   31029701. S2CID   139104807.
  3. Kames, Jacob; Alexaki, Aikaterini; Holcomb, David D.; Santana-Quintero, Luis V.; Athey, John C.; Hamasaki-Katagiri, Nobuko; Katneni, Upendra; Golikov, Anton; Ibla, Juan C.; Bar, Haim; Kimchi-Sarfaty, Chava (January 2020). "TissueCoCoPUTs: Novel Human Tissue-Specific Codon and Codon-Pair Usage Tables Based on Differential Tissue Gene Expression". Journal of Molecular Biology. 432 (11): 3369–3378. doi: 10.1016/j.jmb.2020.01.011 . PMID   31982380.
  4. P. Shah and M. A. Gilchrist (2011). "Explaining complex codon usage patterns with selection for translational efficiency, mutation bias, and genetic drift". Proceedings of the National Academy of Sciences of the United States of America. 108 (25): 10231–6. doi: 10.1073/pnas.1016719108 . PMC   3121864 . PMID   21646514.
  5. L. Duret and N. Galtier (2009). "Biased gene conversion and the evolution of mammalian genomic landscapes". Annu Rev Genomics Hum Genet. 10: 285–311. doi:10.1146/annurev-genom-082908-150001.
  6. N. Galtier, C. Roux, M. Rousselle, J. Romiguier, E. Figuet, S. Glemin, N. Bierne and L. Duret (2018). "Codon Usage Bias in Animals: Disentangling the Effects of Natural Selection, Effective Population Size, and GC-Biased Gene Conversion". Mol Biol Evol. 35 (5): 1092–1103. doi: 10.1093/molbev/msy015 . hdl: 20.500.12210/34500 .{{cite journal}}: CS1 maint: multiple names: authors list (link)
  7. Dong, Hengjiang; Nilsson, Lars; Kurland, Charles G. (1996). "Co-variation of tRNA abundance and codon usage in Escherichia coli at different growth rates". Journal of Molecular Biology. 260 (5): 649–663. doi:10.1006/jmbi.1996.0428. ISSN   0022-2836. PMID   8709146.
  8. Sharp, Paul M.; Stenico, Michele; Peden, John F.; Lloyd, Andrew T. (1993). "Codon usage: mutational bias, translational selection, or both?". Biochem. Soc. Trans. 21 (4): 835–841. doi:10.1042/bst0210835. PMID   8132077. S2CID   8582630.
  9. Kanaya, Shigehiko; Yamada, Yuko; Kudo, Yoshihiro; Ikemura, Toshimichi (1999). "Studies of codon usage and tRNA genes of 18 unicellular organisms and quantification of Bacillus subtilis tRNAs: gene expression level and species-specific diversity of codon usage based on multivariate analysis". Gene. 238 (1): 143–155. doi:10.1016/s0378-1119(99)00225-5. ISSN   0378-1119. PMID   10570992.
  10. Atherton, John C.; Sharp, Paul M.; Lafay, Bénédicte (2000-04-01). "Absence of translationally selected synonymous codon usage bias in Helicobacter pylori". Microbiology. 146 (4): 851–860. doi: 10.1099/00221287-146-4-851 . ISSN   1350-0872. PMID   10784043.
  11. Bornelöv, Susanne; Selmi, Tommaso; Flad, Sophia; Dietmann, Sabine; Frye, Michaela (2019-06-07). "Codon usage optimization in pluripotent embryonic stem cells". Genome Biology. 20 (1): 119. doi: 10.1186/s13059-019-1726-z . ISSN   1474-760X. PMC   6555954 . PMID   31174582.
  12. Duret, Laurent (2000). "tRNA gene number and codon usage in the C. elegans genome are co-adapted for optimal translation of highly expressed genes". Trends in Genetics. 16 (7): 287–289. doi:10.1016/s0168-9525(00)02041-2. ISSN   0168-9525. PMID   10858656.
  13. Shin, Young C.; Bischof, Georg F.; Lauer, William A.; Desrosiers, Ronald C. (2015-09-10). "Importance of codon usage for the temporal regulation of viral gene expression". Proceedings of the National Academy of Sciences. 112 (45): 14030–14035. Bibcode:2015PNAS..11214030S. doi: 10.1073/pnas.1515387112 . PMC   4653223 . PMID   26504241.
  14. Ermolaeva MD (October 2001). "Synonymous codon usage in bacteria". Curr Issues Mol Biol. 3 (4): 91–7. PMID   11719972.
  15. Lynn DJ, Singer GA, Hickey DA (October 2002). "Synonymous codon usage is subject to selection in thermophilic bacteria". Nucleic Acids Res. 30 (19): 4272–7. doi:10.1093/nar/gkf546. PMC   140546 . PMID   12364606.
  16. Paul S, Bag SK, Das S, Harvill ET, Dutta C (2008). "Molecular signature of hypersaline adaptation: insights from genome and proteome composition of halophilic prokaryotes". Genome Biol. 9 (4): R70. doi: 10.1186/gb-2008-9-4-r70 . PMC   2643941 . PMID   18397532.
  17. Kober, K. M.; Pogson, G. H. (2013). "Genome-Wide Patterns of Codon Bias Are Shaped by Natural Selection in the Purple Sea Urchin, Strongylocentrotus purpuratus". G3. 3 (7): 1069–1083. doi:10.1534/g3.113.005769. PMC   3704236 . PMID   23637123.
  18. McInerney, James O. (1998-09-01). "Replicational and transcriptional selection on codon usage in Borrelia burgdorferi". Proceedings of the National Academy of Sciences. 95 (18): 10698–10703. Bibcode:1998PNAS...9510698M. doi: 10.1073/pnas.95.18.10698 . ISSN   0027-8424. PMC   27958 . PMID   9724767.
  19. Seward, Emily; Kelly, Steve (2016). "Dietary nitrogen alters codon bias and genome composition in parasitic microorganisms". Genome Biology. 17 (226): 3–15. doi: 10.1186/s13059-016-1087-9 . PMC   5109750 . PMID   27842572.
  20. 1 2 3 Hershberg, R; Petrov, D. A. (2008). "Selection on codon bias". Annual Review of Genetics. 42: 287–99. doi:10.1146/annurev.genet.42.110807.091442. PMID   18983258. S2CID   7085012.
  21. Novoa, E. M.; Ribas De Pouplana, L (2012). "Speeding with control: Codon usage, tRNAs, and ribosomes". Trends in Genetics. 28 (11): 574–81. doi:10.1016/j.tig.2012.07.006. PMID   22921354.
  22. Shu, P.; Dai, H.; Gao, W.; Goldman, E. (2006). "Inhibition of translation by consecutive rare leucine codons in E. coli: absence of effect of varying mRNA stability". Gene Expr. 13 (2): 97–106. doi:10.3727/000000006783991881. PMC   6032470 . PMID   17017124.
  23. Correddu, D.; Montaño López, J. d. J.; Angermayr, S. A.; Middleditch, M. J.; Payne, L. S.; Leung, I. K. H. (2019). "Effect of Consecutive Rare Codons on the Recombinant Production of Human Proteins in Escherichia coli". IUBMB Life . 72 (2): 266–274. doi:10.1002/iub.2162. hdl: 11343/286411 . PMID   31509345. S2CID   202555575.
  24. Mignon, C.; Mariano, N.; Stadthagen, G.; Lugari, A.; Lagoutte, P.; Donnat, S.; Chenavas, S.; Perot, C.; Sodoyer, R.; Werle, B. (2018). "Codon harmonization - going beyond the speed limit for protein expression". FEBS Letters . 592 (9): 1554–1564. doi: 10.1002/1873-3468.13046 . PMID   29624661.
  25. 1 2 Plotkin, J. B.; Kudla, G (2011). "Synonymous but not the same: The causes and consequences of codon bias". Nature Reviews Genetics. 12 (1): 32–42. doi:10.1038/nrg2899. PMC   3074964 . PMID   21102527.
  26. 1 2 Spencer, P. S.; Barral, J. M. (2012). "Genetic Code Redundancy and Its Influence on the Encoded Polypeptides". Computational and Structural Biotechnology Journal. 1: 1–8. doi:10.5936/csbj.201204006. PMC   3962081 . PMID   24688635.
  27. Comeron JM, Aguadé M (September 1998). "An evaluation of measures of synonymous codon usage bias". J. Mol. Evol. 47 (3): 268–74. Bibcode:1998JMolE..47..268C. doi:10.1007/PL00006384. PMID   9732453. S2CID   21862217.
  28. Ikemura T (September 1981). "Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system". J. Mol. Biol. 151 (3): 389–409. doi:10.1016/0022-2836(81)90003-6. PMID   6175758.
  29. Fox JM, Erill I (June 2010). "Relative codon adaptation: a generic codon bias index for prediction of gene expression". DNA Res. 17 (3): 185–96. doi:10.1093/dnares/dsq012. PMC   2885275 . PMID   20453079.
  30. Sharp, Paul M.; Li, Wen-Hsiung (1987). "The codon adaptation index-a measure of directional synonymous codon usage bias, and its potential applications". Nucleic Acids Research . 15 (3): 1281–1295. doi:10.1093/nar/15.3.1281. PMC   340524 . PMID   3547335.
  31. Peden J (2005-04-15). "Codon usage indices". Correspondence Analysis of Codon Usage. SourceForge. Retrieved 2010-10-20.
  32. Suzuki H, Brown CJ, Forney LJ, Top EM (December 2008). "Comparison of correspondence analysis methods for synonymous codon usage in bacteria". DNA Res. 15 (6): 357–65. doi:10.1093/dnares/dsn028. PMC   2608848 . PMID   18940873.