In biochemistry, non-coded or non-proteinogenic amino acids are distinct from the 22 proteinogenic amino acids (21 in eukaryotes [note 1] ), which are naturally encoded in the genome of organisms for the assembly of proteins. However, over 140 non-proteinogenic amino acids occur naturally in proteins and thousands more may occur in nature or be synthesized in the laboratory. [1] Chemically synthesized amino acids can be called unnatural amino acids. Unnatural amino acids can be synthetically prepared from their native analogs via modifications such as amine alkylation, side chain substitution, structural bond extension cyclization, and isosteric replacements within the amino acid backbone. [2] Many non-proteinogenic amino acids are important:
Technically, any organic compound with an amine (–NH2) and a carboxylic acid (–COOH) functional group is an amino acid. The proteinogenic amino acids are a small subset of this group that possess a central carbon atom (α- or 2-) bearing an amino group, a carboxyl group, a side chain and an α-hydrogen levo conformation, with the exception of glycine, which is achiral, and proline, whose amine group is a secondary amine and is consequently frequently referred to as an imino acid for traditional reasons, albeit not an imino.
The genetic code encodes 20 standard amino acids for incorporation into proteins during translation. However, there are two extra proteinogenic amino acids: selenocysteine and pyrrolysine. These non-standard amino acids do not have a dedicated codon, but are added in place of a stop codon when a specific sequence is present, UGA codon and SECIS element for selenocysteine, [5] UAG PYLIS downstream sequence for pyrrolysine. [6] All other amino acids are termed "non-proteinogenic".
There are various groups of amino acids: [7]
These groups overlap, but are not identical. All 22 proteinogenic amino acids are biosynthesised by organisms and some, but not all, of them also are abiotic (found in prebiotic experiments and meteorites). Some natural amino acids, such as norleucine, are misincorporated translationally into proteins due to infidelity of the protein-synthesis process. Many amino acids, such as ornithine, are metabolic intermediates produced biosynthetically, but not incorporated translationally into proteins. Post-translational modification of amino acid residues in proteins leads to the formation of many proteinaceous, but non-proteinogenic, amino acids. Other amino acids are solely found in abiotic mixes (e.g. α-methylnorvaline). Over 30 unnatural amino acids have been inserted translationally into protein in engineered systems, yet are not biosynthetic. [7]
In addition to the IUPAC numbering system to differentiate the various carbons in an organic molecule, by sequentially assigning a number to each carbon, including those forming a carboxylic group, the carbons along the side-chain of amino acids can also be labelled with Greek letters, where the α-carbon is the central chiral carbon possessing a carboxyl group, a side chain and, in α-amino acids, an amino group – the carbon in carboxylic groups is not counted. [8] (Consequently, the IUPAC names of many non-proteinogenic α-amino acids start with 2-amino- and end in -ic acid.)
Most natural amino acids are α-amino acids in the L configuration, but some exceptions exist.
Some non-α-amino acids exist in organisms. In these structures, the amine group is displaced further from the carboxylic acid end of the amino acid molecule. Thus a β-amino acid has the amine group bonded to the second carbon away, and a γ-amino acid has it on the third. Examples include β-alanine, GABA, and δ-aminolevulinic acid.
The reason why α-amino acids are used in proteins has been linked to their frequency in meteorites and prebiotic experiments. [10] An initial speculation on the deleterious properties of β-amino acids in terms of secondary structure [10] turned out to be incorrect. [11]
Some amino acids contain the opposite absolute chirality, chemicals that are not available from normal ribosomal translation and transcription machinery. Most bacterial cells walls are formed by peptidoglycan, a polymer composed of amino sugars crosslinked with short oligopeptides bridged between each other. The oligopeptide is non-ribosomally synthesised and contains several peculiarities including D-amino acids, generally D-alanine and D-glutamate. A further peculiarity is that the former is racemised by a PLP-binding enzymes (encoded by alr or the homologue dadX), whereas the latter is racemised by a cofactor independent enzyme (murI). Some variants are present, in Thermotoga spp. D-Lysine is present and in certain vancomycin-resistant bacteria D-serine is present (vanT gene). [12] [13]
All proteinogenic amino acids have at least one hydrogen on the α-carbon. Glycine has two hydrogens, and all others have one hydrogen and one side-chain. Replacement of the remaining hydrogen with a larger substituent, such as a methyl group, distorts the protein backbone. [10]
In some fungi α-aminoisobutyric acid is produced as a precursor to peptides, some of which exhibit antibiotic properties. [14] This compound is similar to alanine, but possesses an additional methyl group on the α-carbon instead of a hydrogen. It is therefore achiral. Another compound similar to alanine without an α-hydrogen is dehydroalanine, which possesses a methylene sidechain. It is one of several naturally occurring dehydroamino acids.
A subset of L-α-amino acids are ambiguous as to which of two ends is the α-carbon. In proteins a cysteine residue can form a disulfide bond with another cysteine residue, thus crosslinking the protein. Two crosslinked cysteines form a cystine molecule. Cysteine and methionine are generally produced by direct sulfurylation, but in some species they can be produced by transsulfuration, where the activated homoserine or serine is fused to a cysteine or homocysteine forming cystathionine. A similar compound is lanthionine, which can be seen as two alanine molecules joined via a thioether bond and is found in various organisms. Similarly, djenkolic acid, a plant toxin from jengkol beans, is composed of two cysteines connected by a methylene group. Diaminopimelic acid is both used as a bridge in peptidoglycan and is used a precursor to lysine (via its decarboxylation).
In meteorites and in prebiotic experiments (e.g. Miller–Urey experiment) many more amino acids than the twenty standard amino acids are found, several of which are at higher concentrations than the standard ones. It has been conjectured that if amino acid based life were to arise elsewhere in the universe, no more than 75% of the amino acids would be in common. [10] The most notable anomaly is the lack of aminobutyric acid.
Molecule | Electric discharge | Murchinson meteorite |
---|---|---|
glycine | 100 | 100 |
alanine | 180 | 36 |
α-amino-n-butyric acid | 61 | 19 |
norvaline | 14 | 14 |
valine | 4.4 | |
norleucine | 1.4 | |
leucine | 2.6 | |
isoleucine | 1.1 | |
alloisoleucine | 1.2 | |
t-leucine | < 0.005 | |
α-amino-n-heptanoic acid | 0.3 | |
proline | 0.3 | 22 |
pipecolic acid | 0.01 | 11 |
α,β-diaminopropionic acid | 1.5 | |
α,γ-diaminobutyric acid | 7.6 | |
ornithine | < 0.01 | |
lysine | < 0.01 | |
aspartic acid | 7.7 | 13 |
glutamic acid | 1.7 | 20 |
serine | 1.1 | |
threonine | 0.2 | |
allothreonine | 0.2 | |
methionine | 0.1 | |
homocysteine | 0.5 | |
homoserine | 0.5 | |
β-alanine | 4.3 | 10 |
β-amino-n-butyric acid | 0.1 | 5 |
β-aminoisobutyric acid | 0.5 | 7 |
γ-aminobutyric acid | 0.5 | 7 |
α-aminoisobutyric acid | 7 | 33 |
isovaline | 1 | 11 |
sarcosine | 12.5 | 7 |
N-ethylglycine | 6.8 | 6 |
N-propylglycine | 0.5 | |
N-isopropylglycine | 0.5 | |
N-methylalanine | 3.4 | 3 |
N-ethylalanine | < 0.05 | |
N-methyl-β-alanine | 1.0 | |
N-ethyl-β-alanine | < 0.05 | |
isoserine | 1.2 | |
α-hydroxy-γ-aminobutyric acid | 17 |
The genetic code has been described as a frozen accident and the reasons why there is only one standard amino acid with a straight chain, alanine, could simply be redundancy with valine, leucine and isoleucine. [10] However, straight chained amino acids are reported to form much more stable alpha helices. [15]
Serine, homoserine, O-methylhomoserine and O-ethylhomoserine possess a hydroxymethyl, hydroxyethyl, O-methylhydroxymethyl and O-methylhydroxyethyl side chain; whereas cysteine, homocysteine, methionine and ethionine possess the thiol equivalents. The selenol equivalents are selenocysteine, selenohomocysteine, selenomethionine and selenoethionine. Amino acids with the next chalcogen down are also found in nature: several species such as Aspergillus fumigatus , Aspergillus terreus , and Penicillium chrysogenum in the absence of sulfur are able to produce and incorporate into protein tellurocysteine and telluromethionine. [16]
In cells, especially autotrophs, several non-proteinogenic amino acids are found as metabolic intermediates. However, despite the catalytic flexibility of PLP-binding enzymes, many amino acids are synthesised as keto acids (such as 4-methyl-2-oxopentanoate to leucine) and aminated in the last step, thus keeping the number of non-proteinogenic amino acid intermediates fairly low.
Ornithine and citrulline occur in the urea cycle, part of amino acid catabolism (see below). [17]
In addition to primary metabolism, several non-proteinogenic amino acids are precursors or the final production in secondary metabolism to make small compounds or non-ribosomal peptides (such as some toxins).
Despite not being encoded by the genetic code as proteinogenic amino acids, some non-standard amino acids are nevertheless found in proteins. These are formed by post-translational modification of the side chains of standard amino acids present in the target protein. These modifications are often essential for the function or regulation of a protein; for example, in γ-carboxyglutamate the carboxylation of glutamate allows for better binding of calcium cations, [18] and in hydroxyproline the hydroxylation of proline is critical for maintaining connective tissues. [19] Another example is the formation of hypusine in the translation initiation factor EIF5A, through modification of a lysine residue. [20] Such modifications can also determine the localization of the protein, for example, the addition of long hydrophobic groups can cause a protein to bind to a phospholipid membrane. [21]
There is some preliminary evidence that aminomalonic acid may be present, possibly by misincorporation, in protein. [22] [23]
Several non-proteinogenic amino acids are toxic due to their ability to mimic certain properties of proteinogenic amino acids, such as thialysine. Some non-proteinogenic amino acids are neurotoxic by mimicking amino acids used as neurotransmitters (that is, not for protein biosynthesis), including quisqualic acid, canavanine and azetidine-2-carboxylic acid. [24] Cephalosporin C has an α-aminoadipic acid (homoglutamate) backbone that is amidated with a cephalosporin moiety. [25] Penicillamine is a therapeutic amino acid, whose mode of action is unknown.
Naturally-occurring cyanotoxins can also include non-proteinogenic amino acids. Microcystin and nodularin, for example, are both derived from ADDA, a β-amino acid.
Taurine is an amino sulfonic acid and not an amino carboxylic acid, however it is occasionally considered as such as the amounts required to suppress the auxotroph in certain organisms (such as cats) are closer to those of "essential amino acids" (amino acid auxotrophy) than of vitamins (cofactor auxotrophy).
The osmolytes, sarcosine and glycine betaine are derived from amino acids, but have a secondary and quaternary amine respectively.
Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although over 500 amino acids exist in nature, by far the most important are the 22 α-amino acids incorporated into proteins. Only these 22 appear in the genetic code of life.
Selenocysteine is the 21st proteinogenic amino acid. Selenoproteins contain selenocysteine residues. Selenocysteine is an analogue of the more common cysteine with selenium in place of the sulfur.
Proline (symbol Pro or P) is an organic acid classed as a proteinogenic amino acid (used in the biosynthesis of proteins), although it does not contain the amino group -NH
2 but is rather a secondary amine. The secondary amine nitrogen is in the protonated form (NH2+) under biological conditions, while the carboxyl group is in the deprotonated −COO− form. The "side chain" from the α carbon connects to the nitrogen forming a pyrrolidine loop, classifying it as a aliphatic amino acid. It is non-essential in humans, meaning the body can synthesize it from the non-essential amino acid L-glutamate. It is encoded by all the codons starting with CC (CCU, CCC, CCA, and CCG).
Cysteine is a semiessential proteinogenic amino acid with the formula HOOC−CH(−NH2)−CH2−SH. The thiol side chain in cysteine enables the formation of disulfide bonds, and often participates in enzymatic reactions as a nucleophile. Cysteine is chiral, but both D and L-cysteine are found in nature. L‑Cysteine is a protein monomer in all biota, and D-cysteine acts as a signaling molecule in mammalian nervous systems. Cysteine is named after its discovery in urine, which comes from the urinary bladder or cyst, from Greek κύστη kýsti, "bladder".
Methionine is an essential amino acid in humans.
Pyrrolysine is an α-amino acid that is used in the biosynthesis of proteins in some methanogenic archaea and bacteria; it is not present in humans. It contains an α-amino group and a carboxylic acid group. Its pyrroline side-chain is similar to that of lysine in being basic and positively charged at neutral pH.
Alanine, or α-alanine, is an α-amino acid that is used in the biosynthesis of proteins. It contains an amine group and a carboxylic acid group, both attached to the central carbon atom which also carries a methyl group side chain. Consequently it is classified as a nonpolar, aliphatic α-amino acid. Under biological conditions, it exists in its zwitterionic form with its amine group protonated and its carboxyl group deprotonated. It is non-essential to humans as it can be synthesized metabolically and does not need to be present in the diet. It is encoded by all codons starting with GC.
In molecular biology, post-translational modification (PTM) is the covalent process of changing proteins following protein biosynthesis. PTMs may involve enzymes or occur spontaneously. Proteins are created by ribosomes, which translate mRNA into polypeptide chains, which may then change to form the mature protein product. PTMs are important components in cell signalling, as for example when prohormones are converted to hormones.
Decarboxylation is a chemical reaction that removes a carboxyl group and releases carbon dioxide (CO2). Usually, decarboxylation refers to a reaction of carboxylic acids, removing a carbon atom from a carbon chain. The reverse process, which is the first chemical step in photosynthesis, is called carboxylation, the addition of CO2 to a compound. Enzymes that catalyze decarboxylations are called decarboxylases or, the more formal term, carboxy-lyases (EC number 4.1.1).
Proteinogenic amino acids are amino acids that are incorporated biosynthetically into proteins during translation. The word "proteinogenic" means "protein creating". Throughout known life, there are 22 genetically encoded (proteinogenic) amino acids, 20 in the standard genetic code and an additional 2 that can be incorporated by special translation mechanisms.
Dehydroalanine is a dehydroamino acid. It does not exist in its free form, but it occurs naturally as a residue found in peptides of microbial origin. As an amino acid residue, it is unusual because it has an unsaturated backbone.
Pyridoxal phosphate (PLP, pyridoxal 5'-phosphate, P5P), the active form of vitamin B6, is a coenzyme in a variety of enzymatic reactions. The International Union of Biochemistry and Molecular Biology has catalogued more than 140 PLP-dependent activities, corresponding to ~4% of all classified activities. The versatility of PLP arises from its ability to covalently bind the substrate, and then to act as an electrophilic catalyst, thereby stabilizing different types of carbanionic reaction intermediates.
DD-Transpeptidase is a bacterial enzyme that catalyzes the transfer of the R-L-αα-D-alanyl moiety of R-L-αα-D-alanyl-D-alanine carbonyl donors to the γ-OH of their active-site serine and from this to a final acceptor. It is involved in bacterial cell wall biosynthesis, namely, the transpeptidation that crosslinks the peptide side chains of peptidoglycan strands.
Biosynthesis, i.e., chemical synthesis occurring in biological contexts, is a term most often referring to multi-step, enzyme-catalyzed processes where chemical substances absorbed as nutrients serve as enzyme substrates, with conversion by the living organism either into simpler or more complex products. Examples of biosynthetic pathways include those for the production of amino acids, lipid membrane components, and nucleotides, but also for the production of all classes of biological macromolecules, and of acetyl-coenzyme A, adenosine triphosphate, nicotinamide adenine dinucleotide and other key intermediate and transactional molecules needed for metabolism. Thus, in biosynthesis, any of an array of compounds, from simple to complex, are converted into other compounds, and so it includes both the catabolism and anabolism of complex molecules. Biosynthetic processes are often represented via charts of metabolic pathways. A particular biosynthetic pathway may be located within a single cellular organelle, while others involve enzymes that are located across an array of cellular organelles and structures.
A catalytic triad is a set of three coordinated amino acid residues that can be found in the active site of some enzymes. Catalytic triads are most commonly found in hydrolase and transferase enzymes. An acid-base-nucleophile triad is a common motif for generating a nucleophilic residue for covalent catalysis. The residues form a charge-relay network to polarise and activate the nucleophile, which attacks the substrate, forming a covalent intermediate which is then hydrolysed to release the product and regenerate free enzyme. The nucleophile is most commonly a serine or cysteine, but occasionally threonine or even selenocysteine. The 3D structure of the enzyme brings together the triad residues in a precise orientation, even though they may be far apart in the sequence.
Amino acid biosynthesis is the set of biochemical processes by which the amino acids are produced. The substrates for these processes are various compounds in the organism's diet or growth media. Not all organisms are able to synthesize all amino acids. For example, humans can synthesize 11 of the 20 standard amino acids. These 11 are called the non-essential amino acids.
Cystathionine beta-lyase, also commonly referred to as CBL or β-cystathionase, is an enzyme that primarily catalyzes the following α,β-elimination reaction
Azetidine-2-carboxylic acid (abbreviated Aze or Azc) is a plant non-protein amino acid homologue of proline with the molecular formula C4H7NO2. Aze is a heterocyclic, 4 membered ring with nitrogen as its heteroatom (an azetidine), and a carboxylic acid group substituted on one of the ring carbon atoms. The main difference between Aze and proline is the ring of Aze has four members and the ring of proline has five. Aze has the ability to act as an analog of proline and can be incorporated into proteins in place of proline.
Lysine carboxypeptidase is an enzyme. This enzyme catalyses the following chemical reaction:
In biochemistry, a dehydroamino acid or α,β-dehydroamino acid is an amino acids, usually with a C=C double bond in its side chain. Dehydroamino acids are not coded by DNA, but arise via post-translational modification.