Amino acid replacement is a change from one amino acid to a different amino acid in a protein due to point mutation in the corresponding DNA sequence. It is caused by nonsynonymous missense mutation which changes the codon sequence to code other amino acid instead of the original.
Not all amino acid replacements have the same effect on function or structure of protein. The magnitude of this process may vary depending on how similar or dissimilar the replaced amino acids are, as well as on their position in the sequence or the structure. Similarity between amino acids can be calculated based on substitution matrices, physico-chemical distance, or simple properties such as amino acid size or charge [1] (see also amino acid chemical properties). Usually amino acids are thus classified into two types: [2]
Physicochemical distance is a measure that assesses the difference between replaced amino acids. The value of distance is based on properties of amino acids. There are 134 physicochemical properties that can be used to estimate similarity between amino acids. [3] Each physicochemical distance is based on different composition of properties.
Two-state characters | Properties |
---|---|
1-5 | Presence respectively of: β―CH2, γ―CH2, δ―CH2 (proline scored as positive), ε―CH2 group and a―CH3 group |
6-10 | Presence respectively of: ω―SH, ω―COOH, ω―NH2 (basic), ω―CONH2 and ―CHOH groups |
11-15 | Presence respectively of: benzene ring (including tryptophan as positive), branching in side chain by a CH group, a second CH3 group, two but not three ―H groups at the ends of the side chain (proline scored as positive) and a C―S―C group |
16-20 | Presence respectively of: guanido group, α―NH2, α―NH group in ring, δ―NH group in ring, ―N= group in ring |
21-25 | Presence respectively of: ―CH=N, indolyl group, imidazole group, C=O group in side chain, and configuration at α―C potentially changing direction of the peptide chain (only proline scores positive) |
26-30 | Presence respectively of: sulphur atom, primary aliphatic ―OH group, secondary aliphatic ―OH group, phenolic ―OH group, ability to form S―S bridges |
31-35 | Presence respectively of: imidazole ―NH group, indolyl ―NH group, ―SCH3 group, a second optical centre, the N=CR―NH group |
36-40 | Presence respectively of: isopropyl group, distinct aromatic reactivity, strong aromatic reactivity, terminal positive charge, negative charge at high pH (tyrosine scored positive) |
41 | Presence of pyrrolidine ring |
42-53 | Molecular weight (approximate) of side chain, scored in 12 additive steps (sulphur counted as the equivalent of two carbon, nitrogen or oxygen atoms) |
54-56 | Presence, respectively, of: flat 5-, 6- and 9-membered ring system |
57-64 | pK at isoelectric point, scored additively in steps of 1 pH |
65-68 | Logarithm of solubility in water of the ʟ-isomer in mg/100 ml., scored additively |
69-70 | Optical rotation in 5 ɴ-HCl, [α]D 0 to -25, and over -25, respectively |
71-72 | Optical rotation in 5 ɴ-HCI, [α] 0 to +25, respectively (values for glutamine and tryptophan with water as solvent, and for asparagine 3·4 ɴ-HCl) |
73-74 | Side-chain hydrogen bonding (ionic type), strong donor and strong acceptor, respectively |
75-76 | Side-chain hydrogen bonding (neutral type), strong donor and strong acceptor, respectively |
77-78 | Water structure former, respectively moderate and strong |
79 | Water structure breaker |
80-82 | Mobile electrons few, moderate and many, respectively (scored additively) |
83-85 | Heat and age stability moderate, high and very high, respectively (scored additively) |
86-89 | RF in phenol-water paper chromatography in steps of 0·2 (scored additively) |
90-93 | RF in toluene-pyridine-glycolchlorhydrin (paper chromatography of DNP-derivative) in steps of 0·2 (scored additively: for lysine the di-DNP derivative) |
94-97 | Ninhydrin colour after collidine-lutidine chromatography and heating 5 min at 100 °C, respectively purple, pink, brown and yellow |
98 | End of side-chain furcated |
99-101 | Number of substituents on the β-carbon atom, respectively 1, 2 or 3 (scored additively) |
102-111 | The mean number of lone pair electrons on the side-chain (scored additively) |
112-115 | Number of bonds in the side-chain allowing rotation (scored additively) |
116-117 | Ionic volume within rings slight, or moderate (scored additively) |
118-124 | Maximum moment of inertia for rotation at the α―β bond (scored additively in seven approximate steps) |
125-131 | Maximum moment of inertia for rotation at the β―γ bond (scored additively in seven approximate steps) |
132-134 | Maximum moment of inertia for rotation at the γ―δ bond (scored additively in three approximate steps) |
Grantham's distance depends on three properties: composition, polarity and molecular volume. [4]
Distance difference D for each pair of amino acid i and j is calculated as:
where c = composition, p = polarity, and v = molecular volume; and are constants of squares of the inverses of the mean distance for each property, respectively equal to 1.833, 0.1018, 0.000399. According to Grantham's distance, most similar amino acids are leucine and isoleucine and the most distant are cysteine and tryptophan.
Arg | Leu | Pro | Thr | Ala | Val | Gly | Ile | Phe | Tyr | Cys | His | Gln | Asn | Lys | Asp | Glu | Met | Trp | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
110 | 145 | 74 | 58 | 99 | 124 | 56 | 142 | 155 | 144 | 112 | 89 | 68 | 46 | 121 | 65 | 80 | 135 | 177 | Ser |
102 | 103 | 71 | 112 | 96 | 125 | 97 | 97 | 77 | 180 | 29 | 43 | 86 | 26 | 96 | 54 | 91 | 101 | Arg | |
98 | 92 | 96 | 32 | 138 | 5 | 22 | 36 | 198 | 99 | 113 | 153 | 107 | 172 | 138 | 15 | 61 | Leu | ||
38 | 27 | 68 | 42 | 95 | 114 | 110 | 169 | 77 | 76 | 91 | 103 | 108 | 93 | 87 | 147 | Pro | |||
58 | 69 | 59 | 89 | 103 | 92 | 149 | 47 | 42 | 65 | 78 | 85 | 65 | 81 | 128 | Thr | ||||
64 | 60 | 94 | 113 | 112 | 195 | 86 | 91 | 111 | 106 | 126 | 107 | 84 | 148 | Ala | |||||
109 | 29 | 50 | 55 | 192 | 84 | 96 | 133 | 97 | 152 | 121 | 21 | 88 | Val | ||||||
135 | 153 | 147 | 159 | 98 | 87 | 80 | 127 | 94 | 98 | 127 | 184 | Gly | |||||||
21 | 33 | 198 | 94 | 109 | 149 | 102 | 168 | 134 | 10 | 61 | Ile | ||||||||
22 | 205 | 100 | 116 | 158 | 102 | 177 | 140 | 28 | 40 | Phe | |||||||||
194 | 83 | 99 | 143 | 85 | 160 | 122 | 36 | 37 | Tyr | ||||||||||
174 | 154 | 139 | 202 | 154 | 170 | 196 | 215 | Cys | |||||||||||
24 | 68 | 32 | 81 | 40 | 87 | 115 | His | ||||||||||||
46 | 53 | 61 | 29 | 101 | 130 | Gln | |||||||||||||
94 | 23 | 42 | 142 | 174 | Asn | ||||||||||||||
101 | 56 | 95 | 110 | Lys | |||||||||||||||
45 | 160 | 181 | Asp | ||||||||||||||||
126 | 152 | Glu | |||||||||||||||||
67 | Met |
Sneath's index takes into account 134 categories of activity and structure. [3] Dissimilarity index D is a percentage value of the sum of all properties not shared between two replaced amino acids. It is percentage value expressed by , where S is Similarity.
Leu | Ile | Val | Gly | Ala | Pro | Gln | Asn | Met | Thr | Ser | Cys | Glu | Asp | Lys | Arg | Tyr | Phe | Trp | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Ile | 5 | ||||||||||||||||||
Val | 9 | 7 | |||||||||||||||||
Gly | 24 | 25 | 19 | ||||||||||||||||
Ala | 15 | 17 | 12 | 9 | |||||||||||||||
Pro | 23 | 24 | 20 | 17 | 16 | ||||||||||||||
Gln | 22 | 24 | 25 | 32 | 26 | 33 | |||||||||||||
Asn | 20 | 23 | 23 | 26 | 25 | 31 | 10 | ||||||||||||
Met | 20 | 22 | 23 | 34 | 25 | 31 | 13 | 21 | |||||||||||
Thr | 23 | 21 | 17 | 20 | 20 | 25 | 24 | 19 | 25 | ||||||||||
Ser | 23 | 25 | 20 | 19 | 16 | 24 | 21 | 15 | 22 | 12 | |||||||||
Cys | 24 | 26 | 21 | 21 | 13 | 25 | 22 | 19 | 17 | 19 | 13 | ||||||||
Glu | 30 | 31 | 31 | 37 | 34 | 43 | 14 | 19 | 26 | 34 | 29 | 33 | |||||||
Asp | 25 | 28 | 28 | 33 | 30 | 40 | 22 | 14 | 31 | 29 | 25 | 28 | 7 | ||||||
Lys | 23 | 24 | 26 | 31 | 26 | 31 | 21 | 27 | 24 | 34 | 31 | 32 | 26 | 34 | |||||
Arg | 33 | 34 | 36 | 43 | 37 | 43 | 23 | 31 | 28 | 38 | 37 | 36 | 31 | 39 | 14 | ||||
Tyr | 30 | 34 | 36 | 36 | 34 | 37 | 29 | 28 | 32 | 32 | 29 | 34 | 34 | 34 | 34 | 36 | |||
Phe | 19 | 22 | 26 | 29 | 26 | 27 | 24 | 24 | 24 | 28 | 25 | 29 | 35 | 35 | 28 | 34 | 13 | ||
Trp | 30 | 34 | 37 | 39 | 36 | 37 | 31 | 32 | 31 | 38 | 35 | 37 | 43 | 45 | 34 | 36 | 21 | 13 | |
His | 25 | 28 | 31 | 34 | 29 | 36 | 27 | 24 | 30 | 34 | 28 | 31 | 27 | 35 | 27 | 31 | 23 | 18 | 25 |
Epstein's coefficient of difference is based on the differences in polarity and size between replaced pairs of amino acids. [5] This index that distincts the direction of exchange between amino acids, described by 2 equations:
when smaller hydrophobic residue is replaced by larger hydrophobic or polar residue
when polar residue is exchanged or larger residue is replaced by smaller
Phe | Met | Leu | Ile | Val | Pro | Tyr | Trp | Cys | Ala | Gly | Ser | Thr | His | Glu | Gln | Asp | Asn | Lys | Arg | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Phe | 0.05 | 0.08 | 0.08 | 0.1 | 0.1 | 0.21 | 0.25 | 0.22 | 0.43 | 0.53 | 0.81 | 0.81 | 0.8 | 1 | 1 | 1 | 1 | 1 | 1 | |
Met | 0.1 | 0.03 | 0.03 | 0.1 | 0.1 | 0.25 | 0.32 | 0.21 | 0.41 | 0.42 | 0.8 | 0.8 | 0.8 | 1 | 1 | 1 | 1 | 1 | 1 | |
Leu | 0.15 | 0.05 | 0 | 0.03 | 0.03 | 0.28 | 0.36 | 0.2 | 0.43 | 0.51 | 0.8 | 0.8 | 0.81 | 1 | 1 | 1 | 1 | 1 | 1.01 | |
Ile | 0.15 | 0.05 | 0 | 0.03 | 0.03 | 0.28 | 0.36 | 0.2 | 0.43 | 0.51 | 0.8 | 0.8 | 0.81 | 1 | 1 | 1 | 1 | 1 | 1.01 | |
Val | 0.2 | 0.1 | 0.05 | 0.05 | 0 | 0.32 | 0.4 | 0.2 | 0.4 | 0.5 | 0.8 | 0.8 | 0.81 | 1 | 1 | 1 | 1 | 1 | 1.02 | |
Pro | 0.2 | 0.1 | 0.05 | 0.05 | 0 | 0.32 | 0.4 | 0.2 | 0.4 | 0.5 | 0.8 | 0.8 | 0.81 | 1 | 1 | 1 | 1 | 1 | 1.02 | |
Tyr | 0.2 | 0.22 | 0.22 | 0.22 | 0.24 | 0.24 | 0.1 | 0.13 | 0.27 | 0.36 | 0.62 | 0.61 | 0.6 | 0.8 | 0.8 | 0.81 | 0.81 | 0.8 | 0.8 | |
Trp | 0.21 | 0.24 | 0.25 | 0.25 | 0.27 | 0.27 | 0.05 | 0.18 | 0.3 | 0.39 | 0.63 | 0.63 | 0.61 | 0.81 | 0.81 | 0.81 | 0.81 | 0.81 | 0.8 | |
Cys | 0.28 | 0.22 | 0.21 | 0.21 | 0.2 | 0.2 | 0.25 | 0.35 | 0.25 | 0.31 | 0.6 | 0.6 | 0.62 | 0.81 | 0.81 | 0.8 | 0.8 | 0.81 | 0.82 | |
Ala | 0.5 | 0.45 | 0.43 | 0.43 | 0.41 | 0.41 | 0.4 | 0.49 | 0.22 | 0.1 | 0.4 | 0.41 | 0.47 | 0.63 | 0.63 | 0.62 | 0.62 | 0.63 | 0.67 | |
Gly | 0.61 | 0.56 | 0.54 | 0.54 | 0.52 | 0.52 | 0.5 | 0.58 | 0.34 | 0.1 | 0.32 | 0.34 | 0.42 | 0.56 | 0.56 | 0.54 | 0.54 | 0.56 | 0.61 | |
Ser | 0.81 | 0.8 | 0.8 | 0.8 | 0.8 | 0.8 | 0.62 | 0.63 | 0.6 | 0.4 | 0.3 | 0.03 | 0.1 | 0.21 | 0.21 | 0.2 | 0.2 | 0.21 | 0.24 | |
Thr | 0.81 | 0.8 | 0.8 | 0.8 | 0.8 | 0.8 | 0.61 | 0.63 | 0.6 | 0.4 | 0.31 | 0.03 | 0.08 | 0.21 | 0.21 | 0.2 | 0.2 | 0.21 | 0.22 | |
His | 0.8 | 0.8 | 1 | 1 | 0.8 | 0.8 | 0.6 | 0.61 | 0.61 | 0.42 | 0.34 | 0.1 | 0.08 | 0.2 | 0.2 | 0.21 | 0.21 | 0.2 | 0.2 | |
Glu | 1 | 1 | 1 | 1 | 1 | 1 | 0.8 | 0.81 | 0.8 | 0.61 | 0.52 | 0.22 | 0.21 | 0.2 | 0 | 0.03 | 0.03 | 0 | 0.05 | |
Gln | 1 | 1 | 1 | 1 | 1 | 1 | 0.8 | 0.81 | 0.8 | 0.61 | 0.52 | 0.22 | 0.21 | 0.2 | 0 | 0.03 | 0.03 | 0 | 0.05 | |
Asp | 1 | 1 | 1 | 1 | 1 | 1 | 0.81 | 0.81 | 0.8 | 0.61 | 0.51 | 0.21 | 0.2 | 0.21 | 0.03 | 0.03 | 0 | 0.03 | 0.08 | |
Asn | 1 | 1 | 1 | 1 | 1 | 1 | 0.81 | 0.81 | 0.8 | 0.61 | 0.51 | 0.21 | 0.2 | 0.21 | 0.03 | 0.03 | 0 | 0.03 | 0.08 | |
Lys | 1 | 1 | 1 | 1 | 1 | 1 | 0.8 | 0.81 | 0.8 | 0.61 | 0.52 | 0.22 | 0.21 | 0.2 | 0 | 0 | 0.03 | 0.03 | 0.05 | |
Arg | 1 | 1 | 1 | 1 | 1.01 | 1.01 | 0.8 | 0.8 | 0.81 | 0.62 | 0.53 | 0.24 | 0.22 | 0.2 | 0.05 | 0.05 | 0.08 | 0.08 | 0.05 |
Miyata's distance is based on 2 physicochemical properties: volume and polarity. [6]
Distance between amino acids ai and aj is calculated as where is value of polarity difference between replaced amino acids and and is difference for volume; and are standard deviations for and
Cys | Pro | Ala | Gly | Ser | Thr | Gln | Glu | Asn | Asp | His | Lys | Arg | Val | Leu | Ile | Met | Phe | Tyr | Trp | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1.33 | 1.39 | 2.22 | 2.84 | 1.45 | 2.48 | 3.26 | 2.83 | 3.48 | 2.56 | 3.27 | 3.06 | 0.86 | 1.65 | 1.63 | 1.46 | 2.24 | 2.38 | 3.34 | Cys | |
0.06 | 0.97 | 0.56 | 0.87 | 1.92 | 2.48 | 1.8 | 2.4 | 2.15 | 2.94 | 2.9 | 1.79 | 2.7 | 2.62 | 2.36 | 3.17 | 3.12 | 4.17 | Pro | ||
0.91 | 0.51 | 0.9 | 1.92 | 2.46 | 1.78 | 2.37 | 2.17 | 2.96 | 2.92 | 1.85 | 2.76 | 2.69 | 2.42 | 3.23 | 3.18 | 4.23 | Ala | |||
0.85 | 1.7 | 2.48 | 2.78 | 1.96 | 2.37 | 2.78 | 3.54 | 3.58 | 2.76 | 3.67 | 3.6 | 3.34 | 4.14 | 4.08 | 5.13 | Gly | ||||
0.89 | 1.65 | 2.06 | 1.31 | 1.87 | 1.94 | 2.71 | 2.74 | 2.15 | 3.04 | 2.95 | 2.67 | 3.45 | 3.33 | 4.38 | Ser | |||||
1.12 | 1.83 | 1.4 | 2.05 | 1.32 | 2.1 | 2.03 | 1.42 | 2.25 | 2.14 | 1.86 | 2.6 | 2.45 | 3.5 | Thr | ||||||
0.84 | 0.99 | 1.47 | 0.32 | 1.06 | 1.13 | 2.13 | 2.7 | 2.57 | 2.3 | 2.81 | 2.48 | 3.42 | Gln | |||||||
0.85 | 0.9 | 0.96 | 1.14 | 1.45 | 2.97 | 3.53 | 3.39 | 3.13 | 3.59 | 3.22 | 4.08 | Glu | ||||||||
0.65 | 1.29 | 1.84 | 2.04 | 2.76 | 3.49 | 3.37 | 3.08 | 3.7 | 3.42 | 4.39 | Asn | |||||||||
1.72 | 2.05 | 2.34 | 3.4 | 4.1 | 3.98 | 3.69 | 4.27 | 3.95 | 4.88 | Asp | ||||||||||
0.79 | 0.82 | 2.11 | 2.59 | 2.45 | 2.19 | 2.63 | 2.27 | 3.16 | His | |||||||||||
0.4 | 2.7 | 2.98 | 2.84 | 2.63 | 2.85 | 2.42 | 3.11 | Lys | ||||||||||||
2.43 | 2.62 | 2.49 | 2.29 | 2.47 | 2.02 | 2.72 | Arg | |||||||||||||
0.91 | 0.85 | 0.62 | 1.43 | 1.52 | 2.51 | Val | ||||||||||||||
0.14 | 0.41 | 0.63 | 0.94 | 1.73 | Leu | |||||||||||||||
0.29 | 0.61 | 0.86 | 1.72 | Ile | ||||||||||||||||
0.82 | 0.93 | 1.89 | Met | |||||||||||||||||
0.48 | 1.11 | Phe | ||||||||||||||||||
1.06 | Tyr | |||||||||||||||||||
Trp |
Experimental Exchangeability was devised by Yampolsky and Stoltzfus. [7] It is the measure of the mean effect of exchanging one amino acid into a different amino acid.
It is based on analysis of experimental studies where 9671 amino acids replacements from different proteins, were compared for effect on protein activity.
Cys | Ser | Thr | Pro | Ala | Gly | Asn | Asp | Glu | Gln | His | Arg | Lys | Met | Ile | Leu | Val | Phe | Tyr | Trp | Exsrc | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Cys | . | 258 | 121 | 201 | 334 | 288 | 109 | 109 | 270 | 383 | 258 | 306 | 252 | 169 | 109 | 347 | 89 | 349 | 349 | 139 | 280 |
Ser | 373 | . | 481 | 249 | 490 | 418 | 390 | 314 | 343 | 352 | 353 | 363 | 275 | 321 | 270 | 295 | 358 | 334 | 294 | 160 | 351 |
Thr | 325 | 408 | . | 164 | 402 | 332 | 240 | 190 | 212 | 308 | 246 | 299 | 256 | 152 | 198 | 271 | 362 | 273 | 260 | 66 | 287 |
Pro | 345 | 392 | 286 | . | 454 | 404 | 352 | 254 | 346 | 384 | 369 | 254 | 231 | 257 | 204 | 258 | 421 | 339 | 298 | 305 | 335 |
Ala | 393 | 384 | 312 | 243 | . | 387 | 430 | 193 | 275 | 320 | 301 | 295 | 225 | 549 | 245 | 313 | 319 | 305 | 286 | 165 | 312 |
Gly | 267 | 304 | 187 | 140 | 369 | . | 210 | 188 | 206 | 272 | 235 | 178 | 219 | 197 | 110 | 193 | 208 | 168 | 188 | 173 | 228 |
Asn | 234 | 355 | 329 | 275 | 400 | 391 | . | 208 | 257 | 298 | 248 | 252 | 183 | 236 | 184 | 233 | 233 | 210 | 251 | 120 | 272 |
Asp | 285 | 275 | 245 | 220 | 293 | 264 | 201 | . | 344 | 263 | 298 | 252 | 208 | 245 | 299 | 236 | 175 | 233 | 227 | 103 | 258 |
Glu | 332 | 355 | 292 | 216 | 520 | 407 | 258 | 533 | . | 341 | 380 | 279 | 323 | 219 | 450 | 321 | 351 | 342 | 348 | 145 | 363 |
Gln | 383 | 443 | 361 | 212 | 499 | 406 | 338 | 68 | 439 | . | 396 | 366 | 354 | 504 | 467 | 391 | 603 | 383 | 361 | 159 | 386 |
His | 331 | 365 | 205 | 220 | 462 | 370 | 225 | 141 | 319 | 301 | . | 275 | 332 | 315 | 205 | 364 | 255 | 328 | 260 | 72 | 303 |
Arg | 225 | 270 | 199 | 145 | 459 | 251 | 67 | 124 | 250 | 288 | 263 | . | 306 | 68 | 139 | 242 | 189 | 213 | 272 | 63 | 259 |
Lys | 331 | 376 | 476 | 252 | 600 | 492 | 457 | 465 | 272 | 441 | 362 | 440 | . | 414 | 491 | 301 | 487 | 360 | 343 | 218 | 409 |
Met | 347 | 353 | 261 | 85 | 357 | 218 | 544 | 392 | 287 | 394 | 278 | 112 | 135 | . | 612 | 513 | 354 | 330 | 308 | 633 | 307 |
Ile | 362 | 196 | 193 | 145 | 326 | 160 | 172 | 27 | 197 | 191 | 221 | 124 | 121 | 279 | . | 417 | 494 | 331 | 323 | 73 | 252 |
Leu | 366 | 212 | 165 | 146 | 343 | 201 | 162 | 112 | 199 | 250 | 288 | 185 | 171 | 367 | 301 | . | 275 | 336 | 295 | 152 | 248 |
Val | 382 | 326 | 398 | 201 | 389 | 269 | 108 | 228 | 192 | 280 | 253 | 190 | 197 | 562 | 537 | 333 | . | 207 | 209 | 286 | 277 |
Phe | 176 | 152 | 257 | 112 | 236 | 94 | 136 | 90 | 62 | 216 | 237 | 122 | 85 | 255 | 181 | 296 | 291 | . | 332 | 232 | 193 |
Tyr | 142 | 173 | . | 194 | 402 | 357 | 129 | 87 | 176 | 369 | 197 | 340 | 171 | 392 | . | 362 | . | 360 | . | 303 | 258 |
Trp | 137 | 92 | 17 | 66 | 63 | 162 | . | . | 65 | 61 | 239 | 103 | 54 | 110 | . | 177 | 110 | 364 | 281 | . | 142 |
Exdest | 315 | 311 | 293 | 192 | 411 | 321 | 258 | 225 | 262 | 305 | 290 | 255 | 225 | 314 | 293 | 307 | 305 | 294 | 279 | 172 | 291 |
Amino acids can also be classified according to how many different amino acids they can be exchanged by through single nucleotide substitution.
Some amino acids are more likely to be replaced. One of the factors that influences this tendency is physicochemical distance. Example of a measure of amino acid can be Graur's Stability Index. [9] The assumption of this measure is that the amino acid replacement rate and protein's evolution is dependent on the amino acid composition of protein. Stability index S of an amino acid is calculated based on physicochemical distances of this amino acid and its alternatives than can mutate through single nucleotide substitution and probabilities to replace into these amino acids. Based on Grantham's distance the most immutable amino acid is cysteine, and the most prone to undergo exchange is methionine.
Alternative codons | Alternative amino acids | Probabilities | Grantham's distances [4] | Average distance |
---|---|---|---|---|
AUU, AUC, AUA | Isoleucine | 1/3 | 10 | 3.33 |
ACG | Threonine | 1/9 | 81 | 9.00 |
AAG | Lysine | 1/9 | 95 | 10.56 |
AGG | Arginine | 1/9 | 91 | 10.11 |
UUG, CUG | Leucine | 2/9 | 15 | 3.33 |
GUG | Valine | 1/9 | 21 | 2.33 |
Stability index [9] | 38.67 |
Evolution of proteins is slower than DNA since only nonsynonymous mutations in DNA can result in amino acid replacements. Most mutations are neutral to maintain protein function and structure. Therefore, the more similar amino acids are, the more probable that they will be replaced. Conservative replacements are more common than radical replacements, since they can result in less important phenotypic changes. [10] On the other hand, beneficial mutations, enhancing protein functions are most likely to be radical replacements. [11] Also, the physicochemical distances, which are based on amino acids properties, are negatively correlated with probability of amino acids substitutions. Smaller distance between amino acids indicates that they are more likely to undergo replacement.
The genetic code is the set of rules used by living cells to translate information encoded within genetic material into proteins. Translation is accomplished by the ribosome, which links proteinogenic amino acids in an order specified by messenger RNA (mRNA), using transfer RNA (tRNA) molecules to carry amino acids and to read the mRNA three nucleotides at a time. The genetic code is highly similar among all organisms and can be expressed in a simple table with 64 entries.
In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, mitosis, or meiosis or other types of damage to DNA, which then may undergo error-prone repair, cause an error during other forms of repair, or cause an error during replication. Mutations may also result from insertion or deletion of segments of DNA due to mobile genetic elements.
Circular dichroism (CD) is dichroism involving circularly polarized light, i.e., the differential absorption of left- and right-handed light. Left-hand circular (LHC) and right-hand circular (RHC) polarized light represent two possible spin angular momentum states for a photon, and so circular dichroism is also referred to as dichroism for spin angular momentum. This phenomenon was discovered by Jean-Baptiste Biot, Augustin Fresnel, and Aimé Cotton in the first half of the 19th century. Circular dichroism and circular birefringence are manifestations of optical activity. It is exhibited in the absorption bands of optically active chiral molecules. CD spectroscopy has a wide range of applications in many different fields. Most notably, UV CD is used to investigate the secondary structure of proteins. UV/Vis CD is used to investigate charge-transfer transitions. Near-infrared CD is used to investigate geometric and electronic structure by probing metal d→d transitions. Vibrational circular dichroism, which uses light from the infrared energy region, is used for structural studies of small organic molecules, and most recently proteins and DNA.
In bioinformatics and evolutionary biology, a substitution matrix describes the frequency at which a character in a nucleotide sequence or a protein sequence changes to other character states over evolutionary time. The information is often in the form of log odds of finding two specific character states aligned and depends on the assumed number of evolutionary changes or sequence dissimilarity between compared sequences. It is an application of a stochastic matrix. Substitution matrices are usually seen in the context of amino acid or DNA sequence alignments, where they are used to calculate similarity scores between the aligned sequences.
In genetics, a missense mutation is a point mutation in which a single nucleotide change results in a codon that codes for a different amino acid. It is a type of nonsynonymous substitution.
In biology, a substitution model, also called models of sequence evolution, are Markov models that describe changes over evolutionary time. These models describe evolutionary changes in macromolecules, such as DNA sequences or protein sequences, that can be represented as sequence of symbols. Substitution models are used to calculate the likelihood of phylogenetic trees using multiple sequence alignment data. Thus, substitution models are central to maximum likelihood estimation of phylogeny as well as Bayesian inference in phylogeny. Estimates of evolutionary distances are typically calculated using substitution models. Substitution models are also central to phylogenetic invariants because they are necessary to predict site pattern frequencies given a tree topology. Substitution models are also necessary to simulate sequence data for a group of organisms related by a specific tree.
A synonymous substitution is the evolutionary substitution of one base for another in an exon of a gene coding for a protein, such that the produced amino acid sequence is not modified. This is possible because the genetic code is "degenerate", meaning that some amino acids are coded for by more than one three-base-pair codon; since some of the codons for a given amino acid differ by just one base pair from others coding for the same amino acid, a mutation that replaces the "normal" base by one of the alternatives will result in incorporation of the same amino acid into the growing polypeptide chain when the gene is translated. Synonymous substitutions and mutations affecting noncoding DNA are often considered silent mutations; however, it is not always the case that the mutation is silent.
Genetic distance is a measure of the genetic divergence between species or between populations within a species, whether the distance measures time from common ancestor or degree of differentiation. Populations with many similar alleles have small genetic distances. This indicates that they are closely related and have a recent common ancestor.
A point accepted mutation — also known as a PAM — is the replacement of a single amino acid in the primary structure of a protein with another single amino acid, which is accepted by the processes of natural selection. This definition does not include all point mutations in the DNA of an organism. In particular, silent mutations are not point accepted mutations, nor are mutations that are lethal or that are rejected by natural selection in other ways.
Neutral mutations are changes in DNA sequence that are neither beneficial nor detrimental to the ability of an organism to survive and reproduce. In population genetics, mutations in which natural selection does not affect the spread of the mutation in a species are termed neutral mutations. Neutral mutations that are inheritable and not linked to any genes under selection will be lost or will replace all other alleles of the gene. That loss or fixation of the gene proceeds based on random sampling known as genetic drift. A neutral mutation that is in linkage disequilibrium with other alleles that are under selection may proceed to loss or fixation via genetic hitchhiking and/or background selection.
In bioinformatics, the BLOSUM matrix is a substitution matrix used for sequence alignment of proteins. BLOSUM matrices are used to score alignments between evolutionarily divergent protein sequences. They are based on local alignments. BLOSUM matrices were first introduced in a paper by Steven Henikoff and Jorja Henikoff. They scanned the BLOCKS database for very conserved regions of protein families and then counted the relative frequencies of amino acids and their substitution probabilities. Then, they calculated a log-odds score for each of the 210 possible substitution pairs of the 20 standard amino acids. All BLOSUM matrices are based on observed alignments; they are not extrapolated from comparisons of closely related proteins like the PAM Matrices.
Phi value analysis, analysis, or -value analysis is an experimental protein engineering technique for studying the structure of the folding transition state of small protein domains that fold in a two-state manner. The structure of the folding transition state is hard to find using methods such as protein NMR or X-ray crystallography because folding transitions states are mobile and partly unstructured by definition. In -value analysis, the folding kinetics and conformational folding stability of the wild-type protein are compared with those of point mutants to find phi values. These measure the mutant residue's energetic contribution to the folding transition state, which reveals the degree of native structure around the mutated residue in the transition state, by accounting for the relative free energies of the unfolded state, the folded state, and the transition state for the wild-type and mutant proteins.
A number of different Markov models of DNA sequence evolution have been proposed. These substitution models differ in terms of the parameters used to describe the rates at which one nucleotide replaces another during evolution. These models are frequently used in molecular phylogenetic analyses. In particular, they are used during the calculation of likelihood of a tree and they are used to estimate the evolutionary distance between sequences from the observed differences between the sequences.
In protein structure prediction, statistical potentials or knowledge-based potentials are scoring functions derived from an analysis of known protein structures in the Protein Data Bank (PDB).
Implicit solvation is a method to represent solvent as a continuous medium instead of individual “explicit” solvent molecules, most often used in molecular dynamics simulations and in other applications of molecular mechanics. The method is often applied to estimate free energy of solute-solvent interactions in structural and chemical processes, such as folding or conformational transitions of proteins, DNA, RNA, and polysaccharides, association of biological macromolecules with ligands, or transport of drugs across biological membranes.
Missense mRNA is a messenger RNA bearing one or more mutated codons that yield polypeptides with an amino acid sequence different from the wild-type or naturally occurring polypeptide. Missense mRNA molecules are created when template DNA strands or the mRNA strands themselves undergo a missense mutation in which a protein coding sequence is mutated and an altered amino acid sequence is coded for.
The Gaussian network model (GNM) is a representation of a biological macromolecule as an elastic mass-and-spring network to study, understand, and characterize the mechanical aspects of its long-time large-scale dynamics. The model has a wide range of applications from small proteins such as enzymes composed of a single domain, to large macromolecular assemblies such as a ribosome or a viral capsid. Protein domain dynamics plays key roles in a multitude of molecular recognition and cell signalling processes. Protein domains, connected by intrinsically disordered flexible linker domains, induce long-range allostery via protein domain dynamics. The resultant dynamic modes cannot be generally predicted from static structures of either the entire protein or individual domains.
I-sites are short sequence-structure motifs that are mined from the Protein Data Bank (PDB) that correlate strongly with three-dimensional structural elements. These sequence-structure motifs are used for the local structure prediction of proteins. Local structure can be expressed as fragments or as backbone angles. Locations in the protein sequence that have high confidence I-sites predictions may be the initiation sites of folding. I-sites have also been identified as discrete models for folding pathways. I-sites consist of about 250 motifs. Each motif has an amino acid profile, a fragment structure and optionally, a 4-dimensional tensor of pairwise sequence covariance.
A conservative replacement is an amino acid replacement in a protein that changes a given amino acid to a different amino acid with similar biochemical properties.
Sequence saturation mutagenesis (SeSaM) is a chemo-enzymatic random mutagenesis method applied for the directed evolution of proteins and enzymes. It is one of the most common saturation mutagenesis techniques. In four PCR-based reaction steps, phosphorothioate nucleotides are inserted in the gene sequence, cleaved and the resulting fragments elongated by universal or degenerate nucleotides. These nucleotides are then replaced by standard nucleotides, allowing for a broad distribution of nucleic acid mutations spread over the gene sequence with a preference to transversions and with a unique focus on consecutive point mutations, both difficult to generate by other mutagenesis techniques. The technique was developed by Professor Ulrich Schwaneberg at Jacobs University Bremen and RWTH Aachen University.
{{cite journal}}
: CS1 maint: multiple names: authors list (link)