Protein pKa calculations

Last updated

In computational biology, protein pKa calculations are used to estimate the pKa values of amino acids as they exist within proteins. These calculations complement the pKa values reported for amino acids in their free state, and are used frequently within the fields of molecular modeling, structural bioinformatics, and computational biology.

Contents

Amino acid pKa values

pKa values of amino acid side chains play an important role in defining the pH-dependent characteristics of a protein. The pH-dependence of the activity displayed by enzymes and the pH-dependence of protein stability, for example, are properties that are determined by the pKa values of amino acid side chains.

The pKa values of an amino acid side chain in solution is typically inferred from the pKa values of model compounds (compounds that are similar to the side chains of amino acids). See Amino acid for the pKa values of all amino acid side chains inferred in such a way. There are also numerous experimental studies that have yielded such values, for example by use of NMR spectroscopy.

The table below lists the model pKa values that are often used in a protein pKa calculation, and contains a third column based on protein studies. [1]

Amino AcidpKapKa
Asp (D)3.94.0
Glu (E)4.34.4
Arg (R)12.013.5
Lys (K)10.510.4
His (H)6.086.8
Cys (C) (–SH)8.288.3
Tyr (Y)10.19.6
N-term8.0
C-term3.6

The effect of the protein environment

Coupled system consisting of three acids. The black curve shows a back-titration event Back titration.jpg
Coupled system consisting of three acids. The black curve shows a back-titration event

When a protein folds, the titratable amino acids in the protein are transferred from a solution-like environment to an environment determined by the 3-dimensional structure of the protein. For example, in an unfolded protein, an aspartic acid typically is in an environment which exposes the titratable side chain to water. When the protein folds, the aspartic acid could find itself buried deep in the protein interior with no exposure to solvent.

Furthermore, in the folded protein, the aspartic acid will be closer to other titratable groups in the protein and will also interact with permanent charges (e.g. ions) and dipoles in the protein. All of these effects alter the pKa value of the amino acid side chain, and pKa calculation methods generally calculate the effect of the protein environment on the model pKa value of an amino acid side chain. [2] [3] [4] [5]

Typically, the effects of the protein environment on the amino acid pKa value are divided into pH-independent effects and pH-dependent effects. The pH-independent effects (desolvation, interactions with permanent charges and dipoles) are added to the model pKa value to give the intrinsic pKa value. The pH-dependent effects cannot be added in the same straightforward way and have to be accounted for using Boltzmann summation, Tanford–Roxby iterations or other methods.

The interplay of the intrinsic pKa values of a system with the electrostatic interaction energies between titratable groups can produce quite spectacular effects such as non-Henderson–Hasselbalch titration curves and even back-titration effects. [6]

The image on the right shows a theoretical system consisting of three acidic residues. One group is displaying a back-titration event (blue group).

pKa calculation methods

Several software packages and webserver are available for the calculation of protein pKa values.

Using the Poisson–Boltzmann equation

Some methods are based on solutions to the Poisson–Boltzmann equation (PBE), often referred to as FDPB-based methods (FDPB stands for "finite difference Poisson–Boltzmann"). The PBE is a modification of Poisson's equation that incorporates a description of the effect of solvent ions on the electrostatic field around a molecule.

The H++ web server, [7] the pKD webserver, [8] MCCE2, Karlsberg+,[ dead link ] PETIT and GMCT use the FDPB method to compute pKa values of amino acid side chains.

FDPB-based methods calculate the change in the pKa value of an amino acid side chain when that side chain is moved from a hypothetical fully solvated state to its position in the protein. To perform such a calculation, one needs theoretical methods that can calculate the effect of the protein interior on a pKa value, and knowledge of the pKa values of amino acid side chains in their fully solvated states. [2] [3] [4] [5]

Empirical methods

A set of empirical rules relating the protein structure to the pKa values of ionizable residues have been developed by Li, Robertson, and Jensen. [9] These rules form the basis for the web-accessible program called PROPKA for rapid predictions of pKa values. A recent empirical pKa prediction program was released by Tan KP et.al. with the online server DEPTH web server. [10]

Molecular dynamics (MD)-based methods

Molecular dynamics methods of calculating pKa values make it possible to include full flexibility of the titrated molecule. [11] [12] [13]

Molecular dynamics based methods are typically much more computationally expensive, and not necessarily more accurate, ways to predict pKa values than approaches based on the Poisson–Boltzmann equation. Limited conformational flexibility can also be realized within a continuum electrostatics approach, e.g., for considering multiple amino acid sidechain rotamers. In addition, current commonly used molecular force fields do not take electronic polarizability into account, which could be an important property in determining protonation energies.

Determining pKa values from titration curves or free energy calculations

From the titration of protonatable group, one can read the so-called pKa12 which is equal to the pH value where the group is half-protonated (i.e. when 50% such groups would be protonated). The pKa12 is equal to the Henderson–Hasselbalch pKa (pKHH
a
) if the titration curve follows the Henderson–Hasselbalch equation. [14] Most pKa calculation methods silently assume that all titration curves are Henderson–Hasselbalch shaped, and pKa values in pKa calculation programs are therefore often determined in this way. In the general case of multiple interacting protonatable sites, the pKa12 value is not thermodynamically meaningful. In contrast, the Henderson–Hasselbalch pKa value can be computed from the protonation free energy via

and is thus in turn related to the protonation free energy of the site via

The protonation free energy can in principle be computed from the protonation probability of the group x(pH) which can be read from its titration curve

Titration curves can be computed within a continuum electrostatics approach with formally exact but more elaborate analytical or Monte Carlo (MC) methods, or inexact but fast approximate methods. MC methods that have been used to compute titration curves [15] are Metropolis MC [16] [17] or Wang–Landau MC. [18] Approximate methods that use a mean-field approach for computing titration curves are the Tanford–Roxby method and hybrids of this method that combine an exact statistical mechanics treatment within clusters of strongly interacting sites with a mean-field treatment of intercluster interactions. [19] [20] [21] [22] [23]

In practice, it can be difficult to obtain statistically converged and accurate protonation free energies from titration curves if x is close to a value of 1 or 0. In this case, one can use various free energy calculation methods to obtain the protonation free energy [15] such as biased Metropolis MC, [24] free-energy perturbation, [25] [26] thermodynamic integration, [27] [28] [29] the non-equilibrium work method [30] or the Bennett acceptance ratio method. [31]

Note that the pKHH
a
value does in general depend on the pH value. [32]

This dependence is small for weakly interacting groups like well solvated amino acid side chains on the protein surface, but can be large for strongly interacting groups like those buried in enzyme active sites or integral membrane proteins. [33] [34] [35]

While many protein pKa prediction methods are available, their accuracies often differ significantly due to subtle and often drastic differences in strategy. [36]

Related Research Articles

<span class="mw-page-title-main">Acid</span> Chemical compound giving a proton or accepting an electron pair

An acid is a molecule or ion capable of either donating a proton (i.e. hydrogen ion, H+), known as a Brønsted–Lowry acid, or forming a covalent bond with an electron pair, known as a Lewis acid.

<span class="mw-page-title-main">Amino acid</span> Organic compounds containing amine and carboxylic groups

Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although over 500 amino acids exist in nature, by far the most important are the 22 α-amino acids incorporated into proteins. Only these 22 appear in the genetic code of life.

<span class="mw-page-title-main">Chymotrypsin</span> Digestive enzyme

Chymotrypsin (EC 3.4.21.1, chymotrypsins A and B, alpha-chymar ophth, avazyme, chymar, chymotest, enzeon, quimar, quimotrase, alpha-chymar, alpha-chymotrypsin A, alpha-chymotrypsin) is a digestive enzyme component of pancreatic juice acting in the duodenum, where it performs proteolysis, the breakdown of proteins and polypeptides. Chymotrypsin preferentially cleaves peptide amide bonds where the side chain of the amino acid N-terminal to the scissile amide bond (the P1 position) is a large hydrophobic amino acid (tyrosine, tryptophan, and phenylalanine). These amino acids contain an aromatic ring in their side chain that fits into a hydrophobic pocket (the S1 position) of the enzyme. It is activated in the presence of trypsin. The hydrophobic and shape complementarity between the peptide substrate P1 side chain and the enzyme S1 binding cavity accounts for the substrate specificity of this enzyme. Chymotrypsin also hydrolyzes other amide bonds in peptides at slower rates, particularly those containing leucine at the P1 position.

The isoelectric point (pI, pH(I), IEP), is the pH at which a molecule carries no net electrical charge or is electrically neutral in the statistical mean. The standard nomenclature to represent the isoelectric point is pH(I). However, pI is also used. For brevity, this article uses pI. The net charge on the molecule is affected by pH of its surrounding environment and can become more positively or negatively charged due to the gain or loss, respectively, of protons (H+).

<span class="mw-page-title-main">Titration</span> Laboratory method for determining the concentration of an analyte

Titration is a common laboratory method of quantitative chemical analysis to determine the concentration of an identified analyte. A reagent, termed the titrant or titrator, is prepared as a standard solution of known concentration and volume. The titrant reacts with a solution of analyte to determine the analyte's concentration. The volume of titrant that reacted with the analyte is termed the titration volume.

In chemistry, a zwitterion, also called an inner salt or dipolar ion, is a molecule that contains an equal number of positively and negatively charged functional groups. With amino acids, for example, in solution a chemical equilibrium will be established between the "parent" molecule and the zwitterion.

A buffer solution is a solution where the pH does not change significantly on dilution or if an acid or base is added at constant temperature. Its pH changes very little when a small amount of strong acid or base is added to it. Buffer solutions are used as a means of keeping pH at a nearly constant value in a wide variety of chemical applications. In nature, there are many living systems that use buffering for pH regulation. For example, the bicarbonate buffering system is used to regulate the pH of blood, and bicarbonate also acts as a buffer in the ocean.

In chemistry, hydronium (hydroxonium in traditional British English) is the common name for the cation [H3O]+, also written as H3O+, the type of oxonium ion produced by protonation of water. It is often viewed as the positive ion present when an Arrhenius acid is dissolved in water, as Arrhenius acid molecules in solution give up a proton (a positive hydrogen ion, H+) to the surrounding water molecules (H2O). In fact, acids must be surrounded by more than a single water molecule in order to ionize, yielding aqueous H+ and conjugate base. Three main structures for the aqueous proton have garnered experimental support: the Eigen cation, which is a tetrahydrate, H3O+(H2O)3, the Zundel cation, which is a symmetric dihydrate, H+(H2O)2, and the Stoyanov cation, an expanded Zundel cation, which is a hexahydrate: H+(H2O)2(H2O)4. Spectroscopic evidence from well-defined IR spectra overwhelmingly supports the Stoyanov cation as the predominant form. For this reason, it has been suggested that wherever possible, the symbol H+(aq) should be used instead of the hydronium ion.

In chemistry, an acid dissociation constant is a quantitative measure of the strength of an acid in solution. It is the equilibrium constant for a chemical reaction

<span class="mw-page-title-main">Histidine</span> Chemical compound

Histidine (symbol His or H) is an essential amino acid that is used in the biosynthesis of proteins. It contains an α-amino group (which is in the protonated –NH3+ form under biological conditions), a carboxylic acid group (which is in the deprotonated –COO form under biological conditions), and an imidazole side chain (which is partially protonated), classifying it as a positively charged amino acid at physiological pH. Initially thought essential only for infants, it has now been shown in longer-term studies to be essential for adults also. It is encoded by the codons CAU and CAC.

<span class="mw-page-title-main">Proteinogenic amino acid</span> Amino acid that is incorporated biosynthetically into proteins during translation

Proteinogenic amino acids are amino acids that are incorporated biosynthetically into proteins during translation. The word "proteinogenic" means "protein creating". Throughout known life, there are 22 genetically encoded (proteinogenic) amino acids, 20 in the standard genetic code and an additional 2 that can be incorporated by special translation mechanisms.

In chemistry and biochemistry, the Henderson–Hasselbalch equation

In the physical sciences, a partition coefficient (P) or distribution coefficient (D) is the ratio of concentrations of a compound in a mixture of two immiscible solvents at equilibrium. This ratio is therefore a comparison of the solubilities of the solute in these two liquids. The partition coefficient generally refers to the concentration ratio of un-ionized species of compound, whereas the distribution coefficient refers to the concentration ratio of all species of the compound.

<span class="mw-page-title-main">Catalytic triad</span> Set of three coordinated amino acids

A catalytic triad is a set of three coordinated amino acids that can be found in the active site of some enzymes. Catalytic triads are most commonly found in hydrolase and transferase enzymes. An acid-base-nucleophile triad is a common motif for generating a nucleophilic residue for covalent catalysis. The residues form a charge-relay network to polarise and activate the nucleophile, which attacks the substrate, forming a covalent intermediate which is then hydrolysed to release the product and regenerate free enzyme. The nucleophile is most commonly a serine or cysteine amino acid, but occasionally threonine or even selenocysteine. The 3D structure of the enzyme brings together the triad residues in a precise orientation, even though they may be far apart in the sequence.

<span class="mw-page-title-main">Salt bridge (protein and supramolecular)</span> Combination of hydrogen and ionic bonding in chemistry

In chemistry, a salt bridge is a combination of two non-covalent interactions: hydrogen bonding and ionic bonding. Ion pairing is one of the most important noncovalent forces in chemistry, in biological systems, in different materials and in many applications such as ion pair chromatography. It is a most commonly observed contribution to the stability to the entropically unfavorable folded conformation of proteins. Although non-covalent interactions are known to be relatively weak interactions, small stabilizing interactions can add up to make an important contribution to the overall stability of a conformer. Not only are salt bridges found in proteins, but they can also be found in supramolecular chemistry. The thermodynamics of each are explored through experimental procedures to access the free energy contribution of the salt bridge to the overall free energy of the state.

<span class="mw-page-title-main">Enzyme catalysis</span> Catalysis of chemical reactions by enzymes

Enzyme catalysis is the increase in the rate of a process by a biological molecule, an "enzyme". Most enzymes are proteins, and most such processes are chemical reactions. Within the enzyme, generally catalysis occurs at a localized site, called the active site.

Equilibrium constants are determined in order to quantify chemical equilibria. When an equilibrium constant K is expressed as a concentration quotient,

In coordination chemistry, a stability constant is an equilibrium constant for the formation of a complex in solution. It is a measure of the strength of the interaction between the reagents that come together to form the complex. There are two main kinds of complex: compounds formed by the interaction of a metal ion with a ligand and supramolecular complexes, such as host–guest complexes and complexes of anions. The stability constant(s) provide(s) the information required to calculate the concentration(s) of the complex(es) in solution. There are many areas of application in chemistry, biology and medicine.

Acid strength is the tendency of an acid, symbolised by the chemical formula , to dissociate into a proton, , and an anion, . The dissociation of a strong acid in solution is effectively complete, except in its most concentrated solutions.

Reed McNeil Izatt was an American chemist who was emeritus Charles E. Maw Professor of Chemistry at Brigham Young University in Provo, Utah. His field of research was macrocyclic chemistry and metal separation technologies.

References

  1. Hass and Mulder (2015) Annu. Rev. Biophys. vol 44 pp. 53–75 doi 10.1146/annurev-biophys-083012-130351.
  2. 1 2 Bashford (2004) Front Biosci. vol. 9 pp. 1082–99 doi 10.2741/1187
  3. 1 2 Gunner et al. (2006) Biochim. Biophys. Acta vol. 1757 (8) pp. 942–68 doi 10.1016/j.bbabio.2006.06.005
  4. 1 2 Ullmann et al. (2008) Photosynth. Res. 97 vol. 112 pp. 33–55 doi 10.1007/s11120-008-9306-1
  5. 1 2 Antosiewicz et al. (2011) Mol. BioSyst. vol. 7 pp. 2923–2949 doi 10.1039/C1MB05170A
  6. A. Onufriev, D.A. Case and G. M. Ullmann (2001). Biochemistry 40: 3413–3419 doi 10.1021/bi002740q
  7. "H++ (web-based computational prediction of protonation states and pK of ionizable groups in macromolecules)". newbiophysics.cs.vt.edu. Retrieved 2023-01-26.
  8. Tynan-Connolly, B. M.; Nielsen, J. E. (2006-12-22). "Redesigning protein pKa values". Protein Science. 16 (2): 239–249. doi:10.1110/ps.062538707. ISSN   0961-8368. PMC   2203286 . PMID   17189477.
  9. Li, Hui; Robertson, Andrew D.; Jensen, Jan H. (2005-10-17). "Very fast empirical prediction and rationalization of protein pKa values". Proteins: Structure, Function, and Bioinformatics. 61 (4): 704–721. doi:10.1002/prot.20660. PMID   16231289. S2CID   38196246.
  10. Tan, Kuan Pern; Nguyen, Thanh Binh; Patel, Siddharth; Varadarajan, Raghavan; Madhusudhan, M. S. (2013-07-01). "Depth: a web server to compute depth, cavity sizes, detect potential small-molecule ligand-binding cavities and predict the pKa of ionizable residues in proteins". Nucleic Acids Research. 41 (W1): W314–W321. doi:10.1093/nar/gkt503. ISSN   1362-4962. PMC   3692129 . PMID   23766289.
  11. Donnini et al. (2011) J. Chem. Theory Comp. vol 7 pp. 1962–78 doi 10.1021/ct200061r.
  12. Wallace et al. (2011) J. Chem. Theory Comp. vol 7 pp. 2617–2629 doi 10.1021/ct200146j.
  13. Goh et al. (2012) J. Chem. Theory Comp. vol 8 pp. 36–46 doi 10.1021/ct2006314.
  14. Ullmann (2003) J. Phys. Chem. B vol 107 pp. 1263–71 doi 10.1021/jp026454v.
  15. 1 2 Ullmann et al. (2012) J. Comput. Chem. vol 33 pp. 887–900 doi 10.1002/jcc.22919
  16. Metropolis et al. (1953) J. Chem. Phys. vol 23 pp. 1087–1092 doi 10.1063/1.1699114
  17. Beroza et al. (1991) Proc. Natl. Acad. Sci. USA vol 88 pp. 5804–5808 doi 10.1073/pnas.88.13.5804
  18. Wang and Landau (2001) Phys. Rev. E vol 64 pp 056101 doi 10.1103/PhysRevE.64.056101
  19. Tanford and Roxby (1972) Biochemistry vol 11 pp. 2192–2198 doi 10.1021/bi00761a029
  20. Bashford and Karplus (1991) J. Phys. Chem. vol 95 pp. 9556–61 doi 10.1021/j100176a093
  21. Gilson (1993) Proteins vol 15 pp. 266–82 doi 10.1002/prot.340150305
  22. Antosiewicz et al. (1994) J. Mol. Biol. vol 238 pp. 415–36 doi 10.1006/jmbi.1994.1301
  23. Spassov and Bashford (1999) J. Comput. Chem. vol 20 pp. 1091–1111 doi 10.1002/(SICI)1096-987X(199908)20:11<1091::AID-JCC1>3.0.CO;2-3
  24. Beroza et al. (1995) Biophys. J. vol 68 pp. 2233–2250 doi 10.1016/S0006-3495(95)80406-6
  25. Zwanzig (1954) J. Chem. Phys. vol 22 pp. 1420–1426 doi 10.1063/1.1740409
  26. Ullmann et al. 2011 J. Phys. Chem. B. vol 68 pp. 507–521 doi 10.1021/jp1093838
  27. Kirkwood (1935) J. Chem. Phys. vol 2 pp. 300–313 doi 10.1063/1.1749657
  28. Bruckner and Boresch (2011) J. Comput. Chem. vol 32 pp. 1303–1319 doi 10.1002/jcc.21713
  29. Bruckner and Boresch (2011) J. Comput. Chem. vol 32 pp. 1320–1333 doi 10.1002/jcc.21712
  30. Jarzynski (1997) Phys. Rev. E vol pp. 2233–2250 doi 10.1103/PhysRevE.56.5018
  31. Bennett (1976) J. Comput. Phys. vol 22 pp. 245–268 doi 10.1016/0021-9991(76)90078-4
  32. Bombarda et al. (2010) J. Phys. Chem. B vol 114 pp. 1994–2003 doi 10.1021/jp908926w.
  33. Bashford and Gerwert (1992) J. Mol. Biol. vol 224 pp. 473–86 doi 10.1016/0022-2836(92)91009-E
  34. Spassov et al. (2001) J. Mol. Biol. vol 312 pp. 203–19 doi 10.1006/jmbi.2001.4902
  35. Ullmann et al. (2011) J. Phys. Chem. B vol 115 pp. 10346–59 doi 10.1021/jp204644h
  36. Wanlei Wei, Hervé Hogues, and Traian Sulea (2023) J. Chem. Inf. Model. vol 63, iss 16, pp. 5169–5181