Relative accessible surface area

Last updated

Relative accessible surface area or relative solvent accessibility (RSA) of a protein residue is a measure of residue solvent exposure. It can be calculated by formula:

[1]

where ASA is the solvent accessible surface area and MaxASA is the maximum possible solvent accessible surface area for the residue. [1] Both ASA and MaxASA are commonly measured in .

To measure the relative solvent accessibility of the residue side-chain only, one usually takes MaxASA values that have been obtained from Gly-X-Gly tripeptides, where X is the residue of interest. Several MaxASA scales have been published [1] [2] [3] and are commonly used (see Table).

ResidueTien et al. 2013 (theor.) [1] Tien et al. 2013 (emp.) [1] Miller et al. 1987 [2] Rose et al. 1985 [3]
Alanine129.0121.0113.0118.1
Arginine274.0265.0241.0256.0
Asparagine195.0187.0158.0165.5
Aspartate193.0187.0151.0158.7
Cysteine167.0148.0140.0146.1
Glutamate223.0214.0183.0186.2
Glutamine225.0214.0189.0193.2
Glycine104.097.085.088.1
Histidine224.0216.0194.0202.5
Isoleucine197.0195.0182.0181.0
Leucine201.0191.0180.0193.1
Lysine236.0230.0211.0225.8
Methionine224.0203.0204.0203.4
Phenylalanine240.0228.0218.0222.8
Proline159.0154.0143.0146.8
Serine155.0143.0122.0129.8
Threonine172.0163.0146.0152.5
Tryptophan285.0264.0259.0266.3
Tyrosine263.0255.0229.0236.8
Valine174.0165.0160.0164.5

In this table, the more recently published MaxASA values (from Tien et al. 2013 [1] ) are systematically larger than the older values (from Miller et al. 1987 [2] or Rose et al. 1985 [3] ). This discrepancy can be traced back to the conformation in which the Gly-X-Gly tripeptides are evaluated to calculate MaxASA. The earlier works used the extended conformation, with backbone angles of and . [2] [3] However, Tien et al. 2013 [1] demonstrated that tripeptides in extended conformation fall among the least-exposed conformations. The largest ASA values are consistently observed in alpha helices, with backbone angles around and . Tien et al. 2013 recommend to use their theoretical MaxASA values (2nd column in Table), as they were obtained from a systematic enumeration of all possible conformations and likely represent a true upper bound to observable ASA. [1]

ASA and hence RSA values are generally calculated from a protein structure, for example with the software DSSP. [4] However, there is also an extensive literature attempting to predict RSA values from sequence data, using machine-learning approaches. [5] [6]

Prediction tools

Experimentally predicting RSA is an expensive and time-consuming task. In recent decades, several computational methods have been introduced for RSA prediction. [7] [8] [9]

Related Research Articles

<span class="mw-page-title-main">Protein secondary structure</span> General three-dimensional form of local segments of proteins

Protein secondary structure is the local spatial conformation of the polypeptide backbone excluding the side chains. The two most common secondary structural elements are alpha helices and beta sheets, though beta turns and omega loops occur as well. Secondary structure elements typically spontaneously form as an intermediate before the protein folds into its three dimensional tertiary structure.

Circular dichroism (CD) is dichroism involving circularly polarized light, i.e., the differential absorption of left- and right-handed light. Left-hand circular (LHC) and right-hand circular (RHC) polarized light represent two possible spin angular momentum states for a photon, and so circular dichroism is also referred to as dichroism for spin angular momentum. This phenomenon was discovered by Jean-Baptiste Biot, Augustin Fresnel, and Aimé Cotton in the first half of the 19th century. Circular dichroism and circular birefringence are manifestations of optical activity. It is exhibited in the absorption bands of optically active chiral molecules. CD spectroscopy has a wide range of applications in many different fields. Most notably, UV CD is used to investigate the secondary structure of proteins. UV/Vis CD is used to investigate charge-transfer transitions. Near-infrared CD is used to investigate geometric and electronic structure by probing metal d→d transitions. Vibrational circular dichroism, which uses light from the infrared energy region, is used for structural studies of small organic molecules, and most recently proteins and DNA.

<span class="mw-page-title-main">Protein structure prediction</span> Type of biological prediction

Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is different from the inverse problem of protein design. Protein structure prediction is one of the most important goals pursued by computational biology; and it is important in medicine and biotechnology.

<span class="mw-page-title-main">Molecular mechanics</span> Use of classical mechanics to model molecular systems

Molecular mechanics uses classical mechanics to model molecular systems. The Born–Oppenheimer approximation is assumed valid and the potential energy of all systems is calculated as a function of the nuclear coordinates using force fields. Molecular mechanics can be used to study molecule systems ranging in size and complexity from small to large biological systems or material assemblies with many thousands to millions of atoms.

<span class="mw-page-title-main">Ramachandran plot</span> Visual representation of allowable protein conformations

In biochemistry, a Ramachandran plot, originally developed in 1963 by G. N. Ramachandran, C. Ramakrishnan, and V. Sasisekharan, is a way to visualize energetically allowed regions for backbone dihedral angles ψ against φ of amino acid residues in protein structure. The figure on the left illustrates the definition of the φ and ψ backbone dihedral angles. The ω angle at the peptide bond is normally 180°, since the partial-double-bond character keeps the peptide bond planar. The figure in the top right shows the allowed φ,ψ backbone conformational regions from the Ramachandran et al. 1963 and 1968 hard-sphere calculations: full radius in solid outline, reduced radius in dashed, and relaxed tau (N-Cα-C) angle in dotted lines. Because dihedral angle values are circular and 0° is the same as 360°, the edges of the Ramachandran plot "wrap" right-to-left and bottom-to-top. For instance, the small strip of allowed values along the lower-left edge of the plot are a continuation of the large, extended-chain region at upper left.

Protein design is the rational design of new protein molecules to design novel activity, behavior, or purpose, and to advance basic understanding of protein function. Proteins can be designed from scratch or by making calculated variants of a known protein structure and its sequence. Rational protein design approaches make protein-sequence predictions that will fold to specific structures. These predicted sequences can then be validated experimentally through methods such as peptide synthesis, site-directed mutagenesis, or artificial gene synthesis.

<span class="mw-page-title-main">Epitope mapping</span> Identifying the binding site of an antibody on its target antigen

In immunology, epitope mapping is the process of experimentally identifying the binding site, or epitope, of an antibody on its target antigen. Identification and characterization of antibody binding sites aid in the discovery and development of new therapeutics, vaccines, and diagnostics. Epitope characterization can also help elucidate the binding mechanism of an antibody and can strengthen intellectual property (patent) protection. Experimental epitope mapping data can be incorporated into robust algorithms to facilitate in silico prediction of B-cell epitopes based on sequence and/or structural data.

<span class="mw-page-title-main">Docking (molecular)</span> Prediction method in molecular modeling

In the field of molecular modeling, docking is a method which predicts the preferred orientation of one molecule to a second when a ligand and a target are bound to each other to form a stable complex. Knowledge of the preferred orientation in turn may be used to predict the strength of association or binding affinity between two molecules using, for example, scoring functions.

In computational biology, protein pKa calculations are used to estimate the pKa values of amino acids as they exist within proteins. These calculations complement the pKa values reported for amino acids in their free state, and are used frequently within the fields of molecular modeling, structural bioinformatics, and computational biology.

3<sub>10</sub> helix Type of secondary structure

A 310 helix is a type of secondary structure found in proteins and polypeptides. Of the numerous protein secondary structures present, the 310-helix is the fourth most common type observed; following α-helices, β-sheets and reverse turns. 310-helices constitute nearly 10–15% of all helices in protein secondary structures, and are typically observed as extensions of α-helices found at either their N- or C- termini. Because of the α-helices tendency to consistently fold and unfold, it has been proposed that the 310-helix serves as an intermediary conformation of sorts, and provides insight into the initiation of α-helix folding.

<span class="mw-page-title-main">Accessible surface area</span>

The accessible surface area (ASA) or solvent-accessible surface area (SASA) is the surface area of a biomolecule that is accessible to a solvent. Measurement of ASA is usually described in units of square angstroms. ASA was first described by Lee & Richards in 1971 and is sometimes called the Lee-Richards molecular surface. ASA is typically calculated using the 'rolling ball' algorithm developed by Shrake & Rupley in 1973. This algorithm uses a sphere of a particular radius to 'probe' the surface of the molecule.

Phi value analysis, analysis, or -value analysis is an experimental protein engineering technique for studying the structure of the folding transition state of small protein domains that fold in a two-state manner. The structure of the folding transition state is hard to find using methods such as protein NMR or X-ray crystallography because folding transitions states are mobile and partly unstructured by definition. In -value analysis, the folding kinetics and conformational folding stability of the wild-type protein are compared with those of point mutants to find phi values. These measure the mutant residue's energetic contribution to the folding transition state, which reveals the degree of native structure around the mutated residue in the transition state, by accounting for the relative free energies of the unfolded state, the folded state, and the transition state for the wild-type and mutant proteins.

In biochemistry, equilibrium unfolding is the process of unfolding a protein or RNA molecule by gradually changing its environment, such as by changing the temperature or pressure, pH, adding chemical denaturants, or applying force as with an atomic force microscope tip. If the equilibrium was maintained at all steps, the process theoretically should be reversible during equilibrium folding. Equilibrium unfolding can be used to determine the thermodynamic stability of the protein or RNA structure, i.e. free energy difference between the folded and unfolded states.

<span class="mw-page-title-main">Hydrophobic collapse</span> Process in protein folding

Hydrophobic collapse is a proposed process for the production of the 3-D conformation adopted by polypeptides and other molecules in polar solvents. The theory states that the nascent polypeptide forms initial secondary structure creating localized regions of predominantly hydrophobic residues. The polypeptide interacts with water, thus placing thermodynamic pressures on these regions which then aggregate or "collapse" into a tertiary conformation with a hydrophobic core. Incidentally, polar residues interact favourably with water, thus the solvent-facing surface of the peptide is usually composed of predominantly hydrophilic regions.

<span class="mw-page-title-main">Half sphere exposure</span> Protein solvent exposure measure

Half Sphere exposure (HSE) is a protein solvent exposure measure that was first introduced by Hamelryck (2005). Like all solvent exposure measures it measures how buried amino acid residues are in a protein. It is found by counting the number of amino acid neighbors within two half spheres of chosen radius around the amino acid. The calculation of HSE is found by dividing a contact number (CN) sphere in two halves by the plane perpendicular to the Cβ-Cα vector. This simple division of the CN sphere results in two strikingly different measures, HSE-up and HSE-down. HSE-up is defined as the number of Cα atoms in the upper half and analogously HSE-down is defined as the number of Cα atoms in the opposite sphere.

Implicit solvation is a method to represent solvent as a continuous medium instead of individual “explicit” solvent molecules, most often used in molecular dynamics simulations and in other applications of molecular mechanics. The method is often applied to estimate free energy of solute-solvent interactions in structural and chemical processes, such as folding or conformational transitions of proteins, DNA, RNA, and polysaccharides, association of biological macromolecules with ligands, or transport of drugs across biological membranes.

Hydrophobicity scales are values that define the relative hydrophobicity or hydrophilicity of amino acid residues. The more positive the value, the more hydrophobic are the amino acids located in that region of the protein. These scales are commonly used to predict the transmembrane alpha-helices of membrane proteins. When consecutively measuring amino acids of a protein, changes in value indicate attraction of specific protein regions towards the hydrophobic region inside lipid bilayer.

Residue depth (RD) is a solvent exposure measure that describes to what extent a residue is buried in the protein structure space. It complements the information provided by conventional accessible surface area (ASA).

Volume, Area, Dihedral Angle Reporter (VADAR) is a freely available protein structure validation web server that was developed as a collaboration between Dr. Brian Sykes and Dr. David Wishart at the University of Alberta. VADAR consists of over 15 different algorithms and programs for assessing and validating peptide and protein structures from their PDB coordinate data. VADAR is capable of determining secondary structure, identifying and classifying six different types of beta turns, determining and calculating the strength of C=O -- N-H hydrogen bonds, calculating residue-specific accessible surface areas (ASA), calculating residue volumes, determining backbone and side chain torsion angles, assessing local structure quality, evaluating global structure quality, and identifying residue "outliers". The results have been validated through extensive comparison to published data and careful visual inspection. VADAR produces both text and graphical output with most of the quantitative data presented in easily viewed tables. In particular, VADAR's output is presented in a vertical, tabular format with most of the sequence data, residue numbering and any other calculated property or feature presented from top to bottom, rather than from left to right.

<span class="mw-page-title-main">Backbone-dependent rotamer library</span> Collection of data on conformations of a given proteins amino acid side chains

In biochemistry, a backbone-dependent rotamer library provides the frequencies, mean dihedral angles, and standard deviations of the discrete conformations of the amino acid side chains in proteins as a function of the backbone dihedral angles φ and ψ of the Ramachandran map. By contrast, backbone-independent rotamer libraries express the frequencies and mean dihedral angles for all side chains in proteins, regardless of the backbone conformation of each residue type. Backbone-dependent rotamer libraries have been shown to have significant advantages over backbone-independent rotamer libraries, principally when used as an energy term, by speeding up search times of side-chain packing algorithms used in protein structure prediction and protein design.

References

  1. 1 2 3 4 5 6 7 8 Tien, M. Z.; Meyer, A. G.; Sydykova, D. K.; Spielman, S. J.; Wilke, C. O. (2013). "Maximum allowed solvent accessibilites of residues in proteins". PLOS ONE. 8 (11): e80635. arXiv: 1211.4251 . Bibcode:2013PLoSO...880635T. doi: 10.1371/journal.pone.0080635 . PMC   3836772 . PMID   24278298.
  2. 1 2 3 4 Miller, S.; Janin, J.; Lesk, A. M.; Chothia, C. (1987). "Interior and surface of monomeric proteins". J. Mol. Biol. 196 (3): 641–656. doi:10.1016/0022-2836(87)90038-6. PMID   3681970.
  3. 1 2 3 4 Rose, G. D.; Geselowitz, A. R.; Lesser, G. J.; Lee, R. H.; Zehfus, M. H. (1985). "Hydrophobicity of amino acid residues in globular proteins". Science. 229 (4716): 834–838. Bibcode:1985Sci...229..834R. doi:10.1126/science.4023714. PMID   4023714. S2CID   22227053.
  4. Kabsch, W.; Sander, C. (1983). "Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features". Biopolymers. 22 (12): 2577–2637. doi:10.1002/bip.360221211. PMID   6667333. S2CID   29185760.
  5. Hyunsoo, Kim; Haesun, Park (2003). "Prediction of Protein Relative Solvent Accessibility with Support Vector Machines and Long-range Interaction 3D Local Descriptor" (PDF). Retrieved 10 April 2015.
  6. Rost, Burkhard; Sander, Chris (1994). "Conservation and prediction of solvent accessibility in protein families". Proteins. 20 (3): 216–26. doi:10.1002/prot.340200303. PMID   7892171. S2CID   19285647 . Retrieved 10 April 2015.
  7. Kaleel, Manaz; Torrisi, Mirko; Mooney, Catherine; Pollastri, Gianluca (2019-09-01). "PaleAle 5.0: prediction of protein relative solvent accessibility by deep learning". Amino Acids. 51 (9): 1289–1296. doi:10.1007/s00726-019-02767-6. hdl: 10197/11324 . ISSN   1438-2199. PMID   31388850. S2CID   199469523.
  8. Wang, Sheng; Li, Wei; Liu, Shiwang; Xu, Jinbo (2016-07-08). "RaptorX-Property: a web server for protein structure property prediction". Nucleic Acids Research. 44 (W1): W430–W435. doi:10.1093/nar/gkw306. ISSN   0305-1048. PMC   4987890 . PMID   27112573.
  9. Magnan, Christophe N.; Baldi, Pierre (2014-09-15). "SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity". Bioinformatics. 30 (18): 2592–2597. doi:10.1093/bioinformatics/btu352. ISSN   1367-4803. PMC   4215083 . PMID   24860169.