Half sphere exposure

Last updated
Half Sphere Exposure (HSE) construction. This simple, two-dimensional measure of solvent exposure counts the number of neighbors in two domes (with radius R typically equal to 10 or 12 A) around the Ca atom. It is simple and extremely fast to compute, and superior to the widely used Contact Number measure. The HSE value pair (up and down) of the example above is (3,5). HSECa.png
Half Sphere Exposure (HSE) construction. This simple, two-dimensional measure of solvent exposure counts the number of neighbors in two domes (with radius R typically equal to 10 or 12 Å) around the atom. It is simple and extremely fast to compute, and superior to the widely used Contact Number measure. The HSE value pair (up and down) of the example above is (3,5).

Half Sphere exposure (HSE) is a protein solvent exposure measure that was first introduced by Hamelryck (2005). [1] Like all solvent exposure measures it measures how buried amino acid residues are in a protein. It is found by counting the number of amino acid neighbors within two half spheres of chosen radius around the amino acid. The calculation of HSE is found by dividing a contact number (CN) sphere in two halves by the plane perpendicular to the Cβ-Cα vector. This simple division of the CN sphere results in two strikingly different measures, HSE-up and HSE-down. HSE-up is defined as the number of Cα atoms in the upper half (containing the pseudo-Cβ atom) and analogously HSE-down is defined as the number of Cα atoms in the opposite sphere.

If only Cα atoms are available (as is the case for many simplified representations of protein structure), a related measure, called HSEα, can be used. HSEα uses a pseudo-Cβ instead of the real Cβ atom for its calculation. The position of this pseudo-Cβ atom (pCβ) is derived from the positions of preceding Cα−1 and the following Cα+1. The Cα-pCβ vector is calculated by adding the Cα−1-Cα0 and Cα+1-Cα0 vectors.

HSE is used in predicting discontinuous B-cell epitopes. [2] Song et al. have developed an online webserver termed HSEpred to predict half-sphere exposure from protein primary sequences. [3] HSEpred server can achieve the correlation coefficients of 0.72 and 0.68 between the predicted and observed HSE-up and HSE-down measures, respectively, when evaluated on a well-prepared non-homologous protein structure dataset. Moreover, residue contact number (CN) can also be accurately predicted by HSEpred webserver using the summation of the predicted HSE-up and HSE-down values, which has further enlarged the application of this new solvent exposure measure.

Recently, Heffernan et al. has developed the most accurate predictor for both HSEα and HSEβ based on a big dataset by using multiple-step iterative deep neural-network learning. [4] The predicted HSEa shows a higher correlation coefficient to the stability change by residue mutants than predicted HSEβ and ASA. The results, together with its easy Ca-atom-based calculation, highlight the potential usefulness of predicted HSEa for protein structure prediction and refinement as well as function prediction.

Related Research Articles

<span class="mw-page-title-main">Protein secondary structure</span> General three-dimensional form of local segments of proteins

Protein secondary structure is the local spatial conformation of the polypeptide backbone excluding the side chains. The two most common secondary structural elements are alpha helices and beta sheets, though beta turns and omega loops occur as well. Secondary structure elements typically spontaneously form as an intermediate before the protein folds into its three dimensional tertiary structure.

<span class="mw-page-title-main">Protein structure prediction</span> Type of biological prediction

Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is different from the inverse problem of protein design. Protein structure prediction is one of the most important goals pursued by computational biology; and it is important in medicine and biotechnology.

<span class="mw-page-title-main">Structural alignment</span> Aligning molecular sequences using sequence and structural information

Structural alignment attempts to establish homology between two or more polymer structures based on their shape and three-dimensional conformation. This process is usually applied to protein tertiary structures but can also be used for large RNA molecules. In contrast to simple structural superposition, where at least some equivalent residues of the two structures are known, structural alignment requires no a priori knowledge of equivalent positions. Structural alignment is a valuable tool for the comparison of proteins with low sequence similarity, where evolutionary relationships between proteins cannot be easily detected by standard sequence alignment techniques. Structural alignment can therefore be used to imply evolutionary relationships between proteins that share very little common sequence. However, caution should be used in using the results as evidence for shared evolutionary ancestry because of the possible confounding effects of convergent evolution by which multiple unrelated amino acid sequences converge on a common tertiary structure.

<span class="mw-page-title-main">Structural bioinformatics</span> Bioinformatics subfield

Structural bioinformatics is the branch of bioinformatics that is related to the analysis and prediction of the three-dimensional structure of biological macromolecules such as proteins, RNA, and DNA. It deals with generalizations about macromolecular 3D structures such as comparisons of overall folds and local motifs, principles of molecular folding, evolution, binding interactions, and structure/function relationships, working both from experimentally solved structures and from computational models. The term structural has the same meaning as in structural biology, and structural bioinformatics can be seen as a part of computational structural biology. The main objective of structural bioinformatics is the creation of new methods of analysing and manipulating biological macromolecular data in order to solve problems in biology and generate new knowledge.

An epitope, also known as antigenic determinant, is the part of an antigen that is recognized by the immune system, specifically by antibodies, B cells, or T cells. The part of an antibody that binds to the epitope is called a paratope. Although epitopes are usually non-self proteins, sequences derived from the host that can be recognized are also epitopes.

<span class="mw-page-title-main">Ramachandran plot</span> Visual representation of allowable protein conformations

In biochemistry, a Ramachandran plot, originally developed in 1963 by G. N. Ramachandran, C. Ramakrishnan, and V. Sasisekharan, is a way to visualize energetically allowed regions for backbone dihedral angles ψ against φ of amino acid residues in protein structure. The figure on the left illustrates the definition of the φ and ψ backbone dihedral angles. The ω angle at the peptide bond is normally 180°, since the partial-double-bond character keeps the peptide bond planar. The figure in the top right shows the allowed φ,ψ backbone conformational regions from the Ramachandran et al. 1963 and 1968 hard-sphere calculations: full radius in solid outline, reduced radius in dashed, and relaxed tau (N-Cα-C) angle in dotted lines. Because dihedral angle values are circular and 0° is the same as 360°, the edges of the Ramachandran plot "wrap" right-to-left and bottom-to-top. For instance, the small strip of allowed values along the lower-left edge of the plot are a continuation of the large, extended-chain region at upper left.

<span class="mw-page-title-main">Protein contact map</span>

A protein contact map represents the distance between all possible amino acid residue pairs of a three-dimensional protein structure using a binary two-dimensional matrix. For two residues and , the element of the matrix is 1 if the two residues are closer than a predetermined threshold, and 0 otherwise. Various contact definitions have been proposed: The distance between the Cα-Cα atom with threshold 6-12 Å; distance between Cβ-Cβ atoms with threshold 6-12 Å ; and distance between the side-chain centers of mass.

Solvent exposure occurs when a chemical, material, or person comes into contact with a solvent. Chemicals can be dissolved in solvents, materials such as polymers can be broken down chemically by solvents, and people can develop certain ailments from exposure to solvents both organic and inorganic.

In computational biology, protein pKa calculations are used to estimate the pKa values of amino acids as they exist within proteins. These calculations complement the pKa values reported for amino acids in their free state, and are used frequently within the fields of molecular modeling, structural bioinformatics, and computational biology.

<span class="mw-page-title-main">Accessible surface area</span>

The accessible surface area (ASA) or solvent-accessible surface area (SASA) is the surface area of a biomolecule that is accessible to a solvent. Measurement of ASA is usually described in units of square angstroms. ASA was first described by Lee & Richards in 1971 and is sometimes called the Lee-Richards molecular surface. ASA is typically calculated using the 'rolling ball' algorithm developed by Shrake & Rupley in 1973. This algorithm uses a sphere of a particular radius to 'probe' the surface of the molecule.

In chemistry, a contact number (CN) is a simple solvent exposure measure that measures residue burial in proteins. The definition of CN varies between authors, but is generally defined as the number of either C or C atoms within a sphere around the C or C atom of the residue. The radius of the sphere is typically chosen to be between 8 and 14Å.

The global distance test (GDT), also written as GDT_TS to represent "total score", is a measure of similarity between two protein structures with known amino acid correspondences but different tertiary structures. It is most commonly used to compare the results of protein structure prediction to the experimentally determined structure as measured by X-ray crystallography, protein NMR, or, increasingly, cryoelectron microscopy. The metric was developed by Adam Zemla at Lawrence Livermore National Laboratory and originally implemented in the Local-Global Alignment (LGA) program. It is intended as a more accurate measurement than the common root-mean-square deviation (RMSD) metric - which is sensitive to outlier regions created, for example, by poor modeling of individual loop regions in a structure that is otherwise reasonably accurate. The conventional GDT_TS score is computed over the alpha carbon atoms and is reported as a percentage, ranging from 0 to 100. In general, the higher the GDT_TS score, the more closely a model approximates a given reference structure.

Structural and physical properties of DNA provide important constraints on the binding sites formed on surfaces of DNA-binding proteins. Characteristics of such binding sites may be used for predicting DNA-binding sites from the structural and even sequence properties of unbound proteins. This approach has been successfully implemented for predicting the protein–protein interface. Here, this approach is adopted for predicting DNA-binding sites in DNA-binding proteins. First attempt to use sequence and evolutionary features to predict DNA-binding sites in proteins was made by Ahmad et al. (2004) and Ahmad and Sarai (2005). Some methods use structural information to predict DNA-binding sites and therefore require a three-dimensional structure of the protein, while others use only sequence information and do not require protein structure in order to make a prediction.

FoldX is a protein design algorithm that uses an empirical force field. It can determine the energetic effect of point mutations as well as the interaction energy of protein complexes. FoldX can mutate protein and DNA side chains using a probability-based rotamer library, while exploring alternative conformations of the surrounding side chains.

Computational Resources for Drug Discovery (CRDD) is one of the important silico modules of Open Source for Drug Discovery (OSDD). The CRDD web portal provides computer resources related to drug discovery on a single platform. It provides computational resources for researchers in computer-aided drug design, a discussion forum, and resources to maintain a wiki related to drug discovery, predict inhibitors, and predict the ADME-Tox property of molecules. One of the major objectives of CRDD is to promote open source software in the field of chemoinformatics and pharmacoinformatics.

Residue depth (RD) is a solvent exposure measure that describes to what extent a residue is buried in the protein structure space. It complements the information provided by conventional accessible surface area (ASA).

Triple resonance experiments are a set of multi-dimensional nuclear magnetic resonance spectroscopy (NMR) experiments that link three types of atomic nuclei, most typically consisting of 1H, 15N and 13C. These experiments are often used to assign specific resonance signals to specific atoms in an isotopically-enriched protein. The technique was first described in papers by Ad Bax, Mitsuhiko Ikura and Lewis Kay in 1990, and further experiments were then added to the suite of experiments. Many of these experiments have since become the standard set of experiments used for sequential assignment of NMR resonances in the determination of protein structure by NMR. They are now an integral part of solution NMR study of proteins, and they may also be used in solid-state NMR.

<span class="mw-page-title-main">Chemical shift index</span> Laboratory technique

The chemical shift index or CSI is a widely employed technique in protein nuclear magnetic resonance spectroscopy that can be used to display and identify the location as well as the type of protein secondary structure found in proteins using only backbone chemical shift data The technique was invented by David S. Wishart in 1992 for analyzing 1Hα chemical shifts and then later extended by him in 1994 to incorporate 13C backbone shifts. The original CSI method makes use of the fact that 1Hα chemical shifts of amino acid residues in helices tends to be shifted upfield relative to their random coil values and downfield in beta strands. Similar kinds of upfield and downfield trends are also detectable in backbone 13C chemical shifts.

Relative accessible surface area or relative solvent accessibility (RSA) of a protein residue is a measure of residue solvent exposure. It can be calculated by formula:

<span class="mw-page-title-main">Backbone-dependent rotamer library</span> Collection of data on conformations of a given proteins amino acid side chains

In biochemistry, a backbone-dependent rotamer library provides the frequencies, mean dihedral angles, and standard deviations of the discrete conformations of the amino acid side chains in proteins as a function of the backbone dihedral angles φ and ψ of the Ramachandran map. By contrast, backbone-independent rotamer libraries express the frequencies and mean dihedral angles for all side chains in proteins, regardless of the backbone conformation of each residue type. Backbone-dependent rotamer libraries have been shown to have significant advantages over backbone-independent rotamer libraries, principally when used as an energy term, by speeding up search times of side-chain packing algorithms used in protein structure prediction and protein design.

References

  1. Hamelryck, T. (2005), "An amino acid has two sides: A new 2D measure provides a different view of solvent exposure", Proteins: Structure, Function, and Bioinformatics , 59 (1): 38–48, CiteSeerX   10.1.1.516.4528 , doi:10.1002/prot.20379, PMID   15688434, S2CID   10631851 .
  2. Sweredoski, Michael J.; Baldi, Pierre (2008), "PEPITO: Improved discontinuous B-cell epitope prediction using multiple distance thresholds and Half Sphere Exposure", Bioinformatics, 24 (12): 1459–1460, doi: 10.1093/bioinformatics/btn199 , PMID   18443018 .
  3. Song, J.; Tan, H.; Takemoto, K.; Akutsu, T. (2008), "HSEpred: predict half-sphere exposure from protein sequences", Bioinformatics, 24 (13): 1489–1497, doi: 10.1093/bioinformatics/btn222 , PMID   18467349 .
  4. Heffernan, Rhys; et al. (2016), "Highly accurate sequence-based prediction of half-sphere exposures of amino acid residues in proteins", Bioinformatics, 32 (6): 843–9, doi:10.1093/bioinformatics/btv665, PMID   26568622, S2CID   22034498