Residue depth

Last updated

Residue depth (RD) is a solvent exposure measure that describes to what extent a residue is buried in the protein structure space. [1] [2] [3] It complements the information provided by conventional accessible surface area (ASA).

Currently, predictions in regards to whether a residue is exposed or buried are used in a wide variety of protein structure prediction engines. Such prediction can provide valuable information for protein fold recognition, functional residue prediction and protein drug design. Several biophysical properties of proteins have been shown to correlate with residue depth, including mutant protein stability, protein-protein interface hot-spot, H/D exchange rate of residue and residue conservation.

Residue depth has also been utilized in predicting small molecule binding site on proteins, with accuracy statistically on par with other conventional methods. [4] The method has the advantageous of being simple and intuitive. The method has been reported to detect unconventional flat binding sites.

To date several approaches have been proposed to predict RD values from protein sequences. Yuan and Wang proposed a computational framework that uses sequential evolutionary information contained in PSI-BLAST profiles and the global protein size information to quantify the relationship between RD and protein sequence. [5] Zhang et al. proposed the RDpred method to predict RD values based on predicted secondary structure, residue position and PSI-BLAST profile. [6] More recently, Song et al. described another sequence-based method that also uses support vector regression to quantify the RD-sequence relationship. [7] Their webserver Prodepth, is developed to facilitate RD prediction analysis for sequences submitted by interested users. In addition, Prodepth server can predict the solvent-accessible surface area (ASA) value for each residue for users' submitted sequence. Based on the predicted ASA and RD values, it will further output the two-state solvent accessibility prediction by classifying a residue as being exposed or buried.

Currently, residue depth has been used for several applications. One of these is predicting the pKa of an ionizable group. [8] The pKa equation is a linear combination of a few features, including: depth, number of hydrogen bond, electrostatic energy, solvent accessible surface area. Among these features, depth has a major contribution.

Related Research Articles

Protein secondary structure General three-dimensional form of local segments of proteins

Protein secondary structure is the three dimensional form of local segments of proteins. The two most common secondary structural elements are alpha helices and beta sheets, though beta turns and omega loops occur as well. Secondary structure elements typically spontaneously form as an intermediate before the protein folds into its three dimensional tertiary structure.

Structural alignment Aligning molecular sequences using sequence and structural information

Structural alignment attempts to establish homology between two or more polymer structures based on their shape and three-dimensional conformation. This process is usually applied to protein tertiary structures but can also be used for large RNA molecules. In contrast to simple structural superposition, where at least some equivalent residues of the two structures are known, structural alignment requires no a priori knowledge of equivalent positions. Structural alignment is a valuable tool for the comparison of proteins with low sequence similarity, where evolutionary relationships between proteins cannot be easily detected by standard sequence alignment techniques. Structural alignment can therefore be used to imply evolutionary relationships between proteins that share very little common sequence. However, caution should be used in using the results as evidence for shared evolutionary ancestry because of the possible confounding effects of convergent evolution by which multiple unrelated amino acid sequences converge on a common tertiary structure.

Binding site Chemical bonding

In biochemistry and molecular biology, a binding site is a region on a macromolecule such as a protein that binds to another molecule with specificity. The binding partner of the macromolecule is often referred to as a ligand. Ligands may include other proteins, enzyme substrates, second messengers, hormones, or allosteric modulators. The binding event is often, but not always, accompanied by a conformational change that alters the protein's function. Binding to protein binding sites is most often reversible, but can also be covalent reversible or irreversible.

DNA-binding protein Proteins that bind with DNA, such as transcription factors, polymerases, nucleases and histones

DNA-binding proteins are proteins that have DNA-binding domains and thus have a specific or general affinity for single- or double-stranded DNA. Sequence-specific DNA-binding proteins generally interact with the major groove of B-DNA, because it exposes more functional groups that identify a base pair. However, there are some known minor groove DNA-binding ligands such as netropsin, distamycin, Hoechst 33258, pentamidine, DAPI and others.

Docking (molecular)

In the field of molecular modeling, docking is a method which predicts the preferred orientation of one molecule to a second when bound to each other to form a stable complex. Knowledge of the preferred orientation in turn may be used to predict the strength of association or binding affinity between two molecules using, for example, scoring functions.

Protein contact map

A protein contact map represents the distance between all possible amino acid residue pairs of a three-dimensional protein structure using a binary two-dimensional matrix. For two residues and , the element of the matrix is 1 if the two residues are closer than a predetermined threshold, and 0 otherwise. Various contact definitions have been proposed: The distance between the Cα-Cα atom with threshold 6-12 Å; distance between Cβ-Cβ atoms with threshold 6-12 Å ; and distance between the side-chain centers of mass.

Solvent exposure occurs when a chemical, material, or person comes into contact with a solvent. Chemicals can be dissolved in solvents, materials such as polymers can be broken down chemically by solvents, and people can develop certain ailments from exposure to solvents both organic and inorganic.

Accessible surface area

The accessible surface area (ASA) or solvent-accessible surface area (SASA) is the surface area of a biomolecule that is accessible to a solvent. Measurement of ASA is usually described in units of square Ångstroms. ASA was first described by Lee & Richards in 1971 and is sometimes called the Lee-Richards molecular surface. ASA is typically calculated using the 'rolling ball' algorithm developed by Shrake & Rupley in 1973. This algorithm uses a sphere of a particular radius to 'probe' the surface of the molecule.

Homology modeling Method of protein structure prediction using other known proteins

Homology modeling, also known as comparative modeling of protein, refers to constructing an atomic-resolution model of the "target" protein from its amino acid sequence and an experimental three-dimensional structure of a related homologous protein. Homology modeling relies on the identification of one or more known protein structures likely to resemble the structure of the query sequence, and on the production of an alignment that maps residues in the query sequence to residues in the template sequence has been shown that protein structures are more conserved than protein sequences amongst homologues, but sequences falling below a 20% sequence identity can have very different structure.

Implicit solvation is a method to represent solvent as a continuous medium instead of individual “explicit” solvent molecules, most often used in molecular dynamics simulations and in other applications of molecular mechanics. The method is often applied to estimate free energy of solute-solvent interactions in structural and chemical processes, such as folding or conformational transitions of proteins, DNA, RNA, and polysaccharides, association of biological macromolecules with ligands, or transport of drugs across biological membranes.

Structural and physical properties of DNA provide important constraints on the binding sites formed on surfaces of DNA-binding proteins. Characteristics of such binding sites may be used for predicting DNA-binding sites from the structural and even sequence properties of unbound proteins. This approach has been successfully implemented for predicting the protein–protein interface. Here, this approach is adopted for predicting DNA-binding sites in DNA-binding proteins. First attempt to use sequence and evolutionary features to predict DNA-binding sites in proteins was made by Ahmad et al. (2004) and Ahmad and Sarai (2005). Some methods use structural information to predict DNA-binding sites and therefore require a three-dimensional structure of the protein, while others use only sequence information and do not require protein structure in order to make a prediction.

Hydrophobicity scales are values that define the relative hydrophobicity or hydrophilicity of amino acid residues. The more positive the value, the more hydrophobic are the amino acids located in that region of the protein. These scales are commonly used to predict the transmembrane alpha-helices of membrane proteins. When consecutively measuring amino acids of a protein, changes in value indicate attraction of specific protein regions towards the hydrophobic region inside lipid bilayer.

Short linear motif

In molecular biology short linear motifs (SLiMs), linear motifs or minimotifs are short stretches of protein sequence that mediate protein–protein interaction.

Computational Resources for Drug Discovery (CRDD) is one of the important silico modules of Open Source for Drug Discovery (OSDD). The CRDD web portal provides computer resources related to drug discovery on a single platform. It provides computational resources for researchers in computer-aided drug design, a discussion forum, and resources to maintain Wikipedia related to drug discovery, predict inhibitors, and predict the ADME-Tox property of molecules One of the major objectives of CRDD is to promote open source software in the field of chemoinformatics and pharmacoinformatics.

Protein function prediction methods are techniques that bioinformatics researchers use to assign biological or biochemical roles to proteins. These proteins are usually ones that are poorly studied or predicted based on genomic sequence data. These predictions are often driven by data-intensive computational procedures. Information may come from nucleic acid sequence homology, gene expression profiles, protein domain structures, text mining of publications, phylogenetic profiles, phenotypic profiles, and protein-protein interaction. Protein function is a broad term: the roles of proteins range from catalysis of biochemical reactions to transport to signal transduction, and a single protein may play a role in multiple processes or cellular pathways.

RaptorX is a software and web server for protein structure and function prediction that is free for non-commercial use. RaptorX is among the most popular methods for protein structure prediction. Like other remote homology recognition/protein threading techniques, RaptorX is able to regularly generate reliable protein models when the widely used PSI-BLAST cannot. However, RaptorX is also significantly different from those profile-based methods in that RaptorX excels at modeling of protein sequences without a large number of sequence homologs by exploiting structure information. RaptorX Server has been designed to ensure a user-friendly interface for users inexpert in protein structure prediction methods.

I-TASSER

I-TASSER is a bioinformatics method for predicting three-dimensional structure model of protein molecules from amino acid sequences. It detects structure templates from the Protein Data Bank by a technique called fold recognition. The full-length structure models are constructed by reassembling structural fragments from threading templates using replica exchange Monte Carlo simulations. I-TASSER is one of the most successful protein structure prediction methods in the community-wide CASP experiments.

Volume, Area, Dihedral Angle Reporter (VADAR) is a freely available protein structure validation web server that was developed as a collaboration between Dr. Brian Sykes and Dr. David Wishart at the University of Alberta. VADAR consists of over 15 different algorithms and programs for assessing and validating peptide and protein structures from their PDB coordinate data. VADAR is capable of determining secondary structure, identifying and classifying six different types of beta turns, determining and calculating the strength of C=O -- N-H hydrogen bonds, calculating residue-specific accessible surface areas (ASA), calculating residue volumes, determining backbone and side chain torsion angles, assessing local structure quality, evaluating global structure quality, and identifying residue "outliers". The results have been validated through extensive comparison to published data and careful visual inspection. VADAR produces both text and graphical output with most of the quantitative data presented in easily viewed tables. In particular, VADAR's output is presented in a vertical, tabular format with most of the sequence data, residue numbering and any other calculated property or feature presented from top to bottom, rather than from left to right.

Relative accessible surface area or relative solvent accessibility (RSA) of a protein residue is a measure of residue solvent exposure. It can be calculated by formula:

References

  1. Chakravarty, S.; Varadarajan, R (1999), "Residue depth: a novel parameter for the analysis of protein structure and stability", Structure, 7 (7): 723–732, doi: 10.1016/s0969-2126(99)80097-5 , PMID   10425675 .
  2. Pintar, A.; Carugo, O.; Pongor, S. (2003), "Atom depth as a descriptor of the protein interior", Biophys. J., 84 (4): 2553–2561, Bibcode:2003BpJ....84.2553P, doi:10.1016/S0006-3495(03)75060-7, PMC   1302821 , PMID   12668463 .
  3. Pintar, A.; Carugo, O.; Pongor, S. (2003), "Atom depth in protein structure and function", Trends Biochem Sci, 28 (11): 593–597, doi:10.1016/j.tibs.2003.09.004, PMID   14607089 .
  4. Tan, K. P.; Varadarajan, R.; Madhusudhan, M. S. (16 May 2011). "DEPTH: a web server to compute depth and predict small-molecule binding cavities in proteins". Nucleic Acids Research. 39 (Web Server): W242–W248. doi:10.1093/nar/gkr356. PMC   3125764 . PMID   21576233.
  5. Yuan, Z.; Wang, ZX. (2008), "Quantifying the relationship of protein burying depth and sequence", Proteins, 70 (2): 509–516, doi:10.1002/prot.21545, PMID   17705271, S2CID   31392217 .
  6. Zhang, H.; Zhang, T.; Chen, K.; Shen, S.; Ruan, J.; Kurgan, L. (2008), "Sequence based residue depth prediction using evolutionary information and predicted secondary structure", Proteins, 70 (2): 509–516, doi:10.1002/prot.21545, PMID   17705271, S2CID   31392217 .
  7. Song, J.; Tan, H.; Mahmood, K.; Law, R.H.; Buckle, A.M.; Webb, G.I.; Akutsu, T.; Whisstock, J.C. (2009), "Prodepth: predict residue depth by support vector regression approach from protein sequences only", PLOS ONE, 4 (9): e7072, Bibcode:2009PLoSO...4.7072S, doi: 10.1371/journal.pone.0007072 , PMC   2742725 , PMID   19759917 .
  8. Tan, K.P.; Nguyen, T.B.; Patel, S.; Varadarajan, R.; Madhusudhan, M.S. (2013), "Depth: a web server to compute depth, cavity sizes, detect potential small-molecule ligand-binding cavities and predict the pKa of ionizable residues in proteins", Nucleic Acids Research, 41 (W1): W314-321, doi:10.1093/nar/gkt503, PMC   3692129 , PMID   23766289