Resolution by Proxy

Last updated
Resolution by Proxy
Content
DescriptionCalculates structure resolution using coordinate data only
Contact
Research center University of Alberta
Laboratory David S. Wishart
Primary citation [1]
Access
Data format Data Input: X-ray or NMR coordinates (PDB format); Data Output: Estimated resolution in Angstroms
Website http://www.resprox.ca; http://www.resprox.ca/download.html
Miscellaneous
Data release
frequency
Every 1-2 years with periodic corrections and updates
Curation policyManually curated

Resolution by Proxy (ResProx) is a method for assessing the equivalent X-ray resolution of NMR-derived protein structures. ResProx calculates resolution from coordinate data rather than from electron density or other experimental inputs. This makes it possible to calculate the resolution of a structure regardless of how it was solved (X-ray, NMR, EM, modeling, ab initio prediction). ResProx was originally designed to serve as a simple, single-number evaluation that allows straightforward comparison between the quality/resolution of X-ray structures and the quality of a given NMR structure. However, it can also be used to assess the reliability of an experimentally reported X-ray structure resolution, to evaluate protein structures solved by unconventional or hybrid means and to identify fraudulent structures deposited in the PDB. [1] ResProx incorporates more than 25 different structural features to determine a single resolution-like value. ResProx values are reported in Angstroms. Tests on thousands of X-ray structures show that ResProx values match very closely to resolution values reported by X-ray crystallographers. [1] Resolution-by-proxy values can be calculated for newly determined protein structures using a freely accessible ResProx web server. [1] This server accepts protein coordinate data (in PDB format) and generates a resolution estimate (in Angstroms) for that input structure.

Contents

Background and Rationale

In X-ray crystallography, resolution is a measure of the resolvability or precision in the electron density map of a molecule. Resolution is usually reported in Angstroms (Å, 10–10 meters) for X-ray crystal structures. The smaller the number, the better the degree of atomic resolution. In protein X-ray crystallography the best resolution typically attainable is about 1 Å. This level of resolution allows individual hydrogen atoms to be visualized and heavy atoms (C, O, N) to be very accurately mapped. Most protein structures solved today have a resolution of 1.5 to 2.5 Å, which means the hydrogen atoms are not visible and there is some uncertainty in the precise location of the heavy atoms. Protein structures with a resolution of >2.5 Å generally have a number of coordinate inaccuracies as well as other structural problems. When the resolution is greater than 3.5 Å, there is often considerable uncertainty in both the atom locations and even the identity of individual amino residues. In other words, resolution is inversely correlated with structure quality (i.e. higher numbers mean poorer structures). This trend in protein structure quality for X-ray resolution matches very closely to the trend seen the quality of NMR-determined protein structures. Some NMR structures have large numbers of constraints (NOEs, H-bonds, J-couplings, dipolar couplings), excellent geometry, high structure quality and very tight ensembles with excellent atomic precision (RMSDs < 1 Å). Other NMR structures have very few constraints, poor geometry or poor structure quality and very loose ensembles (RMSDs > 3 Å). However, there is no simple mapping between NMR RMSD values and X-ray resolution values. That is, an NMR ensemble with 1 Å RMSD does not correspond in quality or precision to an X-ray structure with 1 Å resolution. This is because the RMSD measure is both a function of the number of structures used in the ensemble and the selection bias of the spectroscopist who deposits the structural ensemble. Likewise, in NMR it is possible to generate high quality, precisely determined protein structures using relatively few, well-chosen constraints. It is also possible to generate very low quality NMR structures from large numbers of carelessly assessed, mistaken or mis-assigned constraints.

Over the past 20 years several methods have been proposed to calculate “equivalent resolution” using only X-ray coordinate data (rather than X-ray diffraction data). Some were designed specifically for evaluating NMR structures such as Procheck-NMR [2] while others were designed more for structure quality evaluation and validation of X-ray structures such as MolProbity, [3] and RosettaHoles2. [4] However, these methods rely on a relatively small number of protein structure quality measures to predict resolution (4, 3, and 1 measures, respectively) and consequently the correlation between observed (X-ray) resolution and the predicted resolution is not particularly good. By expanding the number of structure features to include the distribution of torsion angles, the presence of atom clashes, the normality of hydrogen bonding, the numbers of violations of bond lengths and bond angles, the presence of cavities, residue-specific packing volumes, packing efficiency and threading energies it is possible to improve this correlation quite substantially.

The ResProx Algorithm

ResProx uses a collection of 25 different protein structure features (such as torsion angle distributions, hydrogen bonding, packing volume, cavities, Molprobity measures) that were used in a Support Vector Regression method to maximize the correlation between the predicted resolution and the observed X-ray resolution on a set of 2400 protein structures with known X-ray resolution. The exact details of the algorithm are provided in a paper published by Wishart and colleagues. [1] After training and appropriate validation on independent tests sets, this SVR model is able to estimate the resolution of solved X-ray structures with a correlation coefficient of 0.92, mean absolute error of 0.28 Å. This is about 15-30% better than existing methods. This is shown in Figure 1. Because the performance of the ResProx method is so high and because it only needs coordinate data to generate an estimate of the equivalent X-ray resolution, it is ideally suited to be applied to NMR structures. When NMR structures are analyzed by ResProx, the average NMR structure has an equivalent X-ray resolution of 2.8 Å, which is relatively poor (Fig. 2). This is in agreement with qualitative observations regarding the overall quality and precision of NMR structures. As seen in Figure 2, a very small number NMR structures exhibit a resolution equivalent to < 1.0 Å, but these are rare.

Figure 1. Performance of ResProx against training and testing data.

Figure 2. Histogram of ResProx equivalent resolution for NMR models and experimental resolution for X-ray structures. 500 NMR ensembles and 500 X-ray structures were randomly selected from the PDB. Proteins were grouped in 0.25Å resolution bins. Resolution values on the X-axis indicate the upper limit of each resolution bin. Values for NMR structures and X-ray structures represent the number of structures in each resolution bin.

The ResProx Server

The ResProx web server a freely accessible server that accepts NMR protein coordinate data (in PDB format) and generates a resolution estimate (in Angstroms) for that NMR structure. A downloadable version of ResProx is also available. ResProx also provides a list of List of 50834 protein structures with PDB identifiers along with their observed resolution and corresponding ResProx values.

Related Research Articles

<span class="mw-page-title-main">X-ray crystallography</span> Technique used for determining crystal structures and identifying mineral compounds

X-ray crystallography is the experimental science determining the atomic and molecular structure of a crystal, in which the crystalline structure causes a beam of incident X-rays to diffract into many specific directions. By measuring the angles and intensities of these diffracted beams, a crystallographer can produce a three-dimensional picture of the density of electrons within the crystal. From this electron density, the mean positions of the atoms in the crystal can be determined, as well as their chemical bonds, their crystallographic disorder, and various other information.

The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules, such as proteins and nucleic acids. The data, typically obtained by X-ray crystallography, NMR spectroscopy, or, increasingly, cryo-electron microscopy, and submitted by biologists and biochemists from around the world, are freely accessible on the Internet via the websites of its member organisations. The PDB is overseen by an organization called the Worldwide Protein Data Bank, wwPDB.

<span class="mw-page-title-main">Structural alignment</span> Aligning molecular sequences using sequence and structural information

Structural alignment attempts to establish homology between two or more polymer structures based on their shape and three-dimensional conformation. This process is usually applied to protein tertiary structures but can also be used for large RNA molecules. In contrast to simple structural superposition, where at least some equivalent residues of the two structures are known, structural alignment requires no a priori knowledge of equivalent positions. Structural alignment is a valuable tool for the comparison of proteins with low sequence similarity, where evolutionary relationships between proteins cannot be easily detected by standard sequence alignment techniques. Structural alignment can therefore be used to imply evolutionary relationships between proteins that share very little common sequence. However, caution should be used in using the results as evidence for shared evolutionary ancestry because of the possible confounding effects of convergent evolution by which multiple unrelated amino acid sequences converge on a common tertiary structure.

<span class="mw-page-title-main">Structural bioinformatics</span> Bioinformatics subfield

Structural bioinformatics is the branch of bioinformatics that is related to the analysis and prediction of the three-dimensional structure of biological macromolecules such as proteins, RNA, and DNA. It deals with generalizations about macromolecular 3D structures such as comparisons of overall folds and local motifs, principles of molecular folding, evolution, binding interactions, and structure/function relationships, working both from experimentally solved structures and from computational models. The term structural has the same meaning as in structural biology, and structural bioinformatics can be seen as a part of computational structural biology. The main objective of structural bioinformatics is the creation of new methods of analysing and manipulating biological macromolecular data in order to solve problems in biology and generate new knowledge.

In physics, the phase problem is the problem of loss of information concerning the phase that can occur when making a physical measurement. The name comes from the field of X-ray crystallography, where the phase problem has to be solved for the determination of a structure from diffraction data. The phase problem is also met in the fields of imaging and signal processing. Various approaches of phase retrieval have been developed over the years.

<span class="mw-page-title-main">Nuclear magnetic resonance spectroscopy</span> Laboratory technique

Nuclear magnetic resonance spectroscopy, most commonly known as NMR spectroscopy or magnetic resonance spectroscopy (MRS), is a spectroscopic technique to observe local magnetic fields around atomic nuclei. The sample is placed in a magnetic field and the NMR signal is produced by excitation of the nuclei sample with radio waves into nuclear magnetic resonance, which is detected with sensitive radio receivers. The intramolecular magnetic field around an atom in a molecule changes the resonance frequency, thus giving access to details of the electronic structure of a molecule and its individual functional groups. As the fields are unique or highly characteristic to individual compounds, in modern organic chemistry practice, NMR spectroscopy is the definitive method to identify monomolecular organic compounds.

Nuclear magnetic resonance spectroscopy of proteins is a field of structural biology in which NMR spectroscopy is used to obtain information about the structure and dynamics of proteins, and also nucleic acids, and their complexes. The field was pioneered by Richard R. Ernst and Kurt Wüthrich at the ETH, and by Ad Bax, Marius Clore, Angela Gronenborn at the NIH, and Gerhard Wagner at Harvard University, among others. Structure determination by NMR spectroscopy usually consists of several phases, each using a separate set of highly specialized techniques. The sample is prepared, measurements are made, interpretive approaches are applied, and a structure is calculated and validated.

In bioinformatics, the root-mean-square deviation of atomic positions, or simply root-mean-square deviation (RMSD), is the measure of the average distance between the atoms of superimposed proteins. Note that RMSD calculation can be applied to other, non-protein molecules, such as small organic molecules. In the study of globular protein conformations, one customarily measures the similarity in three-dimensional structure by the RMSD of the Cα atomic coordinates after optimal rigid body superposition.

<span class="mw-page-title-main">Frederic M. Richards</span> American biochemist and biophysicist (1925–2009)

Frederic Middlebrook Richards, commonly referred to as Fred Richards, was an American biochemist and biophysicist known for solving the pioneering crystal structure of the ribonuclease S enzyme in 1967 and for defining the concept of solvent-accessible surface. He contributed many key experimental and theoretical results and developed new methods, garnering over 20,000 journal citations in several quite distinct research areas. In addition to the protein crystallography and biochemistry of ribonuclease S, these included solvent accessibility and internal packing of proteins, the first side-chain rotamer library, high-pressure crystallography, new types of chemical tags such as biotin/avidin, the nuclear magnetic resonance (NMR) chemical shift index, and structural and biophysical characterization of the effects of mutations.

<span class="mw-page-title-main">Homology modeling</span> Method of protein structure prediction using other known proteins

Homology modeling, also known as comparative modeling of protein, refers to constructing an atomic-resolution model of the "target" protein from its amino acid sequence and an experimental three-dimensional structure of a related homologous protein. Homology modeling relies on the identification of one or more known protein structures likely to resemble the structure of the query sequence, and on the production of an alignment that maps residues in the query sequence to residues in the template sequence. It has been seen that protein structures are more conserved than protein sequences amongst homologues, but sequences falling below a 20% sequence identity can have very different structure.

The root-mean-square deviation (RMSD) or root-mean-square error (RMSE) is a frequently used measure of the differences between values predicted by a model or an estimator and the values observed. The RMSD represents the square root of the second sample moment of the differences between predicted values and observed values or the quadratic mean of these differences. These deviations are called residuals when the calculations are performed over the data sample that was used for estimation and are called errors when computed out-of-sample. The RMSD serves to aggregate the magnitudes of the errors in predictions for various data points into a single measure of predictive power. RMSD is a measure of accuracy, to compare forecasting errors of different models for a particular dataset and not between datasets, as it is scale-dependent.

Molecular replacement is a method of solving the phase problem in X-ray crystallography. MR relies upon the existence of a previously solved protein structure which is similar to our unknown structure from which the diffraction data is derived. This could come from a homologous protein, or from the lower-resolution protein NMR structure of the same protein.

Nuclear magnetic resonance crystallography is a method which utilizes primarily NMR spectroscopy to determine the structure of solid materials on the atomic scale. Thus, solid-state NMR spectroscopy would be used primarily, possibly supplemented by quantum chemistry calculations, powder diffraction etc. If suitable crystals can be grown, any crystallographic method would generally be preferred to determine the crystal structure comprising in case of organic compounds the molecular structures and molecular packing. The main interest in NMR crystallography is in microcrystalline materials which are amenable to this method but not to X-ray, neutron and electron diffraction. This is largely because interactions of comparably short range are measured in NMR crystallography.

<span class="mw-page-title-main">Structure validation</span> Process of evaluating 3-dimensional atomic models of biomacromolecules

Macromolecular structure validation is the process of evaluating reliability for 3-dimensional atomic models of large biological molecules such as proteins and nucleic acids. These models, which provide 3D coordinates for each atom in the molecule, come from structural biology experiments such as x-ray crystallography or nuclear magnetic resonance (NMR). The validation has three aspects: 1) checking on the validity of the thousands to millions of measurements in the experiment; 2) checking how consistent the atomic model is with those experimental data; and 3) checking consistency of the model with known physical and chemical properties.

<span class="mw-page-title-main">GeNMR</span>

GeNMR method is the first fully automated template-based method of protein structure determination that utilizes both NMR chemical shifts and NOE -based distance restraints.

<span class="mw-page-title-main">Protein Structure Evaluation Suite & Server</span> System for validating protein structures

Protein Structure Evaluation Suite & Server (PROSESS) is a freely available web server for protein structure validation. It has been designed at the University of Alberta to assist with the process of evaluating and validating protein structures solved by NMR spectroscopy.

Protein chemical shift prediction is a branch of biomolecular nuclear magnetic resonance spectroscopy that aims to accurately calculate protein chemical shifts from protein coordinates. Protein chemical shift prediction was first attempted in the late 1960s using semi-empirical methods applied to protein structures solved by X-ray crystallography. Since that time protein chemical shift prediction has evolved to employ much more sophisticated approaches including quantum mechanics, machine learning and empirically derived chemical shift hypersurfaces. The most recently developed methods exhibit remarkable precision and accuracy.

Volume, Area, Dihedral Angle Reporter (VADAR) is a freely available protein structure validation web server that was developed as a collaboration between Dr. Brian Sykes and Dr. David Wishart at the University of Alberta. VADAR consists of over 15 different algorithms and programs for assessing and validating peptide and protein structures from their PDB coordinate data. VADAR is capable of determining secondary structure, identifying and classifying six different types of beta turns, determining and calculating the strength of C=O -- N-H hydrogen bonds, calculating residue-specific accessible surface areas (ASA), calculating residue volumes, determining backbone and side chain torsion angles, assessing local structure quality, evaluating global structure quality, and identifying residue "outliers". The results have been validated through extensive comparison to published data and careful visual inspection. VADAR produces both text and graphical output with most of the quantitative data presented in easily viewed tables. In particular, VADAR's output is presented in a vertical, tabular format with most of the sequence data, residue numbering and any other calculated property or feature presented from top to bottom, rather than from left to right.

SuperPose is a freely available web server designed to perform both pairwise and multiple protein structure superpositions. The “Structural superposition” term refers to the rotations and translations performed on one structure to make it match or align with another structure or structures. Structural superposition can be quantified either in terms of similarity or difference measures. The optimal superposition is the one in which the similarity measure is maximized or the difference measure is minimized. The “SuperPose” web server uses “RMSD” or Root-Mean-Square Deviation as a difference measure to find the optimal pairwise or multiple protein structure superposition. After an initial sequence and secondary structure alignment, SuperPose generates a Difference Distance (DD) matrix from the equivalent C-alpha atoms of two molecules. The sequence/structure alignment and DD matrix analysis information is then fed into a modified quaternion eigenvalue algorithm to rapidly perform the structural superposition and calculate the RMSD between aligned regions of two macromolecules.

References

  1. 1 2 3 4 5 Berjanskii, M; Zhou J; Liang Y; Li G; Wishart DS (July 2012). "Resolution-by-proxy: a simple measure for assessing and comparing the overall quality of NMR protein structures". J Biomol NMR. 53 (3): 167–80. doi:10.1007/s10858-012-9637-2. PMID   22678091. S2CID   43740468.
  2. Laskowski, RA; Rullmannn JA; MacArthur MW; Kaptein R; Thornton JM (1996). "AQUA and PROCHECK-NMR: programs for checking the quality of protein structures solved by NMR". J Biomol NMR. 8 (4): 477–486. doi:10.1007/bf00228148. PMID   9008363. S2CID   45664105.
  3. Chen VB, Arendall WB 3rd, Headd JJ, Keedy DA, Immormino RM, Kapral GJ, Murray LW, Richardson JS, Richardson DC (2010). "MolProbity: all-atom structure validation for macromolecular crystallography". Acta Crystallogr D. 66 (Pt 1): 12–21. doi:10.1107/S0907444909042073. PMC   2803126 . PMID   20057044.
  4. Sheffler, W; Baker D (2010). "RosettaHoles2: a volumetric packing measure for protein structure refinement and validation". Protein Sci. 19 (10): 1991–1995. doi:10.1002/pro.458. PMC   2998733 . PMID   20665689.