Structure validation

Last updated
Structure validation concept: model of a protein (each ball is an atom), and magnified region with electron density data and 3 bright flags for problems Structure validation concept.jpg
Structure validation concept: model of a protein (each ball is an atom), and magnified region with electron density data and 3 bright flags for problems

Macromolecular structure validation is the process of evaluating reliability for 3-dimensional atomic models of large biological molecules such as proteins and nucleic acids. These models, which provide 3D coordinates for each atom in the molecule (see example in the image), come from structural biology experiments such as x-ray crystallography [1] or nuclear magnetic resonance (NMR). [2] The validation has three aspects: 1) checking on the validity of the thousands to millions of measurements in the experiment; 2) checking how consistent the atomic model is with those experimental data; and 3) checking consistency of the model with known physical and chemical properties.

Contents

Proteins and nucleic acids are the workhorses of biology, providing the necessary chemical reactions, structural organization, growth, mobility, reproduction, and environmental sensitivity. Essential to their biological functions are the detailed 3D structures of the molecules and the changes in those structures. To understand and control those functions, we need accurate knowledge about the models that represent those structures, including their many strong points and their occasional weaknesses.

End-users of macromolecular models include clinicians, teachers and students, as well as the structural biologists themselves, journal editors and referees, experimentalists studying the macromolecules by other techniques, and theoreticians and bioinformaticians studying more general properties of biological molecules. Their interests and requirements vary, but all benefit greatly from a global and local understanding of the reliability of the models.

Historical summary

Macromolecular crystallography was preceded by the older field of small-molecule x-ray crystallography (for structures with less than a few hundred atoms). Small-molecule diffraction data extends to much higher resolution than feasible for macromolecules, and has a very clean mathematical relationship between the data and the atomic model. The residual, or R-factor, measures the agreement between the experimental data and the values back-calculated from the atomic model. For a well-determined small-molecule structure the R-factor is nearly as small as the uncertainty in the experimental data (well under 5%). Therefore, that one test by itself provides most of the validation needed, but a number of additional consistency and methodology checks are done by automated software [3] as a requirement for small-molecule crystal structure papers submitted to the International Union of Crystallography (IUCr) journals such as Acta Crystallographica section B or C. Atomic coordinates of these small-molecule structures are archived and accessed through the Cambridge Structural Database (CSD) [4] or the Crystallography Open Database (COD). [5]

The first macromolecular validation software was developed around 1990, for proteins. It included Rfree cross-validation for model-to-data match, [6] bond length and angle parameters for covalent geometry, [7] and sidechain and backbone conformational criteria. [8] [9] [10] For macromolecular structures, the atomic models are deposited in the Protein Data Bank (PDB), still the single archive of this data. The PDB was established in the 1970s at Brookhaven National Laboratory, [11] moved in 2000 to the RCSB (Research Collaboration for Structural Biology) centered at Rutgers, [12] and expanded in 2003 to become the wwPDB (worldwide Protein Data Bank), [13] with access sites added in Europe () and Asia (), and with NMR data handled at the BioMagResBank (BMRB) in Wisconsin.

Validation rapidly became standard in the field, [14] with further developments described below. *Obviously needs expansion*

A large boost was given to the applicability of comprehensive validation for both x-ray and NMR as of February 1, 2008, when the worldwide Protein Data Bank (wwPDB) made mandatory the deposition of experimental data along with atomic coordinates. Since 2012 strong forms of validation have been in the process of being adopted for wwPDB deposition from recommendations of the wwPDB Validation Task Force committees for x-ray crystallography, [15] for NMR, [16] for SAXS (small-angle x-ray scattering), and for cryoEM (cryo-Electron Microscopy). [17]

Stages of validation

Validations can be broken into three stages: validating the raw data collected (data validation), the interpretation of the data into the atomic model (model-to-data validation), and finally validation on the model itself. While the first two steps are specific to the technique used, validating the arrangement of atoms in the final model is not.

Model validation

Geometry

[7] [18] [19]

Conformation (dihedrals): protein & RNA

The backbone and side-chain dihedral angles of protein and RNA have been shown to have specific combinations of angles which are allowed (or forbidden). For protein backbone dihedrals (φ, ψ), this has been addressed by the legendary Ramachandran Plot while for side-chain dihedrals (χ's), one should refer to the Dunbrack Backbone-dependent rotamer library. [20]

Though, mRNA structures are generally short-lived and single-stranded, there are an abundance of non-coding RNAs with different secondary and tertiary folding (tRNA, rRNA etc.) which contain a preponderance of the canonical Watson-Crick (WC) base-pairs, together with significant number of non-Watson Crick (NWC) base-pairs - for which such RNA also qualify for regular structural validation that apply for nucleic acid helices. The standard practice is to analyse the intra- (Transnational: Shift, Slide, Rise; Rotational: Tilt, Roll, Twist) and inter-base-pair geometrical parameters (Transnational: Shear, Stagger, Stretch, Rotational: Buckle, Propeller, Opening) - whether in-range or out-of-range with respect to their suggested values. [21] [22] These parameters describe the relative orientations of the two paired bases with respect to each other in two strands (intra) along with those of the two stacked base pairs (inter) with respect to each other, and, hence, together, they serve to validate nucleic acid structures in general. Since, RNA-helices are small in length (average: 10-20 bps), the use of electrostatic surface potential as a validation parameter [23] has been found to be beneficial, particularly for modelling purposes.

Packing and Electrostatics: globular proteins

For globular proteins, interior atomic packing (arising from short-range, local interactions) of side-chains [24] [25] [26] [27] has been shown to be pivotal in the structural stabilization of the protein-fold. On the other hand, the electrostatic harmony (non-local, long-range) of the overall fold [28] has also been shown to be essential for its stabilization. Packing anomalies include steric clashes, [29] short contacts, [27] holes [30] and cavities [31] while electrostatic disharmony [28] [32] refer to unbalanced partial charges in the protein core (particularly relevant for designed protein interiors). While the clash-score of Molprobity identifies steric clashes at a very high resolution, the Complementarity Plot combines packing anomalies with electrostatic imbalance of side-chains and signals for either or both.

Carbohydrates

A 2D diagram of an N-glycan linked to an antibody fragment in the structure with PDB accession code '4BYH '. This diagram, which has been generated with Privateer, follows the standard symbol nomenclature and includes, in its original svg format, annotations containing validation information, including ring conformation and detected monosaccharide types. (A)-ASN297.svg
A 2D diagram of an N-glycan linked to an antibody fragment in the structure with PDB accession code ' 4BYH '. This diagram, which has been generated with Privateer, follows the standard symbol nomenclature and includes, in its original svg format, annotations containing validation information, including ring conformation and detected monosaccharide types.

The branched and cyclic nature of carbohydrates poses particular problems to structure validation tools. [35] At higher resolutions, it is possible to determine the sequence/structure of oligo- and poly-saccharides, both as covalent modifications and as ligands. However, at lower resolutions (typically lower than 2.0Å), sequences/structures should either match known structures, or be supported by complementary techniques such as Mass Spectrometry. [36] Also, monosaccharides have clear conformational preferences (saturated rings are typically found in chair conformations), [37] but errors introduced during model building and/or refinement (wrong linkage chirality or distance, or wrong choice of model - see [38] for recommendations on carbohydrate model building and refinement and [39] [40] [41] for reviews on general errors in carbohydrate structures) can bring their atomic models out of the more likely low-energy state. Around 20% of the deposited carbohydrate structures are in a higher-energy conformation not justified by the structural data (measured using real-space correlation coefficient). [42]

A number of carbohydrate validation web services are available at glycosciences.de (including nomenclature checks and linkage checks by pdb-care, [43] and cross-validation with Mass Spectrometry data through the use of GlycanBuilder), whereas the CCP4 suite currently distributes Privateer, [33] which is a tool that is integrated into the model building and refinement process itself. Privateer is able to check stereo- and regio-chemistry, ring conformation and puckering, linkage torsions, and real-space correlation against positive omit density, generating aperiodic torsion restraints on ring bonds, which can be used by any refinement software in order to maintain the monosaccharide's minimal energy conformation. [33]

Privateer also generates scalable two-dimensional SVG diagrams according to the Essentials of Glycobiology [34] standard symbol nomenclature containing all the validation information as tooltip annotations (see figure). This functionality is currently integrated into other CCP4 programs, such as the molecular graphics program CCP4mg (through the Glycoblocks 3D representation, [44] which conforms to the standard symbol nomenclature [34] ) and the suite's graphical interface, CCP4i2.

Validation for crystallography

Overall considerations

Global vs local criteria

Many evaluation criteria apply globally to an entire experimental structure, most notably the resolution, the anisotropy or incompleteness of the data, and the residual or R-factor that measures overall model-to-data match (see below). Those help a user choose the most accurate among related Protein Data Bank entries to answer their questions. Other criteria apply to individual residues or local regions in the 3D structure, such as fit to the local electron density map or steric clashes between atoms. Those are especially valuable to the structural biologist for making improvements to the model, and to the user for evaluating the reliability of that model right around the place they care about - such as a site of enzyme activity or drug binding. Both types of measures are very useful, but although global criteria are easier to state or publish, local criteria make the greatest contribution to scientific accuracy and biological relevance. As expressed in the Rupp textbook, "Only local validation, including assessment of both geometry and electron density, can give an accurate picture of the reliability of the structure model or any hypothesis based on local features of the model." [45]

What can be seen in low vs high resolution macromolecular crystal structures Low vs high resolution hemoglobin detail.jpg
What can be seen in low vs high resolution macromolecular crystal structures

Relationship to resolution and B-factor

Data validation

Structure factors

Twinning

Model-to-data validation

Residuals and Rfree

Real-space correlation

Improvement by correcting diagnosed problems

In nuclear magnetic resonance

Data Validation: Chemical Shifts, NOEs, RDCs

AVS
Assignment validation suite (AVS) checks the chemical shifts list in BioMagResBank (BMRB) format for problems. [46]
PSVS
Protein Structure Validation Server at the NESG based on information retrieval statistics [47]
PROSESS
PROSESS (Protein Structure Evaluation Suite & Server) is a new web server that offers an assessment of protein structural models by NMR chemical shifts as well as NOEs, geometrical, and knowledge-based parameters.
LACS
Linear analysis of chemical shifts is used for absolute referencing of chemical shift data.

Model-to-data validation

TALOS+. Predicts protein backbone torsion angles from chemical shift data. Frequently used to generate further restraints applied to a structure model during refinement.

Model validation: as above

NMR structural ensemble for PDB file 2K5D, with well-defined structure for the beta strands (arrows) and undefined, presumably highly mobile regions for the orange loop and the blue N-terminus 2k5d NMR ensemble ribbons.jpg
NMR structural ensemble for PDB file 2K5D, with well-defined structure for the beta strands (arrows) and undefined, presumably highly mobile regions for the orange loop and the blue N-terminus

Dynamics: core vs loops, tails, and mobile domains

One of the critical needs for NMR structural ensemble validation is to distinguish well-determined regions (those that have experimental data) from regions that are highly mobile and/or have no observed data. There are several current or proposed methods for making this distinction such as Random Coil Index, but so far the NMR community has not standardized on one.

Software and websites

In cryo-EM

Cyro-EM presents special challenges to model-builders as the observed electron density is frequently insufficient to resolve individual atoms, leading to a higher likelihood of errors.

Geometry-based validation tools similar to those used in X-ray crystallography can be used to highlight implausible modeling choices and guide modeler toward more native-like structures. The CaBLAM method, which only uses Cα atoms, [48] is suitable for low-resolution structures from cyro-EM. [49]

A way to compute the difference density map has been formulated for cyro-EM. [50] [51] Cross-validation using a "free" map, comparable to the use of a free R-factor, is also available. [52] [53] Other methods for checking model-map fit include correlation coefficients, model-map FSC, [54] confidence maps, CryoEF (orientation bias check), and TEMPy SMOC. [51]

In SAXS

SAXS (small-angle x-ray scattering) is a rapidly growing area of structure determination, both as a source of approximate 3D structure for initial or difficult cases and as a component of hybrid-method structure determination when combined with NMR, EM, crystallographic, cross-linking, or computational information. There is great interest in the development of reliable validation standards for SAXS data interpretation and for quality of the resulting models, but there are as yet no established methods in general use. Three recent steps in this direction are the creation of a Small-Angle Scattering Validation Task Force committee by the worldwide Protein DataBank and its initial report, [55] a set of suggested standards for data inclusion in publications, [56] and an initial proposal of statistically derived criteria for automated quality evaluation. [57]

For computational biology

It is difficult to do meaningful validation of an individual, purely computational, macromolecular model in the absence of experimental data for that molecule, because the model with the best geometry and conformational score may not be the one closest to the right answer. Therefore, much of the emphasis in validation of computational modeling is in assessment of the methods. To avoid bias and wishful thinking, double-blind prediction competitions have been organized, the original example of which (held every 2 years since 1994) is CASP (Critical Assessment of Structure Prediction) to evaluate predictions of 3D protein structure for newly solved crystallographic or NMR structures held in confidence until the end of the relevant competition. [58] The major criterion for CASP evaluation is a weighted score called GDT-TS for the match of Calpha positions between the predicted and the experimental models. [59]

See also

Related Research Articles

<span class="mw-page-title-main">Structural biology</span> Study of molecular structures in biology

Structural biology, as defined by the Journal of Structural Biology, deals with structural analysis of living material at every level of organization. Early structural biologists throughout the 19th and early 20th centuries were primarily only able to study structures to the limit of the naked eye's visual acuity and through magnifying glasses and light microscopes.

<span class="mw-page-title-main">X-ray crystallography</span> Technique used for determining crystal structures and identifying mineral compounds

X-ray crystallography is the experimental science determining the atomic and molecular structure of a crystal, in which the crystalline structure causes a beam of incident X-rays to diffract into many specific directions. By measuring the angles and intensities of these diffracted beams, a crystallographer can produce a three-dimensional picture of the density of electrons within the crystal. From this electron density, the positions of the atoms in the crystal can be determined, as well as their chemical bonds, crystallographic disorder, and various other information.

The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules such as proteins and nucleic acids, which is overseen by the Worldwide Protein Data Bank (wwPDB). These structural data are obtained and deposited by biologists and biochemists worldwide through the use of experimental methodologies such as X-ray crystallography, NMR spectroscopy, and, increasingly, cryo-electron microscopy. All submitted data are reviewed by expert biocurators and, once approved, are made freely available on the Internet under the CC0 Public Domain Dedication. Global access to the data is provided by the websites of the wwPDB member organisations.

<span class="mw-page-title-main">Kinemage</span>

A kinemage is an interactive graphic scientific illustration. It often is used to visualize molecules, especially proteins although it can also represent other types of 3-dimensional data. The kinemage system is designed to optimize ease of use, interactive performance, and the perception and communication of detailed 3D information. The kinemage information is stored in a text file, human- and machine-readable, that describes the hierarchy of display objects and their properties, and includes optional explanatory text. The kinemage format is a defined chemical MIME type of 'chemical/x-kinemage' with the file extension '.kin'.

<span class="mw-page-title-main">Ramachandran plot</span> Visual representation of allowable protein conformations

In biochemistry, a Ramachandran plot, originally developed in 1963 by G. N. Ramachandran, C. Ramakrishnan, and V. Sasisekharan, is a way to visualize energetically allowed regions for backbone dihedral angles ψ against φ of amino acid residues in protein structure. The figure on the left illustrates the definition of the φ and ψ backbone dihedral angles. The ω angle at the peptide bond is normally 180°, since the partial-double-bond character keeps the peptide bond planar. The figure in the top right shows the allowed φ,ψ backbone conformational regions from the Ramachandran et al. 1963 and 1968 hard-sphere calculations: full radius in solid outline, reduced radius in dashed, and relaxed tau (N-Cα-C) angle in dotted lines. Because dihedral angle values are circular and 0° is the same as 360°, the edges of the Ramachandran plot "wrap" right-to-left and bottom-to-top. For instance, the small strip of allowed values along the lower-left edge of the plot are a continuation of the large, extended-chain region at upper left.

Electron crystallography is a method to determine the arrangement of atoms in solids using a transmission electron microscope (TEM). It can involve the use of high-resolution transmission electron microscopy images, electron diffraction patterns including convergent-beam electron diffraction or combinations of these. It has been successful in determining some bulk structures, and also surface structures. Two related methods are low-energy electron diffraction which has solved the structure of many surfaces, and reflection high-energy electron diffraction which is used to monitor surfaces often during growth.

<span class="mw-page-title-main">Transmission electron cryomicroscopy</span>

Transmission electron cryomicroscopy (CryoTEM), commonly known as cryo-EM, is a form of cryogenic electron microscopy, more specifically a type of transmission electron microscopy (TEM) where the sample is studied at cryogenic temperatures. Cryo-EM, specifically 3-dimensional electron microscopy (3DEM), is gaining popularity in structural biology.

Nuclear magnetic resonance spectroscopy of proteins is a field of structural biology in which NMR spectroscopy is used to obtain information about the structure and dynamics of proteins, and also nucleic acids, and their complexes. The field was pioneered by Richard R. Ernst and Kurt Wüthrich at the ETH, and by Ad Bax, Marius Clore, Angela Gronenborn at the NIH, and Gerhard Wagner at Harvard University, among others. Structure determination by NMR spectroscopy usually consists of several phases, each using a separate set of highly specialized techniques. The sample is prepared, measurements are made, interpretive approaches are applied, and a structure is calculated and validated.

In X-ray crystallography, a difference density map or Fo–Fc map shows the spatial distribution of the difference between the measured electron density of the crystal and the electron density explained by the current model.

The EM Data Bank or Electron Microscopy Data Bank (EMDB) collects 3D EM maps and associated experimental data determined using electron microscopy of biological specimens. It was established in 2002 at the MSD/PDBe group of the European Bioinformatics Institute (EBI), where the European site of the EMDataBank.org consortium is located. As of 2015, the resource contained over 2,600 entries with a mean resolution of 15Å.

<span class="mw-page-title-main">Jane S. Richardson</span> American biophysicist

Jane Shelby Richardson is an American biophysicist best known for developing the Richardson diagram, or ribbon diagram, a method of representing the 3D structure of proteins. Ribbon diagrams have become a standard representation of protein structures that has facilitated further investigation of protein structure and function globally. With interests in astronomy, math, physics, botany, and philosophy, Richardson took an unconventional route to establishing a science career. Richardson is a professor in biochemistry at Duke University.

Axel T. Brunger is a German American biophysicist. He is Professor of Molecular and Cellular Physiology at Stanford University, and a Howard Hughes Medical Institute Investigator. He served as the Chair of the Department of Molecular and Cellular Physiology (2013–2017).

Acta Crystallographica is a series of peer-reviewed scientific journals, with articles centred on crystallography, published by the International Union of Crystallography (IUCr). Originally established in 1948 as a single journal called Acta Crystallographica, there are now six independent Acta Crystallographica titles:

<span class="mw-page-title-main">Helen M. Berman</span> American chemist

Helen Miriam Berman is a Board of Governors Professor of Chemistry and Chemical Biology at Rutgers University and a former director of the RCSB Protein Data Bank. A structural biologist, her work includes structural analysis of protein-nucleic acid complexes, and the role of water in molecular interactions. She is also the founder and director of the Nucleic Acid Database, and led the Protein Structure Initiative Structural Genomics Knowledgebase.

<span class="mw-page-title-main">Single particle analysis</span> Method of analyzing transmission electron microscopy imagery

Single particle analysis is a group of related computerized image processing techniques used to analyze images from transmission electron microscopy (TEM). These methods were developed to improve and extend the information obtainable from TEM images of particulate samples, typically proteins or other large biological entities such as viruses. Individual images of stained or unstained particles are very noisy, and so hard to interpret. Combining several digitized images of similar particles together gives an image with stronger and more easily interpretable features. An extension of this technique uses single particle methods to build up a three-dimensional reconstruction of the particle. Using cryo-electron microscopy it has become possible to generate reconstructions with sub-nanometer resolution and near-atomic resolution first in the case of highly symmetric viruses, and now in smaller, asymmetric proteins as well. Single particle analysis can also be performed by inductively coupled plasma mass spectrometry (ICP-MS).

<span class="mw-page-title-main">Macromolecular assembly</span>

The term macromolecular assembly (MA) refers to massive chemical structures such as viruses and non-biologic nanoparticles, cellular organelles and membranes and ribosomes, etc. that are complex mixtures of polypeptide, polynucleotide, polysaccharide or other polymeric macromolecules. They are generally of more than one of these types, and the mixtures are defined spatially, and with regard to their underlying chemical composition and structure. Macromolecules are found in living and nonliving things, and are composed of many hundreds or thousands of atoms held together by covalent bonds; they are often characterized by repeating units. Assemblies of these can likewise be biologic or non-biologic, though the MA term is more commonly applied in biology, and the term supramolecular assembly is more often applied in non-biologic contexts. MAs of macromolecules are held in their defined forms by non-covalent intermolecular interactions, and can be in either non-repeating structures, or in repeating linear, circular, spiral, or other patterns. The process by which MAs are formed has been termed molecular self-assembly, a term especially applied in non-biologic contexts. A wide variety of physical/biophysical, chemical/biochemical, and computational methods exist for the study of MA; given the scale of MAs, efforts to elaborate their composition and structure and discern mechanisms underlying their functions are at the forefront of modern structure science.

<span class="mw-page-title-main">Randy Read</span> Canadian-British scientist (1957–)

Randy John Read is a Wellcome Trust Principal Research Fellow and professor of protein crystallography at the University of Cambridge.

<span class="mw-page-title-main">Cryogenic electron microscopy</span> Form of transmission electron microscopy (TEM)

Cryogenic electron microscopy (cryo-EM) is a cryomicroscopy technique applied on samples cooled to cryogenic temperatures. For biological specimens, the structure is preserved by embedding in an environment of vitreous ice. An aqueous sample solution is applied to a grid-mesh and plunge-frozen in liquid ethane or a mixture of liquid ethane and propane. While development of the technique began in the 1970s, recent advances in detector technology and software algorithms have allowed for the determination of biomolecular structures at near-atomic resolution. This has attracted wide attention to the approach as an alternative to X-ray crystallography or NMR spectroscopy for macromolecular structure determination without the need for crystallization.

Microcrystal electron diffraction, or MicroED, is a CryoEM method that was developed by the Gonen laboratory in late 2013 at the Janelia Research Campus of the Howard Hughes Medical Institute. MicroED is a form of electron crystallography where thin 3D crystals are used for structure determination by electron diffraction. Prior to this demonstration, macromolecular (protein) electron crystallography was only used on 2D crystals, for example.

<span class="mw-page-title-main">Wladek Minor</span> Polish-American structural biologist

Władysław Minor also known as Wladek Minor is a Polish-American biophysicist, a specialist in structural biology and protein crystallography. He is a Harrison Distinguished Professor of Molecular Physiology and Biological Physics at the University of Virginia. Minor is a co-author of HKL2000/HKL3000 – crystallographic data processing and structure solution software used to process data and solve structures of macromolecules, as well as small molecules. He is a co-founder of HKL Research, a company that distributes the software. He is also a co-author of a public repository of diffraction images (proteindiffraction.org) for some of the protein structures available in the Protein Data Bank and other software tools for structural biology.

References

  1. Rupp 2009
  2. Cavanagh 2006
  3. Spek AL (2003). "Single-crystal structure validation with the program PLATON". Journal of Applied Crystallography. 36 (1): 7–13. Bibcode:2003JApCr..36....7S. doi: 10.1107/S0021889802022112 .
  4. Allen FH (June 2002). "The Cambridge Structural Database: a quarter of a million crystal structures and rising". Acta Crystallographica Section B. 58 (Pt 3 Pt 1): 380–8. Bibcode:2002AcCrB..58..380A. doi:10.1107/S0108768102003890. PMID   12037359.
  5. Gražulis S, Chateigner D, Downs RT, Yokochi AF, Quirós M, Lutterotti L, et al. (August 2009). "Crystallography Open Database - an open-access collection of crystal structures". Journal of Applied Crystallography. 42 (Pt 4): 726–729. Bibcode:2009JApCr..42..726G. doi:10.1107/s0021889809016690. PMC   3253730 . PMID   22477773.
  6. Brünger AT (January 1992). "Free R value: a novel statistical quantity for assessing the accuracy of crystal structures". Nature. 355 (6359): 472–5. Bibcode:1992Natur.355..472B. doi:10.1038/355472a0. PMID   18481394. S2CID   2462215.
  7. 1 2 Engh RA, Huber R (1991). "Accurate bond and angle parameters for X-ray protein structure refinement". Acta Crystallographica A. 47 (4): 392–400. Bibcode:1991AcCrA..47..392E. doi:10.1107/s0108767391001071.
  8. Ponder JW, Richards FM (1987). "Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes". Journal of Molecular Biology. 193 (4): 775–791. doi:10.1016/0022-2836(87)90358-5. PMID   2441069.
  9. Laskowski RA, MacArthur MW, Moss DS, Thornton JM (1993). "PROCHECK: a program to check the stereochemical quality of protein structures". Journal of Applied Crystallography. 26 (2): 283–291. Bibcode:1993JApCr..26..283L. doi:10.1107/s0021889892009944.
  10. Hooft RW, Vriend G, Sander C, Abola EE (May 1996). "Errors in protein structures". Nature. 381 (6580): 272. Bibcode:1996Natur.381..272H. doi: 10.1038/381272a0 . PMID   8692262. S2CID   4368507.
  11. Bernstein FC, Koetzle TF, Williams GJ, Meyer EF, Brice MD, Rodgers JR, et al. (May 1977). "The Protein Data Bank: a computer-based archival file for macromolecular structures". Journal of Molecular Biology. 112 (3): 535–42. doi:10.1016/s0022-2836(77)80200-3. PMID   875032.
  12. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, et al. (January 2000). "The Protein Data Bank". Nucleic Acids Research. 28 (1): 235–42. doi:10.1093/nar/28.1.235. PMC   102472 . PMID   10592235.
  13. Berman H, Henrick K, Nakamura H (December 2003). "Announcing the worldwide Protein Data Bank". Nature Structural Biology. 10 (12): 980. doi: 10.1038/nsb1203-980 . PMID   14634627. S2CID   2616817.
  14. Kleywegt GJ (2000). "Validation of protein crystal structures". Acta Crystallographica D. 56 (Pt 3): 18–19. Bibcode:2000AcCrD..56..249K. doi:10.1107/s0907444999016364. PMID   10713511.
  15. Read RJ, Adams PD, Arendall WB, Brunger AT, Emsley P, Joosten RP, et al. (October 2011). "A new generation of crystallographic validation tools for the protein data bank". Structure. 19 (10): 1395–412. doi:10.1016/j.str.2011.08.006. PMC   3195755 . PMID   22000512.
  16. Montelione GT, Nilges M, Bax A, Güntert P, Herrmann T, Richardson JS, et al. (September 2013). "Recommendations of the wwPDB NMR Validation Task Force". Structure. 21 (9): 1563–70. doi:10.1016/j.str.2013.07.021. PMC   3884077 . PMID   24010715.
  17. Henderson R, Sali A, Baker ML, Carragher B, Devkota B, Downing KH, et al. (February 2012). "Outcome of the first electron microscopy validation task force meeting". Structure. 20 (2): 205–14. doi:10.1016/j.str.2011.12.014. PMC   3328769 . PMID   22325770.
  18. Gelbin A, Schneider B, Clowney L, Hsieh S-H, Olson WK, Berman HM (1996). "Geometric parameters in Nucleic Acids:Sugar and Phosphate Constituents". Journal of the American Chemical Society. 118 (3): 519–529. doi:10.1021/ja9528846.
  19. Schultze P, Feigon J (June 1997). "Chirality errors in nucleic acid structures". Nature. 387 (6634): 668. Bibcode:1997Natur.387..668S. doi: 10.1038/42632 . PMID   9192890. S2CID   4318780.
  20. "Smooth Backbone-Dependent Rotamer Library 2010". dunbrack.fccc.edu. Retrieved 7 April 2023.
  21. Dickerson, Richard E. (1989-02-01). "Definitions and Nomenclature of Nucleic Acid Structure Parameters". Journal of Biomolecular Structure and Dynamics. 6 (4): 627–634. doi:10.1080/07391102.1989.10507726. ISSN   0739-1102. PMC   400765 . PMID   2619931.
  22. Olson, Wilma K; Bansal, Manju; Burley, Stephen K; Dickerson, Richard E; Gerstein, Mark; Harvey, Stephen C; Heinemann, Udo; Lu, Xiang-Jun; Neidle, Stephen; Shakked, Zippora; Sklenar, Heinz (2001-10-12). "A standard reference frame for the description of nucleic acid base-pair geometry11Edited by P. E. Wright22This is a document of the Nomenclature Committee of IUBMB (NC-IUBMB)/IUPAC-IUBMB Joint Commission on Biochemical Nomenclature (JCBN), whose members are R. Cammack (chairman), A. Bairoch, H.M. Berman, S. Boyce, C.R. Cantor, K. Elliott, D. Horton, M. Kanehisa, A. Kotyk, G.P. Moss, N. Sharon and K.F. Tipton". Journal of Molecular Biology. 313 (1): 229–237. doi:10.1006/jmbi.2001.4987. ISSN   0022-2836. PMID   11601858.
  23. Bhattacharyya, Dhananjay; Halder, Sukanya; Basu, Sankar; Mukherjee, Debasish; Kumar, Prasun; Bansal, Manju (2017-01-19). "RNAHelix: computational modeling of nucleic acid structures with Watson–Crick and non-canonical base pairs". Journal of Computer-Aided Molecular Design. 31 (2): 219–235. Bibcode:2017JCAMD..31..219B. doi:10.1007/s10822-016-0007-0. ISSN   0920-654X. PMID   28102461. S2CID   356097.
  24. Shen MY, Davis FP, Sali A (March 2005). "The optimal size of a globular protein domain: A simple sphere-packing model". Chemical Physics Letters. 405 (1–3): 224–228. Bibcode:2005CPL...405..224S. doi:10.1016/j.cplett.2005.02.029. ISSN   0009-2614.
  25. Misura KM, Morozov AV, Baker D (September 2004). "Analysis of anisotropic side-chain packing in proteins and application to high-resolution structure prediction". Journal of Molecular Biology. 342 (2): 651–64. doi:10.1016/j.jmb.2004.07.038. PMID   15327962.
  26. Basu S, Bhattacharyya D, Banerjee R (May 2011). "Mapping the distribution of packing topologies within protein interiors shows predominant preference for specific packing motifs". BMC Bioinformatics. 12 (1): 195. doi: 10.1186/1471-2105-12-195 . PMC   3123238 . PMID   21605466.
  27. 1 2 Banerjee R, Sen M, Bhattacharya D, Saha P (October 2003). "The jigsaw puzzle model: search for conformational specificity in protein interiors". Journal of Molecular Biology. 333 (1): 211–26. doi:10.1016/j.jmb.2003.08.013. PMID   14516754.
  28. 1 2 Basu S, Bhattacharyya D, Banerjee R (June 2012). "Self-complementarity within proteins: bridging the gap between binding and folding". Biophysical Journal. 102 (11): 2605–14. Bibcode:2012BpJ...102.2605B. doi:10.1016/j.bpj.2012.04.029. PMC   3368132 . PMID   22713576.
  29. Chen VB, Arendall WB, Headd JJ, Keedy DA, Immormino RM, Kapral GJ, et al. (January 2010). "MolProbity: all-atom structure validation for macromolecular crystallography". Acta Crystallographica Section D. 66 (Pt 1): 12–21. Bibcode:2010AcCrD..66...12C. doi:10.1107/S0907444909042073. PMC   2803126 . PMID   20057044.
  30. Sheffler W, Baker D (January 2009). "RosettaHoles: rapid assessment of protein core packing for structure prediction, refinement, design, and validation". Protein Science. 18 (1): 229–39. doi:10.1002/pro.8. PMC   2708028 . PMID   19177366.
  31. Chakravarty S, Varadarajan R (July 1999). "Residue depth: a novel parameter for the analysis of protein structure and stability". Structure. 7 (7): 723–32. doi: 10.1016/s0969-2126(99)80097-5 . PMID   10425675.
  32. Basu S, Bhattacharyya D, Banerjee R (June 2014). "Applications of complementarity plot in error detection and structure validation of proteins". Indian Journal of Biochemistry & Biophysics. 51 (3): 188–200. PMID   25204080.
  33. 1 2 3 Agirre J, Iglesias-Fernández J, Rovira C, Davies GJ, Wilson KS, Cowtan KD (November 2015). "Privateer: software for the conformational validation of carbohydrate structures" (PDF). Nature Structural & Molecular Biology. 22 (11): 833–4. doi:10.1038/nsmb.3115. PMID   26581513. S2CID   33800088.
  34. 1 2 3 Varki A, Cummings RD, Aebi M, Packer NH, Seeberger PH, Esko JD, et al. (December 2015). "Symbol Nomenclature for Graphical Representations of Glycans". Glycobiology. 25 (12): 1323–4. doi:10.1093/glycob/cwv091. PMC   4643639 . PMID   26543186.
  35. Agirre J, Davies GJ, Wilson KS, Cowtan KD (June 2017). "Carbohydrate structure: the rocky road to automation" (PDF). Current Opinion in Structural Biology. Carbohydrates • Sequences and topology. 44: 39–47. doi:10.1016/j.sbi.2016.11.011. PMID   27940408.
  36. Crispin M, Stuart DI, Jones EY (May 2007). "Building meaningful models of glycoproteins". Nature Structural & Molecular Biology. 14 (5): 354, discussion 354–5. doi: 10.1038/nsmb0507-354a . PMID   17473875. S2CID   2020697.
  37. Davies GJ, Planas A, Rovira C (February 2012). "Conformational analyses of the reaction coordinate of glycosidases". Accounts of Chemical Research. 45 (2): 308–16. doi:10.1021/ar2001765. PMID   21923088.
  38. Agirre J (February 2017). "Strategies for carbohydrate model building, refinement and validation". Acta Crystallographica Section D. 73 (Pt 2): 171–186. Bibcode:2017AcCrD..73..171A. doi:10.1107/S2059798316016910. PMC   5297920 . PMID   28177313.
  39. Lütteke T (February 2009). "Analysis and validation of carbohydrate three-dimensional structures". Acta Crystallographica Section D. 65 (Pt 2): 156–68. Bibcode:2009AcCrD..65..156L. doi:10.1107/S0907444909001905. PMC   2631634 . PMID   19171971.
  40. Lütteke T, von der Lieth CW (2009-01-01). "Data mining the PDB for glyco-related data". Glycomics. Methods in Molecular Biology. Vol. 534. pp. 293–310. doi:10.1007/978-1-59745-022-5_21. ISBN   978-1-58829-774-7. PMID   19277543.
  41. Joosten RP, Lütteke T (June 2017). "Carbohydrate 3D structure validation" (PDF). Current Opinion in Structural Biology. 44: 9–17. doi:10.1016/j.sbi.2016.10.010. PMID   27816840.
  42. Agirre J, Davies G, Wilson K, Cowtan K (May 2015). "Carbohydrate anomalies in the PDB" (PDF). Nature Chemical Biology. 11 (5): 303. doi: 10.1038/nchembio.1798 . PMID   25885951.
  43. Lütteke T, von der Lieth CW (June 2004). "pdb-care (PDB carbohydrate residue check): a program to support annotation of complex carbohydrate structures in PDB files". BMC Bioinformatics. 5: 69. doi: 10.1186/1471-2105-5-69 . PMC   441419 . PMID   15180909.
  44. McNicholas S, Agirre J (February 2017). "Glycoblocks: a schematic three-dimensional representation for glycans and their interactions". Acta Crystallographica Section D. 73 (Pt 2): 187–194. Bibcode:2017AcCrD..73..187M. doi:10.1107/S2059798316013553. PMC   5297921 . PMID   28177314.
  45. Rupp 2009 , Chapter 13, Key Concepts
  46. Moseley HN, Sahota G, Montelione GT (April 2004). "Assignment validation software suite for the evaluation and presentation of protein resonance assignment data". Journal of Biomolecular NMR. 28 (4): 341–55. doi:10.1023/B:JNMR.0000015420.44364.06. PMID   14872126. S2CID   14483199.
  47. Huang YJ, Powers R, Montelione GT (February 2005). "Protein NMR recall, precision, and F-measure scores (RPF scores): structure quality assessment measures based on information retrieval statistics". Journal of the American Chemical Society. 127 (6): 1665–74. doi:10.1021/ja047109h. PMID   15701001.
  48. "CaBLAM Validation in Phenix". phenix-online.org.
  49. Rohou, Alexis (February 2021). "Improving cryo-EM structure validation". Nature Methods. 18 (2): 130–131. doi:10.1038/s41592-021-01062-1. PMID   33542515. S2CID   231820981.
  50. Yamashita, Keitaro; Palmer, Colin M.; Burnley, Tom; Murshudov, Garib N. (1 October 2021). "Cryo-EM single-particle structure refinement and map calculation using Servalcat". Acta Crystallographica Section D Structural Biology. 77 (10): 1282–1291. Bibcode:2021AcCrD..77.1282Y. doi: 10.1107/S2059798321009475 . PMC   8489229 . PMID   34605431.
  51. 1 2 Winn, Martyn (20 November 2020). "Cryo-EM validation tools in CCP-EM" (PDF). www.ccpem.ac.uk/. Retrieved 22 November 2023.
  52. Falkner, B; Schröder, GF (28 May 2013). "Cross-validation in cryo-EM-based structural modeling". Proceedings of the National Academy of Sciences of the United States of America. 110 (22): 8930–5. Bibcode:2013PNAS..110.8930F. doi: 10.1073/pnas.1119041110 . PMC   3670386 . PMID   23674685.
  53. Beckers, Maximilian; Mann, Daniel; Sachse, Carsten (March 2021). "Structural interpretation of cryo-EM image reconstructions". Progress in Biophysics and Molecular Biology. 160: 26–36. doi: 10.1016/j.pbiomolbio.2020.07.004 . PMID   32735944.
  54. "Cryo-EM Validation tools in Phenix". phenix-online.org.
  55. Trewhella J, Hendrickson WA, Kleywegt GJ, Sali A, Sato M, Schwede T, et al. (June 2013). "Report of the wwPDB Small-Angle Scattering Task Force: data requirements for biomolecular modeling and the PDB". Structure. 21 (6): 875–81. doi: 10.1016/j.str.2013.04.020 . PMID   23747111.
  56. Jacques DA, Guss JM, Svergun DI, Trewhella J (June 2012). "Publication guidelines for structural modelling of small-angle scattering data from biomolecules in solution". Acta Crystallographica Section D. 68 (Pt 6): 620–6. Bibcode:2012AcCrD..68..620J. doi: 10.1107/S0907444912012073 . hdl: 10453/119226 . PMID   22683784.
  57. Grant TD, Luft JR, Carter LG, Matsui T, Weiss TM, Martel A, Snell EH (January 2015). "The accurate assessment of small-angle X-ray scattering data". Acta Crystallographica Section D. 71 (Pt 1): 45–56. Bibcode:2015AcCrD..71...45G. doi:10.1107/S1399004714010876. PMC   4304685 . PMID   25615859.
  58. Moult J, Pedersen JT, Judson R, Fidelis K (November 1995). "A large-scale experiment to assess protein structure prediction methods". Proteins. 23 (3): ii–v. doi:10.1002/prot.340230303. PMID   8710822. S2CID   11216440.
  59. Zemla A (July 2003). "LGA: A method for finding 3D similarities in protein structures". Nucleic Acids Research. 31 (13): 3370–4. doi:10.1093/nar/gkg571. PMC   168977 . PMID   12824330.
  1. Kleywegt GJ, Harris MR, Zou JY, Taylor TC, Wählby A, Jones TA (December 2004). "The Uppsala Electron-Density Server". Acta Crystallographica Section D. 60 (Pt 12 Pt 1): 2240–9. Bibcode:2004AcCrD..60.2240K. doi: 10.1107/s0907444904013253 . PMID   15572777.
  2. Emsley P, Lohkamp B, Scott WG, Cowtan K (April 2010). "Features and development of Coot". Acta Crystallographica Section D. 66 (Pt 4): 486–501. Bibcode:2010AcCrD..66..486E. doi:10.1107/s0907444910007493. PMC   2852313 . PMID   20383002.
  3. Joosten RP, Joosten K, Murshudov GN, Perrakis A (April 2012). "PDB_REDO: constructive validation, more than just looking for errors". Acta Crystallographica Section D. 68 (Pt 4): 484–96. Bibcode:2012AcCrD..68..484J. doi:10.1107/s0907444911054515. PMC   3322608 . PMID   22505269.
  4. Huang YJ, Powers R, Montelione GT (February 2005). "Protein NMR recall, precision, and F-measure scores (RPF scores): structure quality assessment measures based on information retrieval statistics". Journal of the American Chemical Society. 127 (6): 1665–74. doi:10.1021/ja047109h. PMID   15701001.
  5. Laskowski RA, Rullmannn JA, MacArthur MW, Kaptein R, Thornton JM (December 1996). "AQUA and PROCHECK-NMR: programs for checking the quality of protein structures solved by NMR". Journal of Biomolecular NMR. 8 (4): 477–86. doi:10.1007/bf00228148. PMID   9008363. S2CID   45664105.
  6. Liebschner, D; Afonine, PV; Moriarty, NW; Poon, BK; Chen, VB; Adams, PD (1 January 2021). "CERES: a cryo-EM re-refinement system for continuous improvement of deposited models". Acta Crystallographica. Section D, Structural Biology. 77 (Pt 1): 48–61. Bibcode:2021AcCrD..77...48L. doi: 10.1107/S2059798320015879 . PMC   7787109 . PMID   33404525.

Further reading