Root mean square deviation of atomic positions

Last updated

In bioinformatics, the root mean square deviation of atomic positions, or simply root mean square deviation (RMSD), is the measure of the average distance between the atoms (usually the backbone atoms) of superimposed molecules. [1] In the study of globular protein conformations, one customarily measures the similarity in three-dimensional structure by the RMSD of the atomic coordinates after optimal rigid body superposition.

Contents

When a dynamical system fluctuates about some well-defined average position, the RMSD from the average over time can be referred to as the RMSF or root mean square fluctuation. The size of this fluctuation can be measured, for example using Mössbauer spectroscopy or nuclear magnetic resonance, and can provide important physical information. The Lindemann index is a method of placing the RMSF in the context of the parameters of the system.

A widely used way to compare the structures of biomolecules or solid bodies is to translate and rotate one structure with respect to the other to minimize the RMSD. Coutsias, et al. presented a simple derivation, based on quaternions, for the optimal solid body transformation (rotation-translation) that minimizes the RMSD between two sets of vectors. [2] They proved that the quaternion method is equivalent to the well-known Kabsch algorithm. [3] The solution given by Kabsch is an instance of the solution of the d-dimensional problem, introduced by Hurley and Cattell. [4] The quaternion solution to compute the optimal rotation was published in the appendix of a paper of Petitjean. [5] This quaternion solution and the calculation of the optimal isometry in the d-dimensional case were both extended to infinite sets and to the continuous case in the appendix A of another paper of Petitjean. [6]

The equation

where δi is the distance between atom i and either a reference structure or the mean position of the N equivalent atoms. This is often calculated for the backbone heavy atoms C, N, O, and Cα or sometimes just the Cα atoms.

Normally a rigid superposition which minimizes the RMSD is performed, and this minimum is returned. Given two sets of points and , the RMSD is defined as follows:

A RMSD value is expressed in length units. The most commonly used unit in structural biology is the Ångström (Å) which is equal to 10−10 m.

Uses

Typically RMSD is used as a quantitative measure of similarity between two or more protein structures. For example, the CASP protein structure prediction competition uses RMSD as one of its assessments of how well a submitted structure matches the known, target structure. Thus the lower RMSD, the better the model is in comparison to the target structure.

Also some scientists who study protein folding by computer simulations use RMSD as a reaction coordinate to quantify where the protein is between the folded state and the unfolded state.

The study of RMSD for small organic molecules (commonly called ligands when they're binding to macromolecules, such as proteins, is studied) is common in the context of docking, [1] as well as in other methods to study the configuration of ligands when bound to macromolecules. Note that, for the case of ligands (contrary to proteins, as described above), their structures are most commonly not superimposed prior to the calculation of the RMSD.

RMSD is also one of several metrics that have been proposed for quantifying evolutionary similarity between proteins, as well as the quality of sequence alignments. [7] [8]

See also

Related Research Articles

Density functional theory (DFT) is a computational quantum mechanical modelling method used in physics, chemistry and materials science to investigate the electronic structure of many-body systems, in particular atoms, molecules, and the condensed phases. Using this theory, the properties of a many-electron system can be determined by using functionals, i.e. functions of another function. In the case of DFT, these are functionals of the spatially dependent electron density. DFT is among the most popular and versatile methods available in condensed-matter physics, computational physics, and computational chemistry.

<span class="mw-page-title-main">Structural alignment</span> Aligning molecular sequences using sequence and structural information

Structural alignment attempts to establish homology between two or more polymer structures based on their shape and three-dimensional conformation. This process is usually applied to protein tertiary structures but can also be used for large RNA molecules. In contrast to simple structural superposition, where at least some equivalent residues of the two structures are known, structural alignment requires no a priori knowledge of equivalent positions. Structural alignment is a valuable tool for the comparison of proteins with low sequence similarity, where evolutionary relationships between proteins cannot be easily detected by standard sequence alignment techniques. Structural alignment can therefore be used to imply evolutionary relationships between proteins that share very little common sequence. However, caution should be used in using the results as evidence for shared evolutionary ancestry because of the possible confounding effects of convergent evolution by which multiple unrelated amino acid sequences converge on a common tertiary structure.

<span class="mw-page-title-main">Structural bioinformatics</span> Bioinformatics subfield

Structural bioinformatics is the branch of bioinformatics that is related to the analysis and prediction of the three-dimensional structure of biological macromolecules such as proteins, RNA, and DNA. It deals with generalizations about macromolecular 3D structures such as comparisons of overall folds and local motifs, principles of molecular folding, evolution, binding interactions, and structure/function relationships, working both from experimentally solved structures and from computational models. The term structural has the same meaning as in structural biology, and structural bioinformatics can be seen as a part of computational structural biology. The main objective of structural bioinformatics is the creation of new methods of analysing and manipulating biological macromolecular data in order to solve problems in biology and generate new knowledge.

<span class="mw-page-title-main">Visual Molecular Dynamics</span> Visualization and modelling software

Visual Molecular Dynamics (VMD) is a molecular modelling and visualization computer program. VMD is developed mainly as a tool to view and analyze the results of molecular dynamics simulations. It also includes tools for working with volumetric data, sequence data, and arbitrary graphics objects. Molecular scenes can be exported to external rendering tools such as POV-Ray, RenderMan, Tachyon, Virtual Reality Modeling Language (VRML), and many others. Users can run their own Tcl and Python scripts within VMD as it includes embedded Tcl and Python interpreters. VMD runs on Unix, Apple Mac macOS, and Microsoft Windows. VMD is available to non-commercial users under a distribution-specific license which permits both use of the program and modification of its source code, at no charge.

In abstract algebra, the biquaternions are the numbers w + xi + yj + zk, where w, x, y, and z are complex numbers, or variants thereof, and the elements of {1, i, j, k} multiply as in the quaternion group and commute with their coefficients. There are three types of biquaternions corresponding to complex numbers and the variations thereof:

In computational phylogenetics, tree alignment is a computational problem concerned with producing multiple sequence alignments, or alignments of three or more sequences of DNA, RNA, or protein. Sequences are arranged into a phylogenetic tree, modeling the evolutionary relationships between species or taxa. The edit distances between sequences are calculated for each of the tree's internal vertices, such that the sum of all edit distances within the tree is minimized. Tree alignment can be accomplished using one of several algorithms with various trade-offs between manageable tree size and computational effort.

<span class="mw-page-title-main">Multiple sequence alignment</span> Alignment of more than two molecular sequences

Multiple sequence alignment (MSA) is the process or the result of sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. These alignments are used to infer evolutionary relationships via phylogenetic analysis and can highlight homologous features between sequences. Alignments highlight mutation events such as point mutations, insertion mutations and deletion mutations, and alignments are used to assess sequence conservation and infer the presence and activity of protein domains, tertiary structures, secondary structures, and individual amino acids or nucleotides.

<span class="mw-page-title-main">Low-energy electron diffraction</span> Technique for determining surface structures

Low-energy electron diffraction (LEED) is a technique for the determination of the surface structure of single-crystalline materials by bombardment with a collimated beam of low-energy electrons (30–200 eV) and observation of diffracted electrons as spots on a fluorescent screen.

<span class="mw-page-title-main">UCSF Chimera</span>

UCSF Chimera is an extensible program for interactive visualization and analysis of molecular structures and related data, including density maps, supramolecular assemblies, sequence alignments, docking results, trajectories, and conformational ensembles. High-quality images and movies can be created. Chimera includes complete documentation and can be downloaded free of charge for noncommercial use.

Protein–ligand docking is a molecular modelling technique. The goal of protein–ligand docking is to predict the position and orientation of a ligand when it is bound to a protein receptor or enzyme. Pharmaceutical research employs docking techniques for a variety of purposes, most notably in the virtual screening of large databases of available chemicals in order to select likely drug candidates. There has been rapid development in computational ability to determine protein structure with programs such as AlphaFold, and the demand for the corresponding protein-ligand docking predictions is driving implementation of software that can find accurate models. Once the protein folding can be predicted accurately along with how the ligands of various structures will bind to the protein, the ability for drug development to progress at a much faster rate becomes possible.

<span class="mw-page-title-main">Accessible surface area</span>

The accessible surface area (ASA) or solvent-accessible surface area (SASA) is the surface area of a biomolecule that is accessible to a solvent. Measurement of ASA is usually described in units of square angstroms. ASA was first described by Lee & Richards in 1971 and is sometimes called the Lee-Richards molecular surface. ASA is typically calculated using the 'rolling ball' algorithm developed by Shrake & Rupley in 1973. This algorithm uses a sphere of a particular radius to 'probe' the surface of the molecule.

This list of structural comparison and alignment software is a compilation of software tools and web portals used in pairwise or multiple structural comparison and structural alignment.

<span class="mw-page-title-main">Half sphere exposure</span> Protein solvent exposure measure

Half Sphere exposure (HSE) is a protein solvent exposure measure that was first introduced by Hamelryck (2005). Like all solvent exposure measures it measures how buried amino acid residues are in a protein. It is found by counting the number of amino acid neighbors within two half spheres of chosen radius around the amino acid. The calculation of HSE is found by dividing a contact number (CN) sphere in two halves by the plane perpendicular to the Cβ-Cα vector. This simple division of the CN sphere results in two strikingly different measures, HSE-up and HSE-down. HSE-up is defined as the number of Cα atoms in the upper half and analogously HSE-down is defined as the number of Cα atoms in the opposite sphere.

The global distance test (GDT), also written as GDT_TS to represent "total score", is a measure of similarity between two protein structures with known amino acid correspondences but different tertiary structures. It is most commonly used to compare the results of protein structure prediction to the experimentally determined structure as measured by X-ray crystallography, protein NMR, or, increasingly, cryoelectron microscopy. The metric was developed by Adam Zemla at Lawrence Livermore National Laboratory and originally implemented in the Local-Global Alignment (LGA) program. It is intended as a more accurate measurement than the common root-mean-square deviation (RMSD) metric - which is sensitive to outlier regions created, for example, by poor modeling of individual loop regions in a structure that is otherwise reasonably accurate. The conventional GDT_TS score is computed over the alpha carbon atoms and is reported as a percentage, ranging from 0 to 100. In general, the higher the GDT_TS score, the more closely a model approximates a given reference structure.

The root mean square deviation (RMSD) or root mean square error (RMSE) is either one of two closely related and frequently used measures of the differences between true or predicted values on the one hand and observed values or an estimator on the other.

The Kabsch algorithm, also known as the Kabsch-Umeyama algorithm, named after Wolfgang Kabsch and Shinji Umeyama, is a method for calculating the optimal rotation matrix that minimizes the RMSD between two paired sets of points. It is useful for point-set registration in computer graphics, and in cheminformatics and bioinformatics to compare molecular and protein structures.

In bioinformatics, the template modeling score or TM-score is a measure of similarity between two protein structures. The TM-score is intended as a more accurate measure of the global similarity of full-length protein structures than the often used RMSD measure. The TM-score indicates the similarity between two structures by a score between , where 1 indicates a perfect match between two structures. Generally scores below 0.20 corresponds to randomly chosen unrelated proteins whereas structures with a score higher than 0.5 assume roughly the same fold. A quantitative study shows that proteins of TM-score = 0.5 have a posterior probability of 37% in the same CATH topology family and of 13% in the same SCOP fold family. The probabilities increase rapidly when TM-score > 0.5. The TM-score is designed to be independent of protein lengths.

The sequential structure alignment program (SSAP) in chemistry, physics, and biology is a method that uses double dynamic programming to produce a structural alignment based on atom-to-atom vectors in structure space. Instead of the alpha carbons typically used in structural alignment, SSAP constructs its vectors from the beta carbons for all residues except glycine, a method which thus takes into account the rotameric state of each residue as well as its location along the backbone. SSAP works by first constructing a series of inter-residue distance vectors between each residue and its nearest non-contiguous neighbors on each protein. A series of matrices are then constructed containing the vector differences between neighbors for each pair of residues for which vectors were constructed. Dynamic programming applied to each resulting matrix determines a series of optimal local alignments which are then summed into a "summary" matrix to which dynamic programming is applied again to determine the overall structural alignment.

In bioinformatics, alignment-free sequence analysis approaches to molecular sequence and structure data provide alternatives over alignment-based approaches.

SuperPose is a freely available web server designed to perform both pairwise and multiple protein structure superpositions. The “Structural superposition” term refers to the rotations and translations performed on one structure to make it match or align with another structure or structures. Structural superposition can be quantified either in terms of similarity or difference measures. The optimal superposition is the one in which the similarity measure is maximized or the difference measure is minimized. The “SuperPose” web server uses “RMSD” or Root-Mean-Square Deviation as a difference measure to find the optimal pairwise or multiple protein structure superposition. After an initial sequence and secondary structure alignment, SuperPose generates a Difference Distance (DD) matrix from the equivalent C-alpha atoms of two molecules. The sequence/structure alignment and DD matrix analysis information is then fed into a modified quaternion eigenvalue algorithm to rapidly perform the structural superposition and calculate the RMSD between aligned regions of two macromolecules.

References

  1. 1 2 "Molecular docking, estimating free energies of binding, and AutoDock's semi-empirical force field". Sebastian Raschka's Website. 2014-06-26. Retrieved 2016-06-07.
  2. Coutsias EA, Seok C, Dill KA (2004). "Using quaternions to calculate RMSD". J Comput Chem. 25 (15): 1849–1857. doi:10.1002/jcc.20110. PMID   15376254. S2CID   18224579.
  3. 1 2 Kabsch W (1976). "A solution for the best rotation to relate two sets of vectors". Acta Crystallographica. 32 (5): 922–923. Bibcode:1976AcCrA..32..922K. doi:10.1107/S0567739476001873.
  4. Hurley JR, Cattell RB (1962). "The Procrustes Program: Producing direct rotation to test a hypothesized factor structure". Behavioral Science. 7 (2): 258–262. doi:10.1002/bs.3830070216.
  5. Petitjean M (1999). "On the Root Mean Square quantitative chirality and quantitative symmetry measures" (PDF). Journal of Mathematical Physics. 40 (9): 4587–4595. Bibcode:1999JMP....40.4587P. doi:10.1063/1.532988.
  6. Petitjean M (2002). "Chiral mixtures" (PDF). Journal of Mathematical Physics. 43 (8): 185–192. Bibcode:2002JMP....43.4147P. doi:10.1063/1.1484559.
  7. Jewett AI, Huang CC, Ferrin TE (2003). "MINRMS: an efficient algorithm for determining protein structure similarity using root-mean-squared-distance" (PDF). Bioinformatics. 19 (5): 625–634. doi: 10.1093/bioinformatics/btg035 . PMID   12651721.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  8. Armougom F, Moretti S, Keduas V, Notredame C (2006). "The iRMSD: a local measure of sequence alignment accuracy using structural information" (PDF). Bioinformatics. 22 (14): e35–39. doi: 10.1093/bioinformatics/btl218 . PMID   16873492.

Further reading