Complementarity plot

Last updated
A pictorial description of the Complementarity Plot with its different regions The Complementarity Plot for protein structure validation.png
A pictorial description of the Complementarity Plot with its different regions
Distribution of points in the Complementarity Plot corresponding to buried amino acid side-chains from a high resolution protein crystal structure Distribution of points in the CP.png
Distribution of points in the Complementarity Plot corresponding to buried amino acid side-chains from a high resolution protein crystal structure

The complementarity plot (CP) is a graphical tool for structural validation of atomic models for both folded globular proteins and protein-protein interfaces. [1] [2] [3] It is based on a probabilistic representation of preferred amino acid side-chain orientation, analogous to the preferred backbone orientation of Ramachandran plots). It can potentially serve to elucidate protein folding as well as binding. The upgraded versions of the software suite is available and maintained in github for both folded globular proteins [4] as well as inter-protein complexes. [5] The software is included in the bioinformatic tool suites OmicTools [6] and Delphi tools. [7]

Contents

Background

Validation of three dimensional protein crystal structures are traditionally based on a multitude of parameters ranging from (i) the distribution of residues in the Ramachandran plot, [8] [9] (ii) deviations from ideality, [10] [11] for bond lengths and angles, (iii) atomic short contacts (steric clash scores), [12] (iv) the distribution of the side-chain conformers (rotamers) [13] and, (v) hydrogen bonding parameters. [14] The advent of the complementarity plot as a structural validation tool for proteins essentially provides a conjugation of the traditional approaches. CP detects both local errors in atomic coordinates and also correctly matches an amino acid sequence to its native three dimensional fold situated amid decoys. The Complementarity Plot is based on the combined use of shape and electrostatic complementarity [1] of completely / partially buried residues with respect to their environment constituted by rest of the polypeptide chain and is a sensitive indicator of the harmony or disharmony of interior residues with regard to the short and long range forces sustaining the native fold. The term 'Complementarity Plot' (CP) is perhaps a misnomer as there are actually three plots, each serving a given range of solvent exposure of the plotted residues (CP1, CP2, CP3 for burial bins 1, 2, 3).

Pictorial description

The complementarity plot has been largely inspired by the Ramachnadran Plot in its design (but not in its physicochemical attributes). Ramachandran Plot is deterministic in nature, in contrast, CP is probabilistic. Ramachandran plot deals with main-chain torsion angles and errors in such parameters are essentially locally restricted. In contrast, CP deals with geometric and electrostatic fit of the interior side-chains with their local and non-local neighborhood. Disharmony (misfit) in these conjugated parameters may arise due to a plethora of errors coming from bond angles or torsions from effectively the whole folded polypeptide chain. However, analogous to the Ramachandran Plot, the region within the first contour is termed 'probable' (analogous to the 'allowed' region), between the first and second contour, 'less probable' ('partially allowed') and outside the second contour 'improbable' ('disallowed').

Applications

CP has a multitude of applications in experimental as well as in computational structural biology. [2] Thorough investigation of the effect of small errors in both main- and side-chain bond angles / torsions on the overall fold shows that the CP is effective in the detection of these errors even while failure of the other already existing parameters based on prohibition of local steric overlap and deviation from ideality. Consequences of such small angular errors are not restricted locally, resulting in geometric and electrostatic misfit of interior residues throughout the fold, potentially detectable by the CPs. These errors may arise from (i) misfitting of side-chain torsions/ wrong rotamer assignments (especially relevant for low-resolution structures), (ii) incorrect tracing of the main-chain trajectories during refinement (resulting in low-intensity errors diffused over the entire polypeptide chain). CP can also detect packing anomalies, and, in particular, can potentially signal unbalanced partial charges within protein interiors. It is useful in homology modeling and protein design. A version of the plot (CPint) [15] has also been built and made available to probe similar errors in protein-protein interfaces.


CPdock

CPdock 1acb.png
CPdock

In contrast to the residue-wise plots, there is also a variant available for the Complementarity Plot, namely CPdock [16] for plotting single Sc, EC values for the protein-protein interface and adjudging thereby the quality of the complex atomic structure (either experimentally solved or computationally built) therein. Sc, [17] EC [18] are shape and electrostatic complementarities computed for 'interacting protein-protein surfaces' originally proposed by Peter Colman and co-workers in the 1990s. CPdock was primarily developed as a scoring function to serve as an initial filter in protein-protein docking and can be a very helpful tool in protein design - as has lately been demonstrated in COVID-research both in scoring as well as in the evaluation of docked complexes to eliminate the effect of co-substrate binding in a targeted inhibitor binding. [19] [20]

Software

CP@SINP: http://www.saha.ac.in/biop/www/sarama.html
CP: https://github.com/nemo8130/SARAMA-updated
CPint: https://github.com/nemo8130/SARAMAint-updated
CPdock: https://github.com/nemo8130/CPdock

EnCPdock (web-server): https://scinetmol.in/EnCPDock/

Related Research Articles

<span class="mw-page-title-main">Alpha helix</span> Type of secondary structure of proteins

The alpha helix (α-helix) is a common motif in the secondary structure of proteins and is a right hand-helix conformation in which every backbone N−H group hydrogen bonds to the backbone C=O group of the amino acid located four residues earlier along the protein sequence.

<span class="mw-page-title-main">Protein secondary structure</span> General three-dimensional form of local segments of proteins

Protein secondary structure is the local spatial conformation of the polypeptide backbone excluding the side chains. The two most common secondary structural elements are alpha helices and beta sheets, though beta turns and omega loops occur as well. Secondary structure elements typically spontaneously form as an intermediate before the protein folds into its three dimensional tertiary structure.

<span class="mw-page-title-main">Protein tertiary structure</span> Three dimensional shape of a protein

Protein tertiary structure is the three dimensional shape of a protein. The tertiary structure will have a single polypeptide chain "backbone" with one or more protein secondary structures, the protein domains. Amino acid side chains may interact and bond in a number of ways. The interactions and bonds of side chains within a particular protein determine its tertiary structure. The protein tertiary structure is defined by its atomic coordinates. These coordinates may refer either to a protein domain or to the entire tertiary structure. A number of tertiary structures may fold into a quaternary structure.

<span class="mw-page-title-main">Protein structure prediction</span> Type of biological prediction

Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is different from the inverse problem of protein design. Protein structure prediction is one of the most important goals pursued by computational biology; and it is important in medicine and biotechnology.

<span class="mw-page-title-main">Dihedral angle</span> Angle between two planes in space

A dihedral angle is the angle between two intersecting planes or half-planes. In chemistry, it is the clockwise angle between half-planes through two sets of three atoms, having two atoms in common. In solid geometry, it is defined as the union of a line and two half-planes that have this line as a common edge. In higher dimensions, a dihedral angle represents the angle between two hyperplanes. The planes of a flying machine are said to be at positive dihedral angle when both starboard and port main planes are upwardly inclined to the lateral axis; when downwardly inclined they are said to be at a negative dihedral angle.

<span class="mw-page-title-main">Structural bioinformatics</span> Bioinformatics subfield

Structural bioinformatics is the branch of bioinformatics that is related to the analysis and prediction of the three-dimensional structure of biological macromolecules such as proteins, RNA, and DNA. It deals with generalizations about macromolecular 3D structures such as comparisons of overall folds and local motifs, principles of molecular folding, evolution, binding interactions, and structure/function relationships, working both from experimentally solved structures and from computational models. The term structural has the same meaning as in structural biology, and structural bioinformatics can be seen as a part of computational structural biology. The main objective of structural bioinformatics is the creation of new methods of analysing and manipulating biological macromolecular data in order to solve problems in biology and generate new knowledge.

<span class="mw-page-title-main">Ramachandran plot</span> Visual representation of allowable protein conformations

In biochemistry, a Ramachandran plot, originally developed in 1963 by G. N. Ramachandran, C. Ramakrishnan, and V. Sasisekharan, is a way to visualize energetically allowed regions for backbone dihedral angles ψ against φ of amino acid residues in protein structure. The figure on the left illustrates the definition of the φ and ψ backbone dihedral angles. The ω angle at the peptide bond is normally 180°, since the partial-double-bond character keeps the peptide bond planar. The figure in the top right shows the allowed φ,ψ backbone conformational regions from the Ramachandran et al. 1963 and 1968 hard-sphere calculations: full radius in solid outline, reduced radius in dashed, and relaxed tau (N-Cα-C) angle in dotted lines. Because dihedral angle values are circular and 0° is the same as 360°, the edges of the Ramachandran plot "wrap" right-to-left and bottom-to-top. For instance, the small strip of allowed values along the lower-left edge of the plot are a continuation of the large, extended-chain region at upper left.

Protein design is the rational design of new protein molecules to design novel activity, behavior, or purpose, and to advance basic understanding of protein function. Proteins can be designed from scratch or by making calculated variants of a known protein structure and its sequence. Rational protein design approaches make protein-sequence predictions that will fold to specific structures. These predicted sequences can then be validated experimentally through methods such as peptide synthesis, site-directed mutagenesis, or artificial gene synthesis.

3<sub>10</sub> helix Type of secondary structure

A 310 helix is a type of secondary structure found in proteins and polypeptides. Of the numerous protein secondary structures present, the 310-helix is the fourth most common type observed; following α-helices, β-sheets and reverse turns. 310-helices constitute nearly 10–15% of all helices in protein secondary structures, and are typically observed as extensions of α-helices found at either their N- or C- termini. Because of the α-helices tendency to consistently fold and unfold, it has been proposed that the 310-helix serves as an intermediary conformation of sorts, and provides insight into the initiation of α-helix folding.

<span class="mw-page-title-main">UCSF Chimera</span>

UCSF Chimera is an extensible program for interactive visualization and analysis of molecular structures and related data, including density maps, supramolecular assemblies, sequence alignments, docking results, trajectories, and conformational ensembles. High-quality images and movies can be created. Chimera includes complete documentation and can be downloaded free of charge for noncommercial use.

<span class="mw-page-title-main">BALL</span>

BALL is a C++ class framework and set of algorithms and data structures for molecular modelling and computational structural bioinformatics, a Python interface to this library, and a graphical user interface to BALL, the molecule viewer BALLView.

In chemical thermodynamics, conformational entropy is the entropy associated with the number of conformations of a molecule. The concept is most commonly applied to biological macromolecules such as proteins and RNA, but also be used for polysaccharides and other molecules. To calculate the conformational entropy, the possible conformations of the molecule may first be discretized into a finite number of states, usually characterized by unique combinations of certain structural parameters, each of which has been assigned an energy. In proteins, backbone dihedral angles and side chain rotamers are commonly used as parameters, and in RNA the base pairing pattern may be used. These characteristics are used to define the degrees of freedom. The conformational entropy associated with a particular structure or state, such as an alpha-helix, a folded or an unfolded protein structure, is then dependent on the probability of the occupancy of that structure.

Loop modeling is a problem in protein structure prediction requiring the prediction of the conformations of loop regions in proteins with or without the use of a structural template. Computer programs that solve these problems have been used to research a broad range of scientific topics from ADP to breast cancer. Because protein function is determined by its shape and the physiochemical properties of its exposed surface, it is important to create an accurate model for protein/ligand interaction studies. The problem arises often in homology modeling, where the tertiary structure of an amino acid sequence is predicted based on a sequence alignment to a template, or a second sequence whose structure is known. Because loops have highly variable sequences even within a given structural motif or protein fold, they often correspond to unaligned regions in sequence alignments; they also tend to be located at the solvent-exposed surface of globular proteins and thus are more conformationally flexible. Consequently, they often cannot be modeled using standard homology modeling techniques. More constrained versions of loop modeling are also used in the data fitting stages of solving a protein structure by X-ray crystallography, because loops can correspond to regions of low electron density and are therefore difficult to resolve.

<span class="mw-page-title-main">Protein domain</span> Self-stable region of a proteins chain that folds independently from the rest

In molecular biology, a protein domain is a region of a protein's polypeptide chain that is self-stabilizing and that folds independently from the rest. Each domain forms a compact folded three-dimensional structure. Many proteins consist of several domains, and a domain may appear in a variety of different proteins. Molecular evolution uses domains as building blocks and these may be recombined in different arrangements to create proteins with different functions. In general, domains vary in length from between about 50 amino acids up to 250 amino acids in length. The shortest domains, such as zinc fingers, are stabilized by metal ions or disulfide bridges. Domains often form functional units, such as the calcium-binding EF hand domain of calmodulin. Because they are independently stable, domains can be "swapped" by genetic engineering between one protein and another to make chimeric proteins.

<span class="mw-page-title-main">Coot (software)</span>

The program Coot is used to display and manipulate atomic models of macromolecules, typically of proteins or nucleic acids, using 3D computer graphics. It is primarily focused on building and validation of atomic models into three-dimensional electron density maps obtained by X-ray crystallography methods, although it has also been applied to data from electron microscopy.

FoldX is a protein design algorithm that uses an empirical force field. It can determine the energetic effect of point mutations as well as the interaction energy of protein complexes. FoldX can mutate protein and DNA side chains using a probability-based rotamer library, while exploring alternative conformations of the surrounding side chains.

<span class="mw-page-title-main">Structure validation</span> Process of evaluating 3-dimensional atomic models of biomacromolecules

Macromolecular structure validation is the process of evaluating reliability for 3-dimensional atomic models of large biological molecules such as proteins and nucleic acids. These models, which provide 3D coordinates for each atom in the molecule, come from structural biology experiments such as x-ray crystallography or nuclear magnetic resonance (NMR). The validation has three aspects: 1) checking on the validity of the thousands to millions of measurements in the experiment; 2) checking how consistent the atomic model is with those experimental data; and 3) checking consistency of the model with known physical and chemical properties.

<span class="mw-page-title-main">Protein Structure Evaluation Suite & Server</span> System for validating protein structures

Protein Structure Evaluation Suite & Server (PROSESS) is a freely available web server for protein structure validation. It has been designed at the University of Alberta to assist with the process of evaluating and validating protein structures solved by NMR spectroscopy.

<span class="mw-page-title-main">Beta bend ribbon</span>

The beta bend ribbon, or beta-bend ribbon, is a structural feature in polypeptides and proteins. The shortest possible has six amino acid residues arranged as two overlapping hydrogen bonded beta turns in which the carbonyl group of residue i is hydrogen-bonded to the NH of residue i+3 while the carbonyl group of residue i+2 is hydrogen-bonded to the NH of residue i+5. In longer ribbons, this bonding is continued in peptides of 8, 10, etc., amino acid residues. A beta bend ribbon can be regarded as an aberrant 310 helix (3/10-helix) that has lost some of its hydrogen bonds. Two websites are available to facilitate finding and examining these features in proteins: Motivated Proteins; and PDBeMotif.

<span class="mw-page-title-main">Backbone-dependent rotamer library</span> Collection of data on conformations of a given proteins amino acid side chains

In biochemistry, a backbone-dependent rotamer library provides the frequencies, mean dihedral angles, and standard deviations of the discrete conformations of the amino acid side chains in proteins as a function of the backbone dihedral angles φ and ψ of the Ramachandran map. By contrast, backbone-independent rotamer libraries express the frequencies and mean dihedral angles for all side chains in proteins, regardless of the backbone conformation of each residue type. Backbone-dependent rotamer libraries have been shown to have significant advantages over backbone-independent rotamer libraries, principally when used as an energy term, by speeding up search times of side-chain packing algorithms used in protein structure prediction and protein design.

References

  1. 1 2 Basu S, Bhattacharyya D, Banerjee R (2012) Self-Complementarity within Proteins: Bridging the Gap between Binding and Folding. Biophys J 102:2605–2614 . doi: 10.1016/j.bpj.2012.04.029 http://www.saha.ac.in/biop/www/sarama.html
  2. 1 2 Basu S, Bhattacharyya D, Banerjee R (2014) Applications of complementarity plot in error detection and structure validation of proteins. Indian J Biochem Biophys 51:188–200
  3. Basu S, Bhattacharyya D, Wallner B (2014) SARAMAint: The Complementarity Plot for Protein–Protein Interface. J Bioinforma Intell Control 3:309–314 . doi: 10.1166/jbic.2014.1103
  4. basu, sankar (26 June 2017). "SARAMA-updated: Complementarity Plot : A structure validation tool for globular proteins". GitHub .
  5. basu, sankar (26 June 2017). "SARAMAint-updated: The Complementarity Plot for Protein-Protein Interface (CPint)". GitHub .
  6. "SARAMA: a bioinformatics tool for Structure validation | Protein structure analysis". omictools.
  7. Group, Professor Emil Alexov. "Computational Biophysics & Bioinformatics". compbio.clemson.edu.
  8. Ramachandran, G.N., Ramakrishnan, C., Sasisekharan, V., Stereochemistry of polypeptide chain configurations. J.Mol. Biol., 1963, 7, 95-99.
  9. Kleywegt, G.J., Jones, T.A., Phi/Psi-chology: Ramachandran revisited. Structure., 1996, 4, 1395–1400.
  10. Touw, W.G., and Vriend, G., On the complexity of Engh and Huber refinement restraints: the angle tau as example. Acta Cryst D, 2010, 66, 1341–1350.
  11. Laskowski, R.A., MacArthur, M.W., Moss, D.S., Thornton, J.M., PROCHECK: a program to check the stereochemical quality of protein structures. J. Appl. Crystallogr., 1993, 26, 283-291.
  12. Davis,I.W., Leaver-Fay, A., Chen, V.B., Block, J.N., Kapral, G.J., Wang,X., Murray, L.W., Arendall, W.B., III, Snoeyink, J., Richardson, J.S., Richardson, D.C., MolProbity: all-atom contacts and structure validation for proteins and nucleic acids. Nucl. Acids. Res.,35, W375–W383.
  13. Shapovalov, M.S., and Dunbrack, R.L., Jr. A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions. Structure., 2001, 19, 844-858.
  14. Hooft, R.W.W., Sander, C., and Vriend, G., Positioning hydrogen atoms by optimizing hydrogen-bond networks in protein structures. Proteins., 1996, 26, 363-376.
  15. Basu, Sankar; Bhattacharyya, Dhananjay; Wallner, Björn (1 December 2014). "SARAMAint: The Complementarity Plot for Protein–Protein Interface". Journal of Bioinformatics and Intelligent Control. 3 (4): 309–314. doi:10.1166/jbic.2014.1103.
  16. Basu, Sankar (2017-12-07). "CPdock: the complementarity plot for docking of proteins: implementing multi-dielectric continuum electrostatics". Journal of Molecular Modeling. 24 (1): 8. doi:10.1007/s00894-017-3546-y. ISSN   0948-5023. PMID   29218430. S2CID   23174519.
  17. Lawrence, M. C.; Colman, P. M. (1993-12-20). "Shape complementarity at protein/protein interfaces". Journal of Molecular Biology. 234 (4): 946–950. doi:10.1006/jmbi.1993.1648. ISSN   0022-2836. PMID   8263940.
  18. McCoy, A. J.; Chandana Epa, V.; Colman, P. M. (1997-05-02). "Electrostatic complementarity at protein/protein interfaces". Journal of Molecular Biology. 268 (2): 570–584. doi:10.1006/jmbi.1997.0987. ISSN   0022-2836. PMID   9159491.
  19. Basu, Sankar; Chakravarty, Devlina; Bhattacharyya, Dhananjay; Saha, Pampa; Patra, Hirak K. (2021-05-31). "Plausible blockers of Spike RBD in SARS-CoV2-molecular design and underlying interaction dynamics from high-level structural descriptors". Journal of Molecular Modeling. 27 (6): 191. doi:10.1007/s00894-021-04779-0. ISSN   0948-5023. PMC   8165686 . PMID   34057647.
  20. Basu, Sankar; Assaf, Simon S.; Teheux, Fabian; Rooman, Marianne; Pucci, Fabrizio (2021). "BRANEart: Identify Stability Strength and Weakness Regions in Membrane Proteins". Frontiers in Bioinformatics. 1. doi: 10.3389/fbinf.2021.742843 . ISSN   2673-7647.