Protein contact map

Last updated
Protein Contact Map of protein VPA0982 from Vibrio parahaemolyticus Protein Contact Map, 2-Color, 2QIP-A.png
Protein Contact Map of protein VPA0982 from Vibrio parahaemolyticus

A protein contact map represents the distance between all possible amino acid residue pairs of a three-dimensional protein structure using a binary two-dimensional matrix. For two residues and , the element of the matrix is 1 if the two residues are closer than a predetermined threshold, and 0 otherwise. Various contact definitions have been proposed: The distance between the Cα-Cα atom with threshold 6-12 Å; distance between Cβ-Cβ atoms with threshold 6-12 Å (Cα is used for Glycine); and distance between the side-chain centers of mass.

Contents

Overview

Contact maps provide a more reduced representation of a protein structure than its full 3D atomic coordinates. The advantage is that contact maps are invariant to rotations and translations. They are more easily predicted by machine learning methods. It has also been shown that under certain circumstances (e.g. low content of erroneously predicted contacts) it is possible to reconstruct the 3D coordinates of a protein using its contact map. [1] [2]

Contact maps are also used for protein superimposition and to describe similarity between protein structures. [3] They are either predicted from protein sequence or calculated from a given structure.

Contact map prediction

With the availability of high numbers of genomic sequences it becomes feasible to analyze such sequences for coevolving residues. The effectiveness of this approach results from the fact that a mutation in position i of a protein is more likely to be associated with a mutation in position j than with a back-mutation in i if both positions are functionally coupled (e.g. by taking part in an enzymatic domain, or by being adjacent in a folded protein, or even by being adjacent in an oligomer of that protein). [4]

Several statistical methods exist to extract from a multiple sequence alignment such coupled residue pairs: observed versus expected frequencies of residue pairs (OMES); [5] the McLachlan Based Substitution correlation (McBASC); [6] statistical coupling analysis; Mutual Information (MI) based methods; [7] and recently direct coupling analysis (DCA). [8] [9]

Machine learning algorithms have been able to enhance MSA analysis methods, especially for non-homologous proteins (ie. shallow MSA's). [10]

Predicted contact maps have been used in the prediction of membrane proteins where helix-helix interactions are targeted. [11]

HB Plot

Knowledge of the relationship between a protein's structure and its dynamic behavior is essential for understanding protein function. The description of a protein three dimensional structure as a network of hydrogen bonding interactions (HB plot) [12] was introduced as a tool for exploring protein structure and function. By analyzing the network of tertiary interactions, the possible spread of information within a protein can be investigated.

HB plot offers a simple way of analyzing protein secondary structure and tertiary structure. Hydrogen bonds stabilizing secondary structural elements ( secondary hydrogen bonds ) and those formed between distant amino acid residues - defined as tertiary hydrogen bonds - can be easily distinguished in HB plot, thus, amino acid residues involved in stabilizing protein structure and function can be identified.

Features

The plot distinguishes between main chain-main chain, main chain-side chain and side chain-side chain hydrogen bonding interactions. Bifurcated hydrogen bonds and multiple hydrogen bonds between amino acid residues; and intra- and interchain hydrogen bonds are also indicated on the plots. Three classes of hydrogen bondings are distinguished by color-coding; short (distance smaller than 2.5 Å between donor and acceptor), intermediate (between 2.5 Å and 3.2 Å) and long hydrogen bonds (greater than 3.2 Å).

Secondary structure elements in HB plot

Secondary structure elements in HB plot, there is swapped parallel and anti-parallel sheets Elements hb2.jpg
Secondary structure elements in HB plot, there is swapped parallel and anti-parallel sheets

In representations of the HB plot, characteristic patterns of secondary structure elements can be recognised easily, as follows:

  1. Helices can be identified as strips directly adjacent to the diagonal.
  2. Antiparallel beta sheets appear in HB plot as cross-diagonal.
  3. Parallel beta sheets appears in the HB plot as parallel to the diagonal.
  4. Loops appear as breaks in the diagonal between the cross-diagonal beta-sheet motifs.

Examples of usage

Cytochrome P450s

The cytochrome P450s (P450s) are xenobiotic-metabolizing membrane-bound heme-containing enzymes that use molecular oxygen and electrons from NADPH cytochrome P450 reductase to oxidize their substrates. CYP2B4, a member of the cytochrome P450 family is the only protein within this family, whose X-ray structure in both open 11 and closed form 12 is published. The comparison of the open and closed structures of CYP2B4 structures reveals large-scale conformational rearrangement between the two states, with the greatest conformational change around the residues 215-225, which is widely open in ligand-free state and shut after ligand binding; and the region around loop C near the heme.

HB Plot and structure of Cytochrome P450 2B4 in closed form HB Plot and structure of Cytochrome P450 2B4 in closed form.jpg
HB Plot and structure of Cytochrome P450 2B4 in closed form

Examining the HB plot of the closed and open state of CYP2B4 revealed that the rearrangement of tertiary hydrogen bonds was in excellent agreement with the current knowledge of the cytochrome P450 catalytic cycle.

The first step in P450 catalytic cycle is identified as substrate binding. Preliminary binding of a ligand near to the entrance breaks hydrogen bonds S212-E474, S207-H172 in the open form of CYP2B4 and hydrogen bonds E218-A102, Q215-L51 are formed that fix the entrance in the closed form as the HB plot reveals.

The second step is the transfer of the first electron from NADPH via an electron transfer chain. For the electron transfer a conformational change occurs that triggers interaction of the P450 with the NADPH cytochrome P450 reductase. Breaking of hydrogen bonds between S128-N287, S128-T291, L124-N287 and forming S96-R434, A116-R434, R125-I435, D82-R400 at the NADPH cytochrome P450 reductase binding site as seen in HB plottransform CYP2B4 to a conformation state, where binding of NADPH cytochrome P450 reductase occurs.

In the third step, oxygen enters CYP2B4 in the closed state - the state where newly formed hydrogen bonds S176-T300, H172-S304, N167-R308 open a tunnel which is exactly the size and shape of an oxygen molecule.

Lipocalin family

Beta-lactoglobulin in open (white) and ligand-bound (red) form Lipo label.png
Beta-lactoglobulin in open (white) and ligand-bound (red) form

The lipocalin family is a large and diverse family of proteins with functions as small hydrophobic molecule transporters. Beta-lactoglobulin is a typical member of the lipocalin family. Beta-lactoglobulin was found to have a role in the transport of hydrophobic ligands such as retinol or fatty acids. [13] Its crystal structure were determined [e.g. Qin, 1998] with different ligands and in ligand-free form as well. The crystal structures determined so far reveal that the typical lipocalin contains eight-stranded antiparallel-barrel arranged to form a conical central cavity in which the hydrophobic ligand is bound. The structure of beta-lactoglobulin reveals that the barrel-form structure with the central cavity of the protein has an "entrance" surrounded by five beta-loops with centers around 26, 35, 63, 87, and 111, which undergo a conformational change during the ligand binding and close the cavity.

The overall shape of beta-lactoglobulin is characteristic of the lipocalin family.[ citation needed ] In the absence of alpha-helices, the main diagonal almost disappears and the cross-diagonals representing the beta-sheets dominate the plot. Relatively low number of tertiary hydrogen bonds can be found in the plot, with three high-density regions, one of which is connected to a loop at the residues around 63, a second is connected to the loop around 87, and a third region which is connected to the regions 26 and 35. The fifth loop around 111 is represented only one tertiary hydrogen bond in the HB plot.

In the three-dimensional structure, tertiary hydrogen bonds are formed (1) near to the entrance, directly involved in conformational rearrangement during ligand binding; and (2) at the bottom of the "barrel". HB plots of the open and closed forms of beta-lactoglobulin are very similar, all unique motifs can be recognized in both forms. Difference in HB plots of open and ligand-bound form show few important individual changes in tertiary hydrogen bonding pattern. Especially, the formation of hydrogen bonds between Y20-E157 and S21-H161 in closed form might be crucial in conformational rearrangement. These hydrogen bonds lie at the bottom of the cavity, which suggests that the closure of the entrance of a lipocalin starts when a ligand reached the bottom of the cavity and broke hydrogen bonds R123-Y99, R123-T18, and V41-Q120. Lipocalins are known to have very low sequence similarity with high structural similarity.[ citation needed ] The only conserved regions are exactly the region around 20 and 160 with an unknown role.

HB Plots of beta-lactoglobulin in open (2BLG) and ligand-bound (2AKQ) form Lipohb.jpg
HB Plots of beta-lactoglobulin in open (2BLG) and ligand-bound (2AKQ) form

See also

Related Research Articles

<span class="mw-page-title-main">Alpha helix</span> Type of secondary structure of proteins

An alpha helix is a sequence of amino acids in a protein that are twisted into a coil.

<span class="mw-page-title-main">Beta sheet</span> Protein structural motif

The beta sheet, (β-sheet) is a common motif of the regular protein secondary structure. Beta sheets consist of beta strands (β-strands) connected laterally by at least two or three backbone hydrogen bonds, forming a generally twisted, pleated sheet. A β-strand is a stretch of polypeptide chain typically 3 to 10 amino acids long with backbone in an extended conformation. The supramolecular association of β-sheets has been implicated in the formation of the fibrils and protein aggregates observed in amyloidosis, Alzheimer's disease and other proteinopathies.

<span class="mw-page-title-main">Protein secondary structure</span> General three-dimensional form of local segments of proteins

Protein secondary structure is the local spatial conformation of the polypeptide backbone excluding the side chains. The two most common secondary structural elements are alpha helices and beta sheets, though beta turns and omega loops occur as well. Secondary structure elements typically spontaneously form as an intermediate before the protein folds into its three dimensional tertiary structure.

<span class="mw-page-title-main">Protein tertiary structure</span> Three dimensional shape of a protein

Protein tertiary structure is the three dimensional shape of a protein. The tertiary structure will have a single polypeptide chain "backbone" with one or more protein secondary structures, the protein domains. Amino acid side chains may interact and bond in a number of ways. The interactions and bonds of side chains within a particular protein determine its tertiary structure. The protein tertiary structure is defined by its atomic coordinates. These coordinates may refer either to a protein domain or to the entire tertiary structure. A number of tertiary structures may fold into a quaternary structure.

<span class="mw-page-title-main">Transmembrane protein</span> Protein spanning across a biological membrane

A transmembrane protein (TP) is a type of integral membrane protein that spans the entirety of the cell membrane. Many transmembrane proteins function as gateways to permit the transport of specific substances across the membrane. They frequently undergo significant conformational changes to move a substance through the membrane. They are usually highly hydrophobic and aggregate and precipitate in water. They require detergents or nonpolar solvents for extraction, although some of them (beta-barrels) can be also extracted using denaturing agents.

<span class="mw-page-title-main">Active site</span> Active region of an enzyme

In biology and biochemistry, the active site is the region of an enzyme where substrate molecules bind and undergo a chemical reaction. The active site consists of amino acid residues that form temporary bonds with the substrate, the binding site, and residues that catalyse a reaction of that substrate, the catalytic site. Although the active site occupies only ~10–20% of the volume of an enzyme, it is the most important part as it directly catalyzes the chemical reaction. It usually consists of three to four amino acids, while other amino acids within the protein are required to maintain the tertiary structure of the enzymes.

<span class="mw-page-title-main">Protein structure prediction</span> Type of biological prediction

Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is different from the inverse problem of protein design. Protein structure prediction is one of the most important goals pursued by computational biology; and it is important in medicine and biotechnology.

<span class="mw-page-title-main">Structural alignment</span> Aligning molecular sequences using sequence and structural information

Structural alignment attempts to establish homology between two or more polymer structures based on their shape and three-dimensional conformation. This process is usually applied to protein tertiary structures but can also be used for large RNA molecules. In contrast to simple structural superposition, where at least some equivalent residues of the two structures are known, structural alignment requires no a priori knowledge of equivalent positions. Structural alignment is a valuable tool for the comparison of proteins with low sequence similarity, where evolutionary relationships between proteins cannot be easily detected by standard sequence alignment techniques. Structural alignment can therefore be used to imply evolutionary relationships between proteins that share very little common sequence. However, caution should be used in using the results as evidence for shared evolutionary ancestry because of the possible confounding effects of convergent evolution by which multiple unrelated amino acid sequences converge on a common tertiary structure.

<span class="mw-page-title-main">Structural bioinformatics</span> Bioinformatics subfield

Structural bioinformatics is the branch of bioinformatics that is related to the analysis and prediction of the three-dimensional structure of biological macromolecules such as proteins, RNA, and DNA. It deals with generalizations about macromolecular 3D structures such as comparisons of overall folds and local motifs, principles of molecular folding, evolution, binding interactions, and structure/function relationships, working both from experimentally solved structures and from computational models. The term structural has the same meaning as in structural biology, and structural bioinformatics can be seen as a part of computational structural biology. The main objective of structural bioinformatics is the creation of new methods of analysing and manipulating biological macromolecular data in order to solve problems in biology and generate new knowledge.

<span class="mw-page-title-main">Protein structure</span> Three-dimensional arrangement of atoms in an amino acid-chain molecule

Protein structure is the three-dimensional arrangement of atoms in an amino acid-chain molecule. Proteins are polymers – specifically polypeptides – formed from sequences of amino acids, which are the monomers of the polymer. A single amino acid monomer may also be called a residue, which indicates a repeating unit of a polymer. Proteins form by amino acids undergoing condensation reactions, in which the amino acids lose one water molecule per reaction in order to attach to one another with a peptide bond. By convention, a chain under 30 amino acids is often identified as a peptide, rather than a protein. To be able to perform their biological function, proteins fold into one or more specific spatial conformations driven by a number of non-covalent interactions, such as hydrogen bonding, ionic interactions, Van der Waals forces, and hydrophobic packing. To understand the functions of proteins at a molecular level, it is often necessary to determine their three-dimensional structure. This is the topic of the scientific field of structural biology, which employs techniques such as X-ray crystallography, NMR spectroscopy, cryo-electron microscopy (cryo-EM) and dual polarisation interferometry, to determine the structure of proteins.

<span class="mw-page-title-main">Protein–protein interaction</span> Physical interactions and constructions between multiple proteins

Protein–protein interactions (PPIs) are physical contacts of high specificity established between two or more protein molecules as a result of biochemical events steered by interactions that include electrostatic forces, hydrogen bonding and the hydrophobic effect. Many are physical contacts with molecular associations between chains that occur in a cell or in a living organism in a specific biomolecular context.

<span class="mw-page-title-main">Beta barrel</span>

In protein structures, a beta barrel is a beta sheet composed of tandem repeats that twists and coils to form a closed toroidal structure in which the first strand is bonded to the last strand. Beta-strands in many beta-barrels are arranged in an antiparallel fashion. Beta barrel structures are named for resemblance to the barrels used to contain liquids. Most of them are water-soluble proteins and frequently bind hydrophobic ligands in the barrel center, as in lipocalins. Others span cell membranes and are commonly found in porins. Porin-like barrel structures are encoded by as many as 2–3% of the genes in Gram-negative bacteria. It has been shown that more than 600 proteins with various function contain the beta barrel structure.

<span class="mw-page-title-main">Biomolecular structure</span> 3D conformation of a biological sequence, like DNA, RNA, proteins

Biomolecular structure is the intricate folded, three-dimensional shape that is formed by a molecule of protein, DNA, or RNA, and that is important to its function. The structure of these molecules may be considered at any of several length scales ranging from the level of individual atoms to the relationships among entire protein subunits. This useful distinction among scales is often expressed as a decomposition of molecular structure into four levels: primary, secondary, tertiary, and quaternary. The scaffold for this multiscale organization of the molecule arises at the secondary level, where the fundamental structural elements are the molecule's various hydrogen bonds. This leads to several recognizable domains of protein structure and nucleic acid structure, including such secondary-structure features as alpha helixes and beta sheets for proteins, and hairpin loops, bulges, and internal loops for nucleic acids. The terms primary, secondary, tertiary, and quaternary structure were introduced by Kaj Ulrik Linderstrøm-Lang in his 1951 Lane Medical Lectures at Stanford University.

A turn is an element of secondary structure in proteins where the polypeptide chain reverses its overall direction.

<span class="mw-page-title-main">Beta hairpin</span>

The beta hairpin is a simple protein structural motif involving two beta strands that look like a hairpin. The motif consists of two strands that are adjacent in primary structure, oriented in an antiparallel direction, and linked by a short loop of two to five amino acids. Beta hairpins can occur in isolation or as part of a series of hydrogen bonded strands that collectively comprise a beta sheet.

Computational Resources for Drug Discovery (CRDD) is one of the important silico modules of Open Source for Drug Discovery (OSDD). The CRDD web portal provides computer resources related to drug discovery on a single platform. It provides computational resources for researchers in computer-aided drug design, a discussion forum, and resources to maintain a Wikipedia related to drug discovery, predict inhibitors, and predict the ADME-Tox property of molecules. One of the major objectives of CRDD is to promote open source software in the field of chemoinformatics and pharmacoinformatics.

<span class="mw-page-title-main">Schellman loop</span>

Schellman loops are commonly occurring structural features of proteins and polypeptides. Each has six amino acid residues with two specific inter-mainchain hydrogen bonds and a characteristic main chain dihedral angle conformation. The CO group of residue i is hydrogen-bonded to the NH of residue i+5, and the CO group of residue i+1 is hydrogen-bonded to the NH of residue i+4. Residues i+1, i+2, and i+3 have negative φ (phi) angle values and the phi value of residue i+4 is positive. Schellman loops incorporate a three amino acid residue RL nest, in which three mainchain NH groups form a concavity for hydrogen bonding to carbonyl oxygens. About 2.5% of amino acids in proteins belong to Schellman loops. Two websites are available for examining small motifs in proteins, Motivated Proteins: ; or PDBeMotif:.

PSI-blast based secondary structure PREDiction (PSIPRED) is a method used to investigate protein structure. It uses artificial neural network machine learning methods in its algorithm. It is a server-side program, featuring a website serving as a front-end interface, which can predict a protein's secondary structure from the primary sequence.

<span class="mw-page-title-main">Beta bend ribbon</span>

The beta bend ribbon, or beta-bend ribbon, is a structural feature in polypeptides and proteins. The shortest possible has six amino acid residues arranged as two overlapping hydrogen bonded beta turns in which the carbonyl group of residue i is hydrogen-bonded to the NH of residue i+3 while the carbonyl group of residue i+2 is hydrogen-bonded to the NH of residue i+5. In longer ribbons, this bonding is continued in peptides of 8, 10, etc., amino acid residues. A beta bend ribbon can be regarded as an aberrant 310 helix (3/10-helix) that has lost some of its hydrogen bonds. Two websites are available to facilitate finding and examining these features in proteins: Motivated Proteins; and PDBeMotif.

Amide Rings are small motifs in proteins and polypeptides. They consist of 9-atom or 11-atom rings formed by two CO...HN hydrogen bonds between a side chain amide group and the main chain atoms of a short polypeptide. They are observed with glutamine or asparagine side chains within proteins and polypeptides. Structurally similar rings occur in the binding of purine, pyrimidine and nicotinamide bases to the main chain atoms of proteins. About 4% of asparagines and glutamines form amide rings; in databases of protein domain structures, one is present, on average, every other protein.

References

  1. Pietal, MJ.; Bujnicki, JM.; Kozlowski, LP. (Jun 2015). "GDFuzz3D: a method for protein 3D structure reconstruction from contact maps, based on a non-Euclidean distance function". Bioinformatics. 31 (21): 3499–505. doi: 10.1093/bioinformatics/btv390 . PMID   26130575.
  2. Vassura M, Margara L, Di Lena P, Medri F, Fariselli P, Casadio R (2008). "Reconstruction of 3D Structures From Protein Contact Maps". IEEE/ACM Transactions on Computational Biology and Bioinformatics. 5 (3): 357–367. doi:10.1109/TCBB.2008.27. PMID   18670040. S2CID   6080543.
  3. Holm L, Sander C (August 1996). "Mapping the protein universe". Science. 273 (5275): 595–603. Bibcode:1996Sci...273..595H. doi:10.1126/science.273.5275.595. PMID   8662544. S2CID   7509134.
  4. Fitch, W. M.; Markowitz, E. (1970). "An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution". Biochem. Genet. 4 (5): 579–593. doi:10.1007/bf00486096. PMID   5489762. S2CID   26638948.
  5. Kass, I.; Horovitz, A. (2002). "Mapping pathways of allosteric communication in GroEL by analysis of correlated mutations". Proteins. 48 (4): 611–617. doi:10.1002/prot.10180. PMID   12211028. S2CID   40289209.
  6. Gobel, U.; et al. (1994). "Correlated mutations and residue contacts in proteins". Proteins. 18 (4): 309–317. doi:10.1002/prot.340180402. PMID   8208723. S2CID   14978727.
  7. Wollenberg, K. R.; Atchley, W. R. (2000). "Separation of phylogenetic and functional associations in biological sequences by using the parametric bootstrap". Proc. Natl. Acad. Sci. USA. 97 (7): 3288–3291. Bibcode:2000PNAS...97.3288W. doi: 10.1073/pnas.97.7.3288 . PMC   16231 . PMID   10725404.
  8. Weigt, M; White, RA; Szurmant, H; Hoch, JA; Hwa, T (2009). "Identification of direct residue contacts in protein–protein interaction by message passing". Proc Natl Acad Sci USA. 106 (1): 67–72. arXiv: 0901.1248 . Bibcode:2009PNAS..106...67W. doi: 10.1073/pnas.0805923106 . PMC   2629192 . PMID   19116270.
  9. Morcos, F; et al. (2011). "Direct-coupling analysis of residue coevolution captures native contacts across many protein families". Proc Natl Acad Sci USA. 108 (49): E1293–E1301. doi: 10.1073/pnas.1111471108 . PMC   3241805 . PMID   22106262.
  10. Hanson, Jack; Paliwal, Kuldip K; Litfin, Thomas; Yang, Yuedong; Zhou, Yaoqi (2018). "Accurate Prediction of Protein Contact Maps by Coupling Residual Two-Dimensional Bidirectional Long Short-Term Memory with Convolutional Neural Networks". Bioinformatics. 34 (23): 4039–4045. doi:10.1093/bioinformatics/bty481. PMID   29931279. S2CID   49335891.
  11. Lo A, Chiu YY, Rødland EA, Lyu PC, Sung TY, Hsu WL (2009). "Predicting helix-helix interactions from residue contacts in membrane proteins". Bioinformatics. 25 (8): 996–1003. doi:10.1093/bioinformatics/btp114. PMC   2666818 . PMID   19244388.
  12. Bikadi Z, Demko L, Hazai E (2007). "Functional and structural characterization of a protein based on analysis of its hydrogen bonding network by hydrogen bonding plot". Arch Biochem Biophys. 461 (2): 225–234. doi:10.1016/j.abb.2007.02.020. PMID   17391641.
  13. Pérez, M. D.; Calvo, M (1995). "Interaction of beta-lactoglobulin with retinol and fatty acids and its role as a possible biological function for this protein: A review". Journal of Dairy Science. 78 (5): 978–88. doi: 10.3168/jds.S0022-0302(95)76713-3 . PMID   7622732.