Hydrophobic-polar protein folding model

Last updated

The hydrophobic-polar protein folding model is a highly simplified model for examining protein folds in space. First proposed by Ken Dill in 1985, it is the most known type of lattice protein: it stems from the observation that hydrophobic interactions between amino acid residues are the driving force for proteins folding into their native state. [1] All amino acid types are classified as either hydrophobic (H) or polar (P), and the folding of a protein sequence is defined as a self-avoiding walk in a 2D or 3D lattice. The HP model imitates the hydrophobic effect by assigning a negative (favorable) weight to interactions between adjacent, non-covalently bound H residues. Proteins that have minimum energy are assumed to be in their native state.

Contents

The HP model can be expressed in both two and three dimensions, generally with square lattices, although triangular lattices have been used as well. It has also been studied on general regular lattices. [2]

Randomized search algorithms are often used to tackle the HP folding problem. This includes stochastic, evolutionary algorithms like the Monte Carlo method, genetic algorithms, and ant colony optimization. While no method has been able to calculate the experimentally determined minimum energetic state for long protein sequences, the most advanced methods today are able to come close. [3] [4] For some model variants/lattices, it is possible to compute optimal structures (with maximal number of H-H contacts) using constraint programming techniques [5] [6] as e.g. implemented within the CPSP-tools webserver. [7]

Even though the HP model abstracts away many of the details of protein folding, it is still an NP-hard problem on both 2D and 3D square lattices. [8]

Recently, a Monte Carlo method, named FRESS, was developed and appears to perform well on HP models. [9]

See also

Related Research Articles

<span class="mw-page-title-main">Protein</span> Biomolecule consisting of chains of amino acid residues

Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, responding to stimuli, providing structure to cells and organisms, and transporting molecules from one location to another. Proteins differ from one another primarily in their sequence of amino acids, which is dictated by the nucleotide sequence of their genes, and which usually results in protein folding into a specific 3D structure that determines its activity.

<span class="mw-page-title-main">Protein tertiary structure</span> Three dimensional shape of a protein

Protein tertiary structure is the three dimensional shape of a protein. The tertiary structure will have a single polypeptide chain "backbone" with one or more protein secondary structures, the protein domains. Amino acid side chains may interact and bond in a number of ways. The interactions and bonds of side chains within a particular protein determine its tertiary structure. The protein tertiary structure is defined by its atomic coordinates. These coordinates may refer either to a protein domain or to the entire tertiary structure. A number of tertiary structures may fold into a quaternary structure.

<span class="mw-page-title-main">Protein folding</span> Change of a linear protein chain to a 3D structure

Protein folding is the physical process where a protein chain is translated into its native three-dimensional structure, typically a "folded" conformation, by which the protein becomes biologically functional. Via an expeditious and reproducible process, a polypeptide folds into its characteristic three-dimensional structure from a random coil. Each protein exists first as an unfolded polypeptide or random coil after being translated from a sequence of mRNA into a linear chain of amino acids. At this stage, the polypeptide lacks any stable three-dimensional structure. As the polypeptide chain is being synthesized by a ribosome, the linear chain begins to fold into its three-dimensional structure.

Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be deterministic in principle. They are often used in physical and mathematical problems and are most useful when it is difficult or impossible to use other approaches. Monte Carlo methods are mainly used in three problem classes: optimization, numerical integration, and generating draws from a probability distribution.

<span class="mw-page-title-main">Sequence alignment</span> Process in bioinformatics that identifies equivalent sites within molecular sequences

In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are typically represented as rows within a matrix. Gaps are inserted between the residues so that identical or similar characters are aligned in successive columns. Sequence alignments are also used for non-biological sequences, such as calculating the distance cost between strings in a natural language or in financial data.

<span class="mw-page-title-main">Protein structure prediction</span> Type of biological prediction

Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is different from the inverse problem of protein design. Protein structure prediction is one of the most important goals pursued by computational biology; and it is important in medicine and biotechnology.

<span class="mw-page-title-main">Structural alignment</span> Aligning molecular sequences using sequence and structural information

Structural alignment attempts to establish homology between two or more polymer structures based on their shape and three-dimensional conformation. This process is usually applied to protein tertiary structures but can also be used for large RNA molecules. In contrast to simple structural superposition, where at least some equivalent residues of the two structures are known, structural alignment requires no a priori knowledge of equivalent positions. Structural alignment is a valuable tool for the comparison of proteins with low sequence similarity, where evolutionary relationships between proteins cannot be easily detected by standard sequence alignment techniques. Structural alignment can therefore be used to imply evolutionary relationships between proteins that share very little common sequence. However, caution should be used in using the results as evidence for shared evolutionary ancestry because of the possible confounding effects of convergent evolution by which multiple unrelated amino acid sequences converge on a common tertiary structure.

<span class="mw-page-title-main">Structural bioinformatics</span> Bioinformatics subfield

Structural bioinformatics is the branch of bioinformatics that is related to the analysis and prediction of the three-dimensional structure of biological macromolecules such as proteins, RNA, and DNA. It deals with generalizations about macromolecular 3D structures such as comparisons of overall folds and local motifs, principles of molecular folding, evolution, binding interactions, and structure/function relationships, working both from experimentally solved structures and from computational models. The term structural has the same meaning as in structural biology, and structural bioinformatics can be seen as a part of computational structural biology. The main objective of structural bioinformatics is the creation of new methods of analysing and manipulating biological macromolecular data in order to solve problems in biology and generate new knowledge.

Protein design is the rational design of new protein molecules to design novel activity, behavior, or purpose, and to advance basic understanding of protein function. Proteins can be designed from scratch or by making calculated variants of a known protein structure and its sequence. Rational protein design approaches make protein-sequence predictions that will fold to specific structures. These predicted sequences can then be validated experimentally through methods such as peptide synthesis, site-directed mutagenesis, or artificial gene synthesis.

Lattice proteins are highly simplified models of protein-like heteropolymer chains on lattice conformational space which are used to investigate protein folding. Simplification in lattice proteins is twofold: each whole residue is modeled as a single "bead" or "point" of a finite set of types, and each residue is restricted to be placed on vertices of a lattice. To guarantee the connectivity of the protein chain, adjacent residues on the backbone must be placed on adjacent vertices of the lattice. Steric constraints are expressed by imposing that no more than one residue can be placed on the same lattice vertex.

<span class="mw-page-title-main">Folding funnel</span>

The folding funnel hypothesis is a specific version of the energy landscape theory of protein folding, which assumes that a protein's native state corresponds to its free energy minimum under the solution conditions usually encountered in cells. Although energy landscapes may be "rough", with many non-native local minima in which partially folded proteins can become trapped, the folding funnel hypothesis assumes that the native state is a deep free energy minimum with steep walls, corresponding to a single well-defined tertiary structure. The term was introduced by Ken A. Dill in a 1987 article discussing the stabilities of globular proteins.

In computational biology, de novo protein structure prediction refers to an algorithmic process by which protein tertiary structure is predicted from its amino acid primary sequence. The problem itself has occupied leading scientists for decades while still remaining unsolved. According to Science, the problem remains one of the top 125 outstanding issues in modern science. At present, some of the most successful methods have a reasonable probability of predicting the folds of small, single-domain proteins within 1.5 angstroms over the entire structure.

<span class="mw-page-title-main">Protein domain</span> Self-stable region of a proteins chain that folds independently from the rest

In molecular biology, a protein domain is a region of a protein's polypeptide chain that is self-stabilizing and that folds independently from the rest. Each domain forms a compact folded three-dimensional structure. Many proteins consist of several domains, and a domain may appear in a variety of different proteins. Molecular evolution uses domains as building blocks and these may be recombined in different arrangements to create proteins with different functions. In general, domains vary in length from between about 50 amino acids up to 250 amino acids in length. The shortest domains, such as zinc fingers, are stabilized by metal ions or disulfide bridges. Domains often form functional units, such as the calcium-binding EF hand domain of calmodulin. Because they are independently stable, domains can be "swapped" by genetic engineering between one protein and another to make chimeric proteins.

This is a list of computer programs that are predominantly used for molecular mechanics calculations.

This is a list of notable computer programs that are used for nucleic acids simulations.

Phyre and Phyre2 are free web-based services for protein structure prediction. Phyre is among the most popular methods for protein structure prediction having been cited over 1500 times. Like other remote homology recognition techniques, it is able to regularly generate reliable protein models when other widely used methods such as PSI-BLAST cannot. Phyre2 has been designed to ensure a user-friendly interface for users inexpert in protein structure prediction methods. Its development is funded by the Biotechnology and Biological Sciences Research Council.

<span class="mw-page-title-main">Circular permutation in proteins</span> Arrangement of amino acid sequence

A circular permutation is a relationship between proteins whereby the proteins have a changed order of amino acids in their peptide sequence. The result is a protein structure with different connectivity, but overall similar three-dimensional (3D) shape. In 1979, the first pair of circularly permuted proteins – concanavalin A and lectin – were discovered; over 2000 such proteins are now known.

A neutral network is a set of genes all related by point mutations that have equivalent function or fitness. Each node represents a gene sequence and each line represents the mutation connecting two sequences. Neutral networks can be thought of as high, flat plateaus in a fitness landscape. During neutral evolution, genes can randomly move through neutral networks and traverse regions of sequence space which may have consequences for robustness and evolvability.

Membranome database provides structural and functional information about more than 6000 single-pass (bitopic) transmembrane proteins from Homo sapiens, Arabidopsis thaliana, Dictyostelium discoideum, Saccharomyces cerevisiae, Escherichia coli and Methanocaldococcus jannaschii. Bitopic membrane proteins consist of a single transmembrane alpha-helix connecting water-soluble domains of the protein situated at the opposite sides of a biological membrane. These proteins are frequently involved in the signal transduction and communication between cells in multicellular organisms.

References

  1. Dill K.A. (1985). "Theory for the folding and stability of globular proteins". Biochemistry. 24 (6): 1501–9. doi:10.1021/bi00327a032. PMID   3986190.
  2. Bechini, A. (2013). "On the characterization and software implementation of general protein lattice models". PLOS ONE. 8 (3): e59504. Bibcode:2013PLoSO...859504B. doi: 10.1371/journal.pone.0059504 . PMC   3612044 . PMID   23555684.
  3. Bui T.N.; Sundarraj G. (2005). "An efficient genetic algorithm for predicting protein tertiary structures in the 2D HP model". Proceedings of the 7th annual conference on Genetic and evolutionary computation. pp. 385–392. doi:10.1145/1068009.1068072. ISBN   978-1595930101. S2CID   13485429.
  4. Shmygelska A.; Hoos H.H. (2003). "An Improved Ant Colony Optimisation Algorithm for the 2D HP Protein Folding Problem". Advances in Artificial Intelligence. Lecture Notes in Computer Science. Vol. 2671. pp. 400–417. CiteSeerX   10.1.1.13.7617 . doi:10.1007/3-540-44886-1_30. ISBN   978-3-540-40300-5.
  5. Yue K.; Fiebig K.M.; Thomas P.D.; Chan H.S.; Shakhnovich E.I.; Dill K.A. (1995). "A test of lattice protein folding algorithms". Proc Natl Acad Sci U S A. 92 (1): 325–329. Bibcode:1995PNAS...92..325Y. doi: 10.1073/pnas.92.1.325 . PMC   42871 . PMID   7816842.
  6. Mann M.; Backofen R. (2014). "Exact methods for lattice protein models". Bio-Algorithms and Med-Systems. 10 (4): 213–225. doi:10.1515/bams-2014-0014. S2CID   1238394.
  7. Mann M.; Will S.; Backofen R. (2008). "CPSP-tools - exact and complete algorithms for high-throughput 3D lattice protein studies". BMC Bioinformatics. 9: 230. doi: 10.1186/1471-2105-9-230 . PMC   2396640 . PMID   18462492.
  8. Crescenzi P.; Goldman D.; Papadimitriou C.; Piccolboni A.; Yannakakis M. (1998). "On the complexity of protein folding". Macromolecules. 5 (1): 27–40. CiteSeerX   10.1.1.122.1898 . doi:10.1145/279069.279089. PMID   9773342. S2CID   7783811.
  9. Jinfeng Zhang; S. C. Kou; Jun S. Liu (2007). "Polymer structure optimization and simulation via a fragment re-growth Monte Carlo" (PDF). J. Chem. Phys. 126 (22): 225101. doi:10.1063/1.2736681. PMID   17581081. S2CID   457506.