Lattice protein

Last updated

Lattice proteins are highly simplified models of protein-like heteropolymer chains on lattice conformational space which are used to investigate protein folding. [1] Simplification in lattice proteins is twofold: each whole residue (amino acid) is modeled as a single "bead" or "point" of a finite set of types (usually only two), and each residue is restricted to be placed on vertices of a (usually cubic) lattice. [1] To guarantee the connectivity of the protein chain, adjacent residues on the backbone must be placed on adjacent vertices of the lattice. [2] Steric constraints are expressed by imposing that no more than one residue can be placed on the same lattice vertex. [2]

Contents

Because proteins are such large molecules, there are severe computational limits on the simulated timescales of their behaviour when modeled in all-atom detail. The millisecond regime for all-atom simulations was not reached until 2010, [3] and it is still not possible to fold all real proteins on a computer. Simplification significantly reduces the computational effort in handling the model, although even in this simplified scenario the protein folding problem is NP-complete. [4]

Overview

Different versions of lattice proteins may adopt different types of lattice (typically square and triangular ones), in two or three dimensions, but it has been shown that generic lattices can be used and handled via a uniform approach. [2]

Lattice proteins are made to resemble real proteins by introducing an energy function , a set of conditions which specify the interaction energy between beads occupying adjacent lattice sites. [5] The energy function mimics the interactions between amino acids in real proteins, which include steric, hydrophobic and hydrogen bonding effects. [2] The beads are divided into types, and the energy function specifies the interactions depending on the bead type, just as different types of amino acids interact differently. [5] One of the most popular lattice models, the hydrophobic-polar model (HP model), [6] features just two bead types—hydrophobic (H) and polar (P)—and mimics the hydrophobic effect by specifying a favorable interaction between H beads. [5]

For any sequence in any particular structure, an energy can be rapidly calculated from the energy function. For the simple HP model, this is an enumeration of all the contacts between H residues that are adjacent in the structure but not in the chain. [7] Most researchers consider a lattice protein sequence protein-like only if it possesses a single structure with an energetic state lower than in any other structure, although there are exceptions that consider ensembles of possible folded states. [8] This is the energetic ground state, or native state. The relative positions of the beads in the native state constitute the lattice protein's tertiary structure [ citation needed ]. Lattice proteins do not have genuine secondary structure; however, some researchers have claimed that they can be extrapolated onto real protein structures which do include secondary structure, by appealing to the same law by which the phase diagrams of different substances can be scaled onto one another (the theorem of corresponding states). [9]

By varying the energy function and the bead sequence of the chain (the primary structure), effects on the native state structure and the kinetics of folding can be explored, and this may provide insights into the folding of real proteins. [10] Some of the examples include study of folding processes in lattice proteins that have been discussed to resemble the two-phase folding kinetics in proteins. Lattice protein was shown to have quickly collapsed into compact state and followed by slow subsequent structure rearrangement into native state. [11] Attempts to resolve Levinthal paradox in protein folding are another efforts made in the field. As an example, study conducted by Fiebig and Dill examined searching method involving constraints in forming residue contacts in lattice protein to provide insights to the question of how a protein finds its native structure without global exhaustive searching. [12] Lattice protein models have also been used to investigate the energy landscapes of proteins, i.e. the variation of their internal free energy as a function of conformation.[ citation needed ]

Lattices

A lattice is a set of orderly points that are connected by "edges". [2] These points are called vertices and are connected to a certain number other vertices in the lattice by edges. The number of vertices each individual vertex is connected to is called the coordination number of the lattice, and it can be scaled up or down by changing the shape or dimension (2-dimensional to 3-dimensional, for example) of the lattice. [2] This number is important in shaping the characteristics of the lattice protein because it controls the number of other residues allowed to be adjacent to a given residue. [2] It has been shown that for most proteins the coordination number of the lattice used should fall between 3 and 20, although most commonly used lattices have coordination numbers at the lower end of this range. [2]

Lattice shape is an important factor in the accuracy of lattice protein models. Changing lattice shape can dramatically alter the shape of the energetically favorable conformations. [2] It can also add unrealistic constraints to the protein structure such as in the case of the parity problem where in square and cubic lattices residues of the same parity (odd or even numbered) cannot make hydrophobic contact. [5] It has also been reported that triangular lattices yield more accurate structures than other lattice shapes when compared to crystallographic data. [2] To combat the parity problem, several researchers have suggested using triangular lattices when possible, as well as a square matrix with diagonals for theoretical applications where the square matrix may be more appropriate. [5] Hexagonal lattices were introduced to alleviate sharp turns of adjacent residues in triangular lattices. [13] Hexagonal lattices with diagonals have also been suggested as a way to combat the parity problem. [2]

Hydrophobic-polar model

A schematic of a thermodynamically stable conformation of a generic polypeptide. Note the high number of hydrophobic contacts. amino acid residues are represented as dots along the white line. Hydrophobic residues are in green while polar residues are in blue. See also this example in LabbyFold Thermodynamically Stable Lattice Protein.jpg
A schematic of a thermodynamically stable conformation of a generic polypeptide. Note the high number of hydrophobic contacts. amino acid residues are represented as dots along the white line. Hydrophobic residues are in green while polar residues are in blue. See also this example in LabbyFold
A schematic of a thermodynamically unstable conformation of a generic polypeptide. Note the lower number of hydrophobic contacts than above. Hydrophobic residues are in green and polar residues are in blue. See also this example in LabbyFold Thermodynamically Unstable Lattice Protein.jpg
A schematic of a thermodynamically unstable conformation of a generic polypeptide. Note the lower number of hydrophobic contacts than above. Hydrophobic residues are in green and polar residues are in blue. See also this example in LabbyFold

The hydrophobic-polar protein model is the original lattice protein model. It was first proposed by Dill et al. in 1985 as a way to overcome the significant cost and difficulty of predicting protein structure, using only the hydrophobicity of the amino acids in the protein to predict the protein structure. [5] It is considered to be the paradigmatic lattice protein model. [2] The method was able to quickly give an estimate of protein structure by representing proteins as "short chains on a 2D square lattice" and has since become known as the hydrophobic-polar model. It breaks the protein folding problem into three separate problems: modeling the protein conformation, defining the energetic properties of the amino acids as they interact with one another to find said conformation, and developing an efficient algorithm for the prediction of these conformations. It is done by classifying amino acids in the protein as either hydrophobic or polar and assuming that the protein is being folded in an aqueous environment. The lattice statistical model seeks to recreate protein folding by minimizing the free energy of the contacts between hydrophobic amino acids. Hydrophobic amino acid residues are predicted to group around each other, while hydrophilic residues interact with the surrounding water. [5]

Different lattice types and algorithms were used to study protein folding with HP model. Efforts were made to obtain higher approximation ratios using approximation algorithms in 2 dimensional and 3 dimensional, square and triangular lattices. Alternative to approximation algorithms, some genetic algorithms were also exploited with square, triangular, and face-centered-cubic lattices. [14]

Problems and alternative models

The simplicity of the hydrophobic-polar model has caused it to have several problems that people have attempted to correct with alternative lattice protein models. [5] Chief among these problems is the issue of degeneracy, which is when there is more than one minimum energy conformation for the modeled protein, leading to uncertainty about which conformation is the native one. Attempts to address this include the HPNX model which classifies amino acids as hydrophobic (H), positive (P), negative (N), or neutral (X) according to the charge of the amino acid, [15] adding additional parameters to reduce the number of low energy conformations and allowing for more realistic protein simulations. [5] Another model is the Crippen model which uses protein characteristics taken from crystal structures to inform the choice of native conformation. [16]

Another issue with lattice models is that they generally don't take into account the space taken up by amino acid side chains, instead considering only the α-carbon. [2] The side chain model addresses this by adding a side chain to the vertex adjacent to the α-carbon. [17]

Related Research Articles

<span class="mw-page-title-main">Alpha helix</span> Type of secondary structure of proteins

An alpha helix is a sequence of amino acids in a protein that are twisted into a coil.

<span class="mw-page-title-main">Beta sheet</span> Protein structural motif

The beta sheet is a common motif of the regular protein secondary structure. Beta sheets consist of beta strands (β-strands) connected laterally by at least two or three backbone hydrogen bonds, forming a generally twisted, pleated sheet. A β-strand is a stretch of polypeptide chain typically 3 to 10 amino acids long with backbone in an extended conformation. The supramolecular association of β-sheets has been implicated in the formation of the fibrils and protein aggregates observed in amyloidosis, Alzheimer's disease and other proteinopathies.

<span class="mw-page-title-main">Protein secondary structure</span> General three-dimensional form of local segments of proteins

Protein secondary structure is the local spatial conformation of the polypeptide backbone excluding the side chains. The two most common secondary structural elements are alpha helices and beta sheets, though beta turns and omega loops occur as well. Secondary structure elements typically spontaneously form as an intermediate before the protein folds into its three dimensional tertiary structure.

<span class="mw-page-title-main">Protein tertiary structure</span> Three dimensional shape of a protein

Protein tertiary structure is the three-dimensional shape of a protein. The tertiary structure will have a single polypeptide chain "backbone" with one or more protein secondary structures, the protein domains. Amino acid side chains and the backbone may interact and bond in a number of ways. The interactions and bonds of side chains within a particular protein determine its tertiary structure. The protein tertiary structure is defined by its atomic coordinates. These coordinates may refer either to a protein domain or to the entire tertiary structure. A number of these structures may bind to each other, forming a quaternary structure.

<span class="mw-page-title-main">Protein folding</span> Change of a linear protein chain to a 3D structure

Protein folding is the physical process by which a protein, after synthesis by a ribosome as a linear chain of amino acids, changes from an unstable random coil into a more ordered three-dimensional structure. This structure permits the protein to become biologically functional.

<span class="mw-page-title-main">Active site</span> Active region of an enzyme

In biology and biochemistry, the active site is the region of an enzyme where substrate molecules bind and undergo a chemical reaction. The active site consists of amino acid residues that form temporary bonds with the substrate, the binding site, and residues that catalyse a reaction of that substrate, the catalytic site. Although the active site occupies only ~10–20% of the volume of an enzyme, it is the most important part as it directly catalyzes the chemical reaction. It usually consists of three to four amino acids, while other amino acids within the protein are required to maintain the tertiary structure of the enzymes.

<span class="mw-page-title-main">Protein structure prediction</span> Type of biological prediction

Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is different from the inverse problem of protein design. Protein structure prediction is one of the most important goals pursued by computational biology; it is important in medicine and biotechnology.

<span class="mw-page-title-main">Structural alignment</span> Aligning molecular sequences using sequence and structural information

Structural alignment attempts to establish homology between two or more polymer structures based on their shape and three-dimensional conformation. This process is usually applied to protein tertiary structures but can also be used for large RNA molecules. In contrast to simple structural superposition, where at least some equivalent residues of the two structures are known, structural alignment requires no a priori knowledge of equivalent positions. Structural alignment is a valuable tool for the comparison of proteins with low sequence similarity, where evolutionary relationships between proteins cannot be easily detected by standard sequence alignment techniques. Structural alignment can therefore be used to imply evolutionary relationships between proteins that share very little common sequence. However, caution should be used in using the results as evidence for shared evolutionary ancestry because of the possible confounding effects of convergent evolution by which multiple unrelated amino acid sequences converge on a common tertiary structure.

<span class="mw-page-title-main">Protein structure</span> Three-dimensional arrangement of atoms in an amino acid-chain molecule

Protein structure is the three-dimensional arrangement of atoms in an amino acid-chain molecule. Proteins are polymers – specifically polypeptides – formed from sequences of amino acids, which are the monomers of the polymer. A single amino acid monomer may also be called a residue, which indicates a repeating unit of a polymer. Proteins form by amino acids undergoing condensation reactions, in which the amino acids lose one water molecule per reaction in order to attach to one another with a peptide bond. By convention, a chain under 30 amino acids is often identified as a peptide, rather than a protein. To be able to perform their biological function, proteins fold into one or more specific spatial conformations driven by a number of non-covalent interactions, such as hydrogen bonding, ionic interactions, Van der Waals forces, and hydrophobic packing. To understand the functions of proteins at a molecular level, it is often necessary to determine their three-dimensional structure. This is the topic of the scientific field of structural biology, which employs techniques such as X-ray crystallography, NMR spectroscopy, cryo-electron microscopy (cryo-EM) and dual polarisation interferometry, to determine the structure of proteins.

Protein design is the rational design of new protein molecules to design novel activity, behavior, or purpose, and to advance basic understanding of protein function. Proteins can be designed from scratch or by making calculated variants of a known protein structure and its sequence. Rational protein design approaches make protein-sequence predictions that will fold to specific structures. These predicted sequences can then be validated experimentally through methods such as peptide synthesis, site-directed mutagenesis, or artificial gene synthesis.

<span class="mw-page-title-main">Protein contact map</span>

A protein contact map represents the distance between all possible amino acid residue pairs of a three-dimensional protein structure using a binary two-dimensional matrix. For two residues and , the element of the matrix is 1 if the two residues are closer than a predetermined threshold, and 0 otherwise. Various contact definitions have been proposed: The distance between the Cα-Cα atom with threshold 6-12 Å; distance between Cβ-Cβ atoms with threshold 6-12 Å ; and distance between the side-chain centers of mass.

<span class="mw-page-title-main">Beta barrel</span>

In protein structures, a beta barrel(β barrel) is a beta sheet composed of tandem repeats that twists and coils to form a closed toroidal structure in which the first strand is bonded to the last strand. Beta-strands in many beta-barrels are arranged in an antiparallel fashion. Beta barrel structures are named for resemblance to the barrels used to contain liquids. Most of them are water-soluble outer membrane proteins and frequently bind hydrophobic ligands in the barrel center, as in lipocalins. Others span cell membranes and are commonly found in porins. Porin-like barrel structures are encoded by as many as 2–3% of the genes in Gram-negative bacteria. It has been shown that more than 600 proteins with various function such as oxidase, dismutase, and amylase contain the beta barrel structure.

<span class="mw-page-title-main">Folding funnel</span>

The folding funnel hypothesis is a specific version of the energy landscape theory of protein folding, which assumes that a protein's native state corresponds to its free energy minimum under the solution conditions usually encountered in cells. Although energy landscapes may be "rough", with many non-native local minima in which partially folded proteins can become trapped, the folding funnel hypothesis assumes that the native state is a deep free energy minimum with steep walls, corresponding to a single well-defined tertiary structure. The term was introduced by Ken A. Dill in a 1987 article discussing the stabilities of globular proteins.

<span class="mw-page-title-main">Hydrophobic collapse</span> Process in protein folding

Hydrophobic collapse is a proposed process for the production of the 3-D conformation adopted by polypeptides and other molecules in polar solvents. The theory states that the nascent polypeptide forms initial secondary structure creating localized regions of predominantly hydrophobic residues. The polypeptide interacts with water, thus placing thermodynamic pressures on these regions which then aggregate or "collapse" into a tertiary conformation with a hydrophobic core. Incidentally, polar residues interact favourably with water, thus the solvent-facing surface of the peptide is usually composed of predominantly hydrophilic regions.

In computational biology, de novo protein structure prediction refers to an algorithmic process by which protein tertiary structure is predicted from its amino acid primary sequence. The problem itself has occupied leading scientists for decades while still remaining unsolved. According to Science, the problem remains one of the top 125 outstanding issues in modern science. At present, some of the most successful methods have a reasonable probability of predicting the folds of small, single-domain proteins within 1.5 angstroms over the entire structure.

<span class="mw-page-title-main">Protein domain</span> Self-stable region of a proteins chain that folds independently from the rest

In molecular biology, a protein domain is a region of a protein's polypeptide chain that is self-stabilizing and that folds independently from the rest. Each domain forms a compact folded three-dimensional structure. Many proteins consist of several domains, and a domain may appear in a variety of different proteins. Molecular evolution uses domains as building blocks and these may be recombined in different arrangements to create proteins with different functions. In general, domains vary in length from between about 50 amino acids up to 250 amino acids in length. The shortest domains, such as zinc fingers, are stabilized by metal ions or disulfide bridges. Domains often form functional units, such as the calcium-binding EF hand domain of calmodulin. Because they are independently stable, domains can be "swapped" by genetic engineering between one protein and another to make chimeric proteins.

The hydrophobic-polar protein folding model is a highly simplified model for examining protein folds in space. First proposed by Ken Dill in 1985, it is the most known type of lattice protein: it stems from the observation that hydrophobic interactions between amino acid residues are the driving force for proteins folding into their native state. All amino acid types are classified as either hydrophobic (H) or polar (P), and the folding of a protein sequence is defined as a self-avoiding walk in a 2D or 3D lattice. The HP model imitates the hydrophobic effect by assigning a negative (favorable) weight to interactions between adjacent, non-covalently bound H residues. Proteins that have minimum energy are assumed to be in their native state.

Hydrophobicity scales are values that define the relative hydrophobicity or hydrophilicity of amino acid residues. The more positive the value, the more hydrophobic are the amino acids located in that region of the protein. These scales are commonly used to predict the transmembrane alpha-helices of membrane proteins. When consecutively measuring amino acids of a protein, changes in value indicate attraction of specific protein regions towards the hydrophobic region inside lipid bilayer.

FoldX is a protein design algorithm that uses an empirical force field. It can determine the energetic effect of point mutations as well as the interaction energy of protein complexes. FoldX can mutate protein and DNA side chains using a probability-based rotamer library, while exploring alternative conformations of the surrounding side chains.

<span class="mw-page-title-main">Protein tandem repeats</span>

An array of protein tandem repeats is defined as several adjacent copies having the same or similar sequence motifs. These periodic sequences are generated by internal duplications in both coding and non-coding genomic sequences. Repetitive units of protein tandem repeats are considerably diverse, ranging from the repetition of a single amino acid to domains of 100 or more residues.

References

  1. 1 2 Lau KF, Dill KA (1989). "A lattice statistical mechanics model of the conformational and sequence spaces of proteins". Macromolecules. 22 (10): 3986–97. Bibcode:1989MaMol..22.3986L. doi:10.1021/ma00200a030.
  2. 1 2 3 4 5 6 7 8 9 10 11 12 13 Bechini A (2013). "On the characterization and software implementation of general protein lattice models". PLOS ONE. 8 (3): e59504. Bibcode:2013PLoSO...859504B. doi: 10.1371/journal.pone.0059504 . PMC   3612044 . PMID   23555684.
  3. Voelz VA, Bowman GR, Beauchamp K, Pande VS (February 2010). "Molecular simulation of ab initio protein folding for a millisecond folder NTL9(1-39)". Journal of the American Chemical Society. 132 (5): 1526–8. doi:10.1021/ja9090353. PMC   2835335 . PMID   20070076.
  4. Berger B, Leighton T (1998). "Protein folding in the hydrophobic-hydrophilic (HP) model is NP-complete". Journal of Computational Biology. 5 (1): 27–40. doi:10.1089/cmb.1998.5.27. PMID   9541869.
  5. 1 2 3 4 5 6 7 8 9 Dubey SP, Kini NG, Balaji S, Kumar MS (2018). "A Review of Protein Structure Prediction Using Lattice Model". Critical Reviews in Biomedical Engineering. 46 (2): 147–162. doi:10.1615/critrevbiomedeng.2018026093. PMID   30055531.
  6. Dill KA (March 1985). "Theory for the folding and stability of globular proteins". Biochemistry. 24 (6): 1501–9. doi:10.1021/bi00327a032. PMID   3986190.
  7. Su SC, Lin CJ, Ting CK (December 2010). "An efficient hybrid of hill-climbing and genetic algorithm for 2D triangular protein structure prediction". 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW). IEEE. pp. 51–56. doi:10.1109/BIBMW.2010.5703772. ISBN   978-1-4244-8303-7. S2CID   44932436.
  8. Bertram, Jason; Masel, Joanna (April 2020). "Evolution Rapidly Optimizes Stability and Aggregation in Lattice Proteins Despite Pervasive Landscape Valleys and Mazes". Genetics. 214 (4): 1047–1057. doi: 10.1534/genetics.120.302815 . PMC   7153934 . PMID   32107278.
  9. Onuchic JN, Wolynes PG, Luthey-Schulten Z, Socci ND (April 1995). "Toward an outline of the topography of a realistic protein-folding funnel". Proceedings of the National Academy of Sciences of the United States of America. 92 (8): 3626–30. Bibcode:1995PNAS...92.3626O. doi: 10.1073/pnas.92.8.3626 . PMC   42220 . PMID   7724609.
  10. Moreno-Hernández S, Levitt M (June 2012). "Comparative modeling and protein-like features of hydrophobic-polar models on a two-dimensional lattice". Proteins. 80 (6): 1683–93. doi:10.1002/prot.24067. PMC   3348970 . PMID   22411636.
  11. Socci ND, Onuchic JN (1994-07-15). "Folding kinetics of proteinlike heteropolymers". The Journal of Chemical Physics. 101 (2): 1519–1528. arXiv: cond-mat/9404001 . Bibcode:1994JChPh.101.1519S. doi:10.1063/1.467775. ISSN   0021-9606. S2CID   10672674.
  12. Fiebig KM, Dill KA (1993-02-15). "Protein core assembly processes". The Journal of Chemical Physics. 98 (4): 3475–3487. Bibcode:1993JChPh..98.3475F. doi:10.1063/1.464068.
  13. Jiang M, Zhu B (February 2005). "Protein folding on the hexagonal lattice in the HP model". Journal of Bioinformatics and Computational Biology. 3 (1): 19–34. doi:10.1142/S0219720005000850. PMID   15751110.
  14. Shaw D, Shohidull Islam AS, Sohel Rahman M, Hasan M (2014-01-24). "Protein folding in HP model on hexagonal lattices with diagonals". BMC Bioinformatics. 15 Suppl 2 (2): S7. doi: 10.1186/1471-2105-15-S2-S7 . PMC   4016602 . PMID   24564789.
  15. Backofen R, Will S, Bornberg-Bauer E (March 1999). "Application of constraint programming techniques for structure prediction of lattice proteins with extended alphabets". Bioinformatics. 15 (3): 234–42. doi: 10.1093/bioinformatics/15.3.234 . PMID   10222411.
  16. Crippen GM (April 1991). "Prediction of protein folding from amino acid sequence over discrete conformation spaces". Biochemistry. 30 (17): 4232–7. doi:10.1021/bi00231a018. PMID   2021616.
  17. Dill KA, Bromberg S, Yue K, Fiebig KM, Yee DP, Thomas PD, Chan HS (April 1995). "Principles of protein folding--a perspective from simple exact models". Protein Science. 4 (4): 561–602. doi:10.1002/pro.5560040401. PMC   2143098 . PMID   7613459.