This article needs additional citations for verification .(January 2024) |
In protein folding, a native contact is a contact between the side chains of two amino acids that are not neighboring in the amino acid sequence (i.e., they are more than four residues apart in the primary sequence in order to remove trivial i to i+4 contacts along alpha helices) but are spatially close in the protein's native state tertiary structure. [1] [2] The fraction of native contacts reproduced in a particular structure is often used as a reaction coordinate for measuring the deviation from the native state of structures produced during molecular dynamics simulations [3] or in benchmarks of protein structure prediction methods. [4]
The contact order is a measure of the locality of a protein's native contacts; [5] that is, the sequence distance between amino acids that form contacts. Proteins with low contact order are thought to fold faster [5] [6] and some may be candidates for downhill folding.
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, responding to stimuli, providing structure to cells and organisms, and transporting molecules from one location to another. Proteins differ from one another primarily in their sequence of amino acids, which is dictated by the nucleotide sequence of their genes, and which usually results in protein folding into a specific 3D structure that determines its activity.
Protein secondary structure is the local spatial conformation of the polypeptide backbone excluding the side chains. The two most common secondary structural elements are alpha helices and beta sheets, though beta turns and omega loops occur as well. Secondary structure elements typically spontaneously form as an intermediate before the protein folds into its three dimensional tertiary structure.
Protein tertiary structure is the three-dimensional shape of a protein. The tertiary structure will have a single polypeptide chain "backbone" with one or more protein secondary structures, the protein domains. Amino acid side chains may interact and bond in a number of ways. The interactions and bonds of side chains within a particular protein determine its tertiary structure. The protein tertiary structure is defined by its atomic coordinates. These coordinates may refer either to a protein domain or to the entire tertiary structure. A number of tertiary structures may fold into a quaternary structure.
Protein folding is the physical process in which a polypeptide is synthesized by a ribosome from an unstable, random coil into a linear chain of amino acids, resulting in protein's three-dimensional structure. This is typically a 'folded' conformation, by which the protein becomes biologically functional.
Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is different from the inverse problem of protein design. Protein structure prediction is one of the most important goals pursued by computational biology; and it is important in medicine and biotechnology.
Structural bioinformatics is the branch of bioinformatics that is related to the analysis and prediction of the three-dimensional structure of biological macromolecules such as proteins, RNA, and DNA. It deals with generalizations about macromolecular 3D structures such as comparisons of overall folds and local motifs, principles of molecular folding, evolution, binding interactions, and structure/function relationships, working both from experimentally solved structures and from computational models. The term structural has the same meaning as in structural biology, and structural bioinformatics can be seen as a part of computational structural biology. The main objective of structural bioinformatics is the creation of new methods of analysing and manipulating biological macromolecular data in order to solve problems in biology and generate new knowledge.
Protein structure is the three-dimensional arrangement of atoms in an amino acid-chain molecule. Proteins are polymers – specifically polypeptides – formed from sequences of amino acids, which are the monomers of the polymer. A single amino acid monomer may also be called a residue, which indicates a repeating unit of a polymer. Proteins form by amino acids undergoing condensation reactions, in which the amino acids lose one water molecule per reaction in order to attach to one another with a peptide bond. By convention, a chain under 30 amino acids is often identified as a peptide, rather than a protein. To be able to perform their biological function, proteins fold into one or more specific spatial conformations driven by a number of non-covalent interactions, such as hydrogen bonding, ionic interactions, Van der Waals forces, and hydrophobic packing. To understand the functions of proteins at a molecular level, it is often necessary to determine their three-dimensional structure. This is the topic of the scientific field of structural biology, which employs techniques such as X-ray crystallography, NMR spectroscopy, cryo-electron microscopy (cryo-EM) and dual polarisation interferometry, to determine the structure of proteins.
Protein design is the rational design of new protein molecules to design novel activity, behavior, or purpose, and to advance basic understanding of protein function. Proteins can be designed from scratch or by making calculated variants of a known protein structure and its sequence. Rational protein design approaches make protein-sequence predictions that will fold to specific structures. These predicted sequences can then be validated experimentally through methods such as peptide synthesis, site-directed mutagenesis, or artificial gene synthesis.
Lattice proteins are highly simplified models of protein-like heteropolymer chains on lattice conformational space which are used to investigate protein folding. Simplification in lattice proteins is twofold: each whole residue is modeled as a single "bead" or "point" of a finite set of types, and each residue is restricted to be placed on vertices of a lattice. To guarantee the connectivity of the protein chain, adjacent residues on the backbone must be placed on adjacent vertices of the lattice. Steric constraints are expressed by imposing that no more than one residue can be placed on the same lattice vertex.
In molecular biology, an intrinsically disordered protein (IDP) is a protein that lacks a fixed or ordered three-dimensional structure, typically in the absence of its macromolecular interaction partners, such as other proteins or RNA. IDPs range from fully unstructured to partially structured and include random coil, molten globule-like aggregates, or flexible linkers in large multi-domain proteins. They are sometimes considered as a separate class of proteins along with globular, fibrous and membrane proteins.
The folding funnel hypothesis is a specific version of the energy landscape theory of protein folding, which assumes that a protein's native state corresponds to its free energy minimum under the solution conditions usually encountered in cells. Although energy landscapes may be "rough", with many non-native local minima in which partially folded proteins can become trapped, the folding funnel hypothesis assumes that the native state is a deep free energy minimum with steep walls, corresponding to a single well-defined tertiary structure. The term was introduced by Ken A. Dill in a 1987 article discussing the stabilities of globular proteins.
Hydrophobic collapse is a proposed process for the production of the 3-D conformation adopted by polypeptides and other molecules in polar solvents. The theory states that the nascent polypeptide forms initial secondary structure creating localized regions of predominantly hydrophobic residues. The polypeptide interacts with water, thus placing thermodynamic pressures on these regions which then aggregate or "collapse" into a tertiary conformation with a hydrophobic core. Incidentally, polar residues interact favourably with water, thus the solvent-facing surface of the peptide is usually composed of predominantly hydrophilic regions.
Downhill folding is a process in which a protein folds without encountering any significant macroscopic free energy barrier. It is a key prediction of the folding funnel hypothesis of the energy landscape theory of proteins.
The contact order of a protein is a measure of the locality of the inter-amino acid contacts in the protein's native state tertiary structure. It is calculated as the average sequence distance between residues that form native contacts in the folded protein divided by the total length of the protein. Higher contact orders indicate longer folding times, and low contact order has been suggested as a predictor of potential downhill folding, or protein folding that occurs without a free energy barrier. This effect is thought to be due to the lower loss of conformational entropy associated with the formation of local as opposed to nonlocal contacts.
In protein structure prediction, statistical potentials or knowledge-based potentials are scoring functions derived from an analysis of known protein structures in the Protein Data Bank (PDB).
In computational biology, de novo protein structure prediction refers to an algorithmic process by which protein tertiary structure is predicted from its amino acid primary sequence. The problem itself has occupied leading scientists for decades while still remaining unsolved. According to Science, the problem remains one of the top 125 outstanding issues in modern science. At present, some of the most successful methods have a reasonable probability of predicting the folds of small, single-domain proteins within 1.5 angstroms over the entire structure.
Hydrophobicity scales are values that define the relative hydrophobicity or hydrophilicity of amino acid residues. The more positive the value, the more hydrophobic are the amino acids located in that region of the protein. These scales are commonly used to predict the transmembrane alpha-helices of membrane proteins. When consecutively measuring amino acids of a protein, changes in value indicate attraction of specific protein regions towards the hydrophobic region inside lipid bilayer.
I-TASSER is a bioinformatics method for predicting three-dimensional structure model of protein molecules from amino acid sequences. It detects structure templates from the Protein Data Bank by a technique called fold recognition. The full-length structure models are constructed by reassembling structural fragments from threading templates using replica exchange Monte Carlo simulations. I-TASSER is one of the most successful protein structure prediction methods in the community-wide CASP experiments.
Molecular Operating Environment (MOE) is a drug discovery software platform that integrates visualization, modeling and simulations, as well as methodology development, in one package. MOE scientific applications are used by biologists, medicinal chemists and computational chemists in pharmaceutical, biotechnology and academic research. MOE runs on Windows, Linux, Unix, and macOS. Main application areas in MOE include structure-based design, fragment-based design, ligand-based design, pharmacophore discovery, medicinal chemistry applications, biologics applications, structural biology and bioinformatics, protein and antibody modeling, molecular modeling and simulations, virtual screening, cheminformatics & QSAR. The Scientific Vector Language (SVL) is the built-in command, scripting and application development language of MOE.
IntFOLD is fully automated, integrated pipeline for prediction of 3D structure and function from amino acid sequences. The pipeline is wrapped up and deployed as a Web Server. The core of the server method is quality assessment using built-in accuracy self-estimates (ASE) which improves performance prediction of 3D model using ModFOLD.