Adaptive sampling

Last updated

Adaptive sampling is a technique used in computational molecular biology to efficiently simulate protein folding.

Molecular biology Branch of biology dealing with biological activitys molecular basis

Molecular biology is a branch of biology that concerns the molecular basis of biological activity between biomolecules in the various systems of a cell, including the interactions between DNA, RNA, proteins and their biosynthesis, as well as the regulation of these interactions.

Protein folding the process of assisting in the covalent and noncovalent assembly of single chain polypeptides or multisubunit complexes into the correct tertiary structure

Protein folding is the physical process by which a protein chain acquires its native 3-dimensional structure, a conformation that is usually biologically functional, in an expeditious and reproducible manner. It is the physical process by which a polypeptide folds into its characteristic and functional three-dimensional structure from a random coil. Each protein exists as an unfolded polypeptide or random coil when translated from a sequence of mRNA to a linear chain of amino acids. This polypeptide lacks any stable (long-lasting) three-dimensional structure. As the polypeptide chain is being synthesized by a ribosome, the linear chain begins to fold into its three-dimensional structure. Folding begins to occur even during translation of the polypeptide chain. Amino acids interact with each other to produce a well-defined three-dimensional structure, the folded protein, known as the native state. The resulting three-dimensional structure is determined by the amino acid sequence or primary structure.

Contents

Background

Proteins spend a large portion – nearly 96% in some cases [1] – of their folding time "waiting" in various thermodynamic free energy minima. Consequently, a straightforward simulation of this process would spend a great deal of computation to this state, with the transitions between the states – the aspects of protein folding of greater scientific interest – taking place only rarely. [2] Adaptive sampling exploits this property to simulate the protein's phase space in between these states. Using adaptive sampling, molecular simulations that previously would have taken decades can be performed in a matter of weeks. [3]

Thermodynamic free energy

The thermodynamic free energy is a concept useful in the thermodynamics of chemical or thermal processes in engineering and science. The change in the free energy is the maximum amount of work that a thermodynamic system can perform in a process at constant temperature, and its sign indicates whether a process is thermodynamically favorable or forbidden. Since free energy usually contains potential energy, it is not absolute but depends on the choice of a zero point. Therefore, only relative free energy values, or changes in free energy, are physically meaningful.

Phase space Mathematical construction for dynamical systems

In dynamical system theory, a phase space is a space in which all possible states of a system are represented, with each possible state corresponding to one unique point in the phase space. For mechanical systems, the phase space usually consists of all possible values of position and momentum variables. The concept of phase space was developed in the late 19th century by Ludwig Boltzmann, Henri Poincaré, and Josiah Willard Gibbs.

Theory

If a protein folds through the metastable states A -> B -> C, researchers can calculate the length of the transition time between A and C by simulating the A -> B transition and the B -> C transition. The protein may fold through alternative routes which may overlap in part with the A -> B -> C pathway. Decomposing the problem in this manner is efficient because each step can be simulated in parallel. [3]

Applications

Adaptive sampling is used by the Folding@home distributed computing project in combination with Markov state models. [2] [3]

Folding@home is a distributed computing project for disease research that simulates protein folding, computational drug design, and other types of molecular dynamics. The project uses the idle processing resources of thousands of personal computers owned by volunteers who have installed the software on their systems. Its main purpose is to determine the mechanisms of protein folding, which is the process by which proteins reach their final three-dimensional structure, and to examine the causes of protein misfolding. This is of significant academic interest with major implications for medical research into Alzheimer's disease, Huntington's disease, and many forms of cancer, among other diseases. To a lesser extent, Folding@home also tries to predict a protein's final structure and determine how other molecules may interact with it, which has applications in drug design. Folding@home is developed and operated by the Pande Laboratory at Stanford University, under the direction of Prof. Vijay Pande, and is shared by various scientific institutions and research laboratories across the world.

Disadvantages

While adaptive sampling is useful for short simulations, longer trajectories may be more helpful for certain types of biochemical problems. [4] [5]

See also

Computational biology involves the development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, ecological, behavioral, and social systems. The field is broadly defined and includes foundations in biology, applied mathematics, statistics, biochemistry, chemistry, biophysics, molecular biology, genetics, genomics, computer science and evolution.

Related Research Articles

Molecular dynamics Computer simulations to discover and understand chemical properties

Molecular dynamics (MD) is a computer simulation method for studying the physical movements of atoms and molecules. The atoms and molecules are allowed to interact for a fixed period of time, giving a view of the dynamic evolution of the system. In the most common version, the trajectories of atoms and molecules are determined by numerically solving Newton's equations of motion for a system of interacting particles, where forces between the particles and their potential energies are often calculated using interatomic potentials or molecular mechanics force fields. The method was originally developed within the field of theoretical physics in the late 1950s but is applied today mostly in chemical physics, materials science and the modelling of biomolecules.

In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain a sample of the desired distribution by recording states from the chain. The more steps that are included, the more closely the distribution of the sample matches the actual desired distribution. Various algorithms exist for constructing the Markov chain including the Metropolis–Hastings algorithm.

GROningen MAchine for Chemical Simulations (GROMACS) is a molecular dynamics package mainly designed for simulations of proteins, lipids, and nucleic acids. It was originally developed in the Biophysical Chemistry department of University of Groningen, and is now maintained by contributors in universities and research centers worldwide. GROMACS is one of the fastest and most popular software packages available, and can run on central processing units (CPUs) and graphics processing units (GPUs). It is free, open-source software released under the GNU General Public License (GPL), and starting with version 4.6, the GNU Lesser General Public License (LGPL).

Protein structure three-dimensional arrangement of atoms in an amino acid-chain molecule

Protein structure is the three-dimensional arrangement of atoms in an amino acid-chain molecule. Proteins are polymers – specifically polypeptides – formed from sequences of amino acids, the monomers of the polymer. A single amino acid monomer may also be called a residue indicating a repeating unit of a polymer. Proteins form by amino acids undergoing condensation reactions, in which the amino acids lose one water molecule per reaction in order to attach to one another with a peptide bond. By convention, a chain under 30 amino acids is often identified as a peptide, rather than a protein. To be able to perform their biological function, proteins fold into one or more specific spatial conformations driven by a number of non-covalent interactions such as hydrogen bonding, ionic interactions, Van der Waals forces, and hydrophobic packing. To understand the functions of proteins at a molecular level, it is often necessary to determine their three-dimensional structure. This is the topic of the scientific field of structural biology, which employs techniques such as X-ray crystallography, NMR spectroscopy, and dual polarisation interferometry to determine the structure of proteins.

Vijay S. Pande American scientist

Vijay Satyanand Pande is a Trinidadian-American venture capitalist and an adjunct professor of bioengineering at Stanford University. Pande is the former director of the biophysics program and is best known for orchestrating the distributed computing disease research project known as Folding@home. His research is focused on distributed computing and computer-modelling of microbiology. His research focuses on improving computer simulations regarding drug-binding, protein design, and synthetic bio-mimetic polymers. Pande became the ninth general partner at venture capital firm Andreessen Horowitz in November 2015.

<i>In silico</i> Latin phrase

In silico is an expression meaning "performed on computer or via computer simulation" in reference to biological experiments. The phrase was coined in 1989 as an allusion to the Latin phrases in vivo, in vitro, and in situ, which are commonly used in biology and refer to experiments done in living organisms, outside living organisms, and where they are found in nature, respectively.

Protein design is the rational design of new protein molecules to design novel activity, behavior, or purpose, and to advance basic understanding of protein function. Proteins can be designed from scratch or by making calculated variants of a known protein structure and its sequence. Rational protein design approaches make protein-sequence predictions that will fold to specific structures. These predicted sequences can then be validated experimentally through methods such as peptide synthesis, site-directed mutagenesis, or artificial gene synthesis.

Intrinsically disordered proteins

An intrinsically disordered protein (IDP) is a protein that lacks a fixed or ordered three-dimensional structure. IDPs cover a spectrum of states from fully unstructured to partially structured and include random coils, (pre-)molten globules, and large multi-domain proteins connected by flexible linkers. They constitute one of the main types of protein.

Michael Levitt biophysicist and Professor of Structural biology

Michael Levitt, is an American-British-Israeli biophysicist and a professor of structural biology at Stanford University, a position he has held since 1987. Levitt received the 2013 Nobel Prize in Chemistry, together with Martin Karplus and Arieh Warshel, for "the development of multiscale models for complex chemical systems".

D. E. Shaw Research (DESRES) is a privately held biochemistry research company based in New York City. Under the scientific direction of David E. Shaw, the group's chief scientist, D. E. Shaw Research develops technologies for molecular dynamics simulations and applies such simulations to basic scientific research in structural biology and biochemistry, and to the process of computer-aided drug design.

Anton (computer) supercomputer designed and built by D. E. Shaw Research

Anton is a massively parallel supercomputer designed and built by D. E. Shaw Research in New York, first running in 2008. It is a special-purpose system for molecular dynamics (MD) simulations of proteins and other biological macromolecules. An Anton machine consists of a substantial number of application-specific integrated circuits (ASICs), interconnected by a specialized high-speed, three-dimensional torus network.

Desmond is a software package developed at D. E. Shaw Research to perform high-speed molecular dynamics simulations of biological systems on conventional computer clusters. The code uses novel parallel algorithms and numerical methods to achieve high performance on platforms containing multiple processors, but may also be executed on a single computer.

Stochastic roadmap simulation is inspired by probabilistic roadmap methods (PRM) developed for robot motion planning.

Multi-state modeling of biomolecules refers to a series of techniques used to represent and compute the behaviour of biological molecules or complexes that can adopt a large number of possible functional states.

Jane Clarke (scientist) Professor of Biophysics at the University of Cambridge

Jane Clarke is a British biochemist and academic. Since October 2017, she has served as President of Wolfson College, Cambridge. She is also Professor of Molecular Biophysics, a Wellcome Trust Senior Research Fellow in the Department of Chemistry at the University of Cambridge. She was previously a Fellow of Trinity Hall, Cambridge.

Coarse-grained modeling, coarse-grained models, aim at simulating the behaviour of complex systems using their coarse-grained (simplified) representation. Coarse-grained models are widely used for molecular modeling of biomolecules at various granularity levels. A wide range of coarse-grained models have been proposed. They are usually dedicated to computational modeling of specific molecules: proteins, nucleic acids, lipid membranes, carbohydrates or water. In these models, molecules are represented not by individual atoms, but by "pseudo-atoms" approximating groups of atoms, such as whole amino acid residue. By decreasing the degrees of freedom much longer simulation times can be studied at the expense of molecular detail. Coarse-grained models have found practical applications in molecular dynamics simulations.

References

  1. Robert B Best (2012). "Atomistic molecular simulations of protein folding". Current Opinion in Structural Biology (review). 22 (1): 52–61. doi:10.1016/j.sbi.2011.12.001. PMID   22257762.
  2. 1 2 TJ Lane; Gregory Bowman; Robert McGibbon; Christian Schwantes; Vijay Pande; Bruce Borden (September 10, 2012). "Folding@home Simulation FAQ". Folding@home. Stanford University. Archived from the original on 2012-09-21. Retrieved September 10, 2012.
  3. 1 2 3 G. Bowman; V. Volez; V. S. Pande (2011). "Taming the complexity of protein folding". Current Opinion in Structural Biology. 21 (1): 4–11. doi:10.1016/j.sbi.2010.10.006. PMC   3042729 . PMID   21081274.
  4. David E. Shaw; Martin M. Deneroff; Ron O. Dror; Jeffrey S. Kuskin; Richard H. Larson; John K. Salmon; Cliff Young; Brannon Batson; Kevin J. Bowers; Jack C. Chao; Michael P. Eastwood; Joseph Gagliardo; J. P. Grossman; C. Richard Ho; Douglas J. Ierardi, Ist (2008). "Anton, A Special-Purpose Machine for Molecular Dynamics Simulation". Communications of the ACM. 51 (7): 91–97. doi:10.1145/1364782.1364802.
  5. Ron O. Dror; Robert M. Dirks; J.P. Grossman; Huafeng Xu; David E. Shaw (2012). "Biomolecular Simulation: A Computational Microscope for Molecular Biology". Annual Review of Biophysics . 41: 429–52. doi:10.1146/annurev-biophys-042910-155245. PMID   22577825.