Self-consistent mean field (biology)

Last updated February 02, 2022

The self-consistent mean field (SCMF) method is an adaptation of mean field theory used in protein structure prediction to determine the optimal amino acid side chain packing given a fixed protein backbone. It is faster but less accurate than dead-end elimination and is generally used in situations where the protein of interest is too large for the problem to be tractable by DEE.

General principles

Like dead-end elimination, the SCMF method explores conformational space by discretizing the dihedral angles of each side chain into a set of rotamers for each position in the protein sequence. The method iteratively develops a probabilistic description of the relative population of each possible rotamer at each position, and the probability of a given structure is defined as a function of the probabilities of its individual rotamer components.

The basic requirements for an effective SCMF implementation are:

A well-defined finite set of discrete independent variables
A precomputed numerical value (considered the "energy") associated with each element in the set of variables, and associated with each binary element pair
An initial probability distribution describing the starting population of each individual rotamer
A way of updating rotamer energies and probabilities as a function of the mean-field energy

The process is generally initialized with a uniform probability distribution over the rotamers—that is, if there are $p$ rotamers at the $kth$ position in the protein, then the probability of any individual rotamer $r_{k}^{A}$ is $1/p$ . The conversion between energies and probabilities is generally accomplished via the Boltzmann distribution, which introduces a temperature factor (thus making the method amenable to simulated annealing). Lower temperatures increase the likelihood of converging to a single solution, rather than to a small subpopulation of solutions.

Mean-field energies

The energy of an individual rotamer $r_{k}$ is dependent on the "mean-field" energy of the other positions—that is, at every other position, each rotamer's energy contribution is proportional to its probability. For a protein of length $N$ with $p$ rotamers per residue, the energy at the current iteration is described by the following expression. Note that for clarity, the mean-field energy at iteration $i$ is denoted by $M_{i}$ , whereas the precomputed energies are denoted by $E$ , and the probability of a given rotamer is denoted by $P_{i}(r_{k}^{A})$ .

M_{i}(r_{k}^{A})=E_{k}(r_{k}^{A})+\sum _{x=1}^{N}\sum _{y=1}^{p}P_{i-1}(r_{x}^{y})E_{xy}(r_{k}^{A},r_{x}^{y})

These mean-field energies are used to update the probabilities through the Boltzmann law:

P_{i}(r_{k}^{A})=\left(\exp \left(-{\frac {M_{i}(r_{k}^{A})}{kT}}\right)\right)\left(\sum _{y=1}^{p}\exp \left(-{\frac {M_{i}(r_{k}^{y})}{kT}}\right)\right)^{-1}

where $k$ is the Boltzmann constant and $T$ is the temperature factor.

Energy of the system

Although computing the system energy is not required in carrying out the SCMF method, it is useful to know the overall energies of the converged results. The system energy $M_{sys}$ consists of two sums:

M_{sys}=M_{single}+M_{pair}

where the addends are defined as:

M_{single}=\sum _{x=1}^{N}\sum _{y=1}^{p}P(r_{x}^{y})E_{x}(r_{x}^{y})

M_{pair}=\sum _{x=1}^{N}\sum _{y=1}^{p}\sum _{a=x+1}^{N}\sum _{b=1}^{p}\left(P(r_{x}^{y})P(r_{a}^{b})E_{xy}(r_{x}^{y},r_{a}^{b})\right)

Convergence

Perfect convergence for the SCMF method would result in a probability of 1 for exactly one rotamer at each position $k$ in the protein, and a probability of zero for all other rotamers at each position. Convergence to a unique solution requires probabilities close to 1 for exactly one rotamer at each position. In practice, especially when higher temperatures are used, the algorithm instead identifies a small number of high-probability rotamers at each position, allowing the resulting conformations' relative energies to then be enumerated (based on the precomputed energies, not on those derived from the mean-field approximation). One way to improve convergence is to run again at a lower temperature using the probabilities calculated from a previous higher-temperature run.

Accuracy

Unlike dead-end elimination, SCMF is not guaranteed to converge on the optimal solution. However, it is deterministic (as in, it will converge to the same solution every time given the same initial conditions), unlike alternatives that rely on Monte Carlo analysis. By comparison to DEE, which is guaranteed to find the optimal solution, SCMF is faster but less accurate overall; it is significantly better at identifying correct side chain conformations in the protein's core than it is on identifying correct surface conformations. Geometric packing constraints are less restrictive on the surface and thus provide fewer boundaries to the conformational search.

Related Research Articles

In physics, the Maxwell–Boltzmann distribution is a particular probability distribution named after James Clerk Maxwell and Ludwig Boltzmann.

Quantum superposition is a fundamental principle of quantum mechanics. It states that, much like waves in classical physics, any two quantum states can be added together ("superposed") and the result will be another valid quantum state; and conversely, that every quantum state can be represented as a sum of two or more other distinct states. Mathematically, it refers to a property of solutions to the Schrödinger equation; since the Schrödinger equation is linear, any linear combination of solutions will also be a solution.

Heat equation Type of partial differential equation

In mathematics and physics, the heat equation is a certain partial differential equation. Solutions of the heat equation are sometimes known as caloric functions. The theory of the heat equation was first developed by Joseph Fourier in 1822 for the purpose of modeling how a quantity such as heat diffuses through a given region.

Polymer physics is the field of physics that studies polymers, their fluctuations, mechanical properties, as well as the kinetics of reactions involving degradation and polymerisation of polymers and monomers respectively.

In quantum statistics, Bose–Einstein (B–E) statistics describes one of two possible ways in which a collection of non-interacting, indistinguishable particles may occupy a set of available discrete energy states at thermodynamic equilibrium. The aggregation of particles in the same state, which is a characteristic of particles obeying Bose–Einstein statistics, accounts for the cohesive streaming of laser light and the frictionless creeping of superfluid helium. The theory of this behaviour was developed (1924–25) by Satyendra Nath Bose, who recognized that a collection of identical and indistinguishable particles can be distributed in this way. The idea was later adopted and extended by Albert Einstein in collaboration with Bose.

The Ising model, , named after the physicists Ernst Ising and Wilhelm Lenz, is a mathematical model of ferromagnetism in statistical mechanics. The model consists of discrete variables that represent magnetic dipole moments of atomic "spins" that can be in one of two states. The spins are arranged in a graph, usually a lattice, allowing each spin to interact with its neighbors. Neighboring spins that agree have a lower energy than those that disagree; the system tends to the lowest energy but heat disturbs this tendency, thus creating the possibility of different structural phases. The model allows the identification of phase transitions as a simplified model of reality. The two-dimensional square-lattice Ising model is one of the simplest statistical models to show a phase transition.

In physics, Wick rotation, named after Italian physicist Gian Carlo Wick, is a method of finding a solution to a mathematical problem in Minkowski space from a solution to a related problem in Euclidean space by means of a transformation that substitutes an imaginary-number variable for a real-number variable. This transformation is also used to find solutions to problems in quantum mechanics and other areas.

In physics, a partition function describes the statistical properties of a system in thermodynamic equilibrium. Partition functions are functions of the thermodynamic state variables, such as the temperature and volume. Most of the aggregate thermodynamic variables of the system, such as the total energy, free energy, entropy, and pressure, can be expressed in terms of the partition function or its derivatives. The partition function is dimensionless.

In plasmas and electrolytes, the Debye length $, is a measure of a charge carrier's net electrostatic effect in a solution and how far its electrostatic effect persists. With each Debye length the charges are increasingly electrically screened and the electric potential decreases in magnitude by 1/e. A Debye sphere is a volume whose radius is the Debye length. Debye length is an important parameter in plasma physics, electrolytes, and colloids. The corresponding Debye screening wave vector for particles of density, charge at a temperature is given by in Gaussian units. Expressions in MKS units will be given below. The analogous quantities at very low temperatures are known as the Thomas-Fermi length and the Thomas-Fermi wave vector. They are of interest in describing the behaviour of electrons in metals at room temperature.$

In chemistry, conformational isomerism is a form of stereoisomerism in which the isomers can be interconverted just by rotations about formally single bonds. While any two arrangements of atoms in a molecule that differ by rotation about single bonds can be referred to as different conformations, conformations that correspond to local minima on the potential energy surface are specifically called conformational isomers or conformers. Conformations that correspond to local maxima on the energy surface are the transition states between the local-minimum conformational isomers. Rotations about single bonds involve overcoming a rotational energy barrier to interconvert one conformer to another. If the energy barrier is low, there is free rotation and a sample of the compound exists as a rapidly equilibrating mixture of multiple conformers; if the energy barrier is high enough then there is restricted rotation, a molecule may exist for a relatively long time period as a stable rotational isomer or rotamer. When the time scale for interconversion is long enough for isolation of individual rotamers, the isomers are termed atropisomers. The ring-flip of substituted cyclohexanes constitutes another common form of conformational isomerism.

Protein design is the rational design of new protein molecules to design novel activity, behavior, or purpose, and to advance basic understanding of protein function. Proteins can be designed from scratch or by making calculated variants of a known protein structure and its sequence. Rational protein design approaches make protein-sequence predictions that will fold to specific structures. These predicted sequences can then be validated experimentally through methods such as peptide synthesis, site-directed mutagenesis, or artificial gene synthesis.

An ideal chain is the simplest model to describe polymers, such as nucleic acids and proteins. It only assumes a polymer as a random walk and neglects any kind of interactions among monomers. Although it is simple, its generality gives insight about the physics of polymers.

In statistical mechanics, a microstate is a specific microscopic configuration of a thermodynamic system that the system may occupy with a certain probability in the course of its thermal fluctuations. In contrast, the macrostate of a system refers to its macroscopic properties, such as its temperature, pressure, volume and density. Treatments on statistical mechanics define a macrostate as follows: a particular set of values of energy, the number of particles, and the volume of an isolated thermodynamic system is said to specify a particular macrostate of it. In this description, microstates appear as different possible ways the system can achieve a particular macrostate.

A vertex model is a type of statistical mechanics model in which the Boltzmann weights are associated with a vertex in the model. This contrasts with a nearest-neighbour model, such as the Ising model, in which the energy, and thus the Boltzmann weight of a statistical microstate is attributed to the bonds connecting two neighbouring particles. The energy associated with a vertex in the lattice of particles is thus dependent on the state of the bonds which connect it to adjacent vertices. It turns out that every solution of the Yang–Baxter equation with spectral parameters in a tensor product of vector spaces $yields an exactly-solvable vertex model.$

The dead-end elimination algorithm (DEE) is a method for minimizing a function over a discrete set of independent variables. The basic idea is to identify "dead ends", i.e., combinations of variables that are not necessary to define a global minimum because there is always a way of replacing such combination by a better or equivalent one. Then we can refrain from searching such combinations further. Hence, dead-end elimination is a mirror image of dynamic programming, in which "good" combinations are identified and explored further. Although the method itself is general, it has been developed and applied mainly to the problems of predicting and designing the structures of proteins. It closely related to the notion of dominance in optimization also known as substitutability in a Constraint Satisfaction Problem. The original description and proof of the dead-end elimination theorem can be found in.

The softmax function, also known as softargmax or normalized exponential function, is a generalization of the logistic function to multiple dimensions. It is used in multinomial logistic regression and is often used as the last activation function of a neural network to normalize the output of a network to a probability distribution over predicted output classes, based on Luce's choice axiom.

The Poisson–Boltzmann equation is a useful equation in many settings, whether it be to understand physiological interfaces, polymer science, electron interactions in a semiconductor, or more. It aims to describe the distribution of the electric potential in solution in the direction normal to a charged surface. This distribution is important to determine how the electrostatic interactions will affect the molecules in solution. The Poisson–Boltzmann equation is derived via mean-field assumptions. From the Poisson–Boltzmann equation many other equations have been derived with a number of different assumptions.

In protein structure prediction, statistical potentials or knowledge-based potentials are scoring functions derived from an analysis of known protein structures in the Protein Data Bank (PDB).

Graphical models have become powerful frameworks for protein structure prediction, protein–protein interaction, and free energy calculations for protein structures. Using a graphical model to represent the protein structure allows the solution of many problems including secondary structure prediction, protein-protein interactions, protein-drug interaction, and free energy calculations.

In biochemistry, a backbone-dependent rotamer library provides the frequencies, mean dihedral angles, and standard deviations of the discrete conformations of the amino acid side chains in proteins as a function of the backbone dihedral angles φ and ψ of the Ramachandran map. By contrast, backbone-independent rotamer libraries express the frequencies and mean dihedral angles for all side chains in proteins, regardless of the backbone conformation of each residue type. Backbone-dependent rotamer libraries have been shown to have significant advantages over backbone-independent rotamer libraries, principally when used as an energy term, by speeding up search times of side-chain packing algorithms used in protein structure prediction and protein design.

References

^ Koehl P, Delarue M. (1994). Application of a self-consistent mean field theory to predict protein side-chains conformation and estimate their conformational entropy. J Mol Biol 239(2):249-75.
^ Voigt CA, Gordon DB, Mayo SL. (2000). Trading accuracy for speed: A quantitative comparison of search algorithms in protein sequence design. J Mol Biol 299(3):789-803.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.