NK model

Last updated

The NK model is a mathematical model described by its primary inventor Stuart Kauffman as a "tunably rugged" fitness landscape. "Tunable ruggedness" captures the intuition that both the overall size of the landscape and the number of its local "hills and valleys" can be adjusted via changes to its two parameters, and , with being the length of a string of evolution and determining the level of landscape ruggedness.

Contents

The NK model has found application in a wide variety of fields, including the theoretical study of evolutionary biology, immunology, optimisation, technological evolution, team science [1] , and complex systems. The model was also adopted in organizational theory, where it is used to describe the way an agent may search a landscape by manipulating various characteristics of itself. For example, an agent can be an organization, the hills and valleys represent profit (or changes thereof), and movement on the landscape necessitates organizational decisions (such as adding product lines or altering the organizational structure), which tend to interact with each other and affect profit in a complex fashion. [2]

An early version of the model, which considered only the smoothest () and most rugged () landscapes, was presented in Kauffman and Levin (1987). [3] The model as it is currently known first appeared in Kauffman and Weinberger (1989). [4]

One of the reasons why the model has attracted wide attention in optimisation is that it is a particularly simple instance of a so-called NP-complete problem [5] which means it is difficult to find global optima. Recently, it was shown that the NK model for K > 1 is also PLS-complete [6] which means than, in general, it is difficult to find even local fitness optima. This has consequences for the study of open-ended evolution.

Prototypical example: plasmid fitness

A plasmid is a small circle of DNA inside certain cells that can replicate independently of their host cells. Suppose we wish to study the fitness of plasmids.

For simplicity, we model a plasmid as a ring of N possible genes, always in the same order, and each can have two possible states (active or inactive, type X or type Y, etc...). Then the plasmid is modelled by a binary string with length N, and so the fitness function is .

The simplest model would have the genes not interacting with each other, and so we obtain

where each denotes the contribution to fitness of gene at location . To model epistasis, we introduce another factor K, the number of other genes that a gene interacts with. It is reasonable to assume that on a plasmid, two genes interact if they are adjacent, thus giving

For example, when K = 1, and N = 5,

The NK model generalizes this by allowing arbitrary finite K, N, as well as allowing arbitrary definition of adjacency of genes (the genes do not necessarily lie on a circle or a line segment).

Mathematical definition

The NK model defines a combinatorial phase space, consisting of every string (chosen from a given alphabet) of length . For each string in this search space, a scalar value (called the fitness ) is defined. If a distance metric is defined between strings, the resulting structure is a landscape.

Fitness values are defined according to the specific incarnation of the model, but the key feature of the NK model is that the fitness of a given string is the sum of contributions from each locus in the string:

and the contribution from each locus in general depends on its state and the state of other loci,:

where is the index of the th neighbor of locus .

Hence, the fitness function is a mapping between strings of length K + 1 and scalars, which Weinberger's later work calls "fitness contributions". Such fitness contributions are often chosen randomly from some specified probability distribution.

Visualization of two dimensions of a NK fitness landscape. The arrows represent various mutational paths that the population could follow while evolving on the fitness landscape Visualization of two dimensions of a NK fitness landscape.png
Visualization of two dimensions of a NK fitness landscape. The arrows represent various mutational paths that the population could follow while evolving on the fitness landscape

Example: the spin glass models

The 1D Ising model of spin glass is usually written as

where is the Hamiltonian, which can be thought as energy. We can reformulate it as a special case of the NK model with K=1:

by defining

In general, the m-dimensional Ising model on a square grid is an NK model with .

Since K roughly measures "ruggedness" of the fitness landscape (see below), we see that as the dimension of Ising model increases, its ruggedness also increases.

When , this is the Edwards–Anderson model, which is exactly solvable.

The Sherrington–Kirkpatrick model generalizes the Ising model by allowing all possible pairs of spins to interact (instead of a grid graph, use the complete graph), thus it is also an NK model with .

Allowing all possible subsequences of spins to interact, instead of merely pairs, we obtain the infinite-range model, which is also an NK model with .

Tunable topology

Illustration of tunable topology in the NK model. Nodes are individual binary strings, edges connect strings with a Hamming distance of exactly one. (left) N = 5, K = 0. (centre) N = 5, K = 1. (right) N = 5, K = 2. The colour of a node denotes its fitness, with redder values having higher fitness. The embedding of the hypercube is chosen so that the fitness maximum is at the centre. Notice that the K = 0 landscape appears smoother than the higher-K cases. Nk model hypercube.PNG
Illustration of tunable topology in the NK model. Nodes are individual binary strings, edges connect strings with a Hamming distance of exactly one. (left) N = 5, K = 0. (centre) N = 5, K = 1. (right) N = 5, K = 2. The colour of a node denotes its fitness, with redder values having higher fitness. The embedding of the hypercube is chosen so that the fitness maximum is at the centre. Notice that the K = 0 landscape appears smoother than the higher-K cases.

The value of K controls the degree of epistasis in the NK model, or how much other loci affect the fitness contribution of a given locus. With K = 0, the fitness of a given string is a simple sum of individual contributions of loci: for nontrivial fitness functions, a global optimum is present and easy to locate (the genome of all 0s if f(0) > f(1), or all 1s if f(1) > f(0)). For nonzero K, the fitness of a string is a sum of fitnesses of substrings, which may interact to frustrate the system (consider how to achieve optimal fitness in the example above). Increasing K thus increases the ruggedness of the fitness landscape.

Variations with neutral spaces

The bare NK model does not support the phenomenon of neutral space -- that is, sets of genomes connected by single mutations that have the same fitness value. Two adaptations have been proposed to include this biologically important structure. The NKP model introduces a parameter : a proportion of the fitness contributions is set to zero, so that the contributions of several genetic motifs are degenerate [ citation needed ]. The NKQ model introduces a parameter and enforces a discretisation on the possible fitness contribution values so that each contribution takes one of possible values, again introducing degeneracy in the contributions from some genetic motifs [ citation needed ]. The bare NK model corresponds to the and cases under these parameterisations.

Known results

In 1991, Weinberger published a detailed analysis [7] of the case in which and the fitness contributions are chosen randomly. His analytical estimate of the number of local optima was later shown to be flawed [ citation needed ]. However, numerical experiments included in Weinberger's analysis support his analytical result that the expected fitness of a string is normally distributed with a mean of approximately

and a variance of approximately

.

Applications

The NK model has found use in many fields, including in the study of spin glasses, collective problem solving, [8] epistasis and pleiotropy in evolutionary biology, and combinatorial optimisation.

Related Research Articles

<span class="mw-page-title-main">Stuart Kauffman</span> American medical doctor & academic

Stuart Alan Kauffman is an American medical doctor, theoretical biologist, and complex systems researcher who studies the origin of life on Earth. He was a professor at the University of Chicago, University of Pennsylvania, and University of Calgary. He is currently emeritus professor of biochemistry at the University of Pennsylvania and affiliate faculty at the Institute for Systems Biology. He has a number of awards including a MacArthur Fellowship and a Wiener Medal.

<span class="mw-page-title-main">Gamma distribution</span> Probability distribution

In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. There are two equivalent parameterizations in common use:

  1. With a shape parameter and a scale parameter .
  2. With a shape parameter and an inverse scale parameter , called a rate parameter.

The Ising model, named after the physicists Ernst Ising and Wilhelm Lenz, is a mathematical model of ferromagnetism in statistical mechanics. The model consists of discrete variables that represent magnetic dipole moments of atomic "spins" that can be in one of two states. The spins are arranged in a graph, usually a lattice, allowing each spin to interact with its neighbors. Neighboring spins that agree have a lower energy than those that disagree; the system tends to the lowest energy but heat disturbs this tendency, thus creating the possibility of different structural phases. The model allows the identification of phase transitions as a simplified model of reality. The two-dimensional square-lattice Ising model is one of the simplest statistical models to show a phase transition.

In evolutionary biology, fitness landscapes or adaptive landscapes are used to visualize the relationship between genotypes and reproductive success. It is assumed that every genotype has a well-defined replication rate. This fitness is the "height" of the landscape. Genotypes which are similar are said to be "close" to each other, while those that are very different are "far" from each other. The set of all possible genotypes, their degree of similarity, and their related fitness values is then called a fitness landscape. The idea of a fitness landscape is a metaphor to help explain flawed forms in evolution by natural selection, including exploits and glitches in animals like their reactions to supernormal stimuli.

A conformal field theory (CFT) is a quantum field theory that is invariant under conformal transformations. In two dimensions, there is an infinite-dimensional algebra of local conformal transformations, and conformal field theories can sometimes be exactly solved or classified.

In physics and probability theory, Mean-field theory (MFT) or Self-consistent field theory studies the behavior of high-dimensional random (stochastic) models by studying a simpler model that approximates the original by averaging over degrees of freedom. Such models consider many individual components that interact with each other.

<span class="mw-page-title-main">Haldane's dilemma</span> Limit on the speed of beneficial evolution

Haldane's dilemma, also known as the waiting time problem, is a limit on the speed of beneficial evolution, calculated by J. B. S. Haldane in 1957. Before the invention of DNA sequencing technologies, it was not known how much polymorphism DNA harbored, although alloenzymes were beginning to make it clear that substantial polymorphism existed. This was puzzling because the amount of polymorphism known to exist seemed to exceed the theoretical limits that Haldane calculated, that is, the limits imposed if polymorphisms present in the population generally influence an organism's fitness. Motoo Kimura's landmark paper on neutral theory in 1968 built on Haldane's work to suggest that most molecular evolution is neutral, resolving the dilemma. Although neutral evolution remains the consensus theory among modern biologists, and thus Kimura's resolution of Haldane's dilemma is widely regarded as correct, some biologists argue that adaptive evolution explains a large fraction of substitutions in protein coding sequence, and they propose alternative solutions to Haldane's dilemma.

The classical XY model is a lattice model of statistical mechanics. In general, the XY model can be seen as a specialization of Stanley's n-vector model for n = 2.

In the theory of evolution and natural selection, the Price equation describes how a trait or allele changes in frequency over time. The equation uses a covariance between a trait and fitness, to give a mathematical description of evolution and natural selection. It provides a way to understand the effects that gene transmission and natural selection have on the frequency of alleles within each new generation of a population. The Price equation was derived by George R. Price, working in London to re-derive W.D. Hamilton's work on kin selection. Examples of the Price equation have been constructed for various evolutionary cases. The Price equation also has applications in economics.

In statistical mechanics, the Potts model, a generalization of the Ising model, is a model of interacting spins on a crystalline lattice. By studying the Potts model, one may gain insight into the behaviour of ferromagnets and certain other phenomena of solid-state physics. The strength of the Potts model is not so much that it models these physical systems well; it is rather that the one-dimensional case is exactly solvable, and that it has a rich mathematical formulation that has been studied extensively.

A Hopfield network is a form of recurrent artificial neural network and a type of spin glass system popularised by John Hopfield in 1982 as described by Shun'ichi Amari in 1972 and by Little in 1974 based on Ernst Ising's work with Wilhelm Lenz on the Ising model. Hopfield networks serve as content-addressable ("associative") memory systems with binary threshold nodes, or with continuous variables. Hopfield networks also provide a model for understanding human memory.

<span class="mw-page-title-main">Markov random field</span> Set of random variables

In the domain of physics and probability, a Markov random field (MRF), Markov network or undirected graphical model is a set of random variables having a Markov property described by an undirected graph. In other words, a random field is said to be a Markov random field if it satisfies Markov properties. The concept originates from the Sherrington–Kirkpatrick model.

Genetic load is the difference between the fitness of an average genotype in a population and the fitness of some reference genotype, which may be either the best present in a population, or may be the theoretically optimal genotype. The average individual taken from a population with a low genetic load will generally, when grown in the same conditions, have more surviving offspring than the average individual from a population with a high genetic load. Genetic load can also be seen as reduced fitness at the population level compared to what the population would have if all individuals had the reference high-fitness genotype. High genetic load may put a population in danger of extinction.

In mathematics, the Gibbs measure, named after Josiah Willard Gibbs, is a probability measure frequently seen in many problems of probability theory and statistical mechanics. It is a generalization of the canonical ensemble to infinite systems. The canonical ensemble gives the probability of the system X being in state x as

In statistical mechanics, the two-dimensional square lattice Ising model is a simple lattice model of interacting magnetic spins. The model is notable for having nontrivial interactions, yet having an analytical solution. The model was solved by Lars Onsager for the special case that the external magnetic field H = 0. An analytical solution for the general case for has yet to be found.

<span class="mw-page-title-main">Boolean network</span> Discrete set of boolean variables

A Boolean network consists of a discrete set of boolean variables each of which has a Boolean function assigned to it which takes inputs from a subset of those variables and output that determines the state of the variable it is assigned to. This set of functions in effect determines a topology (connectivity) on the set of variables, which then become nodes in a network. Usually, the dynamics of the system is taken as a discrete time series where the state of the entire network at time t+1 is determined by evaluating each variable's function on the state of the network at time t. This may be done synchronously or asynchronously.

Wagner's gene network model is a computational model of artificial gene networks, which explicitly modeled the developmental and evolutionary process of genetic regulatory networks. A population with multiple organisms can be created and evolved from generation to generation. It was first developed by Andreas Wagner in 1996 and has been investigated by other groups to study the evolution of gene networks, gene expression, robustness, plasticity and epistasis.

In statistical mechanics, the Griffiths inequality, sometimes also called Griffiths–Kelly–Sherman inequality or GKS inequality, named after Robert B. Griffiths, is a correlation inequality for ferromagnetic spin systems. Informally, it says that in ferromagnetic spin systems, if the 'a-priori distribution' of the spin is invariant under spin flipping, the correlation of any monomial of the spins is non-negative; and the two point correlation of two monomial of the spins is non-negative.

Construction of an irreducible Markov chain in the Ising model is a mathematical method to prove results.

In statistical mechanics, Lee–Yang theory, sometimes also known as Yang–Lee theory, is a scientific theory which seeks to describe phase transitions in large physical systems in the thermodynamic limit based on the properties of small, finite-size systems. The theory revolves around the complex zeros of partition functions of finite-size systems and how these may reveal the existence of phase transitions in the thermodynamic limit.

References

  1. Boroomand, Amin; Smaldino, Paul E. (2023). "Superiority bias and communication noise can enhance collective problem solving". Journal of Artificial Societies and Social Simulation. 26 (3). doi:10.18564/jasss.5154.
  2. Levinthal, D. A. (1997). "Adaptation on Rugged Landscapes". Management Science. 43 (7): 934–950. doi:10.1287/mnsc.43.7.934.
  3. Kauffman, S.; Levin, S. (1987). "Towards a general theory of adaptive walks on rugged landscapes". Journal of Theoretical Biology. 128 (1): 11–45. Bibcode:1987JThBi.128...11K. doi:10.1016/s0022-5193(87)80029-2. PMID   3431131.
  4. Kauffman, S.; Weinberger, E. (1989). "The NK Model of rugged fitness landscapes and its application to the maturation of the immune response". Journal of Theoretical Biology. 141 (2): 211–245. Bibcode:1989JThBi.141..211K. doi:10.1016/s0022-5193(89)80019-0. PMID   2632988.
  5. Weinberger, E. (1996), "NP-completeness of Kauffman's N-k model, a Tuneably Rugged Fitness Landscape", Santa Fe Institute Working Paper, 96-02-003.
  6. Kaznatcheev, Artem (2019). "Computational Complexity as an Ultimate Constraint on Evolution". Genetics. 212 (1): 245–265. doi:10.1534/genetics.119.302000. PMC   6499524 . PMID   30833289.
  7. Weinberger, Edward (November 15, 1991). "Local properties of Kauffman's N-k model: A tunably rugged energy landscape". Physical Review A. 10. 44 (10): 6399–6413. Bibcode:1991PhRvA..44.6399W. doi:10.1103/physreva.44.6399. PMID   9905770.
  8. Boroomand, A. and Smaldino, P.E., 2021. Hard Work, Risk-Taking, and Diversity in a Model of Collective Problem Solving. Journal of Artificial Societies and Social Simulation, 24(4).