Evolvability (computer science)

Last updated March 09, 2023

The term evolvability is used for a recent framework of computational learning introduced by Leslie Valiant in his paper of the same name and described below. The aim of this theory is to model biological evolution and categorize which types of mechanisms are evolvable. Evolution is an extension of PAC learning and learning from statistical queries.

General framework

Let $F_{n}\,$ and $R_{n}\,$ be collections of functions on $n\,$ variables. Given an ideal function $f\in F_{n}$ , the goal is to find by local search a representation $r\in R_{n}$ that closely approximates $f\,$ . This closeness is measured by the performance $\operatorname {Perf} (f,r)$ of $r\,$ with respect to $f\,$ .

As is the case in the biological world, there is a difference between genotype and phenotype. In general, there can be multiple representations (genotypes) that correspond to the same function (phenotype). That is, for some $r,r'\in R_{n}$ , with $r\neq r'\,$ , still $r(x)=r'(x)\,$ for all $x\in X_{n}$ . However, this need not be the case. The goal then, is to find a representation that closely matches the phenotype of the ideal function, and the spirit of the local search is to allow only small changes in the genotype. Let the neighborhood $N(r)\,$ of a representation $r\,$ be the set of possible mutations of $r\,$ .

For simplicity, consider Boolean functions on $X_{n}=\{-1,1\}^{n}\,$ , and let $D_{n}\,$ be a probability distribution on $X_{n}\,$ . Define the performance in terms of this. Specifically,

\operatorname {Perf} (f,r)=\sum _{x\in X_{n}}f(x)r(x)D_{n}(x).

Note that $\operatorname {Perf} (f,r)=\operatorname {Prob} (f(x)=r(x))-\operatorname {Prob} (f(x)\neq r(x)).$ In general, for non-Boolean functions, the performance will not correspond directly to the probability that the functions agree, although it will have some relationship.

Throughout an organism's life, it will only experience a limited number of environments, so its performance cannot be determined exactly. The empirical performance is defined by $\operatorname {Perf} _{s}(f,r)={\frac {1}{s}}\sum _{x\in S}f(x)r(x),$ where $S\,$ is a multiset of $s\,$ independent selections from $X_{n}\,$ according to $D_{n}\,$ . If $s\,$ is large enough, evidently $\operatorname {Perf} _{s}(f,r)$ will be close to the actual performance $\operatorname {Perf} (f,r)$ .

Given an ideal function $f\in F_{n}$ , initial representation $r\in R_{n}$ , sample size $s\,$ , and tolerance $t\,$ , the mutator $\operatorname {Mut} (f,r,s,t)$ is a random variable defined as follows. Each $r'\in N(r)$ is classified as beneficial, neutral, or deleterious, depending on its empirical performance. Specifically,

$r'\,$ is a beneficial mutation if $\operatorname {Perf} _{s}(f,r')-\operatorname {Perf} _{s}(f,r)\geq t$ ;
$r'\,$ is a neutral mutation if $-t<\operatorname {Perf} _{s}(f,r')-\operatorname {Perf} _{s}(f,r)<t$ ;
$r'\,$ is a deleterious mutation if $\operatorname {Perf} _{s}(f,r')-\operatorname {Perf} _{s}(f,r)\leq -t$ .

If there are any beneficial mutations, then $\operatorname {Mut} (f,r,s,t)$ is equal to one of these at random. If there are no beneficial mutations, then $\operatorname {Mut} (f,r,s,t)$ is equal to a random neutral mutation. In light of the similarity to biology, $r\,$ itself is required to be available as a mutation, so there will always be at least one neutral mutation.

The intention of this definition is that at each stage of evolution, all possible mutations of the current genome are tested in the environment. Out of the ones who thrive, or at least survive, one is chosen to be the candidate for the next stage. Given $r_{0}\in R_{n}$ , we define the sequence $r_{0},r_{1},r_{2},\ldots$ by $r_{i+1}=\operatorname {Mut} (f,r_{i},s,t)$ . Thus $r_{g}\,$ is a random variable representing what $r_{0}\,$ has evolved to after $g\,$ generations.

Let $F\,$ be a class of functions, $R\,$ be a class of representations, and $D\,$ a class of distributions on $X\,$ . We say that $F\,$ is evolvable by $R\,$ over $D\,$ if there exists polynomials $p(\cdot ,\cdot )$ , $s(\cdot ,\cdot )$ , $t(\cdot ,\cdot )$ , and $g(\cdot ,\cdot )$ such that for all $n\,$ and all $\epsilon >0\,$ , for all ideal functions $f\in F_{n}$ and representations $r_{0}\in R_{n}$ , with probability at least $1-\epsilon \,$ ,

\operatorname {Perf} (f,r_{g(n,1/\epsilon )})\geq 1-\epsilon ,

where the sizes of neighborhoods $N(r)\,$ for $r\in R_{n}\,$ are at most $p(n,1/\epsilon )\,$ , the sample size is $s(n,1/\epsilon )\,$ , the tolerance is $t(1/n,\epsilon )\,$ , and the generation size is $g(n,1/\epsilon )\,$ .

$F\,$ is evolvable over $D\,$ if it is evolvable by some $R\,$ over $D\,$ .

$F\,$ is evolvable if it is evolvable over all distributions $D\,$ .

Results

The class of conjunctions and the class of disjunctions are evolvable over the uniform distribution for short conjunctions and disjunctions, respectively.

The class of parity functions (which evaluate to the parity of the number of true literals in a given subset of literals) are not evolvable, even for the uniform distribution.

Evolvability implies PAC learnability.

Related Research Articles

In mathematics, a continuous function is a function such that a continuous variation of the argument induces a continuous variation of the value of the function. This means that there are no abrupt changes in value, known as discontinuities. More precisely, a function is continuous if arbitrarily small changes in its value can be assured by restricting to sufficiently small changes of its argument. A discontinuous function is a function that is not continuous. Up until the 19th century, mathematicians largely relied on intuitive notions of continuity, and considered only continuous functions. The epsilon–delta definition of a limit was introduced to formalize the definition of continuity.

In mathematics, the determinant is a scalar value that is a function of the entries of a square matrix. It characterizes some properties of the matrix and the linear map represented by the matrix. In particular, the determinant is nonzero if and only if the matrix is invertible and the linear map represented by the matrix is an isomorphism. The determinant of a product of matrices is the product of their determinants (the preceding property is a corollary of this one). The determinant of a matrix $A$ is denoted $det(A)$ , $det A$ , or $| A |$ .

In mathematics, the infimum of a subset $of a partially ordered set is a greatest element in that is less than or equal to each element of if such an element exists. Consequently, the term greatest lower bound is also commonly used. The supremum of a subset of a partially ordered set is the least element in that is greater than or equal to each element of if such an element exists. Consequently, the supremum is also referred to as the least upper bound .$

In mathematics, the Laplace operator or Laplacian is a differential operator given by the divergence of the gradient of a scalar function on Euclidean space. It is usually denoted by the symbols $, (where is the nabla operator), or . In a Cartesian coordinate system, the Laplacian is given by the sum of second partial derivatives of the function with respect to each independent variable. In other coordinate systems, such as cylindrical and spherical coordinates, the Laplacian also has a useful form. Informally, the Laplacian Δ f (p) of a function f at a point p measures by how much the average value of f over small spheres or balls centered at p deviates from f (p) .$

Quantum decoherence is the loss of quantum coherence, the process in which a system's behaviour changes from that which can be explained by quantum mechanics to that which can be explained by classical mechanics. In quantum mechanics, particles such as electrons are described by a wave function, a mathematical representation of the quantum state of a system; a probabilistic interpretation of the wave function is used to explain various quantum effects. As long as there exists a definite phase relation between different states, the system is said to be coherent. A definite phase relationship is necessary to perform quantum computing on quantum information encoded in quantum states. Coherence is preserved under the laws of quantum physics.

<span class="mw-page-title-main">Hardy–Weinberg principle</span> Principle within genetics

In population genetics, the Hardy–Weinberg principle, also known as the Hardy–Weinberg equilibrium, model, theorem, or law, states that allele and genotype frequencies in a population will remain constant from generation to generation in the absence of other evolutionary influences. These influences include genetic drift, mate choice, assortative mating, natural selection, sexual selection, mutation, gene flow, meiotic drive, genetic hitchhiking, population bottleneck, founder effect,inbreeding and outbreeding depression.

In probability theory, Markov's inequality gives an upper bound for the probability that a non-negative function of a random variable is greater than or equal to some positive constant. It is named after the Russian mathematician Andrey Markov, although it appeared earlier in the work of Pafnuty Chebyshev, and many sources, especially in analysis, refer to it as Chebyshev's inequality or Bienaymé's inequality.

In calculus, the product rule is a formula used to find the derivatives of products of two or more functions. For two functions, it may be stated in Lagrange's notation as

In mathematics, Fatou's lemma establishes an inequality relating the Lebesgue integral of the limit inferior of a sequence of functions to the limit inferior of integrals of these functions. The lemma is named after Pierre Fatou.

In the calculus of variations, a field of mathematical analysis, the functional derivative relates a change in a functional to a change in a function on which the functional depends.

In mathematics, the Fourier inversion theorem says that for many types of functions it is possible to recover a function from its Fourier transform. Intuitively it may be viewed as the statement that if we know all frequency and phase information about a wave then we may reconstruct the original wave precisely.

In statistics, originally in geostatistics, kriging or Kriging, also known as Gaussian process regression, is a method of interpolation based on Gaussian process governed by prior covariances. Under suitable assumptions of the prior, kriging gives the best linear unbiased prediction (BLUP) at unsampled locations. Interpolating methods based on other criteria such as smoothness may not yield the BLUP. The method is widely used in the domain of spatial analysis and computer experiments. The technique is also known as Wiener–Kolmogorov prediction, after Norbert Wiener and Andrey Kolmogorov.

In mathematical analysis, asymptotic analysis, also known as asymptotics, is a method of describing limiting behavior.

In mathematics, a derivation is a function on an algebra which generalizes certain features of the derivative operator. Specifically, given an algebra A over a ring or a field K, a K-derivation is a K-linear map D : A → A that satisfies Leibniz's law:

In classical electromagnetism, reciprocity refers to a variety of related theorems involving the interchange of time-harmonic electric current densities (sources) and the resulting electromagnetic fields in Maxwell's equations for time-invariant linear media under certain constraints. Reciprocity is closely related to the concept of symmetric operators from linear algebra, applied to electromagnetism.

In mathematics, the metaplectic group Mp_2n is a double cover of the symplectic group Sp_2n. It can be defined over either real or p-adic numbers. The construction covers more generally the case of an arbitrary local or finite field, and even the ring of adeles.

In statistics, Bayesian multivariate linear regression is a Bayesian approach to multivariate linear regression, i.e. linear regression where the predicted outcome is a vector of correlated random variables rather than a single scalar random variable. A more general treatment of this approach can be found in the article MMSE estimator.

In mathematics, at the junction of singularity theory and differential topology, Cerf theory is the study of families of smooth real-valued functions

In mathematics, geometric calculus extends the geometric algebra to include differentiation and integration. The formalism is powerful and can be shown to encompass other mathematical theories including differential geometry and differential forms.

In functional analysis, the Fréchet–Kolmogorov theorem gives a necessary and sufficient condition for a set of functions to be relatively compact in an L^p space. It can be thought of as an L^p version of the Arzelà–Ascoli theorem, from which it can be deduced. The theorem is named after Maurice René Fréchet and Andrey Kolmogorov.

References

Valiant, L. G. (2006), Evolvability, ECCC TR06-120 .

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

Evolvability (computer science)

Contents

General framework

Results

Related Research Articles

References