Holland's schema theorem

Last updated March 18, 2023

Holland's schema theorem, also called the fundamental theorem of genetic algorithms,^[1] is an inequality that results from coarse-graining an equation for evolutionary dynamics. The Schema Theorem says that short, low-order schemata with above-average fitness increase exponentially in frequency in successive generations. The theorem was proposed by John Holland in the 1970s. It was initially widely taken to be the foundation for explanations of the power of genetic algorithms. However, this interpretation of its implications has been criticized in several publications reviewed in,^[2] where the Schema Theorem is shown to be a special case of the Price equation with the schema indicator function as the macroscopic measurement.

Description

Consider binary strings of length 6. The schema 1*10*1 describes the set of all strings of length 6 with 1's at positions 1, 3 and 6 and a 0 at position 4. The * is a wildcard symbol, which means that positions 2 and 5 can have a value of either 1 or 0. The order of a schema $o(H)$ is defined as the number of fixed positions in the template, while the defining length $\delta (H)$ is the distance between the first and last specific positions. The order of 1*10*1 is 4 and its defining length is 5. The fitness of a schema is the average fitness of all strings matching the schema. The fitness of a string is a measure of the value of the encoded problem solution, as computed by a problem-specific evaluation function. Using the established methods and genetic operators of genetic algorithms, the schema theorem states that short, low-order schemata with above-average fitness increase exponentially in successive generations. Expressed as an equation:

\operatorname {E} (m(H,t+1))\geq {m(H,t)f(H) \over a_{t}}[1-p].

Here $m(H,t)$ is the number of strings belonging to schema $H$ at generation $t$ , $f(H)$ is the observed average fitness of schema $H$ and $a_{t}$ is the observed average fitness at generation $t$ . The probability of disruption $p$ is the probability that crossover or mutation will destroy the schema $H$ . Under the assumption that $p_{m}\ll 1$ , it can be expressed as:

p={\delta (H) \over l-1}p_{c}+o(H)p_{m}

where $o(H)$ is the order of the schema, $l$ is the length of the code, $p_{m}$ is the probability of mutation and $p_{c}$ is the probability of crossover. So a schema with a shorter defining length $\delta (H)$ is less likely to be disrupted.
An often misunderstood point is why the Schema Theorem is an inequality rather than an equality. The answer is in fact simple: the Theorem neglects the small, yet non-zero, probability that a string belonging to the schema $H$ will be created "from scratch" by mutation of a single string (or recombination of two strings) that did not belong to $H$ in the previous generation. Moreover, the expression for $p$ is clearly pessimistic: depending on the mating partner, recombination may not disrupt the scheme even when a cross point is selected between the first and the last fixed position of $H$ .

Limitation

Plot of a multimodal function in two variables. Bimodal-bivariate-small.png — Plot of a multimodal function in two variables.

The schema theorem holds under the assumption of a genetic algorithm that maintains an infinitely large population, but does not always carry over to (finite) practice: due to sampling error in the initial population, genetic algorithms may converge on schemata that have no selective advantage. This happens in particular in multimodal optimization, where a function can have multiple peaks: the population may drift to prefer one of the peaks, ignoring the others.^[3]

The reason that the Schema Theorem cannot explain the power of genetic algorithms is that it holds for all problem instances, and cannot distinguish between problems in which genetic algorithms perform poorly, and problems for which genetic algorithms perform well.

Related Research Articles

In algorithmic information theory, the Kolmogorov complexity of an object, such as a piece of text, is the length of a shortest computer program that produces the object as output. It is a measure of the computational resources needed to specify the object, and is also known as algorithmic complexity, Solomonoff–Kolmogorov–Chaitin complexity, program-size complexity, descriptive complexity, or algorithmic entropy. It is named after Andrey Kolmogorov, who first published on the subject in 1963 and is a generalization of classical information theory.

In computational complexity theory, a branch of computer science, bounded-error probabilistic polynomial time (BPP) is the class of decision problems solvable by a probabilistic Turing machine in polynomial time with an error probability bounded by 1/3 for all instances. BPP is one of the largest practical classes of problems, meaning most problems of interest in BPP have efficient probabilistic algorithms that can be run quickly on real modern machines. BPP also contains P, the class of problems solvable in polynomial time with a deterministic machine, since a deterministic machine is a special case of a probabilistic machine.

In the computer science subfield of algorithmic information theory, a Chaitin constant or halting probability is a real number that, informally speaking, represents the probability that a randomly constructed program will halt. These numbers are formed from a construction due to Gregory Chaitin.

In formal language theory, a context-free language (CFL) is a language generated by a context-free grammar (CFG).

In computer science and operations research, a genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to generate high-quality solutions to optimization and search problems by relying on biologically inspired operators such as mutation, crossover and selection. Some examples of GA applications include optimizing decision trees for better performance, solving sudoku puzzles, hyperparameter optimization, etc.

Fitness is the quantitative representation of individual reproductive success. It is also equal to the average contribution to the gene pool of the next generation, made by the same individuals of the specified genotype or phenotype. Fitness can be defined either with respect to a genotype or to a phenotype in a given environment or time. The fitness of a genotype is manifested through its phenotype, which is also affected by the developmental environment. The fitness of a given phenotype can also be different in different selective environments.

In computational intelligence (CI), an evolutionary algorithm (EA) is a subset of evolutionary computation, a generic population-based metaheuristic optimization algorithm. An EA uses mechanisms inspired by biological evolution, such as reproduction, mutation, recombination, and selection. Candidate solutions to the optimization problem play the role of individuals in a population, and the fitness function determines the quality of the solutions. Evolution of the population then takes place after the repeated application of the above operators.

Grammar theory to model symbol strings originated from work in computational linguistics aiming to understand the structure of natural languages. Probabilistic context free grammars (PCFGs) have been applied in probabilistic modeling of RNA structures almost 40 years after they were introduced in computational linguistics.

A genetic operator is an operator used in genetic algorithms to guide the algorithm towards a solution to a given problem. There are three main types of operators, which must work in conjunction with one another in order for the algorithm to be successful. Genetic operators are used to create and maintain genetic diversity, combine existing solutions into new solutions (crossover) and select between solutions (selection). In his book discussing the use of genetic programming for the optimization of complex problems, computer scientist John Koza has also identified an 'inversion' or 'permutation' operator; however, the effectiveness of this operator has never been conclusively demonstrated and this operator is rarely discussed.

In computational complexity theory, a complexity class is a set of computational problems of related resource-based complexity. The two most commonly analyzed resources are time and memory.

In genetic algorithms and evolutionary computation, crossover, also called recombination, is a genetic operator used to combine the genetic information of two parents to generate new offspring. It is one way to stochastically generate new solutions from an existing population, and is analogous to the crossover that happens during sexual reproduction in biology. Solutions can also be generated by cloning an existing solution, which is analogous to asexual reproduction. Newly generated solutions are typically mutated before being added to the population.

Mutation is a genetic operator used to maintain genetic diversity of the chromosomes of a population of a genetic or, more generally, an evolutionary algorithm (EA). It is analogous to biological mutation.

In computer programming, gene expression programming (GEP) is an evolutionary algorithm that creates computer programs or models. These computer programs are complex tree structures that learn and adapt by changing their sizes, shapes, and composition, much like a living organism. And like living organisms, the computer programs of GEP are also encoded in simple linear chromosomes of fixed length. Thus, GEP is a genotype–phenotype system, benefiting from a simple genome to keep and transmit the genetic information and a complex phenotype to explore the environment and adapt to it.

Selection is the stage of a genetic algorithm or more general evolutionary algorithm in which individual genomes are chosen from a population for later breeding.

In genetic algorithms and genetic programming defining length L(H) is the maximum distance between two defining symbols in schema H. In tree GP schemata, L(H) is the number of links in the minimum tree fragment including all the non-= symbols within a schema H.

In computational complexity theory, P/poly is a complexity class representing problems that can be solved by small circuits. More precisely, it is the set of formal languages that have polynomial-size circuit families. It can also be defined equivalently in terms of Turing machines with advice, extra information supplied to the Turing machine along with its input, that may depend on the input length but not on the input itself. In this formulation, P/poly is the class of decision problems that can be solved by a polynomial-time Turing machine with advice strings of length polynomial in the input size. These two different definitions make P/poly central to circuit complexity and non-uniform complexity.

In computer science, a longest common substring of two or more strings is a longest string that is a substring of all of them. There may be more than one longest common substring. Applications include data deduplication and plagiarism detection.

A schema (pl. schemata) is a template in computer science used in the field of genetic algorithms that identifies a subset of strings with similarities at certain string positions. Schemata are a special case of cylinder sets, forming a basis for a product topology on strings. In other words, schemata can be used to generate a topology on a space of strings.

In computer science, a property testing algorithm for a decision problem is an algorithm whose query complexity to its input is much smaller than the instance size of the problem. Typically property testing algorithms are used to distinguish if some combinatorial structure S satisfies some property P, or is "far" from having this property, using only a small number of "local" queries to the object.

In computational learning theory, Occam learning is a model of algorithmic learning where the objective of the learner is to output a succinct representation of received training data. This is closely related to probably approximately correct (PAC) learning, where the learner is evaluated on its predictive power of a test set.

References

↑ Bridges, Clayton L.; Goldberg, David E. (1987). An analysis of reproduction and crossover in a binary-coded genetic algorithm. 2nd Int'l Conf. on Genetic Algorithms and their applications. ISBN 9781134989737.
↑ Altenberg, L. (1995). The Schema Theorem and Price’s Theorem. Foundations of genetic algorithms, 3, 23-49.
↑ David E., Goldberg; Richardson, Jon (1987). Genetic algorithms with sharing for multimodal function optimization. 2nd Int'l Conf. on Genetic Algorithms and their applications. ISBN 9781134989737.

J. Holland, Adaptation in Natural and Artificial Systems , The MIT Press; Reprint edition 1992 (originally published in 1975).
J. Holland, Hidden Order: How Adaptation Builds Complexity, Helix Books; 1996.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Bridges, Clayton L.; Goldberg, David E. (1987). An analysis of reproduction and crossover in a binary-coded genetic algorithm. 2nd Int'l Conf. on Genetic Algorithms and their applications. ISBN 9781134989737.

[2] Altenberg, L. (1995). The Schema Theorem and Price’s Theorem. Foundations of genetic algorithms, 3, 23-49.

[3] David E., Goldberg; Richardson, Jon (1987). Genetic algorithms with sharing for multimodal function optimization. 2nd Int'l Conf. on Genetic Algorithms and their applications. ISBN 9781134989737.

[1]

[2]

[3]

Holland's schema theorem

Contents

Description

Limitation

Related Research Articles

References