Part of a series on the |
Evolutionary algorithm |
---|
Genetic algorithm (GA) |
Genetic programming (GP) |
Differential evolution |
Evolution strategy |
Evolutionary programming |
Related topics |
Selection is a genetic operator in an evolutionary algorithm (EA). An EA is a metaheuristic inspired by biological evolution and aims to solve challenging problems at least approximately. Selection has a dual purpose: on the one hand, it can choose individual genomes from a population for subsequent breeding (e.g., using the crossover operator). In addition, selection mechanisms are also used to choose candidate solutions (individuals) for the next generation. The biological model is natural selection.
Retaining the best individual(s) of one generation unchanged in the next generation is called elitism or elitist selection. It is a successful (slight) variant of the general process of constructing a new population.
The basis for selection is the quality of an individual, which is determined by the fitness function. In memetic algorithms, an extension of EA, selection also takes place in the selection of those offspring that are to be improved with the help of a meme (e.g. a heuristic).
A selection procedure for breeding used early on [1] may be implemented as follows:
For many problems the above algorithm might be computationally demanding. A simpler and faster alternative uses the so-called stochastic acceptance.
If this procedure is repeated until there are enough selected individuals, this selection method is called fitness proportionate selection or roulette-wheel selection. If instead of a single pointer spun multiple times, there are multiple, equally spaced pointers on a wheel that is spun once, it is called stochastic universal sampling. Repeatedly selecting the best individual of a randomly chosen subset is tournament selection. Taking the best half, third or another proportion of the individuals is truncation selection.
There are other selection algorithms that do not consider all individuals for selection, but only those with a fitness value that is higher than a given (arbitrary) constant. Other algorithms select from a restricted pool where only a certain percentage of the individuals are allowed, based on fitness value.
The listed methods differ mainly in the selection pressure, [2] [3] which can be set by a strategy parameter in the rank selection described below. The higher the selection pressure, the faster a population converges against a certain solution and the search space may not be explored sufficiently. This premature convergence [4] can be counteracted by structuring the population appropriately. [5] [6] There is a close correlation between the population model used and a suitable selection pressure. [5] If the pressure is too low, it must be expected that the population will not converge even after a long computing time. For more selection methods and further detail see. [7] [8]
In the roulette wheel selection, the probability of choosing an individual for breeding of the next generation is proportional to its fitness, the better the fitness is, the higher chance for that individual to be chosen. Choosing individuals can be depicted as spinning a roulette that has as many pockets as there are individuals in the current generation, with sizes depending on their probability. Probability of choosing individual is equal to , where is the fitness of and is the size of current generation (note that in this method one individual can be drawn multiple times).
Stochastic universal sampling is a development of roulette wheel selection with minimal spread and no bias.
In rank selection, the probability for selection does not depend directly on the fitness, but on the fitness rank of an individual within the population. [9] The exact fitness values themselves do not have to be available, but only a sorting of the individuals according to quality.
In addition to the adjustable selection pressure, an advantage of rank-based selection can be seen in the fact that it also gives worse individuals a chance to reproduce and thus to improve. [10] This can be particularly helpful in applications with restrictions, since it facilitates the overcoming of a restriction in several intermediate steps, i.e. via a sequence of several individuals rated poorly due to restriction violations.
Linear ranking, which goes back to Baker, [11] [12] is often used. [5] [10] [13] It allows the selection pressure to be set by the parameter , which can take values between 1.0 (no selection pressure) and 2.0 (high selection pressure). The probability for rank positions is obtained as follows:
Another definition for the probability for rank positions is: [9]
Exponential rank selection is defined as follows: [9]
In every generation few chromosomes are selected (good - with high fitness) for creating a new offspring. Then some (bad - with low fitness) chromosomes are removed and the new offspring is placed in their place. The rest of population survives to new generation.
Tournament selection is a method of choosing the individual from the set of individuals. The winner of each tournament is selected to perform crossover.
For truncation selection, individuals are sorted according to their fitness and a portion (10% to 50%) of the top individuals is selected for next generation. [9]
Often to get better results, strategies with partial reproduction are used. One of them is elitism, in which a small portion of the best individuals from the last generation is carried over (without any changes) to the next one.
In Boltzmann selection, a continuously varying temperature controls the rate of selection according to a preset schedule. The temperature starts out high, which means that the selection pressure is low. The temperature is gradually lowered, which gradually increases the selection pressure, thereby allowing the GA to narrow in more closely to the best part of the search space while maintaining the appropriate degree of diversity. [14]
In statistics, the kth order statistic of a statistical sample is equal to its kth-smallest value. Together with rank statistics, order statistics are among the most fundamental tools in non-parametric statistics and inference.
Fitness is a quantitative representation of individual reproductive success. It is also equal to the average contribution to the gene pool of the next generation, made by the same individuals of the specified genotype or phenotype. Fitness can be defined either with respect to a genotype or to a phenotype in a given environment or time. The fitness of a genotype is manifested through its phenotype, which is also affected by the developmental environment. The fitness of a given phenotype can also be different in different selective environments.
Evolutionary algorithms (EA) reproduce essential elements of the biological evolution in a computer algorithm in order to solve “difficult” problems, at least approximately, for which no exact or satisfactory solution methods are known. They belong to the class of metaheuristics and are a subset of evolutionary computation, which itself is part of the field of computational intelligence. The mechanisms of biological evolution that an EA mainly imitates are reproduction, mutation, recombination and selection. Candidate solutions to the optimization problem play the role of individuals in a population, and the fitness function determines the quality of the solutions (see also loss function). Evolution of the population then takes place after the repeated application of the above operators.
Fitness proportionate selection, also known as roulette wheel selection or spinning wheel selection, is a selection technique used in evolutionary algorithms for selecting potentially useful solutions for recombination.
In statistics, Gibbs sampling or a Gibbs sampler is a Markov chain Monte Carlo (MCMC) algorithm for sampling from a specified multivariate probability distribution when direct sampling from the joint distribution is difficult, but sampling from the conditional distribution is more practical. This sequence can be used to approximate the joint distribution ; to approximate the marginal distribution of one of the variables, or some subset of the variables ; or to compute an integral. Typically, some of the variables correspond to observations whose values are known, and hence do not need to be sampled.
Crossover in evolutionary algorithms and evolutionary computation, also called recombination, is a genetic operator used to combine the genetic information of two parents to generate new offspring. It is one way to stochastically generate new solutions from an existing population, and is analogous to the crossover that happens during sexual reproduction in biology. New solutions can also be generated by cloning an existing solution, which is analogous to asexual reproduction. Newly generated solutions may be mutated before being added to the population. The aim of recombination is to transfer good characteristics from two different parents to one child.
Mutation is a genetic operator used to maintain genetic diversity of the chromosomes of a population of an evolutionary algorithm (EA), including genetic algorithms in particular. It is analogous to biological mutation.
Evolution strategy (ES) from computer science is a subclass of evolutionary algorithms, which serves as an optimization technique. It uses the major genetic operators mutation, recombination and selection of parents.
In the theory of evolution and natural selection, the Price equation describes how a trait or allele changes in frequency over time. The equation uses a covariance between a trait and fitness, to give a mathematical description of evolution and natural selection. It provides a way to understand the effects that gene transmission and natural selection have on the frequency of alleles within each new generation of a population. The Price equation was derived by George R. Price, working in London to re-derive W.D. Hamilton's work on kin selection. Examples of the Price equation have been constructed for various evolutionary cases. The Price equation also has applications in economics.
Mating pool is a concept used in evolutionary algorithms and means a population of parents for the next population.
The Wilcoxon signed-rank test is a non-parametric rank test for statistical hypothesis testing used either to test the location of a population based on a sample of data, or to compare the locations of two populations using two matched samples. The one-sample version serves a purpose similar to that of the one-sample Student's t-test. For two matched samples, it is a paired difference test like the paired Student's t-test. The Wilcoxon test is a good alternative to the t-test when the normal distribution of the differences between paired individuals cannot be assumed. Instead, it assumes a weaker hypothesis that the distribution of this difference is symmetric around a central value and it aims to test whether this center value differs significantly from zero. The Wilcoxon test is a more powerful alternative to the sign test because it considers the magnitude of the differences, but it requires this moderately strong assumption of symmetry.
Estimation of distribution algorithms (EDAs), sometimes called probabilistic model-building genetic algorithms (PMBGAs), are stochastic optimization methods that guide the search for the optimum by building and sampling explicit probabilistic models of promising candidate solutions. Optimization is viewed as a series of incremental updates of a probabilistic model, starting with the model encoding an uninformative prior over admissible solutions and ending with the model that generates only the global optima.
In computer science and operations research, a memetic algorithm (MA) is an extension of an evolutionary algorithm (EA) that aims to accelerate the evolutionary search for the optimum. An EA is a metaheuristic that reproduces the basic principles of biological evolution as a computer algorithm in order to solve challenging optimization or planning tasks, at least approximately. An MA uses one or more suitable heuristics or local search techniques to improve the quality of solutions generated by the EA and to speed up the search. The effects on the reliability of finding the global optimum depend on both the use case and the design of the MA.
Holland's schema theorem, also called the fundamental theorem of genetic algorithms, is an inequality that results from coarse-graining an equation for evolutionary dynamics. The Schema Theorem says that short, low-order schemata with above-average fitness increase exponentially in frequency in successive generations. The theorem was proposed by John Holland in the 1970s. It was initially widely taken to be the foundation for explanations of the power of genetic algorithms. However, this interpretation of its implications has been criticized in several publications reviewed in, where the Schema Theorem is shown to be a special case of the Price equation with the schema indicator function as the macroscopic measurement.
A stochastic simulation is a simulation of a system that has variables that can change stochastically (randomly) with individual probabilities.
Covariance matrix adaptation evolution strategy (CMA-ES) is a particular kind of strategy for numerical optimization. Evolution strategies (ES) are stochastic, derivative-free methods for numerical optimization of non-linear or non-convex continuous optimization problems. They belong to the class of evolutionary algorithms and evolutionary computation. An evolutionary algorithm is broadly based on the principle of biological evolution, namely the repeated interplay of variation and selection: in each generation (iteration) new individuals are generated by variation of the current parental individuals, usually in a stochastic way. Then, some individuals are selected to become the parents in the next generation based on their fitness or objective function value . Like this, individuals with better and better -values are generated over the generation sequence.
A Moran process or Moran model is a simple stochastic process used in biology to describe finite populations. The process is named after Patrick Moran, who first proposed the model in 1958. It can be used to model variety-increasing processes such as mutation as well as variety-reducing effects such as genetic drift and natural selection. The process can describe the probabilistic dynamics in a finite population of constant size N in which two alleles A and B are competing for dominance. The two alleles are considered to be true replicators.
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known constant mean rate and independently of the time since the last event. It can also be used for the number of events in other types of intervals than time, and in dimension greater than 1.
Reward-based selection is a technique used in evolutionary algorithms for selecting potentially useful solutions for recombination. The probability of being selected for an individual is proportional to the cumulative reward obtained by the individual. The cumulative reward can be computed as a sum of the individual reward and the reward inherited from parents.
Biogeography-based optimization (BBO) is an evolutionary algorithm (EA) that optimizes a function by stochastically and iteratively improving candidate solutions with regard to a given measure of quality, or fitness function. BBO belongs to the class of metaheuristics since it includes many variations, and since it does not make any assumptions about the problem and can therefore be applied to a wide class of problems.