Evolution strategy

Last updated

Evolution strategy (ES) from computer science is a subclass of evolutionary algorithms, which serves as an optimization technique. [1] It uses the major genetic operators mutation, recombination and selection of parents. [2]

Contents

History

The 'evolution strategy' optimization technique was created in the early 1960s and developed further in the 1970s and later by Ingo Rechenberg, Hans-Paul Schwefel and their co-workers. [1]

Timeline of ES - selected algorithms [1]
YearDescriptionReference
1973ES introduced with mutation and selection [3]
1994Derandomized self-adaptation ES - Derandomized scheme of mutative step size control is used [4]
1994CSA-ES - usage information from the old generations [5]
2001 CMA-ES [6]
2006Weighted multi-recombination ES - usage of weighted recombination [7]
2007Meta-ES - incremental aggregation of partial semantic structures [8]
2008 Natural ES - usage of natural gradient [9]
2010Exponential natural ES - a simpler version of natural ES [10]
2014Limited memory CMA-ES - time–memory complexity reduction by covariance matrix decomposition [11]
2016Fitness inheritance CMA-ES - fitness evaluation computational cost reduction using fitness inheritance [12]
2017RS-CMSA ES - usage of subpopulations [13]
2017MA-ES - COV update and COV matrix square root are not used [14]
2018Weighted ES - weighted recombination of general convex quadratic functions [15]

Methods

Evolution strategies use natural problem-dependent representations, so problem space and search space are identical. In common with evolutionary algorithms, the operators are applied in a loop. An iteration of the loop is called a generation. The sequence of generations is continued until a termination criterion is met.

The special feature of the ES is the self-adaptation of mutation step sizes and the coevolution associated with it. The ES is briefly presented using the standard form, [16] [17] [18] pointing out that there are many variants. [19] [20] [21] [22] The real-valued chromosome contains, in addition to the decision variables, mutation step sizes , where: . Often one mutation step size is used for all decision variables or each has its own step size. Mate selection to produce offspring is random, i.e. independent of fitness. First, new mutation step sizes are generated per mating by intermediate recombination of the parental with subsequent mutation as follows:

where is a normally distributed random variable with mean and standard deviation . applies to all , while is newly determined for each . Next, discrete recombination of the decision variables is followed by a mutation using the new mutation step sizes as standard deviations of the normal distribution. The new decision variables are calculated as follows:

This results in an evolutionary search on two levels: First, at the problem level itself and second, at the mutation step size level. In this way, it can be ensured that the ES searches for its target in ever finer steps. However, there is also the danger of being able to skip larger invalid areas in the search space only with difficulty.

Variants

The ES knows two variants of best selection for the generation of the next parent population ( - number of parents, - number of offspring): [2]

Bäck and Schwefel recommend that the value of should be seven times the population size , [17] whereby must not be chosen too small because of the strong selection pressure. Suitable values for are application-dependent and must be determined experimentally.

Individual step sizes for each coordinate, or correlations between coordinates, which are essentially defined by an underlying covariance matrix, are controlled in practice either by self-adaptation or by covariance matrix adaptation (CMA-ES). [21] When the mutation step is drawn from a multivariate normal distribution using an evolving covariance matrix, it has been hypothesized that this adapted matrix approximates the inverse Hessian of the search landscape. This hypothesis has been proven for a static model relying on a quadratic approximation. [23]

The selection of the next generation in evolution strategies is deterministic and only based on the fitness rankings, not on the actual fitness values. The resulting algorithm is therefore invariant with respect to monotonic transformations of the objective function. The simplest and oldest [1] evolution strategy operates on a population of size two: the current point (parent) and the result of its mutation. Only if the mutant's fitness is at least as good as the parent one, it becomes the parent of the next generation. Otherwise the mutant is disregarded. This is a . More generally, mutants can be generated and compete with the parent, called . In the best mutant becomes the parent of the next generation while the current parent is always disregarded. For some of these variants, proofs of linear convergence (in a stochastic sense) have been derived on unimodal objective functions. [24] [25]

See also

Related Research Articles

<span class="mw-page-title-main">Multivariate normal distribution</span> Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables, each of which clusters around a mean value.

In linear algebra and functional analysis, a spectral theorem is a result about when a linear operator or matrix can be diagonalized. This is extremely useful because computations involving a diagonalizable matrix can often be reduced to much simpler computations involving the corresponding diagonal matrix. The concept of diagonalization is relatively straightforward for operators on finite-dimensional vector spaces but requires some modification for operators on infinite-dimensional spaces. In general, the spectral theorem identifies a class of linear operators that can be modeled by multiplication operators, which are as simple as one can hope to find. In more abstract language, the spectral theorem is a statement about commutative C*-algebras. See also spectral theory for a historical perspective.

<span class="mw-page-title-main">Covariance matrix</span> Measure of covariance of components of a random vector

In probability theory and statistics, a covariance matrix is a square matrix giving the covariance between each pair of elements of a given random vector.

In probability theory and statistics, a Gaussian process is a stochastic process, such that every finite collection of those random variables has a multivariate normal distribution. The distribution of a Gaussian process is the joint distribution of all those random variables, and as such, it is a distribution over functions with a continuous domain, e.g. time or space.

<span class="mw-page-title-main">Evolutionary programming</span> Evolutionary algorithm with a defined structure

Evolutionary programming is an evolutionary algorithm, where a share of new population is created by mutation of previous population without crossover. Evolutionary programming differs from evolution strategy ES() in one detail. All individuals are selected for the new population, while in ES(), every individual has the same probability to be selected. It is one of the four major evolutionary algorithm paradigms.

<span class="mw-page-title-main">Mutation (evolutionary algorithm)</span> Genetic operation used to add population diversity

Mutation is a genetic operator used to maintain genetic diversity of the chromosomes of a population of an evolutionary algorithm (EA), including genetic algorithms in particular. It is analogous to biological mutation.

In statistics, the Bhattacharyya distance is a quantity which represents a notion of similarity between two probability distributions. It is closely related to the Bhattacharyya coefficient, which is a measure of the amount of overlap between two statistical samples or populations.

In statistics, sometimes the covariance matrix of a multivariate random variable is not known but has to be estimated. Estimation of covariance matrices then deals with the question of how to approximate the actual covariance matrix on the basis of a sample from the multivariate distribution. Simple cases, where observations are complete, can be dealt with by using the sample covariance matrix. The sample covariance matrix (SCM) is an unbiased and efficient estimator of the covariance matrix if the space of covariance matrices is viewed as an extrinsic convex cone in Rp×p; however, measured using the intrinsic geometry of positive-definite matrices, the SCM is a biased and inefficient estimator. In addition, if the random variable has a normal distribution, the sample covariance matrix has a Wishart distribution and a slightly differently scaled version of it is the maximum likelihood estimate. Cases involving missing data, heteroscedasticity, or autocorrelated residuals require deeper considerations. Another issue is the robustness to outliers, to which sample covariance matrices are highly sensitive.

In probability theory and statistics, the noncentral chi distribution is a noncentral generalization of the chi distribution. It is also known as the generalized Rayleigh distribution.

In multivariate statistics, if is a vector of random variables, and is an -dimensional symmetric matrix, then the scalar quantity is known as a quadratic form in .

The sensitivity index or discriminability index or detectability index is a dimensionless statistic used in signal detection theory. A higher index indicates that the signal can be more readily detected.

Probabilistic Computation Tree Logic (PCTL) is an extension of computation tree logic (CTL) that allows for probabilistic quantification of described properties. It has been defined in the paper by Hansson and Jonsson.

Bayesian linear regression is a type of conditional modeling in which the mean of one variable is described by a linear combination of other variables, with the goal of obtaining the posterior probability of the regression coefficients and ultimately allowing the out-of-sample prediction of the regressandconditional on observed values of the regressors. The simplest and most widely used version of this model is the normal linear model, in which given is distributed Gaussian. In this model, and under a particular choice of prior probabilities for the parameters—so-called conjugate priors—the posterior can be found analytically. With more arbitrarily chosen priors, the posteriors generally have to be approximated.

Covariance matrix adaptation evolution strategy (CMA-ES) is a particular kind of strategy for numerical optimization. Evolution strategies (ES) are stochastic, derivative-free methods for numerical optimization of non-linear or non-convex continuous optimization problems. They belong to the class of evolutionary algorithms and evolutionary computation. An evolutionary algorithm is broadly based on the principle of biological evolution, namely the repeated interplay of variation and selection: in each generation (iteration) new individuals are generated by variation of the current parental individuals, usually in a stochastic way. Then, some individuals are selected to become the parents in the next generation based on their fitness or objective function value . Like this, individuals with better and better -values are generated over the generation sequence.

In the fields of computer vision and image analysis, the Harris affine region detector belongs to the category of feature detection. Feature detection is a preprocessing step of several algorithms that rely on identifying characteristic points or interest points so to make correspondences between images, recognize textures, categorize objects or build panoramas.

<span class="mw-page-title-main">Landing footprint</span>

A landing footprint, also called a landing ellipse, is the area of uncertainty of a spacecraft's landing zone on an astronomical body. After atmospheric entry, the landing point of a spacecraft will depend upon the degree of control, entry angle, entry mass, atmospheric conditions, and drag. By aggregating such numerous variables it is possible to model a spacecraft's landing zone to a certain degree of precision. By simulating entry under varying conditions an probable ellipse can be calculated; the size of the ellipse represents the degree of uncertainty for a given confidence interval.

<span class="mw-page-title-main">Marchenko–Pastur distribution</span> Distribution of singular values of large rectangular random matrices

In the mathematical theory of random matrices, the Marchenko–Pastur distribution, or Marchenko–Pastur law, describes the asymptotic behavior of singular values of large rectangular random matrices. The theorem is named after soviet mathematicians Volodymyr Marchenko and Leonid Pastur who proved this result in 1967.

In probability theory and statistics, the normal-inverse-Wishart distribution is a multivariate four-parameter family of continuous probability distributions. It is the conjugate prior of a multivariate normal distribution with unknown mean and covariance matrix.

In statistics, modes of variation are a continuously indexed set of vectors or functions that are centered at a mean and are used to depict the variation in a population or sample. Typically, variation patterns in the data can be decomposed in descending order of eigenvalues with the directions represented by the corresponding eigenvectors or eigenfunctions. Modes of variation provide a visualization of this decomposition and an efficient description of variation around the mean. Both in principal component analysis (PCA) and in functional principal component analysis (FPCA), modes of variation play an important role in visualizing and describing the variation in the data contributed by each eigencomponent. In real-world applications, the eigencomponents and associated modes of variation aid to interpret complex data, especially in exploratory data analysis (EDA).

In statistics and machine learning, Gaussian process approximation is a computational method that accelerates inference tasks in the context of a Gaussian process model, most commonly likelihood evaluation and prediction. Like approximations of other models, they can often be expressed as additional assumptions imposed on the model, which do not correspond to any actual feature, but which retain its key properties while simplifying calculations. Many of these approximation methods can be expressed in purely linear algebraic or functional analytic terms as matrix or function approximations. Others are purely algorithmic and cannot easily be rephrased as a modification of a statistical model.

References

  1. 1 2 3 4 Slowik, Adam; Kwasnicka, Halina (1 August 2020). "Evolutionary algorithms and their applications to engineering problems". Neural Computing and Applications. 32 (16): 12363–12379. doi:10.1007/s00521-020-04832-8. ISSN   1433-3058.
  2. 1 2 Alrashdi, Zaid; Sayyafzadeh, Mohammad (1 June 2019). "(μ+λ) Evolution strategy algorithm in well placement, trajectory, control and joint optimisation". Journal of Petroleum Science and Engineering. 177: 1042–1058. doi:10.1016/j.petrol.2019.02.047. ISSN   0920-4105.
  3. Vent, W. (January 1975). "Rechenberg, Ingo, Evolutionsstrategie — Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. 170 S. mit 36 Abb. Frommann-Holzboog-Verlag. Stuttgart 1973. Broschiert". Feddes Repertorium. 86 (5): 337. doi:10.1002/fedr.19750860506.
  4. Ostermeier, Andreas; Gawelczyk, Andreas; Hansen, Nikolaus (December 1994). "A Derandomized Approach to Self-Adaptation of Evolution Strategies". Evolutionary Computation. 2 (4): 369–380. doi:10.1162/evco.1994.2.4.369.
  5. Ostermeier, Andreas; Gawelczyk, Andreas; Hansen, Nikolaus (1994). "Step-size adaptation based on non-local use of selection information". Parallel Problem Solving from Nature — PPSN III. Lecture Notes in Computer Science. Vol. 866. Springer. pp. 189–198. doi:10.1007/3-540-58484-6_263. ISBN   978-3-540-58484-1.
  6. Hansen, Nikolaus; Ostermeier, Andreas (June 2001). "Completely Derandomized Self-Adaptation in Evolution Strategies". Evolutionary Computation. 9 (2): 159–195. doi:10.1162/106365601750190398. PMID   11382355.
  7. Arnold, Dirk V. (28 August 2006). "Weighted multirecombination evolution strategies". Theoretical Computer Science. 361 (1): 18–37. doi:10.1016/j.tcs.2006.04.003. ISSN   0304-3975.
  8. Jung, Jason J.; Jo, Geun-Sik; Yeo, Seong-Won (2007). "Meta-evolution Strategy to Focused Crawling on Semantic Web". Artificial Neural Networks – ICANN 2007. Lecture Notes in Computer Science. Vol. 4669. Springer. pp. 399–407. doi:10.1007/978-3-540-74695-9_41. ISBN   978-3-540-74693-5.
  9. Wierstra, Daan; Schaul, Tom; Glasmachers, Tobias; Sun, Yi; Peters, Jan; Schmidhuber, Jürgen (1 January 2014). "Natural evolution strategies". J. Mach. Learn. Res. 15 (1): 949–980. ISSN   1532-4435.
  10. Glasmachers, Tobias; Schaul, Tom; Yi, Sun; Wierstra, Daan; Schmidhuber, Jürgen (7 July 2010). "Exponential natural evolution strategies". Proceedings of the 12th annual conference on Genetic and evolutionary computation (PDF). Association for Computing Machinery. pp. 393–400. doi:10.1145/1830483.1830557. ISBN   978-1-4503-0072-8.
  11. Loshchilov, Ilya (12 July 2014). "A computationally efficient limited memory CMA-ES for large scale optimization". Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation. Association for Computing Machinery. pp. 397–404. arXiv: 1404.5520 . doi:10.1145/2576768.2598294. ISBN   978-1-4503-2662-9.
  12. Liaw, Rung-Tzuo; Ting, Chuan-Kang (July 2016). "Enhancing covariance matrix adaptation evolution strategy through fitness inheritance". 2016 IEEE Congress on Evolutionary Computation (CEC): 1956–1963. doi:10.1109/CEC.2016.7744027. ISBN   978-1-5090-0623-6.
  13. Ahrari, Ali; Deb, Kalyanmoy; Preuss, Mike (September 2017). "Multimodal Optimization by Covariance Matrix Self-Adaptation Evolution Strategy with Repelling Subpopulations". Evolutionary Computation. 25 (3): 439–471. doi:10.1162/evco_a_00182. hdl:1887/76707. PMID   27070282.
  14. Beyer, Hans-Georg; Sendhoff, Bernhard (October 2017). "Simplify Your Covariance Matrix Adaptation Evolution Strategy". IEEE Transactions on Evolutionary Computation. 21 (5): 746–759. doi:10.1109/TEVC.2017.2680320.
  15. Akimoto, Youhei; Auger, Anne; Hansen, Nikolaus (6 September 2020). "Quality gain analysis of the weighted recombination evolution strategy on general convex quadratic functions". Theoretical Computer Science. 832: 42–67. doi:10.1016/j.tcs.2018.05.015. ISSN   0304-3975.
  16. Schwefel, Hans-Paul (1995). Evolution and Optimum Seeking. Sixth-generation computer technology series. New York: Wiley. ISBN   978-0-471-57148-3.
  17. 1 2 Bäck, Thomas; Schwefel, Hans-Paul (1993). "An Overview of Evolutionary Algorithms for Parameter Optimization". Evolutionary Computation. 1 (1): 1–23. doi:10.1162/evco.1993.1.1.1. ISSN   1063-6560.
  18. Schwefel, Hans-Paul; Rudolph, Günter; Bäck, Thomas (1995), Morán, Frederico; Moreno, Alvaro; Merelo, J.J.; Chacón, Pablo (eds.), "Contemporary Evolution Strategies", Proceedings of the Third European Conference on Advances in Artificial Life, Berlin, Heidelberg: Springer, pp. 893–907, doi:10.1007/3-540-59496-5_351, ISBN   978-3-540-59496-3
  19. Bäck, Thomas; Hoffmeister, Frank; Schwefel, Hans-Paul (1991), Belew, Richard K.; Booker, Lashon B. (eds.), "A Survey of Evolution Strategies", Proceedings of the Fourth International Conference on Genetic Algorithms (ICGA), San Mateo, Calif: Morgan Kaufmann, ISBN   978-1-55860-208-3
  20. Coelho, V. N.; Coelho, I. M.; Souza, M. J. F.; Oliveira, T. A.; Cota, L. P.; Haddad, M. N.; Mladenovic, N.; Silva, R. C. P.; Guimarães, F. G. (2016). "Hybrid Self-Adaptive Evolution Strategies Guided by Neighborhood Structures for Combinatorial Optimization Problems". Evolutionary Computation. 24 (4): 637–666. doi:10.1162/EVCO_a_00187. ISSN   1063-6560.
  21. 1 2 Hansen, Nikolaus; Ostermeier, Andreas (2001). "Completely Derandomized Self-Adaptation in Evolution Strategies". Evolutionary Computation. 9 (2): 159–195. doi:10.1162/106365601750190398. ISSN   1063-6560.
  22. Hansen, Nikolaus; Kern, Stefan (2004), Yao, Xin; Burke, Edmund K.; Lozano, José A.; Smith, Jim (eds.), "Evaluating the CMA Evolution Strategy on Multimodal Test Functions", Parallel Problem Solving from Nature - PPSN VIII, vol. 3242, Berlin, Heidelberg: Springer, pp. 282–291, doi:10.1007/978-3-540-30217-9_29, ISBN   978-3-540-23092-2
  23. Shir, Ofer M.; Yehudayoff, Amir (January 2020). "On the covariance-Hessian relation in evolution strategies". Theoretical Computer Science. 801: 157–174. doi:10.1016/j.tcs.2019.09.002.
  24. Auger, Anne (April 2005). "Convergence results for the ( 1 , λ ) -SA-ES using the theory of ϕ -irreducible Markov chains". Theoretical Computer Science. 334 (1–3): 35–69. doi:10.1016/j.tcs.2004.11.017.
  25. Jägersküpper, Jens (August 2006). "How the (1+1) ES using isotropic mutations minimizes positive definite quadratic forms". Theoretical Computer Science. 361 (1): 38–56. doi:10.1016/j.tcs.2006.04.004.

Bibliography

Research centers