Part of a series on the |
Evolutionary algorithm |
---|
Genetic algorithm (GA) |
Genetic programming (GP) |
Differential evolution |
Evolution strategy |
Evolutionary programming |
Related topics |
This article includes a list of references, related reading, or external links, but its sources remain unclear because it lacks inline citations .(March 2015) |
Natural evolution strategies (NES) are a family of numerical optimization algorithms for black box problems. Similar in spirit to evolution strategies, they iteratively update the (continuous) parameters of a search distribution by following the natural gradient towards higher expected fitness.
The general procedure is as follows: the parameterized search distribution is used to produce a batch of search points, and the fitness function is evaluated at each such point. The distribution’s parameters (which include strategy parameters) allow the algorithm to adaptively capture the (local) structure of the fitness function. For example, in the case of a Gaussian distribution, this comprises the mean and the covariance matrix. From the samples, NES estimates a search gradient on the parameters towards higher expected fitness. NES then performs a gradient ascent step along the natural gradient , a second order method which, unlike the plain gradient, renormalizes the update with respect to uncertainty. This step is crucial, since it prevents oscillations, premature convergence, and undesired effects stemming from a given parameterization. The entire process reiterates until a stopping criterion is met.
All members of the NES family operate based on the same principles. They differ in the type of probability distribution and the gradient approximation method used. Different search spaces require different search distributions; for example, in low dimensionality it can be highly beneficial to model the full covariance matrix. In high dimensions, on the other hand, a more scalable alternative is to limit the covariance to the diagonal only. In addition, highly multi-modal search spaces may benefit from more heavy-tailed distributions (such as Cauchy, as opposed to the Gaussian). A last distinction arises between distributions where we can analytically compute the natural gradient, and more general distributions where we need to estimate it from samples.
Let denote the parameters of the search distribution and the fitness function evaluated at . NES then pursues the objective of maximizing the expected fitness under the search distribution
through gradient ascent. The gradient can be rewritten as
that is, the expected value of times the log-derivatives at . In practice, it is possible to use the Monte Carlo approximation based on a finite number of samples
Finally, the parameters of the search distribution can be updated iteratively
Instead of using the plain stochastic gradient for updates, NES follows the natural gradient, which has been shown to possess numerous advantages over the plain (vanilla) gradient, e.g.:
The NES update is therefore
where is the Fisher information matrix. The Fisher matrix can sometimes be computed exactly, otherwise it is estimated from samples, reusing the log-derivatives .
NES utilizes rank-based fitness shaping in order to render the algorithm more robust, and invariant under monotonically increasing transformations of the fitness function. For this purpose, the fitness of the population is transformed into a set of utility values . Let denote the ith best individual. Replacing fitness with utility, the gradient estimate becomes
The choice of utility function is a free parameter of the algorithm.
input: 1 repeat 2 for do// λ is the population size 3 draw sample 4 evaluate fitness 5 calculate log-derivatives 6 end 7 assign the utilities // based on rank 8 estimate the gradient 9 estimate // or compute it exactly 10 update parameters // η is the learning rate 11 until stopping criterion is met
In mathematics and physics, Laplace's equation is a second-order partial differential equation named after Pierre-Simon Laplace, who first studied its properties. This is often written as or where is the Laplace operator, is the divergence operator, is the gradient operator, and is a twice-differentiable real-valued function. The Laplace operator therefore maps a scalar function to another scalar function.
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.
In physics, the screened Poisson equation is a Poisson equation, which arises in the Klein–Gordon equation, electric field screening in plasmas, and nonlocal granular fluidity in granular flow.
In probability theory and statistics, the Weibull distribution is a continuous probability distribution. It models a broad range of random variables, largely in the nature of a time to failure or time between events. Examples are maximum one-day rainfalls and the time a user spends on a web page.
In probability theory and statistics, the gamma distribution is a versatile two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. There are two equivalent parameterizations in common use:
Stellar dynamics is the branch of astrophysics which describes in a statistical way the collective motions of stars subject to their mutual gravity. The essential difference from celestial mechanics is that the number of body
In Bayesian statistics, the Jeffreys prior is a non-informative prior distribution for a parameter space. Named after Sir Harold Jeffreys, its density function is proportional to the square root of the determinant of the Fisher information matrix:
In statistics, a parametric model or parametric family or finite-dimensional model is a particular class of statistical models. Specifically, a parametric model is a family of probability distributions that has a finite number of parameters.
In electromagnetics, directivity is a parameter of an antenna or optical system which measures the degree to which the radiation emitted is concentrated in a single direction. It is the ratio of the radiation intensity in a given direction from the antenna to the radiation intensity averaged over all directions. Therefore, the directivity of a hypothetical isotropic radiator, a source of electromagnetic waves which radiates the same power in all directions, is 1, or 0 dBi.
Cylindrical multipole moments are the coefficients in a series expansion of a potential that varies logarithmically with the distance to a source, i.e., as . Such potentials arise in the electric potential of long line charges, and the analogous sources for the magnetic potential and gravitational potential.
In probability theory, the inverse Gaussian distribution is a two-parameter family of continuous probability distributions with support on (0,∞).
Covariance matrix adaptation evolution strategy (CMA-ES) is a particular kind of strategy for numerical optimization. Evolution strategies (ES) are stochastic, derivative-free methods for numerical optimization of non-linear or non-convex continuous optimization problems. They belong to the class of evolutionary algorithms and evolutionary computation. An evolutionary algorithm is broadly based on the principle of biological evolution, namely the repeated interplay of variation and selection: in each generation (iteration) new individuals are generated by variation of the current parental individuals, usually in a stochastic way. Then, some individuals are selected to become the parents in the next generation based on their fitness or objective function value . Like this, individuals with better and better -values are generated over the generation sequence.
In probability and statistics, a natural exponential family (NEF) is a class of probability distributions that is a special case of an exponential family (EF).
A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.
In fluid dynamics, the Oseen equations describe the flow of a viscous and incompressible fluid at small Reynolds numbers, as formulated by Carl Wilhelm Oseen in 1910. Oseen flow is an improved description of these flows, as compared to Stokes flow, with the (partial) inclusion of convective acceleration.
In quantum computing, the quantum phase estimation algorithm is a quantum algorithm to estimate the phase corresponding to an eigenvalue of a given unitary operator. Because the eigenvalues of a unitary operator always have unit modulus, they are characterized by their phase, and therefore the algorithm can be equivalently described as retrieving either the phase or the eigenvalue itself. The algorithm was initially introduced by Alexei Kitaev in 1995.
In probability theory and directional statistics, a wrapped exponential distribution is a wrapped probability distribution that results from the "wrapping" of the exponential distribution around the unit circle.
In optics, the Fraunhofer diffraction equation is used to model the diffraction of waves when the diffraction pattern is viewed at a long distance from the diffracting object, and also when it is viewed at the focal plane of an imaging lens.
In probability theory and directional statistics, a wrapped asymmetric Laplace distribution is a wrapped probability distribution that results from the "wrapping" of the asymmetric Laplace distribution around the unit circle. For the symmetric case (asymmetry parameter κ = 1), the distribution becomes a wrapped Laplace distribution. The distribution of the ratio of two circular variates (Z) from two different wrapped exponential distributions will have a wrapped asymmetric Laplace distribution. These distributions find application in stochastic modelling of financial data.
In plasma physics and magnetic confinement fusion, neoclassical transport or neoclassical diffusion is a theoretical description of collisional transport in toroidal plasmas, usually found in tokamaks or stellarators. It is a modification of classical diffusion adding in effects of non-uniform magnetic fields due to the toroidal geometry, which give rise to new diffusion effects.