Reward-based selection

Last updated September 29, 2023

Reward-based selection is a technique used in evolutionary algorithms for selecting potentially useful solutions for recombination. The probability of being selected for an individual is proportional to the cumulative reward obtained by the individual. The cumulative reward can be computed as a sum of the individual reward and the reward inherited from parents.

Description

Reward-based selection can be used within Multi-armed bandit framework for Multi-objective optimization to obtain a better approximation of the Pareto front. ^[1]

The newborn $a'^{(g+1)}$ and its parents receive a reward $r^{(g)}$ , if $a'^{(g+1)}$ was selected for new population $Q^{(g+1)}$ , otherwise the reward is zero. Several reward definitions are possible:

1. $r^{(g)}=1$ , if the newborn individual $a'^{(g+1)}$ was selected for new population $Q^{(g+1)}$ .
2. $r^{(g)}=1-{\frac {rank(a'^{(g+1)})}{\mu }}{\mbox{ if }}a'^{(g+1)}\in Q^{(g+1)}$ , where $rank(a'^{(g+1)})$ is the rank of newly inserted individual in the population of $\mu$ individuals. Rank can be computed using a well-known non-dominated sorting procedure.^[2]
3. $r^{(g)}=\sum _{a\in Q^{(g+1)}}\Delta {H}(a,Q^{(g+1)})-\sum _{a\in Q^{(g)}}\Delta {H}(a,Q^{(g)})$ , where $\Delta {H}(a,Q^{(g)})$ is the hypervolume indicator contribution of the individual $a$ to the population $Q^{(g)}$ . The reward $r^{(g)}>0$ if the newly inserted individual improves the quality of the population, which is measured as its hypervolume contribution in the objective space.
4. A relaxation of the above reward, involving a rank-based penalization for points for $k$ -th dominated Pareto front: $r^{(g)}={\frac {1}{2^{k-1}}}\left(\sum _{ndom_{k}(Q^{(g+1)})}\Delta {H}(a,ndom_{k}(Q^{(g+1)}))-\sum _{ndom_{k}(Q^{(g)})}\Delta {H}(a,ndom_{k}(Q^{(g)}))\right)$

Reward-based selection can quickly identify the most fruitful directions of search by maximizing the cumulative reward of individuals.

Related Research Articles

Algorithms for calculating variance play a major role in computational statistics. A key difficulty in the design of good algorithms for this problem is that formulas for the variance may involve sums of squares, which can lead to numerical instability as well as to arithmetic overflow when dealing with large values.

In mathematics, a recurrence relation is an equation according to which the $th term of a sequence of numbers is equal to some combination of the previous terms. Often, only previous terms of the sequence appear in the equation, for a parameter that is independent of; this number is called the order of the relation. If the values of the first numbers in the sequence have been given, the rest of the sequence can be calculated by repeatedly applying the equation.$

In bioinformatics, neighbor joining is a bottom-up (agglomerative) clustering method for the creation of phylogenetic trees, created by Naruya Saitou and Masatoshi Nei in 1987. Usually based on DNA or protein sequence data, the algorithm requires knowledge of the distance between each pair of taxa to create the phylogenetic tree.

A fitness function is a particular type of objective function that is used to summarise, as a single figure of merit, how close a given design solution is to achieving the set aims. Fitness functions are used in evolutionary algorithms (EA), such as genetic programming and genetic algorithms to guide simulations towards optimal design solutions.

In probability theory, a Chernoff bound is an exponentially decreasing upper bound on the tail of a random variable based on its moment generating function. The minimum of all such exponential bounds forms the Chernoff or Chernoff-Cramér bound, which may decay faster than exponential. It is especially useful for sums of independent random variables, such as sums of Bernoulli random variables.

Multi-task learning (MTL) is a subfield of machine learning in which multiple learning tasks are solved at the same time, while exploiting commonalities and differences across tasks. This can result in improved learning efficiency and prediction accuracy for the task-specific models, when compared to training the models separately. Early versions of MTL were called "hints".

In control theory, the discrete Lyapunov equation is of the form

In mathematics, differential algebra is, broadly speaking, the area of mathematics consisting in the study of differential equations and differential operators as algebraic objects in view of deriving properties of differential equations and operators without computing the solutions, similarly as polynomial algebras are used for the study of algebraic varieties, which are solution sets of systems of polynomial equations. Weyl algebras and Lie algebras may be considered as belonging to differential algebra.

In multi-objective optimization, the Pareto front is the set of all Pareto efficient solutions. The concept is widely used in engineering. It allows the designer to restrict attention to the set of efficient choices, and to make tradeoffs within this set, rather than considering the full range of every parameter.

Covariance matrix adaptation evolution strategy (CMA-ES) is a particular kind of strategy for numerical optimization. Evolution strategies (ES) are stochastic, derivative-free methods for numerical optimization of non-linear or non-convex continuous optimization problems. They belong to the class of evolutionary algorithms and evolutionary computation. An evolutionary algorithm is broadly based on the principle of biological evolution, namely the repeated interplay of variation and selection: in each generation (iteration) new individuals are generated by variation, usually in a stochastic way, of the current parental individuals. Then, some individuals are selected to become the parents in the next generation based on their fitness or objective function value $. Like this, over the generation sequence, individuals with better and better -values are generated.$

Stochastic approximation methods are a family of iterative methods typically used for root-finding problems or for optimization problems. The recursive update rules of stochastic approximation methods can be used, among other things, for solving linear systems when the collected data is corrupted by noise, or for approximating extreme values of functions which cannot be computed directly, but only estimated via noisy observations.

Multi-objective optimization or Pareto optimization is an area of multiple-criteria decision making that is concerned with mathematical optimization problems involving more than one objective function to be optimized simultaneously. Multi-objective is a type of vector optimization that has been applied in many fields of science, including engineering, economics and logistics where optimal decisions need to be taken in the presence of trade-offs between two or more conflicting objectives. Minimizing cost while maximizing comfort while buying a car, and maximizing performance whilst minimizing fuel consumption and emission of pollutants of a vehicle are examples of multi-objective optimization problems involving two and three objectives, respectively. In practical problems, there can be more than three objectives.

Equilibrium constants are determined in order to quantify chemical equilibria. When an equilibrium constant $K$ is expressed as a concentration quotient,

Non-linear least squares is the form of least squares analysis used to fit a set of m observations with a model that is non-linear in n unknown parameters (m ≥ n). It is used in some forms of nonlinear regression. The basis of the method is to approximate the model by a linear one and to refine the parameters by successive iterations. There are many similarities to linear least squares, but also some significant differences. In economic theory, the non-linear least squares method is applied in (i) the probit regression, (ii) threshold regression, (iii) smooth regression, (iv) logistic link regression, (v) Box–Cox transformed regressors ( $).$

Clustering is the problem of partitioning data points into groups based on their similarity. Correlation clustering provides a method for clustering a set of objects into the optimum number of clusters without specifying that number in advance.

A Moran process or Moran model is a simple stochastic process used in biology to describe finite populations. The process is named after Patrick Moran, who first proposed the model in 1958. It can be used to model variety-increasing processes such as mutation as well as variety-reducing effects such as genetic drift and natural selection. The process can describe the probabilistic dynamics in a finite population of constant size N in which two alleles A and B are competing for dominance. The two alleles are considered to be true replicators.

Inverse modeling is a mathematical technique where the objective is to determine the physical properties of the subsurface of an earth region that has produced a given seismogram. Cooke and Schneider (1983) defined it as calculation of the earth's structure and physical parameters from some set of observed seismic data. The underlying assumption in this method is that the collected seismic data are from an earth structure that matches the cross-section computed from the inversion algorithm. Some common earth properties that are inverted for include acoustic velocity, formation and fluid densities, acoustic impedance, Poisson's ratio, formation compressibility, shear rigidity, porosity, and fluid saturation.

In computer science, optimal computing budget allocation (OCBA) is an approach to maximize the overall simulation efficiency for finding an optimal decision. It was introduced in the mid-1990s by Dr. Chun-Hung Chen.

Multiple kernel learning refers to a set of machine learning methods that use a predefined set of kernels and learn an optimal linear or non-linear combination of kernels as part of the algorithm. Reasons to use multiple kernel learning include a) the ability to select for an optimal kernel and parameters from a larger set of kernels, reducing bias due to kernel selection while allowing for more automated machine learning methods, and b) combining data from different sources that have different notions of similarity and thus require different kernels. Instead of creating a new kernel, multiple kernel algorithms can be used to combine kernels already established for each individual data source.

Fish School Search (FSS), proposed by Bastos Filho and Lima Neto in 2008 is, in its basic version, an unimodal optimization algorithm inspired on the collective behavior of fish schools. The mechanisms of feeding and coordinated movement were used as inspiration to create the search operators. The core idea is to make the fishes “swim” toward the positive gradient in order to “eat” and “gain weight”. Collectively, the heavier fishes have more influence on the search process as a whole, what makes the barycenter of the fish school moves toward better places in the search space over the iterations.

References

↑ Loshchilov, I.; M. Schoenauer; M. Sebag (2011). "Not all parents are equal for MO-CMA-ES" (PDF). Evolutionary Multi-Criterion Optimization 2011 (EMO 2011). Springer Verlag, LNCS 6576. pp. 31–45. Archived from the original (PDF) on 2012-06-04.
↑ Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T. (2002). "A fast and elitist multi-objective genetic algorithm: NSGA-II". IEEE Transactions on Evolutionary Computation. 6 (2): 182–197. CiteSeerX 10.1.1.17.7771 . doi:10.1109/4235.996017.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Loshchilov, I.; M. Schoenauer; M. Sebag (2011). "Not all parents are equal for MO-CMA-ES" (PDF). Evolutionary Multi-Criterion Optimization 2011 (EMO 2011). Springer Verlag, LNCS 6576. pp. 31–45. Archived from the original (PDF) on 2012-06-04.

[2] Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T. (2002). "A fast and elitist multi-objective genetic algorithm: NSGA-II". IEEE Transactions on Evolutionary Computation. 6 (2): 182–197. CiteSeerX 10.1.1.17.7771 . doi:10.1109/4235.996017.

[1]

[2]

Reward-based selection

Contents

Description

See also

Related Research Articles

References