Estimation of distribution algorithm

Last updated
Estimation of distribution algorithm. For each iteration i, a random draw is performed for a population P in a distribution PDu. The distribution parameters PDe are then estimated using the selected points PS. The illustrated example optimizes a continuous objective function f(X) with a unique optimum O. The sampling (following a normal distribution N) concentrates around the optimum as one goes along unwinding algorithm. Eda mono-variant gauss iterations.svg
Estimation of distribution algorithm. For each iteration i, a random draw is performed for a population P in a distribution PDu. The distribution parameters PDe are then estimated using the selected points PS. The illustrated example optimizes a continuous objective function f(X) with a unique optimum O. The sampling (following a normal distribution N) concentrates around the optimum as one goes along unwinding algorithm.

Estimation of distribution algorithms (EDAs), sometimes called probabilistic model-building genetic algorithms (PMBGAs), [1] are stochastic optimization methods that guide the search for the optimum by building and sampling explicit probabilistic models of promising candidate solutions. Optimization is viewed as a series of incremental updates of a probabilistic model, starting with the model encoding an uninformative prior over admissible solutions and ending with the model that generates only the global optima. [2] [3] [4]

Contents

EDAs belong to the class of evolutionary algorithms. The main difference between EDAs and most conventional evolutionary algorithms is that evolutionary algorithms generate new candidate solutions using an implicit distribution defined by one or more variation operators, whereas EDAs use an explicit probability distribution encoded by a Bayesian network, a multivariate normal distribution, or another model class. Similarly as other evolutionary algorithms, EDAs can be used to solve optimization problems defined over a number of representations from vectors to LISP style S expressions, and the quality of candidate solutions is often evaluated using one or more objective functions.

The general procedure of an EDA is outlined in the following:

t := 0 initialize model M(0) to represent uniform distribution over admissible solutions while (termination criteria not met) doP := generate N>0 candidate solutions by sampling M(t)     F := evaluate all candidate solutions in P     M(t + 1) := adjust_model(P, F, M(t))     t := t + 1

Using explicit probabilistic models in optimization allowed EDAs to feasibly solve optimization problems that were notoriously difficult for most conventional evolutionary algorithms and traditional optimization techniques, such as problems with high levels of epistasis [ citation needed ]. Nonetheless, the advantage of EDAs is also that these algorithms provide an optimization practitioner with a series of probabilistic models that reveal a lot of information about the problem being solved. This information can in turn be used to design problem-specific neighborhood operators for local search, to bias future runs of EDAs on a similar problem, or to create an efficient computational model of the problem.

For example, if the population is represented by bit strings of length 4, the EDA can represent the population of promising solution using a single vector of four probabilities (p1, p2, p3, p4) where each component of p defines the probability of that position being a 1. Using this probability vector it is possible to create an arbitrary number of candidate solutions.

Estimation of distribution algorithms (EDAs)

This section describes the models built by some well known EDAs of different levels of complexity. It is always assumed a population at the generation , a selection operator , a model-building operator and a sampling operator .

Univariate factorizations

The most simple EDAs assume that decision variables are independent, i.e. . Therefore, univariate EDAs rely only on univariate statistics and multivariate distributions must be factorized as the product of univariate probability distributions,

Such factorizations are used in many different EDAs, next we describe some of them.

Univariate marginal distribution algorithm (UMDA)

The UMDA [5] is a simple EDA that uses an operator to estimate marginal probabilities from a selected population . By assuming contain elements, produces probabilities:

Every UMDA step can be described as follows

Population-based incremental learning (PBIL)

The PBIL, [6] represents the population implicitly by its model, from which it samples new solutions and updates the model. At each generation, individuals are sampled and are selected. Such individuals are then used to update the model as follows

where is a parameter defining the learning rate, a small value determines that the previous model should be only slightly modified by the new solutions sampled. PBIL can be described as

Compact genetic algorithm (cGA)

The CGA, [7] also relies on the implicit populations defined by univariate distributions. At each generation , two individuals are sampled, . The population is then sorted in decreasing order of fitness, , with being the best and being the worst solution. The CGA estimates univariate probabilities as follows

where, is a constant defining the learning rate, usually set to . The CGA can be defined as

Bivariate factorizations

Although univariate models can be computed efficiently, in many cases they are not representative enough to provide better performance than GAs. In order to overcome such a drawback, the use of bivariate factorizations was proposed in the EDA community, in which dependencies between pairs of variables could be modeled. A bivariate factorization can be defined as follows, where contains a possible variable dependent to , i.e. .

Bivariate and multivariate distributions are usually represented as probabilistic graphical models (graphs), in which edges denote statistical dependencies (or conditional probabilities) and vertices denote variables. To learn the structure of a PGM from data linkage-learning is employed.

Mutual information maximizing input clustering (MIMIC)

The MIMIC [8] factorizes the joint probability distribution in a chain-like model representing successive dependencies between variables. It finds a permutation of the decision variables, , such that minimizes the Kullback-Leibler divergence in relation to the true probability distribution, i.e. . MIMIC models a distribution

New solutions are sampled from the leftmost to the rightmost variable, the first is generated independently and the others according to conditional probabilities. Since the estimated distribution must be recomputed each generation, MIMIC uses concrete populations in the following way

Bivariate marginal distribution algorithm (BMDA)

The BMDA [9] factorizes the joint probability distribution in bivariate distributions. First, a randomly chosen variable is added as a node in a graph, the most dependent variable to one of those in the graph is chosen among those not yet in the graph, this procedure is repeated until no remaining variable depends on any variable in the graph (verified according to a threshold value).

The resulting model is a forest with multiple trees rooted at nodes . Considering the non-root variables, BMDA estimates a factorized distribution in which the root variables can be sampled independently, whereas all the others must be conditioned to the parent variable .

Each step of BMDA is defined as follows

Multivariate factorizations

The next stage of EDAs development was the use of multivariate factorizations. In this case, the joint probability distribution is usually factorized in a number of components of limited size .

The learning of PGMs encoding multivariate distributions is a computationally expensive task, therefore, it is usual for EDAs to estimate multivariate statistics from bivariate statistics. Such relaxation allows PGM to be built in polynomial time in ; however, it also limits the generality of such EDAs.

Extended compact genetic algorithm (eCGA)

The ECGA [10] was one of the first EDA to employ multivariate factorizations, in which high-order dependencies among decision variables can be modeled. Its approach factorizes the joint probability distribution in the product of multivariate marginal distributions. Assume is a set of subsets, in which every is a linkage set, containing variables. The factorized joint probability distribution is represented as follows

The ECGA popularized the term "linkage-learning" as denoting procedures that identify linkage sets. Its linkage-learning procedure relies on two measures: (1) the Model Complexity (MC) and (2) the Compressed Population Complexity (CPC). The MC quantifies the model representation size in terms of number of bits required to store all the marginal probabilities

The CPC, on the other hand, quantifies the data compression in terms of entropy of the marginal distribution over all partitions, where is the selected population size, is the number of decision variables in the linkage set and is the joint entropy of the variables in

The linkage-learning in ECGA works as follows: (1) Insert each variable in a cluster, (2) compute CCC = MC + CPC of the current linkage sets, (3) verify the increase on CCC provided by joining pairs of clusters, (4) effectively joins those clusters with highest CCC improvement. This procedure is repeated until no CCC improvements are possible and produces a linkage model . The ECGA works with concrete populations, therefore, using the factorized distribution modeled by ECGA, it can be described as

Bayesian optimization algorithm (BOA)

The BOA [11] [12] [13] uses Bayesian networks to model and sample promising solutions. Bayesian networks are directed acyclic graphs, with nodes representing variables and edges representing conditional probabilities between pair of variables. The value of a variable can be conditioned on a maximum of other variables, defined in . BOA builds a PGM encoding a factorized joint distribution, in which the parameters of the network, i.e. the conditional probabilities, are estimated from the selected population using the maximum likelihood estimator.

The Bayesian network structure, on the other hand, must be built iteratively (linkage-learning). It starts with a network without edges and, at each step, adds the edge which better improves some scoring metric (e.g. Bayesian information criterion (BIC) or Bayesian-Dirichlet metric with likelihood equivalence (BDe)). [14] The scoring metric evaluates the network structure according to its accuracy in modeling the selected population. From the built network, BOA samples new promising solutions as follows: (1) it computes the ancestral ordering for each variable, each node being preceded by its parents; (2) each variable is sampled conditionally to its parents. Given such scenario, every BOA step can be defined as

Linkage-tree Genetic Algorithm (LTGA)

The LTGA [15] differs from most EDA in the sense it does not explicitly model a probability distribution but only a linkage model, called linkage-tree. A linkage is a set of linkage sets with no probability distribution associated, therefore, there is no way to sample new solutions directly from . The linkage model is a linkage-tree produced stored as a Family of sets (FOS).

The linkage-tree learning procedure is a hierarchical clustering algorithm, which work as follows. At each step the two closest clusters and are merged, this procedure repeats until only one cluster remains, each subtree is stored as a subset .

The LTGA uses to guide an "optimal mixing" procedure which resembles a recombination operator but only accepts improving moves. We denote it as , where the notation indicates the transfer of the genetic material indexed by from to .

Algorithm Gene-pool optimal mixing    Input: A family of subsets  and a population     Output: A population .    for eachindofor eachindo            choose a random  := := ifthenreturn
  • "" denotes assignment. For instance, "largestitem" means that the value of largest changes to the value of item.
  • "return" terminates the algorithm and outputs the following value.

The LTGA does not implement typical selection operators, instead, selection is performed during recombination. Similar ideas have been usually applied into local-search heuristics and, in this sense, the LTGA can be seen as an hybrid method. In summary, one step of the LTGA is defined as

Other

Related Research Articles

<span class="mw-page-title-main">Convolution</span> Integral expressing the amount of overlap of one function as it is shifted over another

In mathematics, convolution is a mathematical operation on two functions that produces a third function. The term convolution refers to both the result function and to the process of computing it. It is defined as the integral of the product of the two functions after one is reflected about the y-axis and shifted. The integral is evaluated for all values of shift, producing the convolution function. The choice of which function is reflected and shifted before the integral does not change the integral result. Graphically, it expresses how the 'shape' of one function is modified by the other.

<span class="mw-page-title-main">Normal distribution</span> Probability distribution

In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is

<span class="mw-page-title-main">Student's t-distribution</span> Probability distribution

In probability and statistics, Student's t distribution is a continuous probability distribution that generalizes the standard normal distribution. Like the latter, it is symmetric around zero and bell-shaped.

A Bayesian network is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). While it is one of several forms of causal notation, causal networks are special cases of Bayesian networks. Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.

Grammar theory to model symbol strings originated from work in computational linguistics aiming to understand the structure of natural languages. Probabilistic context free grammars (PCFGs) have been applied in probabilistic modeling of RNA structures almost 40 years after they were introduced in computational linguistics.

<span class="mw-page-title-main">Markov property</span> Memoryless property of a stochastic process

In probability theory and statistics, the term Markov property refers to the memoryless property of a stochastic process, which means that its future evolution is independent of its history. It is named after the Russian mathematician Andrey Markov. The term strong Markov property is similar to the Markov property, except that the meaning of "present" is defined in terms of a random variable known as a stopping time.

A graphical model or probabilistic graphical model (PGM) or structured probabilistic model is a probabilistic model for which a graph expresses the conditional dependence structure between random variables. They are commonly used in probability theory, statistics—particularly Bayesian statistics—and machine learning.

<span class="mw-page-title-main">Expectation–maximization algorithm</span> Iterative method for finding maximum likelihood estimates in statistical models

In statistics, an expectation–maximization (EM) algorithm is an iterative method to find (local) maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables. The EM iteration alternates between performing an expectation (E) step, which creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step. These parameter-estimates are then used to determine the distribution of the latent variables in the next E step. It can be used, for example, to estimate a mixture of gaussians, or to solve the multiple linear regression problem.

<span class="mw-page-title-main">Ant colony optimization algorithms</span> Optimization algorithm

In computer science and operations research, the ant colony optimization algorithm (ACO) is a probabilistic technique for solving computational problems which can be reduced to finding good paths through graphs. Artificial ants stand for multi-agent methods inspired by the behavior of real ants. The pheromone-based communication of biological ants is often the predominant paradigm used. Combinations of artificial ants and local search algorithms have become a method of choice for numerous optimization tasks involving some sort of graph, e.g., vehicle routing and internet routing.

In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.

In statistics, a mixture model is a probabilistic model for representing the presence of subpopulations within an overall population, without requiring that an observed data set should identify the sub-population to which an individual observation belongs. Formally a mixture model corresponds to the mixture distribution that represents the probability distribution of observations in the overall population. However, while problems associated with "mixture distributions" relate to deriving the properties of the overall population from those of the sub-populations, "mixture models" are used to make statistical inferences about the properties of the sub-populations given only observations on the pooled population, without sub-population identity information. Mixture models are used for clustering, under the name model-based clustering, and also for density estimation.

Variational Bayesian methods are a family of techniques for approximating intractable integrals arising in Bayesian inference and machine learning. They are typically used in complex statistical models consisting of observed variables as well as unknown parameters and latent variables, with various sorts of relationships among the three types of random variables, as might be described by a graphical model. As typical in Bayesian inference, the parameters and latent variables are grouped together as "unobserved variables". Variational Bayesian methods are primarily used for two purposes:

  1. To provide an analytical approximation to the posterior probability of the unobserved variables, in order to do statistical inference over these variables.
  2. To derive a lower bound for the marginal likelihood of the observed data. This is typically used for performing model selection, the general idea being that a higher marginal likelihood for a given model indicates a better fit of the data by that model and hence a greater probability that the model in question was the one that generated the data.

A stochastic simulation is a simulation of a system that has variables that can change stochastically (randomly) with individual probabilities.

In statistics, the Kendall rank correlation coefficient, commonly referred to as Kendall's τ coefficient, is a statistic used to measure the ordinal association between two measured quantities. A τ test is a non-parametric hypothesis test for statistical dependence based on the τ coefficient. It is a measure of rank correlation: the similarity of the orderings of the data when ranked by each of the quantities. It is named after Maurice Kendall, who developed it in 1938, though Gustav Fechner had proposed a similar measure in the context of time series in 1897.

In actuarial science and applied probability, ruin theory uses mathematical models to describe an insurer's vulnerability to insolvency/ruin. In such models key quantities of interest are the probability of ruin, distribution of surplus immediately prior to ruin and deficit at time of ruin.

Financial correlations measure the relationship between the changes of two or more financial variables over time. For example, the prices of equity stocks and fixed interest bonds often move in opposite directions: when investors sell stocks, they often use the proceeds to buy bonds and vice versa. In this case, stock and bond prices are negatively correlated.

Variable elimination (VE) is a simple and general exact inference algorithm in probabilistic graphical models, such as Bayesian networks and Markov random fields. It can be used for inference of maximum a posteriori (MAP) state or estimation of conditional or marginal distributions over a subset of variables. The algorithm has exponential time complexity, but could be efficient in practice for low-treewidth graphs, if the proper elimination order is used.

Stochastic chains with memory of variable length are a family of stochastic chains of finite order in a finite alphabet, such as, for every time pass, only one finite suffix of the past, called context, is necessary to predict the next symbol. These models were introduced in the information theory literature by Jorma Rissanen in 1983, as a universal tool to data compression, but recently have been used to model data in different areas such as biology, linguistics and music.

<span class="mw-page-title-main">Stochastic gradient Langevin dynamics</span> Optimization and sampling technique

Stochastic gradient Langevin dynamics (SGLD) is an optimization and sampling technique composed of characteristics from Stochastic gradient descent, a Robbins–Monro optimization algorithm, and Langevin dynamics, a mathematical extension of molecular dynamics models. Like stochastic gradient descent, SGLD is an iterative optimization algorithm which uses minibatching to create a stochastic gradient estimator, as used in SGD to optimize a differentiable objective function. Unlike traditional SGD, SGLD can be used for Bayesian learning as a sampling method. SGLD may be viewed as Langevin dynamics applied to posterior distributions, but the key difference is that the likelihood gradient terms are minibatched, like in SGD. SGLD, like Langevin dynamics, produces samples from a posterior distribution of parameters based on available data. First described by Welling and Teh in 2011, the method has applications in many contexts which require optimization, and is most notably applied in machine learning problems.

Probabilistic numerics is an active field of study at the intersection of applied mathematics, statistics, and machine learning centering on the concept of uncertainty in computation. In probabilistic numerics, tasks in numerical analysis such as finding numerical solutions for integration, linear algebra, optimization and simulation and differential equations are seen as problems of statistical, probabilistic, or Bayesian inference.

References

  1. Pelikan, Martin (2005-02-21), "Probabilistic Model-Building Genetic Algorithms", Hierarchical Bayesian Optimization Algorithm, Studies in Fuzziness and Soft Computing, vol. 170, Springer Berlin Heidelberg, pp. 13–30, doi:10.1007/978-3-540-32373-0_2, ISBN   9783540237747
  2. Pedro Larrañaga; Jose A. Lozano (2002). Estimation of Distribution Algorithms a New Tool for Evolutionary Computation. Boston, MA: Springer US. ISBN   978-1-4615-1539-5.
  3. Jose A. Lozano; Larrañaga, P.; Inza, I.; Bengoetxea, E. (2006). Towards a new evolutionary computation advances in the estimation of distribution algorithms. Berlin: Springer. ISBN   978-3-540-32494-2.
  4. Pelikan, Martin; Sastry, Kumara; Cantú-Paz, Erick (2006). Scalable optimization via probabilistic modeling : from algorithms to applications; with 26 tables. Berlin: Springer. ISBN   978-3540349532.
  5. Mühlenbein, Heinz (1 September 1997). "The Equation for Response to Selection and Its Use for Prediction". Evol. Computation. 5 (3): 303–346. doi:10.1162/evco.1997.5.3.303. ISSN   1063-6560. PMID   10021762. S2CID   2593514.
  6. Baluja, Shummet (1 January 1994). "Population-Based Incremental Learning: A Method for Integrating Genetic Search Based Function Optimization and Competitive Learning". Carnegie Mellon University.{{cite journal}}: Cite journal requires |journal= (help)
  7. Harik, G.R.; Lobo, F.G.; Goldberg, D.E. (1999). "The compact genetic algorithm". IEEE Transactions on Evolutionary Computation. 3 (4): 287–297. doi:10.1109/4235.797971.
  8. Bonet, Jeremy S. De; Isbell, Charles L.; Viola, Paul (1 January 1996). "MIMIC: Finding Optima by Estimating Probability Densities". Advances in Neural Information Processing Systems: 424. CiteSeerX   10.1.1.47.6497 .
  9. Pelikan, Martin; Muehlenbein, Heinz (1 January 1999). "The Bivariate Marginal Distribution Algorithm". Advances in Soft Computing. pp. 521–535. CiteSeerX   10.1.1.55.1151 . doi:10.1007/978-1-4471-0819-1_39. ISBN   978-1-85233-062-0.
  10. Harik, Georges Raif (1997). Learning Gene Linkage to Efficiently Solve Problems of Bounded Difficulty Using Genetic Algorithms (phd). University of Michigan.
  11. Pelikan, Martin; Goldberg, David E.; Cantu-Paz, Erick (1 January 1999). "BOA: The Bayesian Optimization Algorithm". Morgan Kaufmann: 525–532. CiteSeerX   10.1.1.46.8131 .{{cite journal}}: Cite journal requires |journal= (help)
  12. Pelikan, Martin (2005). Hierarchical Bayesian optimization algorithm : toward a new generation of evolutionary algorithms (1st ed.). Berlin [u.a.]: Springer. ISBN   978-3-540-23774-7.
  13. Wolpert, David H.; Rajnarayan, Dev (1 January 2013). "Using Machine Learning to Improve Stochastic Optimization". Proceedings of the 17th AAAI Conference on Late-Breaking Developments in the Field of Artificial Intelligence. Aaaiws'13-17: 146–148.
  14. Larrañaga, Pedro; Karshenas, Hossein; Bielza, Concha; Santana, Roberto (21 August 2012). "A review on probabilistic graphical models in evolutionary computation". Journal of Heuristics. 18 (5): 795–819. doi:10.1007/s10732-012-9208-4. S2CID   9734434.
  15. Thierens, Dirk (11 September 2010). "The Linkage Tree Genetic Algorithm". Parallel Problem Solving from Nature, PPSN XI. pp. 264–273. doi:10.1007/978-3-642-15844-5_27. ISBN   978-3-642-15843-8.
  16. WOLPERT, DAVID H.; STRAUSS, CHARLIE E. M.; RAJNARAYAN, DEV (December 2006). "Advances in Distributed Optimization Using Probability Collectives". Advances in Complex Systems. 09 (4): 383–436. CiteSeerX   10.1.1.154.6395 . doi:10.1142/S0219525906000884.
  17. Pelikan, Martin; Goldberg, David E.; Lobo, Fernando G. (2002). "A Survey of Optimization by Building and Using Probabilistic Models". Computational Optimization and Applications. 21 (1): 5–20. doi:10.1023/A:1013500812258.
  18. Rudlof, Stephan; Köppen, Mario (1997). "Stochastic Hill Climbing with Learning by Vectors of Normal Distributions": 60–70. CiteSeerX   10.1.1.19.3536 .{{cite journal}}: Cite journal requires |journal= (help)
  19. Rudlof, Stephan; Köppen, Mario (1997). "Stochastic Hill Climbing with Learning by Vectors of Normal Distributions": 60––70. CiteSeerX   10.1.1.19.3536 .{{cite journal}}: Cite journal requires |journal= (help)
  20. Corno, Fulvio; Reorda, Matteo Sonza; Squillero, Giovanni (1998-02-27). The selfish gene algorithm: a new evolutionary optimization strategy. ACM. pp. 349–355. doi:10.1145/330560.330838. ISBN   978-0897919692. S2CID   9125252.
  21. Mininno, Ernesto; Neri, Ferrante; Cupertino, Francesco; Naso, David (2011). "Compact Differential Evolution". IEEE Transactions on Evolutionary Computation. 15 (1): 32–54. doi:10.1109/tevc.2010.2058120. ISSN   1089-778X. S2CID   20582233.
  22. Iacca, Giovanni; Caraffini, Fabio; Neri, Ferrante (2012). "Compact Differential Evolution Light: High Performance Despite Limited Memory Requirement and Modest Computational Overhead". Journal of Computer Science and Technology. 27 (5): 1056–1076. doi:10.1007/s11390-012-1284-2. ISSN   1000-9000. S2CID   3184035.
  23. Iacca, Giovanni; Neri, Ferrante; Mininno, Ernesto (2011), "Opposition-Based Learning in Compact Differential Evolution", Applications of Evolutionary Computation, Springer Berlin Heidelberg, pp. 264–273, doi:10.1007/978-3-642-20525-5_27, ISBN   9783642205248
  24. Mallipeddi, Rammohan; Iacca, Giovanni; Suganthan, Ponnuthurai Nagaratnam; Neri, Ferrante; Mininno, Ernesto (2011). "Ensemble strategies in Compact Differential Evolution". 2011 IEEE Congress of Evolutionary Computation (CEC). IEEE. pp. 1972–1977. doi:10.1109/cec.2011.5949857. ISBN   9781424478347. S2CID   11781300.
  25. Neri, Ferrante; Iacca, Giovanni; Mininno, Ernesto (2011). "Disturbed Exploitation compact Differential Evolution for limited memory optimization problems". Information Sciences. 181 (12): 2469–2487. doi:10.1016/j.ins.2011.02.004. ISSN   0020-0255.
  26. Iacca, Giovanni; Mallipeddi, Rammohan; Mininno, Ernesto; Neri, Ferrante; Suganthan, Pannuthurai Nagaratnam (2011). "Global supervision for compact Differential Evolution". 2011 IEEE Symposium on Differential Evolution (SDE). IEEE. pp. 1–8. doi:10.1109/sde.2011.5952051. ISBN   9781612840710. S2CID   8874851.
  27. Iacca, Giovanni; Mallipeddi, Rammohan; Mininno, Ernesto; Neri, Ferrante; Suganthan, Pannuthurai Nagaratnam (2011). "Super-fit and population size reduction in compact Differential Evolution". 2011 IEEE Workshop on Memetic Computing (MC). IEEE. pp. 1–8. doi:10.1109/mc.2011.5953633. ISBN   9781612840659. S2CID   5692951.
  28. Neri, Ferrante; Mininno, Ernesto; Iacca, Giovanni (2013). "Compact Particle Swarm Optimization". Information Sciences. 239: 96–121. doi:10.1016/j.ins.2013.03.026. ISSN   0020-0255.
  29. Iacca, Giovanni; Neri, Ferrante; Mininno, Ernesto (2012), "Compact Bacterial Foraging Optimization", Swarm and Evolutionary Computation, Springer Berlin Heidelberg, pp. 84–92, doi:10.1007/978-3-642-29353-5_10, hdl:11572/196442, ISBN   9783642293528
  30. Salustowicz, null; Schmidhuber, null (1997). "Probabilistic incremental program evolution". Evolutionary Computation. 5 (2): 123–141. doi:10.1162/evco.1997.5.2.123. ISSN   1530-9304. PMID   10021756. S2CID   10759266.
  31. Tamayo-Vera, Dania; Bolufe-Rohler, Antonio; Chen, Stephen (2016). "Estimation multivariate normal algorithm with thresheld convergence". 2016 IEEE Congress on Evolutionary Computation (CEC). IEEE. pp. 3425–3432. doi:10.1109/cec.2016.7744223. ISBN   9781509006236. S2CID   33114730.
  32. Yu, Tian-Li; Goldberg, David E.; Yassine, Ali; Chen, Ying-Ping (2003), "Genetic Algorithm Design Inspired by Organizational Theory: Pilot Study of a Dependency Structure Matrix Driven Genetic Algorithm", Genetic and Evolutionary Computation — GECCO 2003, Springer Berlin Heidelberg, pp. 1620–1621, doi:10.1007/3-540-45110-2_54, ISBN   9783540406037
  33. Hsu, Shih-Huan; Yu, Tian-Li (2015-07-11). Optimization by Pairwise Linkage Detection, Incremental Linkage Set, and Restricted / Back Mixing: DSMGA-II. ACM. pp. 519–526. arXiv: 1807.11669 . doi:10.1145/2739480.2754737. ISBN   9781450334723. S2CID   17031156.