Part of a series on the |
Evolutionary algorithm |
---|
Genetic algorithm (GA) |
Genetic programming (GP) |
Differential evolution |
Evolution strategy |
Evolutionary programming |
Related topics |
Mutation is a genetic operator used to maintain genetic diversity of the chromosomes of a population of an evolutionary algorithm (EA), including genetic algorithms in particular. It is analogous to biological mutation.
The classic example of a mutation operator of a binary coded genetic algorithm (GA) involves a probability that an arbitrary bit in a genetic sequence will be flipped from its original state. A common method of implementing the mutation operator involves generating a random variable for each bit in a sequence. This random variable tells whether or not a particular bit will be flipped. This mutation procedure, based on the biological point mutation, is called single point mutation. Other types of mutation operators are commonly used for representations other than binary, such as floating-point encodings or representations for combinatorial problems.
The purpose of mutation in EAs is to introduce diversity into the sampled population. Mutation operators are used in an attempt to avoid local minima by preventing the population of chromosomes from becoming too similar to each other, thus slowing or even stopping convergence to the global optimum. This reasoning also leads most EAs to avoid only taking the fittest of the population in generating the next generation, but rather selecting a random (or semi-random) set with a weighting toward those that are fitter. [1]
The following requirements apply to all mutation operators used in an EA: [2] [3]
For different genome types, different mutation types are suitable. Some mutations are Gaussian, Uniform, Zigzag, Scramble, Insertion, Inversion, Swap, and so on. [4] [5] [6] An overview and more operators than those presented below can be found in the introductory book by Eiben and Smith [7] or in. [3] [8]
The mutation of bit strings ensue through bit flips at random positions.
Example:
1 | 0 | 1 | 0 | 0 | 1 | 0 |
↓ | ||||||
1 | 0 | 1 | 0 | 1 | 1 | 0 |
The probability of a mutation of a bit is , where is the length of the binary vector. [9] Thus, a mutation rate of per mutation and individual selected for mutation is reached.
Many EAs, such as the evolution strategy [10] [11] or the real-coded genetic algorithms, [12] [13] [8] work with real numbers instead of bit strings. This is due to the good experiences that have been made with this type of coding. [8] [14]
The value of a real-valued gene can either be changed or redetermined. A mutation that implements the latter should only ever be used in conjunction with the value-changing mutations and then only with comparatively low probability, as it can lead to large changes.
In practical applications, the respective value range of the decision variables to be changed of the optimisation problem to be solved is usually limited. Accordingly, the values of the associated genes are each restricted to an interval . Mutations may or may not take these restrictions into account. In the latter case, suitable post-treatment is then required as described below.
A real number can be mutated using normal distribution by adding the generated random value to the old value of the gene, resulting in the mutated value :
In the case of genes with a restricted range of values, it is a good idea to choose the step size of the mutation so that it reasonably fits the range of the gene to be changed, e.g.:
The step size can also be adjusted to the smaller permissible change range depending on the current value. In any case, however, it is likely that the new value of the gene will be outside the permissible range of values. Such a case must be considered a lethal mutation, since the obvious repair by using the respective violated limit as the new value of the gene would lead to a drift. This is because the limit value would then be selected with the entire probability of the values beyond the limit of the value range.
The evolution strategy works with real numbers and mutation based on normal distribution. The step sizes are part of the chromosome and are subject to evolution together with the actual decision variables. [15] [16]
One possible form of changing the value of a gene while taking its value range into account is the mutation relative parameter change of the evolutionary algorithm GLEAM (General Learning Evolutionary Algorithm and Method), [17] in which, as with the mutation presented earlier, small changes are more likely than large ones.
First, an equally distributed decision is made as to whether the current value should be increased or decreased and then the corresponding total change interval is determined. Without loss of generality, an increase is assumed for the explanation and the total change interval is then . It is divided into sub-areas of equal size with the width , from which sub-change intervals of different size are formed:
Subsequently, one of the sub-change intervals is selected in equal distribution and a random number, also equally distributed, is drawn from it as the new value of the gene. The resulting summed probabilities of the sub-change intervals result in the probability distribution of the sub-areas shown in the adjacent figure for the exemplary case of . This is not a normal distribution as before, but this distribution also clearly favours small changes over larger ones.
This mutation for larger values of , such as 10, is less well suited for tasks where the optimum lies on one of the value range boundaries. This can be remedied by significantly reducing when a gene value approaches its limits very closely.
For both mutation operators for real-valued numbers, the probability of an increase and decrease is independent of the current value and is 50% in each case. In addition, small changes are considerably more likely than large ones. For mixed-integer optimization problems, rounding is usually used.
Mutations of permutations are specially designed for genomes that are themselves permutations of a set. These are often used to solve combinatorial tasks. [8] [18] [19] In the two mutations presented, parts of the genome are rotated or inverted.
The presentation of the procedure [19] is illustrated by an example on the right:
Procedure | Example | |
* Let a permutation be given. | ||
* Select a partial list, i.e. a start index and an end index in , which are both natural numbers between 0 and . Choose the number of positions by which the partial list should be rotated, where . The start index can also be after the end index. Then the partial list simply starts again from the beginning (periodic boundary condition). This is necessary so that the permutation probability in the genome is the same everywhere and is not greater in the middle than at the edges. | , , | |
* Copy to and rotate the partial list by positions to the right. | ||
* is then the mutated genome. |
The presentation of the procedure [18] is illustrated by an example on the right:
Procedure | Example | |
* Let a permutation be given. | ||
* Select a partial list, i.e. a start index and an end index in , which are both natural numbers between 0 and , where . This condition causes the mutation to always produce a genotypically altered chromosome. The start index can also be after the end index. Then the partial list simply starts again from the beginning (periodic boundary condition). This is necessary so that the permutation probability in the genome is the same everywhere and is not greater in the middle than at the edges. | , | |
* Copy to and invert the partial list. | ||
* is then the mutated genome. |
The requirement raised at the beginning for mutations, according to which small changes should be more probable than large ones, is only inadequately fulfilled by the two permutation mutations presented, since the lengths of the partial lists and the number of shift positions are determined in an equally distributed manner. However, the longer the partial list and the shift, the greater the change in gene order.
This can be remedied by the following modifications. The end index of the partial lists is determined as the distance to the start index :
where is determined randomly according to one of the two procedures for the mutation of real numbers from the interval and rounded.
For the rotation, is determined similarly to the distance , but the value is forbidden.
For the inversion, note that must hold, so for the value must be excluded.
Genetic programming (GP) is an evolutionary algorithm, an artificial intelligence technique mimicking natural evolution, which operates on a population of programs. It applies the genetic operators selection according to a predefined fitness measure, mutation and crossover.
In computer science and operations research, a genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to generate high-quality solutions to optimization and search problems via biologically inspired operators such as selection, crossover, and mutation. Some examples of GA applications include optimizing decision trees for better performance, solving sudoku puzzles, hyperparameter optimization, and causal inference.
Evolutionary algorithms (EA) reproduce essential elements of the biological evolution in a computer algorithm in order to solve “difficult” problems, at least approximately, for which no exact or satisfactory solution methods are known. They belong to the class of metaheuristics and are a subset of evolutionary computation, which itself is part of the field of computational intelligence. The mechanisms of biological evolution that an EA mainly imitates are reproduction, mutation, recombination and selection. Candidate solutions to the optimization problem play the role of individuals in a population, and the fitness function determines the quality of the solutions (see also loss function). Evolution of the population then takes place after the repeated application of the above operators.
Evolutionary computation from computer science is a family of algorithms for global optimization inspired by biological evolution, and the subfield of artificial intelligence and soft computing studying these algorithms. In technical terms, they are a family of population-based trial and error problem solvers with a metaheuristic or stochastic optimization character.
A genetic operator is an operator used in evolutionary algorithms to guide the algorithm towards a solution to a given problem. There are three main types of operators, which must work in conjunction with one another in order for the algorithm to be successful. Genetic operators are used to create and maintain genetic diversity, combine existing solutions into new solutions (crossover) and select between solutions (selection). In his book discussing the use of genetic programming for the optimization of complex problems, computer scientist John Koza has also identified an 'inversion' or 'permutation' operator; however, the effectiveness of this operator has never been conclusively demonstrated and this operator is rarely discussed.
A fitness function is a particular type of objective or cost function that is used to summarize, as a single figure of merit, how close a given candidate solution is to achieving the set aims. It is an important component of evolutionary algorithms (EA), such as genetic programming, evolution strategies or genetic algorithms. An EA is a metaheuristic that reproduces the basic principles of biological evolution as a computer algorithm in order to solve challenging optimization or planning tasks, at least approximately. For this purpose, many candidate solutions are generated, which are evaluated using a fitness function in order to guide the evolutionary development towards the desired goal. Similar quality functions are also used in other metaheuristics, such as ant colony optimization or particle swarm optimization.
Crossover in evolutionary algorithms and evolutionary computation, also called recombination, is a genetic operator used to combine the genetic information of two parents to generate new offspring. It is one way to stochastically generate new solutions from an existing population, and is analogous to the crossover that happens during sexual reproduction in biology. New solutions can also be generated by cloning an existing solution, which is analogous to asexual reproduction. Newly generated solutions may be mutated before being added to the population. The aim of recombination is to transfer good characteristics from two different parents to one child.
A chromosome or genotype in evolutionary algorithms (EA) is a set of parameters which define a proposed solution of the problem that the evolutionary algorithm is trying to solve. The set of all solutions, also called individuals according to the biological model, is known as the population. The genome of an individual consists of one, more rarely of several, chromosomes and corresponds to the genetic representation of the task to be solved. A chromosome is composed of a set of genes, where a gene consists of one or more semantically connected parameters, which are often also called decision variables. They determine one or more phenotypic characteristics of the individual or at least have an influence on them. In the basic form of genetic algorithms, the chromosome is represented as a binary string, while in later variants and in EAs in general, a wide variety of other data structures are used.
Metaheuristic in computer science and mathematical optimization is a higher-level procedure or heuristic designed to find, generate, tune, or select a heuristic that may provide a sufficiently good solution to an optimization problem or a machine learning problem, especially with incomplete or imperfect information or limited computation capacity. Metaheuristics sample a subset of solutions which is otherwise too large to be completely enumerated or otherwise explored. Metaheuristics may make relatively few assumptions about the optimization problem being solved and so may be usable for a variety of problems. Their use is always of interest when exact or other (approximate) methods are not available or are not expedient, either because the calculation time is too long or because, for example, the solution provided is too imprecise.
Evolution strategy (ES) from computer science is a subclass of evolutionary algorithms, which serves as an optimization technique. It uses the major genetic operators mutation, recombination and selection of parents.
Gene expression programming (GEP) in computer programming is an evolutionary algorithm that creates computer programs or models. These computer programs are complex tree structures that learn and adapt by changing their sizes, shapes, and composition, much like a living organism. And like living organisms, the computer programs of GEP are also encoded in simple linear chromosomes of fixed length. Thus, GEP is a genotype–phenotype system, benefiting from a simple genome to keep and transmit the genetic information and a complex phenotype to explore the environment and adapt to it.
Selection is a genetic operator in a evolutionary algorithm in which individual genomes are chosen from a population for later breeding. Selection mechanisms are also used to choose candidate solutions (individuals) for the next generation. Retaining the best individuals in a generation unchanged in the next generation, is called elitism or elitist selection. It is a successful (slight) variant of the general process of constructing a new population.
Premature convergence is an unwanted effect in evolutionary algorithms (EA), a metaheuristic that mimics the basic principles of biological evolution as a computer algorithm for solving an optimization problem. The effect means that the population of an EA has converged too early, resulting in being suboptimal. In this context, the parental solutions, through the aid of genetic operators, are not able to generate offspring that are superior to, or outperform, their parents. Premature convergence is a common problem found in evolutionary algorithms, as it leads to a loss, or convergence of, a large number of alleles, subsequently making it very difficult to search for a specific gene in which the alleles were present. An allele is considered lost if, in a population, a gene is present, where all individuals are sharing the same value for that particular gene. An allele is, as defined by De Jong, considered to be a converged allele, when 95% of a population share the same value for a certain gene.
Genetic distance is a measure of the genetic divergence between species or between populations within a species, whether the distance measures time from common ancestor or degree of differentiation. Populations with many similar alleles have small genetic distances. This indicates that they are closely related and have a recent common ancestor.
Estimation of distribution algorithms (EDAs), sometimes called probabilistic model-building genetic algorithms (PMBGAs), are stochastic optimization methods that guide the search for the optimum by building and sampling explicit probabilistic models of promising candidate solutions. Optimization is viewed as a series of incremental updates of a probabilistic model, starting with the model encoding an uninformative prior over admissible solutions and ending with the model that generates only the global optima.
A point accepted mutation — also known as a PAM — is the replacement of a single amino acid in the primary structure of a protein with another single amino acid, which is accepted by the processes of natural selection. This definition does not include all point mutations in the DNA of an organism. In particular, silent mutations are not point accepted mutations, nor are mutations that are lethal or that are rejected by natural selection in other ways.
In computer programming, genetic representation is a way of presenting solutions/individuals in evolutionary computation methods. The term encompasses both the concrete data structures and data types used to realize the genetic material of the candidate solutions in the form of a genome, and the relationships between search space and problem space. In the simplest case, the search space corresponds to the problem space. The choice of problem representation is tied to the choice of genetic operators, both of which have a decisive effect on the efficiency of the optimization. Genetic representation can encode appearance, behavior, physical qualities of individuals. Difference in genetic representations is one of the major criteria drawing a line between known classes of evolutionary computation.
Biogeography-based optimization (BBO) is an evolutionary algorithm (EA) that optimizes a function by stochastically and iteratively improving candidate solutions with regard to a given measure of quality, or fitness function. BBO belongs to the class of metaheuristics since it includes many variations, and since it does not make any assumptions about the problem and can therefore be applied to a wide class of problems.
The Fly Algorithm is a computational method within the field of evolutionary algorithms, designed for direct exploration of 3D spaces in applications such as computer stereo vision, robotics, and medical imaging. Unlike traditional image-based stereovision, which relies on matching features to construct 3D information, the Fly Algorithm operates by generating a 3D representation directly from random points, termed "flies." Each fly is a coordinate in 3D space, evaluated for its accuracy by comparing its projections in a scene. By iteratively refining the positions of flies based on fitness criteria, the algorithm can construct an optimized spatial representation. The Fly Algorithm has expanded into various fields, including applications in digital art, where it is used to generate complex visual patterns.
Genotypic and phenotypic repair are optional components of an evolutionary algorithm (EA). An EA reproduces essential elements of biological evolution as a computer algorithm in order to solve demanding optimization or planning tasks, at least approximately. A candidate solution is represented by a - usually linear - data structure that plays the role of an individual's chromosome. New solution candidates are generated by mutation and crossover operators following the example of biology. These offspring may be defective, which is corrected or compensated for by genotypic or phenotypic repair.