Crossover (genetic algorithm)

Last updated

In genetic algorithms and evolutionary computation, crossover, also called recombination, is a genetic operator used to combine the genetic information of two parents to generate new offspring. It is one way to stochastically generate new solutions from an existing population, and is analogous to the crossover that happens during sexual reproduction in biology. Solutions can also be generated by cloning an existing solution, which is analogous to asexual reproduction. Newly generated solutions may be mutated before being added to the population.

Contents

Different algorithms in evolutionary computation may use different data structures to store genetic information, and each genetic representation can be recombined with different crossover operators. Typical data structures that can be recombined with crossover are bit arrays, vectors of real numbers, or trees.

The list of operators presented below is by no means complete and serves mainly as an exemplary illustration of this dyadic genetic operator type. More operators and more details can be found in the literature. [1] [2] [3] [4] [5]

Crossover for binary arrays

Traditional genetic algorithms store genetic information in a chromosome represented by a bit array. Crossover methods for bit arrays are popular and an illustrative example of genetic recombination.

One-point crossover

A point on both parents' chromosomes is picked randomly, and designated a 'crossover point'. Bits to the right of that point are swapped between the two parent chromosomes. This results in two offspring, each carrying some genetic information from both parents.

OnePointCrossover.svg

Two-point and k-point crossover

In two-point crossover, two crossover points are picked randomly from the parent chromosomes. The bits in between the two points are swapped between the parent organisms.

TwoPointCrossover.svg

Two-point crossover is equivalent to performing two single-point crossovers with different crossover points. This strategy can be generalized to k-point crossover for any positive integer k, picking k crossover points.

Uniform crossover

In uniform crossover, typically, each bit is chosen from either parent with equal probability. [6] Other mixing ratios are sometimes used, resulting in offspring which inherit more genetic information from one parent than the other. In a uniform crossover, we don’t divide the chromosome into segments, rather we treat each gene separately. In this, we essentially flip a coin for each chromosome to decide whether or not it will be included in the off-spring.

Crossover for integer or real-valued genomes

Example of a discrete recombination in the three-dimensional case. The two possible offspring lie on the corners of the cuboid marked in blue. Discrete Recombination-3Dim.png
Example of a discrete recombination in the three-dimensional case. The two possible offspring lie on the corners of the cuboid marked in blue.

For the crossover operators presented above and for most other crossover operators for bit strings, it holds that they can also be applied accordingly to integer or real-valued genomes whose genes each consist of an integer or real-valued number. Instead of individual bits, integer or real-valued numbers are then simply copied into the child genome. The offspring lie on the remaining corners of the hyperbody spanned by the two parents and , as exemplified in the accompanying image for the three-dimensional case.

Discrete recombination

If the rules of the uniform crossover for bit strings are applied during the generation of the offspring, this is also called discrete recombination. [7]

Intermediate recombination

In the two-dimensional case, the two offspring of discrete recombination lie on the corners marked in blue, while the entire gray area is in question for the offspring of intermediate recombination. Intermediate Recombination.png
In the two-dimensional case, the two offspring of discrete recombination lie on the corners marked in blue, while the entire gray area is in question for the offspring of intermediate recombination.

In this recombination operator, the allele values of the child genome are generated by mixing the alleles of the two parent genomes and : [7]

randomly equally distributed per gene

The choice of the interval causes that besides the interior of the hyperbody spanned by the allele values of the parent genes additionally a certain environment for the range of values of the offspring is in question. A value of is recommended for to counteract the tendency to reduce the allele values that otherwise exists at . [8]

The adjacent figure shows for the two-dimensional case the range of possible new alleles of the two exemplary parents and in intermediate recombination. The offspring of discrete recombination and are also plotted. Intermediate recombination satisfies the arithmetic calculation of the allele values of the child genome required by virtual alphabet theory. [9] [10] Discrete and intermediate recombination are used as a standard in the evolution strategy. [11]

Crossover for permutations

For combinatorial tasks, permutations are usually used that are specifically designed for genomes that are themselves permutations of a set. The underlying set is usually a subset of or . If 1- or n-point or uniform crossover for integer genomes is used for such genomes, a child genome may contain some values twice and others may be missing. This can be remedied by genetic repair, e.g. by replacing the redundant genes in positional fidelity for missing ones from the other child genome.

In order to avoid the generation of invalid offspring, special crossover operators for permutations have been developed [12] which fulfill the basic requirements of such operators for permutations, namely that all elements of the initial permutation are also present in the new one and only the order is changed. It can be distinguished between combinatorial tasks, where all sequences are admissible, and those where there are constraints in the form of inadmissible partial sequences. A well-known representative of the first task type is the traveling salesman problem (TSP), where the goal is to visit a set of cities exactly once on the shortest tour. An example of the constrained task type is the scheduling of multiple workflows. Workflows involve sequence constraints on some of the individual work steps. For example, a thread cannot be cut until the corresponding hole has been drilled in a workpiece. Such problems are also called order-based permutations.

In the following, two crossover operators are presented as examples, the partially mapped crossover (PMX) motivated by the TSP and the order crossover (OX1) designed for order-based permutations. A second offspring can be produced in each case by exchanging the parent chromosomes.

Partially mapped crossover (PMX)

The PMX operator was designed as a recombination operator for TSP like Problems. [13] [14] The explanation of the procedure is illustrated by an example:

Procedure Example Example Chromosome

  Let be given two permutations of the same set.  and

Randomly select two crossover points forming a gene segment in . Here from gene position 4 to 6. 

The selected section is copied to the child chromosome in the same position. The open positions are indicated by question marks. 

Look for genes that have not been copied in the corresponding segment of starting at the first crossover point. For each gene found (called ), look up in the offspring which element (called ) was copied in its place from . Copy to the position held by in if it is not occupied. Otherwise, continue with the next step. Gene is the first uncopied gene in the corresponding segment of : . Gene was copied from in its place in . 
 The position of in is the furthest right position and will be placed there in . 

If the place taken by in is already occupied by an element in the descendant, is put in the place taken by in . The next gene in is and this has already been copied into the child chromosome. Thus, the next gene to be handled is . Its position in the offspring would be the position of in . However, this place is already occupied by gene . So is copied to the position of in . 

After processing the genes from the selected segment in , the remaining positions in the offspring are filled with the genes from that have not yet been copied, in the order of their appearance. This results in the finished child genome. The genes copied from are and . 

Order crossover (OX1)

The order crossover goes back to Davis [1] in its original form and is presented here in a slightly generalized version with more than two crossover points. It transfers information about the relative order from the second parent to the offspring. First, the number and position of the crossover points are determined randomly. The resulting gene sequences are then processed as described below:

Procedure Example Example Chromosome

  Let be given two permutations of the same set.  and

Randomly select gene segments in . Here two segments from gene position 1 to 2 and from 6 to 8. 

As a child permutation, a permutation is generated that contains the selected gene segments of in the same position. The open positions are indicated by question marks. 

The remaining missing genes are now also transferred, but in the order in which they appear in . The missing genes of in the example are: 


This results in the completed child genome. The transferred genes are underlined: 

Among other things, order crossover is well suited for scheduling multiple workflows, when used in conjunction with 1- and n-point crossover. [15]

Further crossover operators for permutations

Over time, a large number of crossover operators for permutations have been proposed, so the following list is only a small selection. For more information, the reader is referred to the literature. [1] [5] [14] [12]

  1. cycle crossover (CX) [16] [14]
  2. order-based crossover (OX2) [5] [17]
  3. position-based crossover (POS) [5] [17]
  4. edge recombination [18] [14]
  5. voting recombination (VR) [12]
  6. alternating-positions crossover (AP) [12]
  7. maximal preservative crossover (MPX) [5] [19]
  8. merge crossover (MX) [5] [20]
  9. sequential constructive crossover operator (SCX) [21]

The usual approach to solving TSP-like problems by genetic or, more generally, evolutionary algorithms, presented earlier, is either to repair illegal descendants or to adjust the operators appropriately so that illegal offspring do not arise in the first place. Alternatively, Riazi suggests the use of a double chromosome representation, which avoids illegal offspring. [22]

See also

Bibliography

Related Research Articles

<span class="mw-page-title-main">Genetic algorithm</span> Competitive algorithm for searching a problem space

In computer science and operations research, a genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to generate high-quality solutions to optimization and search problems by relying on biologically inspired operators such as mutation, crossover and selection. Some examples of GA applications include optimizing decision trees for better performance, solving sudoku puzzles, hyperparameter optimization, causal inference, etc.

<span class="mw-page-title-main">Chromosomal crossover</span> Cellular process

Chromosomal crossover, or crossing over, is the exchange of genetic material during sexual reproduction between two homologous chromosomes' non-sister chromatids that results in recombinant chromosomes. It is one of the final phases of genetic recombination, which occurs in the pachytene stage of prophase I of meiosis during a process called synapsis. Synapsis begins before the synaptonemal complex develops and is not completed until near the end of prophase I. Crossover usually occurs when matching regions on matching chromosomes break and then reconnect to the other chromosome.

<span class="mw-page-title-main">Genetic recombination</span> Production of offspring with combinations of traits that differ from those found in either parent

Genetic recombination is the exchange of genetic material between different organisms which leads to production of offspring with combinations of traits that differ from those found in either parent. In eukaryotes, genetic recombination during meiosis can lead to a novel set of genetic information that can be further passed on from parents to offspring. Most recombination occurs naturally and can be classified into two types: (1) interchromosomal recombination, occurring through independent assortment of alleles whose loci are on different but homologous chromosomes ; & (2) intrachromosomal recombination, occurring through crossing over.

In computational intelligence (CI), an evolutionary algorithm (EA) is a subset of evolutionary computation, a generic population-based metaheuristic optimization algorithm. An EA uses mechanisms inspired by biological evolution, such as reproduction, mutation, recombination, and selection. Candidate solutions to the optimization problem play the role of individuals in a population, and the fitness function determines the quality of the solutions. Evolution of the population then takes place after the repeated application of the above operators.

In computer science, evolutionary computation is a family of algorithms for global optimization inspired by biological evolution, and the subfield of artificial intelligence and soft computing studying these algorithms. In technical terms, they are a family of population-based trial and error problem solvers with a metaheuristic or stochastic optimization character.

Genetic linkage is the tendency of DNA sequences that are close together on a chromosome to be inherited together during the meiosis phase of sexual reproduction. Two genetic markers that are physically near to each other are unlikely to be separated onto different chromatids during chromosomal crossover, and are therefore said to be more linked than markers that are far apart. In other words, the nearer two genes are on a chromosome, the lower the chance of recombination between them, and the more likely they are to be inherited together. Markers on different chromosomes are perfectly unlinked, although the penetrance of potentially deleterious alleles may be influenced by the presence of other alleles, and these other alleles may be located on other chromosomes than that on which a particular potentially deleterious allele is located.

<span class="mw-page-title-main">Muller's ratchet</span> Accumulation of harmful mutations

In evolutionary genetics, Muller's ratchet is a process which, in the absence of recombination, results in an accumulation of irreversible deleterious mutations. This happens because in the absence of recombination, and assuming reverse mutations are rare, offspring bear at least as much mutational load as their parents. Muller proposed this mechanism as one reason why sexual reproduction may be favored over asexual reproduction, as sexual organisms benefit from recombination and consequent elimination of deleterious mutations. The negative effect of accumulating irreversible deleterious mutations may not be prevalent in organisms which, while they reproduce asexually, also undergo other forms of recombination. This effect has also been observed in those regions of the genomes of sexual organisms that do not undergo recombination.

<span class="mw-page-title-main">Genetic variation</span> Difference in DNA among individuals or populations

Genetic variation is the difference in DNA among individuals or the differences between populations among the same species. The multiple sources of genetic variation include mutation and genetic recombination. Mutations are the ultimate sources of genetic variation, but other mechanisms, such as genetic drift, contribute to it, as well.

A genetic operator is an operator used in genetic algorithms to guide the algorithm towards a solution to a given problem. There are three main types of operators, which must work in conjunction with one another in order for the algorithm to be successful. Genetic operators are used to create and maintain genetic diversity, combine existing solutions into new solutions (crossover) and select between solutions (selection). In his book discussing the use of genetic programming for the optimization of complex problems, computer scientist John Koza has also identified an 'inversion' or 'permutation' operator; however, the effectiveness of this operator has never been conclusively demonstrated and this operator is rarely discussed.

A fitness function is a particular type of objective function that is used to summarise, as a single figure of merit, how close a given design solution is to achieving the set aims. Fitness functions are used in evolutionary algorithms (EA), such as genetic programming and genetic algorithms to guide simulations towards optimal design solutions.

Mutation is a genetic operator used to maintain genetic diversity of the chromosomes of a population of a genetic or, more generally, an evolutionary algorithm (EA). It is analogous to biological mutation.

In genetic algorithms (GA), or more general, evolutionary algorithms (EA), a chromosome is a set of parameters which define a proposed solution of the problem that the evolutionary algorithm is trying to solve. The set of all solutions, also called individuals according to the biological model, is known as the population. The genome of an individual consists of one, more rarely of several, chromosomes and corresponds to the genetic representation of the task to be solved. A chromosome is composed of a set of genes, where a gene consists of one or more semantically connected parameters, which are often also called decision variables. They determine one or more phenotypic characteristics of the individual or at least have an influence on them. In the basic form of genetic algorithms, the chromosome is represented as a binary string, while in later variants and in EAs in general, a wide variety of other data structures are used.

<span class="mw-page-title-main">Evolution of sexual reproduction</span> How sexually reproducing multicellular organisms could have evolved from a common ancestor species

Sexual reproduction is an adaptive feature which is common to almost all multicellular organisms and various unicellular organisms. Currently, the adaptive advantage of sexual reproduction is widely regarded as a major unsolved problem in biology. As discussed below, one prominent theory is that sex evolved as an efficient mechanism for producing variation, and this had the advantage of enabling organisms to adapt to changing environments. Another prominent theory, also discussed below, is that a primary advantage of outcrossing sex is the masking of the expression of deleterious mutations. Additional theories concerning the adaptive advantage of sex are also discussed below. Sex does, however, come with a cost. In reproducing asexually, no time nor energy needs to be expended in choosing a mate and, if the environment has not changed, then there may be little reason for variation, as the organism may already be well-adapted. However, very few environments have not changed over the millions of years that reproduction has existed. Hence it is easy to imagine that being able to adapt to changing environment imparts a benefit. Sex also halves the amount of offspring a given population is able to produce. Sex, however, has evolved as the most prolific means of species branching into the tree of life. Diversification into the phylogenetic tree happens much more rapidly via sexual reproduction than it does by way of asexual reproduction.

In population genetics, linkage disequilibrium (LD) is the non-random association of alleles at different loci in a given population. Loci are said to be in linkage disequilibrium when the frequency of association of their different alleles is higher or lower than expected if the loci were independent and associated randomly.

In computer science, an evolution strategy (ES) is an optimization technique based on ideas of evolution. It belongs to the general class of evolutionary computation or artificial evolution methodologies.

In computer programming, gene expression programming (GEP) is an evolutionary algorithm that creates computer programs or models. These computer programs are complex tree structures that learn and adapt by changing their sizes, shapes, and composition, much like a living organism. And like living organisms, the computer programs of GEP are also encoded in simple linear chromosomes of fixed length. Thus, GEP is a genotype–phenotype system, benefiting from a simple genome to keep and transmit the genetic information and a complex phenotype to explore the environment and adapt to it.

Selection is the stage of a genetic algorithm or more general evolutionary algorithm in which individual genomes are chosen from a population for later breeding. Selection mechanisms are also used to choose candidate solutions (individuals) for the next generation. Retaining the best individuals in a generation unchanged in the next generation, is called elitism or elitist selection. It is a successful (slight) variant of the general process of constructing a new population.

In evolutionary algorithms (EA), the term of premature convergence means that a population for an optimization problem converged too early, resulting in being suboptimal. In this context, the parental solutions, through the aid of genetic operators, are not able to generate offspring that are superior to, or outperform, their parents. Premature convergence is a common problem found in evolutionary algorithms in general and genetic algorithms in particular, as it leads to a loss, or convergence of, a large number of alleles, subsequently making it very difficult to search for a specific gene in which the alleles were present. An allele is considered lost if, in a population, a gene is present, where all individuals are sharing the same value for that particular gene. An allele is, as defined by De Jong, considered to be a converged allele, when 95% of a population share the same value for a certain gene.

In computer programming, genetic representation is a way of presenting solutions/individuals in evolutionary computation methods. The term encompasses both the concrete data structures and data types used to realize the genetic material of the candidate solutions in the form of a genome, and the relationships between search space and problem space. In the simplest case, the search space corresponds to the problem space. The choice of problem representation is tied to the choice of genetic operators, both of which have a decisive effect on the efficiency of the optimization. Genetic representation can encode appearance, behavior, physical qualities of individuals. Difference in genetic representations is one of the major criteria drawing a line between known classes of evolutionary computation.

A memetic algorithm (MA) in computer science and operations research, is an extension of the traditional genetic algorithm (GA) or more general evolutionary algorithm (EA). It may provide a sufficiently good solution to an optimization problem. It uses a suitable heuristic or local search technique to improve the quality of solutions generated by the EA and to reduce the likelihood of premature convergence.

References

  1. 1 2 3 Davis, Lawrence (1991). Handbook of genetic algorithms. New York: Van Nostrand Reinhold. ISBN   0-442-00173-8. OCLC   23081440.
  2. Eiben, A.E.; Smith, J.E. (2015). "Representation, Mutation, and Recombination". Introduction to Evolutionary Computing. Natural Computing Series (2nd ed.). Berlin, Heidelberg: Springer. pp. 49–78. doi:10.1007/978-3-662-44874-8. ISBN   978-3-662-44873-1. S2CID   20912932.
  3. Yu, Xinjie; Gen, Mitsuo (2010). "Encoding and Operators". Introduction to evolutionary algorithms. Decision Engineering. London: Springer. pp. 40–63. doi:10.1007/978-1-84996-129-5. ISBN   978-1-84996-129-5. OCLC   654380156.
  4. Yu, Xinjie; Gen, Mitsuo (2010). "Variation Operators for Permutation Code". Introduction to Evolutionary Algorithms. Decision Engineering. London: Springer. pp. 285–299. doi:10.1007/978-1-84996-129-5. ISBN   978-1-84996-128-8.
  5. 1 2 3 4 5 6 Booker, Lashon B.; Fogel, David B.; Whitley, Darrell; Angeline, Peter J.; Eiben, A.E. (2000). "Recombination". In Bäck, Thomas; Fogel, David B.; Michalewicz, Zbigniew (eds.). Evolutionary computation. Vol. 1, Basic algorithms and operators. Bristol: Institute of Physics Pub. pp. 256–307. ISBN   0-585-30560-9. OCLC   45730387.
  6. Syswerda, Gilbert (1989), Schaffer, J.D. (ed.), "Uniform crossover in genetic algorithms", Proceedings of the 3rd International Conference on Genetic Algorithms (ICGA), San Francisco: Morgan Kaufmann, pp. 2–9, ISBN   1558600663
  7. 1 2 Eiben, A.E.; Smith, J.E. (2015). "Recombination Operators for Real-Valued Representation". Introduction to Evolutionary Computing. Natural Computing Series (2nd ed.). Berlin, Heidelberg: Springer. pp. 65–67. doi:10.1007/978-3-662-44874-8. ISBN   978-3-662-44873-1. S2CID   20912932.
  8. Mühlenbein, Heinz; Schlierkamp-Voosen, Dirk (1993). "Predictive Models for the Breeder Genetic Algorithm I. Continuous Parameter Optimization". Evolutionary Computation. 1 (1): 25–49. doi:10.1162/evco.1993.1.1.25. ISSN   1063-6560. S2CID   16085506.
  9. Goldberg, David E. (1991). "Real-coded Genetic Algorithms, Virtual Alphabets, and Blocking". Complex Syst. 5 (2): 139–167.
  10. Stender, J.; Hillebrand, E.; Kingdon, J. (1994). Genetic algorithms in optimisation, simulation, and modelling. Amsterdam: IOS Press. ISBN   90-5199-180-0. OCLC   47216370.
  11. Schwefel, Hans-Paul (1995). Evolution and optimum seeking. New York: Wiley. ISBN   0-471-57148-2. OCLC   30701094.
  12. 1 2 3 4 Larrañaga, P.; Kuijpers, C.M.H.; Murga, R.H.; Inza, I.; Dizdarevic, S. (1999). "Genetic Algorithms for the Travelling Salesman Problem: A Review of Representations and Operators". Artificial Intelligence Review. 13 (2): 129–170. doi:10.1023/A:1006529012972. S2CID   10284682.
  13. Goldberg, David E.; Lingle, R. (1985), Grefenstette, John J. (ed.), "Alleles, loci, and the traveling salesman problem", Proceedings of the First International Conference on Genetic Algorithms and Their Applications (ICGA), Hillsdale, N.J.: Lawrence Erlbaum Associates, pp. 154–159, ISBN   0-8058-0426-9, OCLC   19702892
  14. 1 2 3 4 Eiben, A.E.; Smith, J.E. (2015). "Recombination for Permutation Representation". Introduction to Evolutionary Computing. Natural Computing Series (2nd ed.). Berlin, Heidelberg: Springer. pp. 70–74. doi:10.1007/978-3-662-44874-8. ISBN   978-3-662-44873-1. S2CID   20912932.
  15. Jakob, Wilfried; Quinte, Alexander; Stucky, Karl-Uwe; Süß, Wolfgang (2008), Rudolph, Günter; Jansen, Thomas; Beume, Nicola; Lucas, Simon (eds.), "Fast Multi-objective Scheduling of Jobs to Constrained Resources Using a Hybrid Evolutionary Algorithm", Parallel Problem Solving from Nature – PPSN X, Berlin, Heidelberg: Springer, vol. LNCS 5199, pp. 1031–1040, doi:10.1007/978-3-540-87700-4_102, ISBN   978-3-540-87699-1 , retrieved 2023-01-14
  16. Oliver, I.M.; Smith, D.J.; Holland, J. (1987), Grefenstette, John J. (ed.), "A study of permutation crossover operators on the travelling salesman problem", Proceedings of the Second International Conference on Genetic Algorithms and Their Applications (ICGA), Hillsdale, N.J.: Lawrence Erlbaum Associates, pp. 224–230, ISBN   978-0-8058-0158-3
  17. 1 2 Syswerda, Gilbert (1991). "Schedule Optimization Using Genetic Algorithms". In Davis, Lawrence (ed.). Handbook of genetic algorithms. New York: Van Nostrand Reinhold. pp. 332–349. ISBN   0-442-00173-8. OCLC   23081440.
  18. Whitley, Darrell; Starkweather, Timothy; Fuquay, D'Ann (1989), Schaffer, J.D. (ed.), "Scheduling Problems and Traveling Salesmen: The Genetic Edge Recombination Operator", Proceedings of the 3rd International Conference on Genetic Algorithms (ICGA), San Francisco: Morgan Kaufmann, pp. 133–140, ISBN   1558600663
  19. Dzubera, John; Whitley, Darrell (1994), Davidor, Yuval; Schwefel, Hans-Paul; Männer, Reinhard (eds.), "Advanced correlation analysis of operators for the traveling salesman problem", Parallel Problem Solving from Nature — PPSN III, Berlin, Heidelberg: Springer, vol. 866, pp. 68–77, doi:10.1007/3-540-58484-6_251, ISBN   978-3-540-58484-1 , retrieved 2023-01-15
  20. Blanton, Joe L.; Wainwright, Roger L. (1993), Forrest, Stephanie (ed.), "Multiple Vehicle Routing with Time and Capacity Constraints Using Genetic Algorithms", Proceedings of the 5th International Conference on Genetic Algorithms (ICGA), San Francisco: Morgan Kaufmann, pp. 452–459, ISBN   978-1-55860-299-1
  21. Ahmed, Zakir Hussain (2000). Sequential Constructive Sampling and Related approaches to Combinatorial Optimization (PhD). Tezpur University, India.
  22. Riazi, Amin (14 October 2019). "Genetic algorithm and a double-chromosome implementation to the traveling salesman problem". SN Applied Sciences. 1 (11). doi: 10.1007/s42452-019-1469-1 .