Chromosome (genetic algorithm)

Last updated

In genetic algorithms (GA), or more general, evolutionary algorithms (EA), a chromosome (also sometimes called a genotype ) is a set of parameters which define a proposed solution of the problem that the evolutionary algorithm is trying to solve. The set of all solutions, also called individuals according to the biological model, is known as the population . [1] [2] The genome of an individual consists of one, more rarely of several, [3] [4] chromosomes and corresponds to the genetic representation of the task to be solved. A chromosome is composed of a set of genes, where a gene consists of one or more semantically connected parameters, which are often also called decision variables. They determine one or more phenotypic characteristics of the individual or at least have an influence on them. [2] In the basic form of genetic algorithms, the chromosome is represented as a binary string, [5] while in later variants [6] [7] and in EAs in general, a wide variety of other data structures are used. [8] [9] [10]

Contents

Chromosome design

When creating the genetic representation of a task, it is determined which decision variables and other degrees of freedom of the task should be improved by the EA and possible additional heuristics and how the genotype-phenotype mapping should look like. The design of a chromosome translates these considerations into concrete data structures for which an EA then has to be selected, configured, extended, or, in the worst case, created. Finding a suitable representation of the problem domain for a chromosome is an important consideration, as a good representation will make the search easier by limiting the search space; similarly, a poorer representation will allow a larger search space. [11] In this context, suitable mutation and crossover operators [2] must also be found or newly defined to fit the chosen chromosome design. An important requirement for these operators is that they not only allow all points in the search space to be reached in principle, but also make this as easy as possible. [12] [13]

The following requirements must be met by a well-suited chromosome:

While the first requirement is indispensable, depending on the application and the EA used, one usually only has to be satisfied with fulfilling the remaining requirements as far as possible. It should be noted, however, that the evolutionary search is supported and possibly considerably accelerated by a fulfillment as complete as possible.

Examples of chromosomes

Chromosomes for binary codings

In their classical form, GAs use bit strings and map the decision variables to be optimized onto them. An example for one boolean and three integer decision variables with the value ranges , and may illustrate this:

Example representation of four decision variables in a bitstring
decision variable:
bits:01011011101111000
position:1716151413121110987654321

Note that the negative number here is given in two's complement. This straight forward representation uses five bits to represent the three values of , although two bits would suffice. This is a significant redundancy. An improved alternative, where 28 is to be added for the genotype-phenotype mapping, could look like this:

Example of an improved representation of the four decision variables
decision variable:
bits:01011001111000
position:1413121110987654321

with .

Chromosomes with real-valued or integer genes

For the processing of tasks with real-valued or mixed-integer decision variables, EAs such as the evolution strategy [15] or the real-coded GAs [16] [17] [18] are suited. In the case of mixed-integer values, rounding is often used, but this represents some violation of the redundancy requirement. If the necessary precisions of the real values can be reasonably narrowed down, this violation can be remedied by using integer-coded GAs. [19] [20] For this purpose, the valid digits of real values are mapped to integers by multiplication with a suitable factor. For example, 12.380 becomes the integer 12380 by multiplying by 1000. This must of course be taken into account in genotype-phenotype mapping for evaluation and result presentation. A common form is a chromosome consisting of a list or an array of integer or real values.

Chromosomes for permutations

Combinatorial problems are mainly concerned with finding an optimal sequence of a set of elementary items. As an example, consider the problem of the traveling salesman who wants to visit a given number of cities exactly once on the shortest possible tour. The simplest and most obvious mapping onto a chromosome is to number the cities consecutively, to interpret a resulting sequence as permutation and to store it directly in a chromosome, where one gene corresponds to the ordinal number of a city. [21] Then, however, the variation operators may only change the gene order and not remove or duplicate any genes. [22] The chromosome thus contains the path of a possible tour to the cities. As an example the sequence of nine cities may serve, to which the following chromosome corresponds:

357142968

In addition to this encoding frequently called path representation, there are several other ways of representing a permutation, for example the ordinal representation or the matrix representation. [22] [23]

Chromosomes for co-evolution

When a genetic representation contains, in addition to the decision variables, additional information that influences evolution and/or the mapping of the genotype to the phenotype and is itself subject to evolution, this is referred to as co-evolution. A typical example is the evolution strategy (ES), which includes one or more mutation step sizes as strategy parameters in each chromosome. [15] Another example is an additional gene to control a selection heuristic for resource allocation in a scheduling tasks. [24]

This approach is based on the assumption that good solutions are based on an appropriate selection of strategy parameters or on control gene(s) that influences genotype-phenotype mapping. The success of the ES gives evidence to this assumption.

Chromosomes for complex representations

The chromosomes presented above are well suited for processing tasks of continuous, mixed-integer, pure-integer or combinatorial optimization. For a combination of these optimization areas, on the other hand, it becomes increasingly difficult to map them to simple strings of values, depending on the task. The following extension of the gene concept is proposed by the EA GLEAM (General Learning Evolutionary Algorithm and Method) for this purpose: [25] A gene is considered to be the description of an element or elementary trait of the phenotype, which may have multiple parameters. For this purpose, gene types are defined that contain as many parameters of the appropriate data type as are required to describe the particular element of the phenotype. A chromosome now consists of genes as data objects of the gene types, whereby, depending on the application, each gene type occurs exactly once as a gene or can be contained in the chromosome any number of times. The latter leads to chromosomes of dynamic length, as they are required for some problems. [26] [27] The gene type definitions also contain information on the permissible value ranges of the gene parameters, which are observed during chromosome generation and by corresponding mutations, so they cannot lead to lethal mutations. For tasks with a combinatorial part, there are suitable genetic operators that can move or reposition genes as a whole, i.e. with their parameters.

Three exemplary genes matching the adjacent gene type definitions in a chromosome organized as a list Genmodell Chromosombeispiel.png
Three exemplary genes matching the adjacent gene type definitions in a chromosome organized as a list
Three exemplary genes matching the adjacent gene type definitions in a chromosome organized as a list Gene model gene types.png
Three exemplary genes matching the adjacent gene type definitions in a chromosome organized as a list

A scheduling task is used as an illustration, in which workflows are to be scheduled that require different numbers of heterogeneous resources. A workflow specifies which work steps can be processed in parallel and which have to be executed one after the other. In this context, heterogeneous resources mean different processing times at different costs in addition to different processing capabilities. [24] Each scheduling operation therefore requires one or more parameters that determine the resource selection, where the value ranges of the parameters depend on the number of alternative resources available for each work step. A suitable chromosome provides one gene type per work step and in this case one corresponding gene, which has one parameter for each required resource. The order of genes determines the order of scheduling operations and, therefore, the precedence in case of allocation conflicts. The exemplary gene type definition of work step 15 with two resources, for which there are four and seven alternatives respectively, would then look as shown in the left image. Since the parameters represent indices in lists of available resources for the respective work step, their value range starts at 0. The right image shows an example of three genes of a chromosome belonging to the gene types in list representation.

Syntax tree of a formula example Genetic Program Tree.png
Syntax tree of a formula example

Chromosomes for tree representations

Tree representations in a chromosome are used by genetic programming, an EA type for generating computer programs or circuits. [10] The trees correspond to the syntax trees generated by a compiler as internal representation when translating a computer program. The adjacent figure shows the syntax tree of a mathematical expression as an example. Mutation operators can rearrange, change or delete subtrees depending on the represented syntax structure. Recombination is performed by exchanging suitable subtrees. [28]

Bibliography

Related Research Articles

<span class="mw-page-title-main">Genetic algorithm</span> Competitive algorithm for searching a problem space

In computer science and operations research, a genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to generate high-quality solutions to optimization and search problems by relying on biologically inspired operators such as mutation, crossover and selection. Some examples of GA applications include optimizing decision trees for better performance, solving sudoku puzzles, hyperparameter optimization, causal inference, etc.

In computational intelligence (CI), an evolutionary algorithm (EA) is a subset of evolutionary computation, a generic population-based metaheuristic optimization algorithm. An EA uses mechanisms inspired by biological evolution, such as reproduction, mutation, recombination, and selection. Candidate solutions to the optimization problem play the role of individuals in a population, and the fitness function determines the quality of the solutions. Evolution of the population then takes place after the repeated application of the above operators.

In computer science, evolutionary computation is a family of algorithms for global optimization inspired by biological evolution, and the subfield of artificial intelligence and soft computing studying these algorithms. In technical terms, they are a family of population-based trial and error problem solvers with a metaheuristic or stochastic optimization character.

<span class="mw-page-title-main">Particle swarm optimization</span> Iterative simulation method

In computational science, particle swarm optimization (PSO) is a computational method that optimizes a problem by iteratively trying to improve a candidate solution with regard to a given measure of quality. It solves a problem by having a population of candidate solutions, here dubbed particles, and moving these particles around in the search-space according to simple mathematical formula over the particle's position and velocity. Each particle's movement is influenced by its local best known position, but is also guided toward the best known positions in the search-space, which are updated as better positions are found by other particles. This is expected to move the swarm toward the best solutions.

A fitness function is a particular type of objective function that is used to summarise, as a single figure of merit, how close a given design solution is to achieving the set aims. Fitness functions are used in evolutionary algorithms (EA), such as genetic programming and genetic algorithms to guide simulations towards optimal design solutions.

Neuroevolution, or neuro-evolution, is a form of artificial intelligence that uses evolutionary algorithms to generate artificial neural networks (ANN), parameters, and rules. It is most commonly applied in artificial life, general game playing and evolutionary robotics. The main benefit is that neuroevolution can be applied more widely than supervised learning algorithms, which require a syllabus of correct input-output pairs. In contrast, neuroevolution requires only a measure of a network's performance at a task. For example, the outcome of a game can be easily measured without providing labeled examples of desired strategies. Neuroevolution is commonly used as part of the reinforcement learning paradigm, and it can be contrasted with conventional deep learning techniques that use gradient descent on a neural network with a fixed topology.

In genetic algorithms and evolutionary computation, crossover, also called recombination, is a genetic operator used to combine the genetic information of two parents to generate new offspring. It is one way to stochastically generate new solutions from an existing population, and is analogous to the crossover that happens during sexual reproduction in biology. Solutions can also be generated by cloning an existing solution, which is analogous to asexual reproduction. Newly generated solutions may be mutated before being added to the population.

Mutation is a genetic operator used to maintain genetic diversity of the chromosomes of a population of a genetic or, more generally, an evolutionary algorithm (EA). It is analogous to biological mutation.

In computer science and mathematical optimization, a metaheuristic is a higher-level procedure or heuristic designed to find, generate, tune, or select a heuristic that may provide a sufficiently good solution to an optimization problem or a machine learning problem, especially with incomplete or imperfect information or limited computation capacity. Metaheuristics sample a subset of solutions which is otherwise too large to be completely enumerated or otherwise explored. Metaheuristics may make relatively few assumptions about the optimization problem being solved and so may be usable for a variety of problems.

In computer science, an evolution strategy (ES) is an optimization technique based on ideas of evolution. It belongs to the general class of evolutionary computation or artificial evolution methodologies.

In computer programming, gene expression programming (GEP) is an evolutionary algorithm that creates computer programs or models. These computer programs are complex tree structures that learn and adapt by changing their sizes, shapes, and composition, much like a living organism. And like living organisms, the computer programs of GEP are also encoded in simple linear chromosomes of fixed length. Thus, GEP is a genotype–phenotype system, benefiting from a simple genome to keep and transmit the genetic information and a complex phenotype to explore the environment and adapt to it.

Selection is the stage of a genetic algorithm or more general evolutionary algorithm in which individual genomes are chosen from a population for later breeding. Selection mechanisms are also used to choose candidate solutions (individuals) for the next generation. Retaining the best individuals in a generation unchanged in the next generation, is called elitism or elitist selection. It is a successful (slight) variant of the general process of constructing a new population.

In evolutionary algorithms (EA), the term of premature convergence means that a population for an optimization problem converged too early, resulting in being suboptimal. In this context, the parental solutions, through the aid of genetic operators, are not able to generate offspring that are superior to, or outperform, their parents. Premature convergence is a common problem found in evolutionary algorithms in general and genetic algorithms in particular, as it leads to a loss, or convergence of, a large number of alleles, subsequently making it very difficult to search for a specific gene in which the alleles were present. An allele is considered lost if, in a population, a gene is present, where all individuals are sharing the same value for that particular gene. An allele is, as defined by De Jong, considered to be a converged allele, when 95% of a population share the same value for a certain gene.

The expression computational intelligence (CI) usually refers to the ability of a computer to learn a specific task from data or experimental observation. Even though it is commonly considered a synonym of soft computing, there is still no commonly accepted definition of computational intelligence.

<span class="mw-page-title-main">Estimation of distribution algorithm</span>

Estimation of distribution algorithms (EDAs), sometimes called probabilistic model-building genetic algorithms (PMBGAs), are stochastic optimization methods that guide the search for the optimum by building and sampling explicit probabilistic models of promising candidate solutions. Optimization is viewed as a series of incremental updates of a probabilistic model, starting with the model encoding an uninformative prior over admissible solutions and ending with the model that generates only the global optima.

In computer programming, genetic representation is a way of presenting solutions/individuals in evolutionary computation methods. The term encompasses both the concrete data structures and data types used to realize the genetic material of the candidate solutions in the form of a genome, and the relationships between search space and problem space. In the simplest case, the search space corresponds to the problem space. The choice of problem representation is tied to the choice of genetic operators, both of which have a decisive effect on the efficiency of the optimization. Genetic representation can encode appearance, behavior, physical qualities of individuals. Difference in genetic representations is one of the major criteria drawing a line between known classes of evolutionary computation.

A memetic algorithm (MA) in computer science and operations research, is an extension of the traditional genetic algorithm (GA) or more general evolutionary algorithm (EA). It may provide a sufficiently good solution to an optimization problem. It uses a suitable heuristic or local search technique to improve the quality of solutions generated by the EA and to reduce the likelihood of premature convergence.

Grammatical evolution (GE) is an evolutionary computation and, more specifically, a genetic programming (GP) technique (or approach) pioneered by Conor Ryan, JJ Collins and Michael O'Neill in 1998 at the BDS Group in the University of Limerick.

<span class="mw-page-title-main">Meta-optimization</span>

In numerical optimization, meta-optimization is the use of one optimization method to tune another optimization method. Meta-optimization is reported to have been used as early as in the late 1970s by Mercer and Sampson for finding optimal parameter settings of a genetic algorithm.

The population model of an evolutionary algorithm (EA) describes the structural properties of its population to which its members are subject. A population is the set of all proposed solutions of an EA considered in one iteration, which are also called individuals according to the biological role model. The individuals of a population can generate further individuals as offspring with the help of the genetic operators of the procedure.

References

  1. "Introduction to genetic algorithms: IV. Genetic Algorithm" . Retrieved 12 August 2015.
  2. 1 2 3 Eiben, A.E.; Smith, J.E. (2015). "Components of Evolutionary Algorithms". Introduction to Evolutionary Computing. Natural Computing Series. Berlin, Heidelberg: Springer. pp. 28–34. doi:10.1007/978-3-662-44874-8. ISBN   978-3-662-44873-1. S2CID   20912932.
  3. Baine, Nicholas (2008), "A simple multi-chromosome genetic algorithm optimization of a Proportional-plus-Derivative Fuzzy Logic Controller", NAFIPS 2008 - 2008 Annual Meeting of the North American Fuzzy Information Processing Society, IEEE, pp. 1–5, doi:10.1109/NAFIPS.2008.4531273, ISBN   978-1-4244-2351-4, S2CID   46591432
  4. Peng, Jin; Chu, Zhang Shu (2010), "A Hybrid Multi-chromosome Genetic Algorithm for the Cutting Stock Problem", 3rd International Conference on Information Management, Innovation Management and Industrial Engineering, IEEE, pp. 508–511, doi:10.1109/ICIII.2010.128, ISBN   978-1-4244-8829-2, S2CID   15608610
  5. Holland, John H. (1992). Adaptation in natural and artificial systems. Cambridge, Mass.: MIT Press. ISBN   0-585-03844-9. OCLC   42854623.
  6. Janikow, C.Z.; Michalewicz, Z. (1991), Belew, Richard K.; Booker, Lashon B. (eds.), "An Experimental Comparison of Binary and Floating Point Representations in Genetic Algorithms" (PDF), Proceedings of the Fourth International Conference on Genetic Algorithms, San Francisco, CA: Morgan Kaufmann Publishers, pp. 31–36, ISBN   1-55860-208-9
  7. Whitley, Darrell (June 1994). "A genetic algorithm tutorial". Statistics and Computing. 4 (2). CiteSeerX   10.1.1.184.3999 . doi:10.1007/BF00175354. S2CID   3447126.
  8. Whitley, Darrell (2001). "An overview of evolutionary algorithms: practical issues and common pitfalls". Information and Software Technology. 43 (14): 817–831. doi:10.1016/S0950-5849(01)00188-4. S2CID   18637958.
  9. Bäck, Thomas; Hoffmeister, Frank; Schwefel, Hans-Paul (1991), Belew, Richard K.; Booker, Lashon B. (eds.), "A Survey of Evolution Strategies", Proceedings of the Fourth International Conference on Genetic Algorithms, San Francisco, CA: Morgan Kaufmann Publishers, pp. 2–9, ISBN   1-55860-208-9
  10. 1 2 Koza, John R. (1992). Genetic programming : on the programming of computers by means of natural selection. Cambridge, Mass.: MIT Press. ISBN   0-262-11170-5. OCLC   26263956.
  11. "Genetic algorithms". Archived from the original on 22 October 2019. Retrieved 12 August 2015.
  12. Rothlauf, Franz (2002). Representations for Genetic and Evolutionary Algorithms. Studies in Fuzziness and Soft Computing. Vol. 104. Heidelberg: Physica-Verlag HD. p. 31. doi:10.1007/978-3-642-88094-0. ISBN   978-3-642-88096-4.
  13. Eiben, A.E.; Smith, J.E. (2015). "Representation and the Roles of Variation Operators". Introduction to Evolutionary Computing. Natural Computing Series. Berlin, Heidelberg: Springer. pp. 49–51. doi:10.1007/978-3-662-44874-8. ISBN   978-3-662-44873-1. S2CID   20912932.
  14. Galván-López, Edgar; McDermott, James; O'Neill, Michael; Brabazon, Anthony (2010-07-07). "Towards an understanding of locality in genetic programming". Proceedings of the 12th annual conference on Genetic and evolutionary computation (PDF). Portland Oregon USA: ACM. pp. 901–908. doi:10.1145/1830483.1830646. ISBN   978-1-4503-0072-8. S2CID   15348983.
  15. 1 2 Schwefel, Hans-Paul (1995). Evolution and optimum seeking. New York: John Wiley & Sons. ISBN   0-471-57148-2. OCLC   30701094.
  16. Eshelman, Larry J.; Schaffer, J. David (1993), "Real-Coded Genetic Algorithms and Interval-Schemata", Foundations of Genetic Algorithms, Elsevier, vol. 2, pp. 187–202, doi:10.1016/b978-0-08-094832-4.50018-0, ISBN   978-0-08-094832-4 , retrieved 2023-01-26
  17. Michalewicz, Zbigniew (1996). Genetic Algorithms + Data Structures = Evolution Programs. Third, revised and extended edition. Berlin, Heidelberg: Springer. ISBN   978-3-662-03315-9. OCLC   851375253.
  18. Deep, Kusum; Singh, Krishna Pratap; Kansal, M.L.; Mohan, C. (June 2009). "A real coded genetic algorithm for solving integer and mixed integer optimization problems". Applied Mathematics and Computation. 212 (2): 505–518. doi:10.1016/j.amc.2009.02.044.
  19. Wang, Fuchang; Cao, Huirong; Qian, Xiaoshi (2011), Liu, Baoxiang; Chai, Chunlai (eds.), "Decimal-Integer-Coded Genetic Algorithm for Trimmed Estimator of the Multiple Linear Errors in Variables Model", Information Computing and Applications, LNCS 7030, Berlin, Heidelberg: Springer, pp. 359–366, doi:10.1007/978-3-642-25255-6_46, ISBN   978-3-642-25254-9 , retrieved 2023-01-23
  20. Cheng, Xueli; An, Linchao; Zhang, Zhenhua (2019). "Integer Encoding Genetic Algorithm for Optimizing Redundancy Allocation of Series-parallel Systems". Journal of Engineering Science and Technology Review. 12 (1): 126–136. doi: 10.25103/JESTR.121.15 . S2CID   149497992.
  21. Eiben, A.E.; Smith, J.E. (2015). "Permutation Representation". Introduction to Evolutionary Computing. Natural Computing Series. Berlin, Heidelberg: Springer. pp. 67–74. doi:10.1007/978-3-662-44874-8. ISBN   978-3-662-44873-1. S2CID   20912932.
  22. 1 2 Larrañaga, P.; Kuijpers, C.M.H.; Murga, R.H.; Inza, I.; Dizdarevic, S. (1999). "Genetic Algorithms for the Travelling Salesman Problem: A Review of Representations and Operators". Artificial Intelligence Review. 13 (2): 129–170. doi:10.1023/A:1006529012972. S2CID   10284682.
  23. Whitley, Darrell (2000). "Permutations". In Fogel, David B.; Bäck, Thomas; Michalewicz, Zbigniew (eds.). Evolutionary computation. Vol. 1, Basic algorithms and operators. Bristol: Institute of Physics Pub. pp. 139–150. ISBN   0-585-30560-9. OCLC   45730387.
  24. 1 2 Jakob, Wilfried; Strack, Sylvia; Quinte, Alexander; Bengel, Günther; Stucky, Karl-Uwe; Süß, Wolfgang (2013-04-22). "Fast Rescheduling of Multiple Workflows to Constrained Heterogeneous Resources Using Multi-Criteria Memetic Computing". p.253-255. Algorithms. 6 (2): 245–277. doi: 10.3390/a6020245 . ISSN   1999-4893.
  25. Blume, Christian; Jakob, Wilfried (2002), "GLEAM - An Evolutionary Algorithm for Planning and Control Based on Evolution Strategy", Conf. Proc. of Genetic and Evolutionary Computation Conference (GECCO 2002), vol. Late Breaking Papers, pp. 31–38, retrieved 2023-01-01
  26. Pawar, Sunil Nilkanth; Bichkar, Rajankumar Sadashivrao (June 2015). "Genetic algorithm with variable length chromosomes for network intrusion detection". International Journal of Automation and Computing. 12 (3): 337–342. doi: 10.1007/s11633-014-0870-x . ISSN   1476-8186. S2CID   255346767.
  27. Blume, Christian (2000), Cagnoni, Stefano (ed.), "Optimized Collision Free Robot Move Statement Generation by the Evolutionary Software GLEAM", Real-World Applications of Evolutionary Computing, Lecture Notes in Computer Science, Berlin, Heidelberg: Springer Berlin Heidelberg, vol. 1803, pp. 330–341, doi:10.1007/3-540-45561-2_32, ISBN   978-3-540-67353-8 , retrieved 2023-06-25
  28. Eiben, A.E.; Smith, J.E. (2015). "Tree Representation". Introduction to Evolutionary Computing. Natural Computing Series. Berlin, Heidelberg: Springer. pp. 75–78. doi:10.1007/978-3-662-44874-8. ISBN   978-3-662-44873-1. S2CID   20912932.