Learning classifier system

Last updated
2D visualization of LCS rules learning to approximate a 3D function. Each blue ellipse represents an individual rule covering part of the solution space. (Adapted from images taken from XCSF with permission from Martin Butz) Function approximation with LCS rules.jpg
2D visualization of LCS rules learning to approximate a 3D function. Each blue ellipse represents an individual rule covering part of the solution space. (Adapted from images taken from XCSF with permission from Martin Butz)

Learning classifier systems, or LCS, are a paradigm of rule-based machine learning methods that combine a discovery component (e.g. typically a genetic algorithm) with a learning component (performing either supervised learning, reinforcement learning, or unsupervised learning). [2] Learning classifier systems seek to identify a set of context-dependent rules that collectively store and apply knowledge in a piecewise manner in order to make predictions (e.g. behavior modeling, [3] classification, [4] [5] data mining, [5] [6] [7] regression, [8] function approximation, [9] or game strategy). This approach allows complex solution spaces to be broken up into smaller, simpler parts.

Contents

The founding concepts behind learning classifier systems came from attempts to model complex adaptive systems, using rule-based agents to form an artificial cognitive system (i.e. artificial intelligence).

Methodology

The architecture and components of a given learning classifier system can be quite variable. It is useful to think of an LCS as a machine consisting of several interacting components. Components may be added or removed, or existing components modified/exchanged to suit the demands of a given problem domain (like algorithmic building blocks) or to make the algorithm flexible enough to function in many different problem domains. As a result, the LCS paradigm can be flexibly applied to many problem domains that call for machine learning. The major divisions among LCS implementations are as follows: (1) Michigan-style architecture vs. Pittsburgh-style architecture, [10] (2) reinforcement learning vs. supervised learning, (3) incremental learning vs. batch learning, (4) online learning vs. offline learning, (5) strength-based fitness vs. accuracy-based fitness, and (6) complete action mapping vs best action mapping. These divisions are not necessarily mutually exclusive. For example, XCS, [11] the best known and best studied LCS algorithm, is Michigan-style, was designed for reinforcement learning but can also perform supervised learning, applies incremental learning that can be either online or offline, applies accuracy-based fitness, and seeks to generate a complete action mapping.

Elements of a generic LCS algorithm

A step-wise schematic illustrating a generic Michigan-style learning classifier system learning cycle performing supervised learning Generic Michigan-style Supervised LCS Schematic.png
A step-wise schematic illustrating a generic Michigan-style learning classifier system learning cycle performing supervised learning

Keeping in mind that LCS is a paradigm for genetic-based machine learning rather than a specific method, the following outlines key elements of a generic, modern (i.e. post-XCS) LCS algorithm. For simplicity let us focus on Michigan-style architecture with supervised learning. See the illustrations on the right laying out the sequential steps involved in this type of generic LCS.

Environment

The environment is the source of data upon which an LCS learns. It can be an offline, finite training dataset (characteristic of a data mining, classification, or regression problem), or an online sequential stream of live training instances. Each training instance is assumed to include some number of features (also referred to as attributes, or independent variables), and a single endpoint of interest (also referred to as the class, action, phenotype , prediction, or dependent variable). Part of LCS learning can involve feature selection, therefore not all of the features in the training data need to be informative. The set of feature values of an instance is commonly referred to as the state. For simplicity let's assume an example problem domain with Boolean/binary features and a Boolean/binary class. For Michigan-style systems, one instance from the environment is trained on each learning cycle (i.e. incremental learning). Pittsburgh-style systems perform batch learning, where rule sets are evaluated in each iteration over much or all of the training data.

Rule/classifier/population

A rule is a context dependent relationship between state values and some prediction. Rules typically take the form of an {IF:THEN} expression, (e.g. {IF 'condition' THEN 'action'}, or as a more specific example, {IF 'red' AND 'octagon' THEN 'stop-sign'}). A critical concept in LCS and rule-based machine learning alike, is that an individual rule is not in itself a model, since the rule is only applicable when its condition is satisfied. Think of a rule as a "local-model" of the solution space.

Rules can be represented in many different ways to handle different data types (e.g. binary, discrete-valued, ordinal, continuous-valued). Given binary data LCS traditionally applies a ternary rule representation (i.e. rules can include either a 0, 1, or '#' for each feature in the data). The 'don't care' symbol (i.e. '#') serves as a wild card within a rule's condition allowing rules, and the system as a whole to generalize relationships between features and the target endpoint to be predicted. Consider the following rule (#1###0 ~ 1) (i.e. condition ~ action). This rule can be interpreted as: IF the second feature = 1 AND the sixth feature = 0 THEN the class prediction = 1. We would say that the second and sixth features were specified in this rule, while the others were generalized. This rule, and the corresponding prediction are only applicable to an instance when the condition of the rule is satisfied by the instance. This is more commonly referred to as matching. In Michigan-style LCS, each rule has its own fitness, as well as a number of other rule-parameters associated with it that can describe the number of copies of that rule that exist (i.e. the numerosity), the age of the rule, its accuracy, or the accuracy of its reward predictions, and other descriptive or experiential statistics. A rule along with its parameters is often referred to as a classifier. In Michigan-style systems, classifiers are contained within a population [P] that has a user defined maximum number of classifiers. Unlike most stochastic search algorithms (e.g. evolutionary algorithms), LCS populations start out empty (i.e. there is no need to randomly initialize a rule population). Classifiers will instead be initially introduced to the population with a covering mechanism.

In any LCS, the trained model is a set of rules/classifiers, rather than any single rule/classifier. In Michigan-style LCS, the entire trained (and optionally, compacted) classifier population forms the prediction model.

Matching

One of the most critical and often time-consuming elements of an LCS is the matching process. The first step in an LCS learning cycle takes a single training instance from the environment and passes it to [P] where matching takes place. In step two, every rule in [P] is now compared to the training instance to see which rules match (i.e. are contextually relevant to the current instance). In step three, any matching rules are moved to a match set [M]. A rule matches a training instance if all feature values specified in the rule condition are equivalent to the corresponding feature value in the training instance. For example, assuming the training instance is (001001 ~ 0), these rules would match: (###0## ~ 0), (00###1 ~ 0), (#01001 ~ 1), but these rules would not (1##### ~ 0), (000##1 ~ 0), (#0#1#0 ~ 1). Notice that in matching, the endpoint/action specified by the rule is not taken into consideration. As a result, the match set may contain classifiers that propose conflicting actions. In the fourth step, since we are performing supervised learning, [M] is divided into a correct set [C] and an incorrect set [I]. A matching rule goes into the correct set if it proposes the correct action (based on the known action of the training instance), otherwise it goes into [I]. In reinforcement learning LCS, an action set [A] would be formed here instead, since the correct action is not known.

Covering

At this point in the learning cycle, if no classifiers made it into either [M] or [C] (as would be the case when the population starts off empty), the covering mechanism is applied (fifth step). Covering is a form of online smart population initialization. Covering randomly generates a rule that matches the current training instance (and in the case of supervised learning, that rule is also generated with the correct action. Assuming the training instance is (001001 ~ 0), covering might generate any of the following rules: (#0#0## ~ 0), (001001 ~ 0), (#010## ~ 0). Covering not only ensures that each learning cycle there is at least one correct, matching rule in [C], but that any rule initialized into the population will match at least one training instance. This prevents LCS from exploring the search space of rules that do not match any training instances.

Parameter updates/credit assignment/learning

In the sixth step, the rule parameters of any rule in [M] are updated to reflect the new experience gained from the current training instance. Depending on the LCS algorithm, a number of updates can take place at this step. For supervised learning, we can simply update the accuracy/error of a rule. Rule accuracy/error is different than model accuracy/error, since it is not calculated over the entire training data, but only over all instances that it matched. Rule accuracy is calculated by dividing the number of times the rule was in a correct set [C] by the number of times it was in a match set [M]. Rule accuracy can be thought of as a 'local accuracy'. Rule fitness is also updated here, and is commonly calculated as a function of rule accuracy. The concept of fitness is taken directly from classic genetic algorithms. Be aware that there are many variations on how LCS updates parameters in order to perform credit assignment and learning.

Subsumption

In the seventh step, a subsumption mechanism is typically applied. Subsumption is an explicit generalization mechanism that merges classifiers that cover redundant parts of the problem space. The subsuming classifier effectively absorbs the subsumed classifier (and has its numerosity increased). This can only happen when the subsuming classifier is more general, just as accurate, and covers all of the problem space of the classifier it subsumes.

Rule discovery/genetic algorithm

In the eighth step, LCS adopts a highly elitist genetic algorithm (GA) which will select two parent classifiers based on fitness (survival of the fittest). Parents are selected from [C] typically using tournament selection. Some systems have applied roulette wheel selection or deterministic selection, and have differently selected parent rules from either [P] - panmictic selection, or from [M]). Crossover and mutation operators are now applied to generate two new offspring rules. At this point, both the parent and offspring rules are returned to [P]. The LCS genetic algorithm is highly elitist since each learning iteration, the vast majority of the population is preserved. Rule discovery may alternatively be performed by some other method, such as an estimation of distribution algorithm, but a GA is by far the most common approach. Evolutionary algorithms like the GA employ a stochastic search, which makes LCS a stochastic algorithm. LCS seeks to cleverly explore the search space, but does not perform an exhaustive search of rule combinations, and is not guaranteed to converge on an optimal solution.

Deletion

The last step in a generic LCS learning cycle is to maintain the maximum population size. The deletion mechanism will select classifiers for deletion (commonly using roulette wheel selection). The probability of a classifier being selected for deletion is inversely proportional to its fitness. When a classifier is selected for deletion, its numerosity parameter is reduced by one. When the numerosity of a classifier is reduced to zero, it is removed entirely from the population.

Training

LCS will cycle through these steps repeatedly for some user defined number of training iterations, or until some user defined termination criteria have been met. For online learning, LCS will obtain a completely new training instance each iteration from the environment. For offline learning, LCS will iterate through a finite training dataset. Once it reaches the last instance in the dataset, it will go back to the first instance and cycle through the dataset again.

Rule compaction

Once training is complete, the rule population will inevitably contain some poor, redundant and inexperienced rules. It is common to apply a rule compaction, or condensation heuristic as a post-processing step. This resulting compacted rule population is ready to be applied as a prediction model (e.g. make predictions on testing instances), and/or to be interpreted for knowledge discovery.

Prediction

Whether or not rule compaction has been applied, the output of an LCS algorithm is a population of classifiers which can be applied to making predictions on previously unseen instances. The prediction mechanism is not part of the supervised LCS learning cycle itself, however it would play an important role in a reinforcement learning LCS learning cycle. For now we consider how the prediction mechanism can be applied for making predictions to test data. When making predictions, the LCS learning components are deactivated so that the population does not continue to learn from incoming testing data. A test instance is passed to [P] where a match set [M] is formed as usual. At this point the match set is differently passed to a prediction array. Rules in the match set can predict different actions, therefore a voting scheme is applied. In a simple voting scheme, the action with the strongest supporting 'votes' from matching rules wins, and becomes the selected prediction. All rules do not get an equal vote. Rather the strength of the vote for a single rule is commonly proportional to its numerosity and fitness. This voting scheme and the nature of how LCS's store knowledge, suggests that LCS algorithms are implicitly ensemble learners.

Interpretation

Individual LCS rules are typically human readable IF:THEN expression. Rules that constitute the LCS prediction model can be ranked by different rule parameters and manually inspected. Global strategies to guide knowledge discovery using statistical and graphical have also been proposed. [12] [13] With respect to other advanced machine learning approaches, such as artificial neural networks, random forests, or genetic programming, learning classifier systems are particularly well suited to problems that require interpretable solutions.

History

Early years

John Henry Holland was best known for his work popularizing genetic algorithms (GA), through his ground-breaking book "Adaptation in Natural and Artificial Systems" [14] in 1975 and his formalization of Holland's schema theorem. In 1976, Holland conceptualized an extension of the GA concept to what he called a "cognitive system", [15] and provided the first detailed description of what would become known as the first learning classifier system in the paper "Cognitive Systems based on Adaptive Algorithms". [16] This first system, named Cognitive System One (CS-1) was conceived as a modeling tool, designed to model a real system (i.e. environment) with unknown underlying dynamics using a population of human readable rules. The goal was for a set of rules to perform online machine learning to adapt to the environment based on infrequent payoff/reward (i.e. reinforcement learning) and apply these rules to generate a behavior that matched the real system. This early, ambitious implementation was later regarded as overly complex, yielding inconsistent results. [2] [17]

Beginning in 1980, Kenneth de Jong and his student Stephen Smith took a different approach to rule-based machine learning with (LS-1), where learning was viewed as an offline optimization process rather than an online adaptation process. [18] [19] [20] This new approach was more similar to a standard genetic algorithm but evolved independent sets of rules. Since that time LCS methods inspired by the online learning framework introduced by Holland at the University of Michigan have been referred to as Michigan-style LCS, and those inspired by Smith and De Jong at the University of Pittsburgh have been referred to as Pittsburgh-style LCS. [2] [17] In 1986, Holland developed what would be considered the standard Michigan-style LCS for the next decade. [21]

Other important concepts that emerged in the early days of LCS research included (1) the formalization of a bucket brigade algorithm (BBA) for credit assignment/learning, [22] (2) selection of parent rules from a common 'environmental niche' (i.e. the match set [M]) rather than from the whole population [P], [23] (3) covering, first introduced as a create operator, [24] (4) the formalization of an action set [A], [24] (5) a simplified algorithm architecture, [24] (6) strength-based fitness, [21] (7) consideration of single-step, or supervised learning problems [25] and the introduction of the correct set [C], [26] (8) accuracy-based fitness [27] (9) the combination of fuzzy logic with LCS [28] (which later spawned a lineage of fuzzy LCS algorithms), (10) encouraging long action chains and default hierarchies for improving performance on multi-step problems, [29] [30] [31] (11) examining latent learning (which later inspired a new branch of anticipatory classifier systems (ACS) [32] ), and (12) the introduction of the first Q-learning-like credit assignment technique. [33] While not all of these concepts are applied in modern LCS algorithms, each were landmarks in the development of the LCS paradigm.

The revolution

Interest in learning classifier systems was reinvigorated in the mid 1990s largely due to two events; the development of the Q-Learning algorithm [34] for reinforcement learning, and the introduction of significantly simplified Michigan-style LCS architectures by Stewart Wilson. [11] [35] Wilson's Zeroth-level Classifier System (ZCS) [35] focused on increasing algorithmic understandability based on Hollands standard LCS implementation. [21] This was done, in part, by removing rule-bidding and the internal message list, essential to the original BBA credit assignment, and replacing it with a hybrid BBA/Q-Learning strategy. ZCS demonstrated that a much simpler LCS architecture could perform as well as the original, more complex implementations. However, ZCS still suffered from performance drawbacks including the proliferation of over-general classifiers.

In 1995, Wilson published his landmark paper, "Classifier fitness based on accuracy" in which he introduced the classifier system XCS. [11] XCS took the simplified architecture of ZCS and added an accuracy-based fitness, a niche GA (acting in the action set [A]), an explicit generalization mechanism called subsumption, and an adaptation of the Q-Learning credit assignment. XCS was popularized by its ability to reach optimal performance while evolving accurate and maximally general classifiers as well as its impressive problem flexibility (able to perform both reinforcement learning and supervised learning). XCS later became the best known and most studied LCS algorithm and defined a new family of accuracy-based LCS. ZCS alternatively became synonymous with strength-based LCS. XCS is also important, because it successfully bridged the gap between LCS and the field of reinforcement learning. Following the success of XCS, LCS were later described as reinforcement learning systems endowed with a generalization capability. [36] Reinforcement learning typically seeks to learn a value function that maps out a complete representation of the state/action space. Similarly, the design of XCS drives it to form an all-inclusive and accurate representation of the problem space (i.e. a complete map) rather than focusing on high payoff niches in the environment (as was the case with strength-based LCS). Conceptually, complete maps don't only capture what you should do, or what is correct, but also what you shouldn't do, or what's incorrect. Differently, most strength-based LCSs, or exclusively supervised learning LCSs seek a rule set of efficient generalizations in the form of a best action map (or a partial map). Comparisons between strength vs. accuracy-based fitness and complete vs. best action maps have since been examined in greater detail. [37] [38]

In the wake of XCS

XCS inspired the development of a whole new generation of LCS algorithms and applications. In 1995, Congdon was the first to apply LCS to real-world epidemiological investigations of disease [39] followed closely by Holmes who developed the BOOLE++, [40] EpiCS, [41] and later EpiXCS [42] for epidemiological classification. These early works inspired later interest in applying LCS algorithms to complex and large-scale data mining tasks epitomized by bioinformatics applications. In 1998, Stolzmann introduced anticipatory classifier systems (ACS) which included rules in the form of 'condition-action-effect, rather than the classic 'condition-action' representation. [32] ACS was designed to predict the perceptual consequences of an action in all possible situations in an environment. In other words, the system evolves a model that specifies not only what to do in a given situation, but also provides information of what will happen after a specific action will be executed. This family of LCS algorithms is best suited to multi-step problems, planning, speeding up learning, or disambiguating perceptual aliasing (i.e. where the same observation is obtained in distinct states but requires different actions). Butz later pursued this anticipatory family of LCS developing a number of improvements to the original method. [43] In 2002, Wilson introduced XCSF, adding a computed action in order to perform function approximation. [44] In 2003, Bernado-Mansilla introduced a sUpervised Classifier System (UCS), which specialized the XCS algorithm to the task of supervised learning, single-step problems, and forming a best action set. UCS removed the reinforcement learning strategy in favor of a simple, accuracy-based rule fitness as well as the explore/exploit learning phases, characteristic of many reinforcement learners. Bull introduced a simple accuracy-based LCS (YCS) [45] and a simple strength-based LCS Minimal Classifier System (MCS) [46] in order to develop a better theoretical understanding of the LCS framework. Bacardit introduced GAssist [47] and BioHEL, [48] Pittsburgh-style LCSs designed for data mining and scalability to large datasets in bioinformatics applications. In 2008, Drugowitsch published the book titled "Design and Analysis of Learning Classifier Systems" including some theoretical examination of LCS algorithms. [49] Butz introduced the first rule online learning visualization within a GUI for XCSF [1] (see the image at the top of this page). Urbanowicz extended the UCS framework and introduced ExSTraCS, explicitly designed for supervised learning in noisy problem domains (e.g. epidemiology and bioinformatics). [50] ExSTraCS integrated (1) expert knowledge to drive covering and genetic algorithm towards important features in the data, [51] (2) a form of long-term memory referred to as attribute tracking, [52] allowing for more efficient learning and the characterization of heterogeneous data patterns, and (3) a flexible rule representation similar to Bacardit's mixed discrete-continuous attribute list representation. [53] Both Bacardit and Urbanowicz explored statistical and visualization strategies to interpret LCS rules and perform knowledge discovery for data mining. [12] [13] Browne and Iqbal explored the concept of reusing building blocks in the form of code fragments and were the first to solve the 135-bit multiplexer benchmark problem by first learning useful building blocks from simpler multiplexer problems. [54] ExSTraCS 2.0 was later introduced to improve Michigan-style LCS scalability, successfully solving the 135-bit multiplexer benchmark problem for the first time directly. [5] The n-bit multiplexer problem is highly epistatic and heterogeneous, making it a very challenging machine learning task.

Variants

Michigan-Style Learning Classifier System

Michigan-Style LCSs are characterized by a population of rules where the genetic algorithm operates at the level of individual rules and the solution is represented by the entire rule population. Michigan style systems also learn incrementally which allows them to perform both reinforcement learning and supervised learning, as well as both online and offline learning. Michigan-style systems have the advantage of being applicable to a greater number of problem domains, and the unique benefits of incremental learning.

Pittsburgh-Style Learning Classifier System

Pittsburgh-Style LCSs are characterized by a population of variable length rule-sets where each rule-set is a potential solution. The genetic algorithm typically operates at the level of an entire rule-set. Pittsburgh-style systems can also uniquely evolve ordered rule lists, as well as employ a default rule. These systems have the natural advantage of identifying smaller rule sets, making these systems more interpretable with regards to manual rule inspection.

Hybrid systems

Systems that seek to combine key strengths of both systems have also been proposed.

Advantages

Disadvantages

Problem domains

Terminology

The name, "Learning Classifier System (LCS)", is a bit misleading since there are many machine learning algorithms that 'learn to classify' (e.g. decision trees, artificial neural networks), but are not LCSs. The term 'rule-based machine learning (RBML)' is useful, as it more clearly captures the essential 'rule-based' component of these systems, but it also generalizes to methods that are not considered to be LCSs (e.g. association rule learning, or artificial immune systems). More general terms such as, 'genetics-based machine learning', and even 'genetic algorithm' [39] have also been applied to refer to what would be more characteristically defined as a learning classifier system. Due to their similarity to genetic algorithms, Pittsburgh-style learning classifier systems are sometimes generically referred to as 'genetic algorithms'. Beyond this, some LCS algorithms, or closely related methods, have been referred to as 'cognitive systems', [16] 'adaptive agents', 'production systems', or generically as a 'classifier system'. [55] [56] This variation in terminology contributes to some confusion in the field.

Up until the 2000s nearly all learning classifier system methods were developed with reinforcement learning problems in mind. As a result, the term ‘learning classifier system’ was commonly defined as the combination of ‘trial-and-error’ reinforcement learning with the global search of a genetic algorithm. Interest in supervised learning applications, and even unsupervised learning have since broadened the use and definition of this term.

See also

Related Research Articles

<span class="mw-page-title-main">Supervised learning</span> A paradigm in machine learning

Supervised learning (SL) is a paradigm in machine learning where input objects and a desired output value train a model. The training data is processed, building a function that maps new data on expected output values. An optimal scenario will allow for the algorithm to correctly determine output values for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way. This statistical quality of an algorithm is measured through the so-called generalization error.

<span class="mw-page-title-main">Genetic algorithm</span> Competitive algorithm for searching a problem space

In computer science and operations research, a genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to generate high-quality solutions to optimization and search problems by relying on biologically inspired operators such as mutation, crossover and selection. Some examples of GA applications include optimizing decision trees for better performance, solving sudoku puzzles, hyperparameter optimization, causal inference, etc.

<span class="mw-page-title-main">Reinforcement learning</span> Field of machine learning

Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning.

In the field of machine learning, the goal of statistical classification is to use an object's characteristics to identify which class it belongs to. A linear classifier achieves this by making a classification decision based on the value of a linear combination of the characteristics. An object's characteristics are also known as feature values and are typically presented to the machine in a vector called a feature vector. Such classifiers work well for practical problems such as document classification, and more generally for problems with many variables (features), reaching accuracy levels comparable to non-linear classifiers while taking less time to train and use.

<span class="mw-page-title-main">Evolutionary algorithm</span> Subset of evolutionary computation

In computational intelligence (CI), an evolutionary algorithm (EA) is a subset of evolutionary computation, a generic population-based metaheuristic optimization algorithm. An EA uses mechanisms inspired by biological evolution, such as reproduction, mutation, recombination, and selection. Candidate solutions to the optimization problem play the role of individuals in a population, and the fitness function determines the quality of the solutions. Evolution of the population then takes place after the repeated application of the above operators.

<span class="mw-page-title-main">Machine learning</span> Study of algorithms that improve automatically through experience

Machine learning (ML) is an umbrella term for solving problems for which development of algorithms by human programmers would be cost-prohibitive, and instead the problems are solved by helping machines 'discover' their 'own' algorithms, without needing to be explicitly told what to do by any human-developed algorithms. Recently, generative artificial neural networks have been able to surpass results of many previous approaches. Machine learning approaches have been applied to large language models, computer vision, speech recognition, email filtering, agriculture and medicine, where it is too costly to develop algorithms to perform the needed tasks.

In computer science, evolutionary computation is a family of algorithms for global optimization inspired by biological evolution, and the subfield of artificial intelligence and soft computing studying these algorithms. In technical terms, they are a family of population-based trial and error problem solvers with a metaheuristic or stochastic optimization character.

In the field of machine learning and specifically the problem of statistical classification, a confusion matrix, also known as error matrix, is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one; in unsupervised learning it is usually called a matching matrix.

<span class="mw-page-title-main">Training, validation, and test data sets</span> Tasks in machine learning

In machine learning, a common task is the study and construction of algorithms that can learn from and make predictions on data. Such algorithms function by making data-driven predictions or decisions, through building a mathematical model from input data. These input data used to build the model are usually divided into multiple data sets. In particular, three data sets are commonly used in different stages of the creation of the model: training, validation, and test sets.

In statistics, classification is the problem of identifying which of a set of categories (sub-populations) an observation belongs to. Examples are assigning a given email to the "spam" or "non-spam" class, and assigning a diagnosis to a given patient based on observed characteristics of the patient.

In predictive analytics, data science, machine learning and related fields, concept drift or drift is an evolution of data that invalidates the data model. It happens when the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways. This causes problems because the predictions become less accurate as time passes. Drift detection and drift adaptation are of paramount importance in the fields that involve dynamically changing data and data models.

<span class="mw-page-title-main">Grammar induction</span>

Grammar induction is the process in machine learning of learning a formal grammar from a set of observations, thus constructing a model which accounts for the characteristics of the observed objects. More generally, grammatical inference is that branch of machine learning where the instance space consists of discrete combinatorial objects such as strings, trees and graphs.

<span class="mw-page-title-main">Meta-learning (computer science)</span> Subfield of machine learning

Meta learning is a subfield of machine learning where automatic learning algorithms are applied to metadata about machine learning experiments. As of 2017, the term had not found a standard interpretation, however the main goal is to use such metadata to understand how automatic learning can become flexible in solving learning problems, hence to improve the performance of existing learning algorithms or to learn (induce) the learning algorithm itself, hence the alternative term learning to learn.

In machine learning, multi-label classification or multi-output classification is a variant of the classification problem where multiple nonexclusive labels may be assigned to each instance. Multi-label classification is a generalization of multiclass classification, which is the single-label problem of categorizing instances into precisely one of several classes. In the multi-label problem the labels are nonexclusive and there is no constraint on how many of the classes the instance can be assigned to.

<span class="mw-page-title-main">Ensemble learning</span> Statistics and machine learning technique

In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. Unlike a statistical ensemble in statistical mechanics, which is usually infinite, a machine learning ensemble consists of only a concrete finite set of alternative models, but typically allows for much more flexible structure to exist among those alternatives.

<span class="mw-page-title-main">Multiclass classification</span> Problem in machine learning and statistical classification

In machine learning and statistical classification, multiclass classification or multinomial classification is the problem of classifying instances into one of three or more classes.

<span class="mw-page-title-main">Glossary of artificial intelligence</span> List of definitions of terms and concepts commonly used in the study of artificial intelligence

This glossary of artificial intelligence is a list of definitions of terms and concepts relevant to the study of artificial intelligence, its sub-disciplines, and related fields. Related glossaries include Glossary of computer science, Glossary of robotics, and Glossary of machine vision.

<span class="mw-page-title-main">Rule-based machine learning</span>

Rule-based machine learning (RBML) is a term in computer science intended to encompass any machine learning method that identifies, learns, or evolves 'rules' to store, manipulate or apply. The defining characteristic of a rule-based machine learner is the identification and utilization of a set of relational rules that collectively represent the knowledge captured by the system. This is in contrast to other machine learners that commonly identify a singular model that can be universally applied to any instance in order to make a prediction.

<span class="mw-page-title-main">Outline of machine learning</span> Overview of and topical guide to machine learning

The following outline is provided as an overview of and topical guide to machine learning. Machine learning is a subfield of soft computing within computer science that evolved from the study of pattern recognition and computational learning theory in artificial intelligence. In 1959, Arthur Samuel defined machine learning as a "field of study that gives computers the ability to learn without being explicitly programmed". Machine learning explores the study and construction of algorithms that can learn from and make predictions on data. Such algorithms operate by building a model from an example training set of input observations in order to make data-driven predictions or decisions expressed as outputs, rather than following strictly static program instructions.

Fairness in machine learning refers to the various attempts at correcting algorithmic bias in automated decision processes based on machine learning models. Decisions made by computers after a machine-learning process may be considered unfair if they were based on variables considered sensitive. Examples of these kinds of variable include gender, ethnicity, sexual orientation, disability and more. As it is the case with many ethical concepts, definitions of fairness and bias are always controversial. In general, fairness and bias are considered relevant when the decision process impacts people's lives. In machine learning, the problem of algorithmic bias is well known and well studied. Outcomes may be skewed by a range of factors and thus might be considered unfair with respect to certain groups or individuals. An example would be the way social media sites deliver personalized news to consumers.

References

  1. 1 2 Stalph, Patrick O.; Butz, Martin V. (2010-02-01). "JavaXCSF: The XCSF Learning Classifier System in Java". SIGEVOlution. 4 (3): 16–19. doi:10.1145/1731888.1731890. ISSN   1931-8499. S2CID   16861908.
  2. 1 2 3 Urbanowicz, Ryan J.; Moore, Jason H. (2009-09-22). "Learning Classifier Systems: A Complete Introduction, Review, and Roadmap". Journal of Artificial Evolution and Applications. 2009: 1–25. doi: 10.1155/2009/736398 . ISSN   1687-6229.
  3. Dorigo, Marco (1995). "Alecsys and the AutonoMouse: Learning to control a real robot by distributed classifier systems". Machine Learning. 19 (3): 209–240. doi: 10.1007/BF00996270 . ISSN   0885-6125.
  4. Bernadó-Mansilla, Ester; Garrell-Guiu, Josep M. (2003-09-01). "Accuracy-Based Learning Classifier Systems: Models, Analysis and Applications to Classification Tasks". Evolutionary Computation. 11 (3): 209–238. doi:10.1162/106365603322365289. ISSN   1063-6560. PMID   14558911. S2CID   9086149.
  5. 1 2 3 Urbanowicz, Ryan J.; Moore, Jason H. (2015-04-03). "ExSTraCS 2.0: description and evaluation of a scalable learning classifier system". Evolutionary Intelligence. 8 (2–3): 89–116. doi:10.1007/s12065-015-0128-8. ISSN   1864-5909. PMC   4583133 . PMID   26417393.
  6. Bernadó, Ester; Llorà, Xavier; Garrell, Josep M. (2001-07-07). Lanzi, Pier Luca; Stolzmann, Wolfgang; Wilson, Stewart W. (eds.). Advances in Learning Classifier Systems . Lecture Notes in Computer Science. Springer Berlin Heidelberg. pp.  115–132. doi:10.1007/3-540-48104-4_8. ISBN   9783540437932.
  7. Bacardit, Jaume; Butz, Martin V. (2007-01-01). "Data Mining in Learning Classifier Systems: Comparing XCS with GAssist". In Kovacs, Tim; Llorà, Xavier; Takadama, Keiki; Lanzi, Pier Luca; Stolzmann, Wolfgang; Wilson, Stewart W. (eds.). Learning Classifier Systems . Lecture Notes in Computer Science. Vol. 4399. Springer Berlin Heidelberg. pp.  282–290. CiteSeerX   10.1.1.553.4679 . doi:10.1007/978-3-540-71231-2_19. ISBN   9783540712305.
  8. Urbanowicz, Ryan; Ramanand, Niranjan; Moore, Jason (2015-01-01). "Continuous Endpoint Data Mining with ExSTraCS". Proceedings of the Companion Publication of the 2015 Annual Conference on Genetic and Evolutionary Computation. GECCO Companion '15. New York, NY, USA: ACM. pp. 1029–1036. doi:10.1145/2739482.2768453. ISBN   9781450334884. S2CID   11908241.
  9. Butz, M. V.; Lanzi, P. L.; Wilson, S. W. (2008-06-01). "Function Approximation With XCS: Hyperellipsoidal Conditions, Recursive Least Squares, and Compaction". IEEE Transactions on Evolutionary Computation. 12 (3): 355–376. doi:10.1109/TEVC.2007.903551. ISSN   1089-778X. S2CID   8861046.
  10. Introducing Rule-Based Machine Learning: A Practical Guide, Ryan J. Urbanowicz and Will Browne, see pp. 72-73 for Michigan-style architecture vs. Pittsburgh-style architecture.
  11. 1 2 3 Wilson, Stewart W. (1995-06-01). "Classifier Fitness Based on Accuracy". Evol. Comput. 3 (2): 149–175. CiteSeerX   10.1.1.363.2210 . doi:10.1162/evco.1995.3.2.149. ISSN   1063-6560. S2CID   18341635.
  12. 1 2 3 Urbanowicz, R. J.; Granizo-Mackenzie, A.; Moore, J. H. (2012-11-01). "An analysis pipeline with statistical and visualization-guided knowledge discovery for Michigan-style learning classifier systems". IEEE Computational Intelligence Magazine. 7 (4): 35–45. doi:10.1109/MCI.2012.2215124. ISSN   1556-603X. PMC   4244006 . PMID   25431544.
  13. 1 2 Bacardit, Jaume; Llorà, Xavier (2013). "Large‐scale data mining using genetics‐based machine learning". Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 3 (1): 37–61. doi:10.1002/widm.1078. S2CID   43062613.
  14. Holland, John (1975). Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. Michigan Press. ISBN   9780262581110.
  15. Holland JH (1976) Adaptation. In: Rosen R, Snell F (eds) Progress in theoretical biology, vol 4. Academic Press, New York, pp 263–293
  16. 1 2 Holland JH, Reitman JS (1978) Cognitive systems based on adaptive algorithms Reprinted in: Evolutionary computation. The fossil record. In: David BF (ed) IEEE Press, New York 1998. ISBN   0-7803-3481-7
  17. 1 2 Lanzi, Pier Luca (2008-02-08). "Learning classifier systems: then and now". Evolutionary Intelligence. 1 (1): 63–82. doi:10.1007/s12065-007-0003-3. ISSN   1864-5909. S2CID   27153843.
  18. Smith S (1980) A learning system based on genetic adaptive algorithms. Ph.D. thesis, Department of Computer Science, University of Pittsburgh
  19. Smith S (1983) Flexible learning of problem solving heuristics through adaptive search. In: Eighth international joint conference on articial intelligence. Morgan Kaufmann, Los Altos, pp 421–425
  20. De Jong KA (1988) Learning with genetic algorithms: an overview. Mach Learn 3:121–138
  21. 1 2 3 Holland, John H. "Escaping brittleness: the possibilities of general purpose learning algorithms applied to parallel rule-based system." Machine learning(1986): 593-623.
  22. Holland, John H. (1985-01-01). Properties of the Bucket Brigade. pp. 1–7. ISBN   978-0805804263.{{cite book}}: |journal= ignored (help)
  23. Booker, L (1982-01-01). Intelligent Behavior as an Adaptation to the Task Environment (Thesis). University of Michigan.
  24. 1 2 3 Wilson, S. W. "Knowledge growth in an artificial animal. Proceedings of the First International Conference on Genetic Algorithms and their Applications." (1985).
  25. Wilson, Stewart W. (1987). "Classifier systems and the animat problem". Machine Learning. 2 (3): 199–228. doi: 10.1007/BF00058679 . ISSN   0885-6125.
  26. Bonelli, Pierre; Parodi, Alexandre; Sen, Sandip; Wilson, Stewart (1990-01-01). NEWBOOLE: A Fast GBML System . pp.  153–159. ISBN   978-1558601413.{{cite book}}: |journal= ignored (help)
  27. Frey, Peter W.; Slate, David J. (1991). "Letter recognition using Holland-style adaptive classifiers". Machine Learning. 6 (2): 161–182. doi: 10.1007/BF00114162 . ISSN   0885-6125.
  28. Valenzuela-Rendón, Manuel. "The Fuzzy Classifier System: A Classifier System for Continuously Varying Variables." In ICGA, pp. 346-353. 1991.
  29. Riolo, Rick L. (1988-01-01). Empirical Studies of Default Hierarchies and Sequences of Rules in Learning Classifier Systems (Thesis). Ann Arbor, MI, USA: University of Michigan.
  30. R.L., Riolo (1987-01-01). "Bucket brigade performance. I. Long sequences of classifiers". Genetic Algorithms and Their Applications: Proceedings of the Second International Conference on Genetic Algorithms: July 28–31, 1987 at the Massachusetts Institute of Technology, Cambridge, MA.
  31. R.L., Riolo (1987-01-01). "Bucket brigade performance. II. Default hierarchies". Genetic Algorithms and Their Applications: Proceedings of the Second International Conference on Genetic Algorithms: July 28–31, 1987 at the Massachusetts Institute of Technology, Cambridge, MA.
  32. 1 2 W. Stolzmann, "Anticipatory classifier systems," in Proceedings of the 3rd Annual Genetic Programming Conference, pp. 658–664, 1998.
  33. Riolo, Rick L. (1990-01-01). Lookahead Planning and Latent Learning in a Classifier System. pp. 316–326. ISBN   978-0262631389.{{cite book}}: |journal= ignored (help)
  34. Watkins, Christopher John Cornish Hellaby. "Learning from delayed rewards." PhD diss., University of Cambridge, 1989.
  35. 1 2 Wilson, Stewart W. (1994-03-01). "ZCS: A Zeroth Level Classifier System". Evolutionary Computation. 2 (1): 1–18. CiteSeerX   10.1.1.363.798 . doi:10.1162/evco.1994.2.1.1. ISSN   1063-6560. S2CID   17680778.
  36. Lanzi, P. L. (2002). "Learning classifier systems from a reinforcement learning perspective". Soft Computing. 6 (3–4): 162–170. doi:10.1007/s005000100113. ISSN   1432-7643. S2CID   39103390.
  37. Kovacs, Timothy Michael Douglas. A Comparison of Strength and Accuracy-based Fitness in Learning and Classifier Systems. 2002.
  38. Kovacs, Tim. "Two views of classifier systems." In International Workshop on Learning Classifier Systems, pp. 74-87. Springer Berlin Heidelberg, 2001
  39. 1 2 Congdon, Clare Bates. "A comparison of genetic algorithms and other machine learning systems on a complex classification task from common disease research." PhD diss., The University of Michigan, 1995.
  40. Holmes, John H. (1996-01-01). "A Genetics-Based Machine Learning Approach to Knowledge Discovery in Clinical Data". Proceedings of the AMIA Annual Fall Symposium: 883. ISSN   1091-8280. PMC   2233061 .
  41. Holmes, John H. "Discovering Risk of Disease with a Learning Classifier System." In ICGA, pp. 426-433. 1997.
  42. Holmes, John H., and Jennifer A. Sager. "Rule discovery in epidemiologic surveillance data using EpiXCS: an evolutionary computation approach." InConference on Artificial Intelligence in Medicine in Europe, pp. 444-452. Springer Berlin Heidelberg, 2005.
  43. Butz, Martin V. "Biasing exploration in an anticipatory learning classifier system." In International Workshop on Learning Classifier Systems, pp. 3-22. Springer Berlin Heidelberg, 2001.
  44. Wilson, Stewart W. (2002). "Classifiers that approximate functions". Natural Computing. 1 (2–3): 211–234. doi:10.1023/A:1016535925043. ISSN   1567-7818. S2CID   23032802.
  45. Bull, Larry. "A simple accuracy-based learning classifier system." Learning Classifier Systems Group Technical Report UWELCSG03-005, University of the West of England, Bristol, UK (2003).
  46. Bull, Larry. "A simple payoff-based learning classifier system." InInternational Conference on Parallel Problem Solving from Nature, pp. 1032-1041. Springer Berlin Heidelberg, 2004.
  47. Peñarroya, Jaume Bacardit. "Pittsburgh genetic-based machine learning in the data mining era: representations, generalization, and run-time." PhD diss., Universitat Ramon Llull, 2004.
  48. Bacardit, Jaume; Burke, Edmund K.; Krasnogor, Natalio (2008-12-12). "Improving the scalability of rule-based evolutionary learning". Memetic Computing. 1 (1): 55–67. doi:10.1007/s12293-008-0005-4. ISSN   1865-9284. S2CID   775199.
  49. Drugowitsch, Jan (2008). Design and Analysis of Learning Classifier Systems - Springer. Studies in Computational Intelligence. Vol. 139. doi:10.1007/978-3-540-79866-8. ISBN   978-3-540-79865-1.
  50. Urbanowicz, Ryan J., Gediminas Bertasius, and Jason H. Moore. "An extended michigan-style learning classifier system for flexible supervised learning, classification, and data mining." In International Conference on Parallel Problem Solving from Nature, pp. 211-221. Springer International Publishing, 2014.
  51. Urbanowicz, Ryan J., Delaney Granizo-Mackenzie, and Jason H. Moore. "Using expert knowledge to guide covering and mutation in a michigan style learning classifier system to detect epistasis and heterogeneity." InInternational Conference on Parallel Problem Solving from Nature, pp. 266-275. Springer Berlin Heidelberg, 2012.
  52. Urbanowicz, Ryan; Granizo-Mackenzie, Ambrose; Moore, Jason (2012-01-01). "Instance-linked attribute tracking and feedback for michigan-style supervised learning classifier systems". Proceedings of the 14th annual conference on Genetic and evolutionary computation. GECCO '12. New York, NY, USA: ACM. pp. 927–934. doi:10.1145/2330163.2330291. ISBN   9781450311779. S2CID   142534.
  53. Bacardit, Jaume; Krasnogor, Natalio (2009-01-01). "A mixed discrete-continuous attribute list representation for large scale classification domains". Proceedings of the 11th Annual conference on Genetic and evolutionary computation. GECCO '09. New York, NY, USA: ACM. pp. 1155–1162. CiteSeerX   10.1.1.158.7314 . doi:10.1145/1569901.1570057. ISBN   9781605583259. S2CID   10906515.
  54. Iqbal, Muhammad; Browne, Will N.; Zhang, Mengjie (2014-08-01). "Reusing Building Blocks of Extracted Knowledge to Solve Complex, Large-Scale Boolean Problems". IEEE Transactions on Evolutionary Computation. 18 (4): 465–480. doi:10.1109/tevc.2013.2281537. S2CID   525358.
  55. Booker, L. B.; Goldberg, D. E.; Holland, J. H. (1989-09-01). "Classifier systems and genetic algorithms" (PDF). Artificial Intelligence. 40 (1): 235–282. doi:10.1016/0004-3702(89)90050-7. hdl: 2027.42/27777 .
  56. Wilson, Stewart W., and David E. Goldberg. "A critical review of classifier systems." In Proceedings of the third international conference on Genetic algorithms, pp. 244-255. Morgan Kaufmann Publishers Inc., 1989.

Video tutorial

Webpages