CN2 algorithm

Last updated

The CN2 induction algorithm is a learning algorithm for rule induction. [1] It is designed to work even when the training data is imperfect. It is based on ideas from the AQ algorithm and the ID3 algorithm. As a consequence it creates a rule set like that created by AQ but is able to handle noisy data like ID3.

Algorithmic learning theory is a mathematical framework for analyzing machine learning problems and algorithms. Synonyms include formal learning theory and algorithmic inductive inference. Algorithmic learning theory is different from statistical learning theory in that it does not make use of statistical assumptions and analysis. Both algorithmic and statistical learning theory are concerned with machine learning and can thus be viewed as branches of computational learning theory.

Rule induction is an area of machine learning in which formal rules are extracted from a set of observations. The rules extracted may represent a full scientific model of the data, or merely represent local patterns in the data.

ID3 algorithm decision tree algorithm

In decision tree learning, ID3 is an algorithm invented by Ross Quinlan used to generate a decision tree from a dataset. ID3 is the precursor to the C4.5 algorithm, and is typically used in the machine learning and natural language processing domains.

Contents

Description of algorithm

The algorithm must be given a set of examples, TrainingSet, which have already been classified in order to generate a list of classification rules. A set of conditions, SimpleConditionSet, which can be applied, alone or in combination, to any set of examples is predefined to be used for the classification.

routine CN2(TrainingSet)    let the ClassificationRuleList be empty    repeat       let the BestConditionExpression be Find_BestConditionExpression(TrainingSet)       if the BestConditionExpression is not nil          then             let the TrainingSubset be the examples covered by the BestConditionExpression             remove from the TrainingSet the examples in the TrainingSubset             let the MostCommonClass be the most common class of examples in the TrainingSubset             append to the ClassificationRuleList the rule                'if ' the BestConditionExpression ' then the class is ' the MostCommonClass    until the TrainingSet is empty or the BestConditionExpression is nil return the ClassificationRuleList
routine Find_BestConditionExpression(TrainingSet)    let the ConditionalExpressionSet be empty    let the BestConditionExpression be nil    repeat       let the TrialConditionalExpressionSet be the set of conditional expressions,          {x and y where x belongs to the ConditionalExpressionSet and y belongs to the SimpleConditionSet}.       remove all formulae in the TrialConditionalExpressionSet that are either in the ConditionalExpressionSet (i.e.,           the unspecialized ones) or null (e.g., big = y and big = n)       for every expression, F, in the TrialConditionalExpressionSet          if             F is statistically significant                and F is better than the BestConditionExpression                by user-defined criteria when tested on the TrainingSet             then                replace the current value of the BestConditionExpression by F       while the number of expressions in the TrialConditionalExpressionSet > user-defined maximum          remove the worst expression from the TrialConditionalExpressionSet       let the ConditionalExpressionSet be the TrialConditionalExpressionSet    until the ConditionalExpressionSet is empty return the BestConditionExpression

Related Research Articles

Supervised learning machine learning task of learning a function that maps an input to an output based on example input-output pairs

Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. It infers a function from labeled training data consisting of a set of training examples. In supervised learning, each example is a pair consisting of an input object and a desired output value. A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples. An optimal scenario will allow for the algorithm to correctly determine the class labels for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way.

Power set (of any set S) set of all subsets of S, including the empty set and S itself

In mathematics, the power set of any set S is the set of all subsets of S, including the empty set and S itself, variously denoted as P(S), 𝒫(S), ℘(S), P(S), ℙ(S), or, identifying the powerset of S with the set of all functions from S to a given set of two elements, 2S. In axiomatic set theory, the existence of the power set of any set is postulated by the axiom of power set.

Queue (abstract data type) abstract data type

In computer science, a queue is a collection in which the entities in the collection are kept in order and the principal operations on the collection are the addition of entities to the rear terminal position, known as enqueue, and removal of entities from the front terminal position, known as dequeue. This makes the queue a First-In-First-Out (FIFO) data structure. In a FIFO data structure, the first element added to the queue will be the first one to be removed. This is equivalent to the requirement that once a new element is added, all elements that were added before have to be removed before the new element can be removed. Often a peek or front operation is also entered, returning the value of the front element without dequeuing it. A queue is an example of a linear data structure, or more abstractly a sequential collection.

Regular expression Sequence of characters that forms a search pattern

A regular expression, regex or regexp is a sequence of characters that define a search pattern. Usually this pattern is used by string searching algorithms for "find" or "find and replace" operations on strings, or for input validation. It is a technique that developed in theoretical computer science and formal language theory.

Ultrafilter in set theory

In the mathematical field of set theory, an ultrafilter on a given partially ordered set (poset) P is a maximal filter on P, that is, a filter on P that cannot be enlarged. Filters and ultrafilters are special subsets of P. If P happens to be a Boolean algebra, each ultrafilter is also a prime filter, and vice versa.

In logic and computer science, unification is an algorithmic process of solving equations between symbolic expressions.

In the field of machine learning, the goal of statistical classification is to use an object's characteristics to identify which class it belongs to. A linear classifier achieves this by making a classification decision based on the value of a linear combination of the characteristics. An object's characteristics are also known as feature values and are typically presented to the machine in a vector called a feature vector. Such classifiers work well for practical problems such as document classification, and more generally for problems with many variables (features), reaching accuracy levels comparable to non-linear classifiers while taking less time to train and use.

In mathematics, a binary relation, R, is called well-founded on a class X if every non-empty subset SX has a minimal element with respect to R, that is an element m not related by sRm for any sS. In other words, a relation is well founded if

Decision tree learning

In computer science, Decision tree learning uses a decision tree to go from observations about an item to conclusions about the item's target value. It is one of the predictive modeling approaches used in statistics, data mining and machine learning. Tree models where the target variable can take a discrete set of values are called classification trees; in these tree structures, leaves represent class labels and branches represent conjunctions of features that lead to those class labels. Decision trees where the target variable can take continuous values are called regression trees.

Association rule learning method for discovering interesting relations between variables in databases

Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness. This rule-based approach also generates new rules as it analyzes more data. The ultimate goal, assuming a large enough dataset, is to help a machine mimic the human brain’s feature extraction and abstract association capabilities from new uncategorized data.

In combinatorics, a greedoid is a type of set system. It arises from the notion of the matroid, which was originally introduced by Whitney in 1935 to study planar graphs and was later used by Edmonds to characterize a class of optimization problems that can be solved by greedy algorithms. Around 1980, Korte and Lovász introduced the greedoid to further generalize this characterization of greedy algorithms; hence the name greedoid. Besides mathematical optimization, greedoids have also been connected to graph theory, language theory, poset theory, and other areas of mathematics.

In computer programming, ?: is a ternary operator that is part of the syntax for basic conditional expressions in several programming languages. It is commonly referred to as the conditional operator, inline if (iif), or ternary if. An expression a? b : c evaluates to b if the value of a is true, and otherwise to c.

In machine learning and statistics, feature selection, also known as variable selection, attribute selection or variable subset selection, is the process of selecting a subset of relevant features for use in model construction. Feature selection techniques are used for four reasons:

In statistical classification, including machine learning, two main approaches are called the generative approach and the discriminative approach. These compute classifiers by different approaches, differing in the degree of statistical modelling. Terminology is inconsistent, but three major types can be distinguished, following Jebara (2004):

<i>k</i>-nearest neighbors algorithm algorithm

In pattern recognition, the k-nearest neighbors algorithm (k-NN) is a non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression:

DFA minimization

In automata theory, DFA minimization is the task of transforming a given deterministic finite automaton (DFA) into an equivalent DFA that has a minimum number of states. Here, two DFAs are called equivalent if they recognize the same regular language. Several different algorithms accomplishing this task are known and described in standard textbooks on automata theory.

Structured prediction

Structured prediction or structured (output) learning is an umbrella term for supervised machine learning techniques that involves predicting structured objects, rather than scalar discrete or real values.

Probabilistic classification

In machine learning, a probabilistic classifier is a classifier that is able to predict, given an observation of an input, a probability distribution over a set of classes, rather than only outputting the most likely class that the observation should belong to. Probabilistic classifiers provide classification that can be useful in its own right or when combining classifiers into ensembles.

The rules extraction system (RULES) family is a family of inductive learning that includes several covering algorithms. This family is used to build a predictive model based on given observation. It works based on the concept of separate-and-conquer to directly induce rules from a given training set and build its knowledge repository.

References

  1. Clark, P. and Niblett, T (1989) The CN2 induction algorithm. Machine Learning 3(4):261-283.