Decision stump

Last updated
An example of a decision stump that discriminates between two of three classes of Iris flower data set: Iris versicolor and Iris virginica. The petal width is in centimetres. This particular stump achieves 94% accuracy on the Iris dataset for these two classes. Decision stump.svg
An example of a decision stump that discriminates between two of three classes of Iris flower data set: Iris versicolor and Iris virginica. The petal width is in centimetres. This particular stump achieves 94% accuracy on the Iris dataset for these two classes.

A decision stump is a machine learning model consisting of a one-level decision tree. [1] That is, it is a decision tree with one internal node (the root) which is immediately connected to the terminal nodes (its leaves). A decision stump makes a prediction based on the value of just a single input feature. Sometimes they are also called 1-rules. [2]

Depending on the type of the input feature, several variations are possible. For nominal features, one may build a stump which contains a leaf for each possible feature value [3] [4] or a stump with the two leaves, one of which corresponds to some chosen category, and the other leaf to all the other categories. [5] For binary features these two schemes are identical. A missing value may be treated as a yet another category. [5]

For continuous features, usually, some threshold feature value is selected, and the stump contains two leaves — for values below and above the threshold. However, rarely, multiple thresholds may be chosen and the stump therefore contains three or more leaves.

Decision stumps are often [6] used as components (called "weak learners" or "base learners") in machine learning ensemble techniques such as bagging and boosting. For example, a state-of-the-art[ weasel words ] Viola–Jones face detection algorithm employs AdaBoost with decision stumps as weak learners. [7]

The term "decision stump" was coined in a 1992 ICML paper by Wayne Iba and Pat Langley. [1] [8]

See also

Related Research Articles

Boosting (machine learning) Method in machine learning

In machine learning, boosting is an ensemble meta-algorithm for primarily reducing bias, and also variance in supervised learning, and a family of machine learning algorithms that convert weak learners to strong ones. Boosting is based on the question posed by Kearns and Valiant : "Can a set of weak learners create a single strong learner?" A weak learner is defined to be a classifier that is only slightly correlated with the true classification. In contrast, a strong learner is a classifier that is arbitrarily well-correlated with the true classification.

Decision tree decision support tool

A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains conditional control statements.

Machine learning Scientific study of algorithms and statistical models that computer systems use to perform tasks without explicit instructions

Machine learning (ML) is the scientific study of algorithms and statistical models that computer systems use to perform a specific task without using explicit instructions, relying on patterns and inference instead. It is seen as a subset of artificial intelligence. Machine learning algorithms build a mathematical model based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to perform the task. Machine learning algorithms are used in a wide variety of applications, such as email filtering and computer vision, where it is difficult or infeasible to develop a conventional algorithm for effectively performing the task.

Decision tree learning Machine Learning Algorithm

Decision tree learning is one of the predictive modeling approaches used in statistics, data mining and machine learning. It uses a decision tree to go from observations about an item to conclusions about the item's target value. Tree models where the target variable can take a discrete set of values are called classification trees; in these tree structures, leaves represent class labels and branches represent conjunctions of features that lead to those class labels. Decision trees where the target variable can take continuous values are called regression trees.

Statistical classification in supervised learning

In machine learning and statistics, classification is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations whose category membership is known. Examples are assigning a given email to the "spam" or "non-spam" class, and assigning a diagnosis to a given patient based on observed characteristics of the patient. Classification is an example of pattern recognition.

AdaBoost boosting algorithm

AdaBoost, short for Adaptive Boosting, is a machine learning meta-algorithm formulated by Yoav Freund and Robert Schapire, who won the 2003 Gödel Prize for their work. It can be used in conjunction with many other types of learning algorithms to improve performance. The output of the other learning algorithms is combined into a weighted sum that represents the final output of the boosted classifier. AdaBoost is adaptive in the sense that subsequent weak learners are tweaked in favor of those instances misclassified by previous classifiers. AdaBoost is sensitive to noisy data and outliers. In some problems it can be less susceptible to the overfitting problem than other learning algorithms. The individual learners can be weak, but as long as the performance of each one is slightly better than random guessing, the final model can be proven to converge to a strong learner.

C4.5 is an algorithm used to generate a decision tree developed by Ross Quinlan. C4.5 is an extension of Quinlan's earlier ID3 algorithm. The decision trees generated by C4.5 can be used for classification, and for this reason, C4.5 is often referred to as a statistical classifier. In 2011, authors of the Weka machine learning software described the C4.5 algorithm as "a landmark decision tree program that is probably the machine learning workhorse most widely used in practice to date".

Weka (machine learning) suite of machine learning software written in Java

Waikato Environment for Knowledge Analysis (Weka), developed at the University of Waikato, New Zealand. It is free software licensed under the GNU General Public License, and the companion software to the book "Data Mining: Practical Machine Learning Tools and Techniques".

COBWEB is an incremental system for hierarchical conceptual clustering. COBWEB was invented by Professor Douglas H. Fisher, currently at Vanderbilt University.

Conceptual clustering is a machine learning paradigm for unsupervised classification developed mainly during the 1980s. It is distinguished from ordinary data clustering by generating a concept description for each generated class. Most conceptual clustering methods are capable of generating hierarchical category structures; see Categorization for more information on hierarchy. Conceptual clustering is closely related to formal concept analysis, decision tree learning, and mixture model learning.

An alternating decision tree (ADTree) is a machine learning method for classification. It generalizes decision trees and has connections to boosting.

Cascading is a particular case of ensemble learning based on the concatenation of several classifiers, using all information collected from the output from a given classifier as additional information for the next classifier in the cascade. Unlike voting or stacking ensembles, which are multiexpert systems, cascading is a multistage one.

Ensemble learning in statistics and machine learning, using multiple algorithms

In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. Unlike a statistical ensemble in statistical mechanics, which is usually infinite, a machine learning ensemble consists of only a concrete finite set of alternative models, but typically allows for much more flexible structure to exist among those alternatives.

In machine learning, multiclass or multinomial classification is the problem of classifying instances into one of three or more classes.

In machine learning the random subspace method, also called attribute bagging or feature bagging, is an ensemble learning method that attempts to reduce the correlation between estimators in an ensemble by training them on random samples of features instead of the entire feature set.

Contrast set learning is a form of association rule learning that seeks to identify meaningful differences between separate groups by reverse-engineering the key predictors that identify for each particular group. For example, given a set of attributes for a pool of students, a contrast set learner would identify the contrasting features between students seeking bachelor's degrees and those working toward PhD degrees.

Massive Online Analysis (MOA) is a free open-source software project specific for data stream mining with concept drift. It is written in Java and developed at the University of Waikato, New Zealand.

Probabilistic classification

In machine learning, a probabilistic classifier is a classifier that is able to predict, given an observation of an input, a probability distribution over a set of classes, rather than only outputting the most likely class that the observation should belong to. Probabilistic classifiers provide classification that can be useful in its own right or when combining classifiers into ensembles.

Outline of machine learning Overview of and topical guide to machine learning

The following outline is provided as an overview of and topical guide to machine learning. Machine learning is a subfield of soft computing within computer science that evolved from the study of pattern recognition and computational learning theory in artificial intelligence. In 1959, Arthur Samuel defined machine learning as a "field of study that gives computers the ability to learn without being explicitly programmed". Machine learning explores the study and construction of algorithms that can learn from and make predictions on data. Such algorithms operate by building a model from an example training set of input observations in order to make data-driven predictions or decisions expressed as outputs, rather than following strictly static program instructions.

Automated machine learning automated machine learning or AutoML is the process of automating the end-to-end process of machine learning.

Automated machine learning (AutoML) is the process of automating the process of applying machine learning to real-world problems. AutoML covers the complete pipeline from the raw dataset to the deployable machine learning model. AutoML was proposed as an artificial intelligence-based solution to the ever-growing challenge of applying machine learning. The high degree of automation in AutoML allows non-experts to make use of machine learning models and techniques without requiring to become an expert in this field first.

References

  1. 1 2 Iba, Wayne; and Langley, Pat (1992); Induction of One-Level Decision Trees, in ML92: Proceedings of the Ninth International Conference on Machine Learning, Aberdeen, Scotland, 1–3 July 1992, San Francisco, CA: Morgan Kaufmann, pp. 233–240
  2. Holte, Robert C. (1993). "Very Simple Classification Rules Perform Well on Most Commonly Used Datasets": 63–91. CiteSeerX   10.1.1.67.2711 .Cite journal requires |journal= (help)
  3. Loper, Edward L.; Bird, Steven; Klein, Ewan (2009). Natural language processing with Python. Sebastopol, CA: O'Reilly. ISBN   978-0-596-51649-9. Archived from the original on 2010-06-18. Retrieved 2010-06-10.
  4. This classifier is implemented in Weka under the name OneR (for "1-rule").
  5. 1 2 This is what has been implemented in Weka's DecisionStump classifier.
  6. Reyzin, Lev; and Schapire, Robert E. (2006); How Boosting the Margin Can Also Boost Classifier Complexity, in ICML′06: Proceedings of the 23rd international conference on Machine Learning, pp. 753-760
  7. Viola, Paul; and Jones, Michael J. (2004); Robust Real-Time Face Detection, International Journal of Computer Vision, 57(2), 137–154
  8. Oliver, Jonathan J.; and Hand, David (1994); Averaging Over Decision Stumps, in Machine Learning: ECML-94, European Conference on Machine Learning, Catania, Italy, April 6–8, 1994, Proceedings, Lecture Notes in Computer Science (LNCS) 784, Springer, pp. 231–241 ISBN   3-540-57868-4 doi : 10.1007/3-540-57868-4_61
    Quote: "These simple rules are in effect severely pruned decision trees and have been termed decision stumps [cites Iba and Langley]".