Decision stump

Last updated
An example of a decision stump that discriminates between two of three classes of Iris flower data set: Iris versicolor and Iris virginica. The petal width is in centimetres. This particular stump achieves 94% accuracy on the Iris dataset for these two classes. Decision stump.svg
An example of a decision stump that discriminates between two of three classes of Iris flower data set: Iris versicolor and Iris virginica. The petal width is in centimetres. This particular stump achieves 94% accuracy on the Iris dataset for these two classes.

A decision stump is a machine learning model consisting of a one-level decision tree. [1] That is, it is a decision tree with one internal node (the root) which is immediately connected to the terminal nodes (its leaves). A decision stump makes a prediction based on the value of just a single input feature. Sometimes they are also called 1-rules. [2]

Depending on the type of the input feature, several variations are possible. For nominal features, one may build a stump which contains a leaf for each possible feature value [3] [4] or a stump with the two leaves, one of which corresponds to some chosen category, and the other leaf to all the other categories. [5] For binary features these two schemes are identical. A missing value may be treated as a yet another category. [5]

For continuous features, usually, some threshold feature value is selected, and the stump contains two leaves — for values below and above the threshold. However, rarely, multiple thresholds may be chosen and the stump therefore contains three or more leaves.

Decision stumps are often [6] used as components (called "weak learners" or "base learners") in machine learning ensemble techniques such as bagging and boosting. For example, a Viola–Jones face detection algorithm employs AdaBoost with decision stumps as weak learners. [7]

The term "decision stump" was coined in a 1992 ICML paper by Wayne Iba and Pat Langley. [1] [8]

See also

Related Research Articles

In machine learning, boosting is an ensemble meta-algorithm for primarily reducing bias, variance. It is used in supervised learning and a family of machine learning algorithms that convert weak learners to strong ones.

<span class="mw-page-title-main">Decision tree</span> Decision support tool

A decision tree is a decision support hierarchical model that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains conditional control statements.

Decision tree learning is a supervised learning approach used in statistics, data mining and machine learning. In this formalism, a classification or regression decision tree is used as a predictive model to draw conclusions about a set of observations.

Bootstrap aggregating, also called bagging, is a machine learning ensemble meta-algorithm designed to improve the stability and accuracy of machine learning algorithms used in statistical classification and regression. It also reduces variance and helps to avoid overfitting. Although it is usually applied to decision tree methods, it can be used with any type of method. Bagging is a special case of the model averaging approach.

Random forests or random decision forests is an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time. For classification tasks, the output of the random forest is the class selected by most trees. For regression tasks, the mean or average prediction of the individual trees is returned. Random decision forests correct for decision trees' habit of overfitting to their training set.

AdaBoost, short for Adaptive Boosting, is a statistical classification meta-algorithm formulated by Yoav Freund and Robert Schapire in 1995, who won the 2003 Gödel Prize for their work. It can be used in conjunction with many other types of learning algorithms to improve performance. The output of the other learning algorithms is combined into a weighted sum that represents the final output of the boosted classifier. Usually, AdaBoost is presented for binary classification, although it can be generalized to multiple classes or bounded intervals on the real line.

C4.5 is an algorithm used to generate a decision tree developed by Ross Quinlan. C4.5 is an extension of Quinlan's earlier ID3 algorithm. The decision trees generated by C4.5 can be used for classification, and for this reason, C4.5 is often referred to as a statistical classifier. In 2011, authors of the Weka machine learning software described the C4.5 algorithm as "a landmark decision tree program that is probably the machine learning workhorse most widely used in practice to date".

John Ross Quinlan is a computer science researcher in data mining and decision theory. He has contributed extensively to the development of decision tree algorithms, including inventing the canonical C4.5 and ID3 algorithms. He also contributed to early ILP literature with First Order Inductive Learner (FOIL). He is currently running the company RuleQuest Research which he founded in 1997.

<i>Machine Learning</i> (journal) Academic journal

Machine Learning is a peer-reviewed scientific journal, published since 1986.

Conceptual clustering is a machine learning paradigm for unsupervised classification that has been defined by Ryszard S. Michalski in 1980 and developed mainly during the 1980s. It is distinguished from ordinary data clustering by generating a concept description for each generated class. Most conceptual clustering methods are capable of generating hierarchical category structures; see Categorization for more information on hierarchy. Conceptual clustering is closely related to formal concept analysis, decision tree learning, and mixture model learning.

An alternating decision tree (ADTree) is a machine learning method for classification. It generalizes decision trees and has connections to boosting.

Co-training is a machine learning algorithm used when there are only small amounts of labeled data and large amounts of unlabeled data. One of its uses is in text mining for search engines. It was introduced by Avrim Blum and Tom Mitchell in 1998.

Cascading is a particular case of ensemble learning based on the concatenation of several classifiers, using all information collected from the output from a given classifier as additional information for the next classifier in the cascade. Unlike voting or stacking ensembles, which are multiexpert systems, cascading is a multistage one.

An incremental decision tree algorithm is an online machine learning algorithm that outputs a decision tree. Many decision tree methods, such as C4.5, construct a tree using a complete dataset. Incremental decision tree methods allow an existing tree to be updated using only new individual data instances, without having to re-process past instances. This may be useful in situations where the entire dataset is not available when the tree is updated, the original data set is too large to process or the characteristics of the data change over time.

In machine learning and statistical classification, multiclass classification or multinomial classification is the problem of classifying instances into one of three or more classes.

Gradient boosting is a machine learning technique based on boosting in a functional space, where the target is pseudo-residuals rather than the typical residuals used in traditional boosting. It gives a prediction model in the form of an ensemble of weak prediction models, i.e., models that make very few assumptions about the data, which are typically simple decision trees. When a decision tree is the weak learner, the resulting algorithm is called gradient-boosted trees; it usually outperforms random forest. A gradient-boosted trees model is built in a stage-wise fashion as in other boosting methods, but it generalizes the other methods by allowing optimization of an arbitrary differentiable loss function.

Contrast set learning is a form of association rule learning that seeks to identify meaningful differences between separate groups by reverse-engineering the key predictors that identify for each particular group. For example, given a set of attributes for a pool of students, a contrast set learner would identify the contrasting features between students seeking bachelor's degrees and those working toward PhD degrees.

Massive Online Analysis (MOA) is a free open-source software project specific for data stream mining with concept drift. It is written in Java and developed at the University of Waikato, New Zealand.

In machine learning, a probabilistic classifier is a classifier that is able to predict, given an observation of an input, a probability distribution over a set of classes, rather than only outputting the most likely class that the observation should belong to. Probabilistic classifiers provide classification that can be useful in its own right or when combining classifiers into ensembles.

The following outline is provided as an overview of and topical guide to machine learning:

References

  1. 1 2 Iba, Wayne; Langley, Pat (1992). "Induction of One-Level Decision Trees" (PDF). ML92: Proceedings of the Ninth International Conference on Machine Learning, Aberdeen, Scotland, 1–3 July 1992. Morgan Kaufmann. pp. 233–240. doi:10.1016/B978-1-55860-247-2.50035-8. ISBN   978-1-55860-247-2.
  2. Holte, Robert C. (1993). "Very simple classification rules perform well on most commonly used datasets" (PDF). Machine Learning. 11 (1): 63–90. doi:10.1023/A:1022631118932. S2CID   6596.
  3. Loper, Edward L.; Bird, Steven; Klein, Ewan (2009). Natural language processing with Python. Sebastopol, CA: O'Reilly. ISBN   978-0-596-51649-9. Archived from the original on 2010-06-18. Retrieved 2010-06-10.
  4. This classifier is implemented in Weka under the name OneR (for "1-rule").
  5. 1 2 This is what has been implemented in Weka's DecisionStump classifier.
  6. Reyzin, Lev; Schapire, Robert E. (2006). "How Boosting the Margin Can Also Boost Classifier Complexity" (PDF). ICML′06: Proceedings of the 23rd international conference on Machine Learning. pp. 753–760. doi:10.1145/1143844.1143939. ISBN   978-1-59593-383-6. S2CID   2483269.
  7. Viola, Paul; Jones, Michael J. (2004). "Robust Real-Time Face Detection" (PDF). International Journal of Computer Vision. 57 (2): 137–154. doi:10.1023/B:VISI.0000013087.49260.fb. S2CID   2796017.
  8. Oliver, Jonathan J.; Hand, David (1994). "Averaging Over Decision Stumps". Machine Learning: ECML-94, European Conference on Machine Learning, Catania, Italy, April 6–8, 1994, Proceedings. Lecture Notes in Computer Science. Vol. 784. Springer. pp. 231–241. doi:10.1007/3-540-57868-4_61. ISBN   3-540-57868-4. These simple rules are in effect severely pruned decision trees and have been termed decision stumps Iba & Langley 1992