Incremental decision tree

Last updated October 09, 2024

An incremental decision tree algorithm is an online machine learning algorithm that outputs a decision tree. Many decision tree methods, such as C4.5, construct a tree using a complete dataset. Incremental decision tree methods allow an existing tree to be updated using only new individual data instances, without having to re-process past instances. This may be useful in situations where the entire dataset is not available when the tree is updated (i.e. the data was not stored), the original data set is too large to process or the characteristics of the data change over time.

Applications

On-line learning
Data streams
Concept drift
Data which can be modeled well using a hierarchical model.
Systems where a user-interpretable output is desired.

Methods

Here is a short list of incremental decision tree methods, organized by their (usually non-incremental) parent algorithms.

CART family

CART ^[1] (1984) is a nonincremental decision tree inducer for both classification and regression problems. developed in the mathematics and statistics communities. CART traces its roots to AID (1963)^[2]

incremental CART (1989)^[3] Crawford modified CART to incorporate data incrementally.

ID3/C4.5 family

ID3 (1986)^[4] and C4.5 (1993)^[5] were developed by Quinlan and have roots in Hunt's Concept Learning System (CLS, 1966)^[6] The ID3 family of tree inducers was developed in the engineering and computer science communities.

ID3' (1986)^[7] was suggested by Schlimmer and Fisher. It was a brute-force method to make ID3 incremental; after each new data instance is acquired, an entirely new tree is induced using ID3.
ID4 (1986)^[7] could incorporate data incrementally. However, certain concepts were unlearnable, because ID4 discards subtrees when a new test is chosen for a node.
ID5 (1988)^[8] didn't discard subtrees, but also did not guarantee that it would produce the same tree as ID3.
ID5R (1989)^[9] output the same tree as ID3 for a dataset regardless of the incremental training order. This was accomplished by recursively updating the tree's subnodes. It did not handle numeric variables, multiclass classification tasks, or missing values.
ID6MDL (2007)^[10] an extended version of the ID3 or ID5R algorithms.
ITI (1997)^[11] is an efficient method for incrementally inducing decision trees. The same tree is produced for a dataset regardless of the data's presentation order, or whether the tree is induced incrementally or non incrementally (batch mode). It can accommodate numeric variables, multiclass tasks, and missing values. Code is available on the web.

note: ID6NB (2009)[12] is not incremental.

Other Incremental Learning Systems

There were several incremental concept learning systems that did not build decision trees, but which predated and influenced the development of the earliest incremental decision tree learners, notably ID4.^[7] Notable among these was Schlimmer and Granger's STAGGER (1986),^[13] which learned disjunctive concepts incrementally. STAGGER was developed to examine concepts that changed over time (concept drift). Prior to STAGGER, Michalski and Larson (1978)^[14] investigated an incremental variant of AQ (Michalski, 1973),^[15] a supervised system for learning concepts in disjunctive normal form (DNF). Experience with these earlier systems and others, to include incremental tree-structured unsupervised learning, contributed to a conceptual framework for evaluating incremental decision tree learners specifically, and incremental concept learning generally, along four dimensions that reflect the inherent tradeoffs between learning cost and quality:^[7] (1) cost of knowledge base update, (2) the number of observations that are required to converge on a knowledge base with given characteristics, (3) the total effort (as a function of the first two dimensions) that a system exerts, and the (4) quality (often consistency) of the final knowledge base. Some of the historical context in which incremental decision tree learners emerged is given in Fisher and Schlimmer (1988),^[16] and which also expands on the four factor framework that was used to evaluate and design incremental learning systems.

VFDT Algorithm

Very Fast Decision Trees learner reduces training time for large incremental data sets by subsampling the incoming data stream.

VFDT (2000)^[17]
CVFDT (2001)^[18] can adapt to concept drift, by using a sliding window on incoming data. Old data outside the window is forgotten.
VFDTc (2006)^[19] extends VFDT for continuous data, concept drift, and application of Naive Bayes classifiers in the leaves.
VFML (2003) is a toolkit and available on the web. . It was developed by the creators of VFDT and CVFDT.

OLIN and IFN

OLIN (2002)[20]
IOLIN (2008)^[21] — based on Info-Fuzzy Network (IFN)^[22]

GAENARI

gaenari

Related Research Articles

Inductive logic programming (ILP) is a subfield of symbolic artificial intelligence which uses logic programming as a uniform representation for examples, background knowledge and hypotheses. The term "inductive" here refers to philosophical rather than mathematical induction. Given an encoding of the known background knowledge and a set of examples represented as a logical database of facts, an ILP system will derive a hypothesised logic program which entails all the positive and none of the negative examples.

In machine learning, boosting is an ensemble meta-algorithm for primarily reducing bias, variance. It is used in supervised learning and a family of machine learning algorithms that convert weak learners to strong ones.

The inductive bias of a learning algorithm is the set of assumptions that the learner uses to predict outputs of given inputs that it has not encountered. Inductive bias is anything which makes the algorithm learn one pattern instead of another pattern. Learning is the process of apprehending useful knowledge by observing and interacting with the world. It involves searching a space of solutions for one expected to provide a better explanation of the data or to achieve higher rewards. But in many cases, there are multiple solutions which are equally good. An inductive bias allows a learning algorithm to prioritize one solution over another, independent of the observed data.

Decision tree learning is a supervised learning approach used in statistics, data mining and machine learning. In this formalism, a classification or regression decision tree is used as a predictive model to draw conclusions about a set of observations.

Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness. In any given transaction with a variety of items, association rules are meant to discover the rules that determine how or why certain items are connected.

Data Stream Mining is the process of extracting knowledge structures from continuous, rapid data records. A data stream is an ordered sequence of instances that in many applications of data stream mining can be read only once or a small number of times using limited computing and storage capabilities.

In predictive analytics, data science, machine learning and related fields, concept drift or drift is an evolution of data that invalidates the data model. It happens when the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways. This causes problems because the predictions become less accurate as time passes. Drift detection and drift adaptation are of paramount importance in the fields that involve dynamically changing data and data models.

John Ross Quinlan is a computer science researcher in data mining and decision theory. He has contributed extensively to the development of decision tree algorithms, including inventing the canonical C4.5 and ID3 algorithms. He also contributed to early ILP literature with First Order Inductive Learner (FOIL). He is currently running the company RuleQuest Research which he founded in 1997.

Grammar induction is the process in machine learning of learning a formal grammar from a set of observations, thus constructing a model which accounts for the characteristics of the observed objects. More generally, grammatical inference is that branch of machine learning where the instance space consists of discrete combinatorial objects such as strings, trees and graphs.

Meta-learning is a subfield of machine learning where automatic learning algorithms are applied to metadata about machine learning experiments. As of 2017, the term had not found a standard interpretation, however the main goal is to use such metadata to understand how automatic learning can become flexible in solving learning problems, hence to improve the performance of existing learning algorithms or to learn (induce) the learning algorithm itself, hence the alternative term learning to learn.

Conceptual clustering is a machine learning paradigm for unsupervised classification that has been defined by Ryszard S. Michalski in 1980 and developed mainly during the 1980s. It is distinguished from ordinary data clustering by generating a concept description for each generated class. Most conceptual clustering methods are capable of generating hierarchical category structures; see Categorization for more information on hierarchy. Conceptual clustering is closely related to formal concept analysis, decision tree learning, and mixture model learning.

In machine learning, multi-label classification or multi-output classification is a variant of the classification problem where multiple nonexclusive labels may be assigned to each instance. Multi-label classification is a generalization of multiclass classification, which is the single-label problem of categorizing instances into precisely one of several classes. In the multi-label problem the labels are nonexclusive and there is no constraint on how many of the classes the instance can be assigned to.

Rule induction is an area of machine learning in which formal rules are extracted from a set of observations. The rules extracted may represent a full scientific model of the data, or merely represent local patterns in the data.

Active learning is a special case of machine learning in which a learning algorithm can interactively query a human user, to label new data points with the desired outputs. The human user must possess knowledge/expertise in the problem domain, including the ability to consult/research authoritative sources when necessary. In statistics literature, it is sometimes also called optimal experimental design. The information source is also called teacher or oracle.

Contrast set learning is a form of association rule learning that seeks to identify meaningful differences between separate groups by reverse-engineering the key predictors that identify for each particular group. For example, given a set of attributes for a pool of students, a contrast set learner would identify the contrasting features between students seeking bachelor's degrees and those working toward PhD degrees.

Massive Online Analysis (MOA) is a free open-source software project specific for data stream mining with concept drift. It is written in Java and developed at the University of Waikato, New Zealand.

Automatic taxonomy construction (ATC) is the use of software programs to generate taxonomical classifications from a body of texts called a corpus. ATC is a branch of natural language processing, which in turn is a branch of artificial intelligence.

The rules extraction system (RULES) family is a family of inductive learning that includes several covering algorithms. This family is used to build a predictive model based on given observation. It works based on the concept of separate-and-conquer to directly induce rules from a given training set and build its knowledge repository.

In computer science, incremental learning is a method of machine learning in which input data is continuously used to extend the existing model's knowledge i.e. to further train the model. It represents a dynamic technique of supervised learning and unsupervised learning that can be applied when training data becomes available gradually over time or its size is out of system memory limits. Algorithms that can facilitate incremental learning are known as incremental machine learning algorithms.

The following outline is provided as an overview of and topical guide to machine learning:

References

↑ Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. (1984). Classification and regression trees. Belmont, CA: Wadsworth International. ISBN 978-1-351-46048-4.
↑ Morgan, J.N; Sondquist, J.A. (1963). "Problems in the analysis of survey data, and a proposal" (PDF). J. Amer. Statist. Assoc. 58 (302): 415–434. doi:10.1080/01621459.1963.10500855.
↑ Crawford, S.L. (1989). "Extensions to the CART algorithm". International Journal of Man-Machine Studies. 31 (2): 197–217. doi:10.1016/0020-7373(89)90027-8.
↑ Quinlan, J.R. (1986). "Induction of Decision Trees" (PDF). Machine Learning. 1 (1): 81–106. doi:10.1007/BF00116251. S2CID 13252401.
↑ Quinlan, J.R. (2014) [1993]. C4.5: Programs for machine learning. Elsevier. ISBN 978-1-55860-238-0.
↑ Hunt, E.B.; Marin, J.; Stone, P.J. (1966). Experiments in induction. Academic Press. ISBN 978-0-12-362350-8.
1 2 3 4 Schlimmer, J.C.; Fisher, D. (1986). "A case study of incremental concept induction". AAAI'86: Proceedings of the Fifth National Conference on Artificial Intelligence. Morgan Kaufmann. pp. 496–501.
↑ Utgoff, P.E. (1988). "ID5: An incremental ID3" (PDF). Machine Learning Proceedings 1988 Proceedings of the Fifth International Conference on Machine Learning. Morgan Kaufmann. pp. 107–120. doi:10.1016/B978-0-934613-64-4.50017-7. ISBN 978-0-934613-64-4. Publishers.
↑ Utgoff, P.E. (1989). "Incremental induction of decision trees" (PDF). Machine Learning. 4 (2): 161–186. doi:10.1023/A:1022699900025. S2CID 5293072.
↑ Kroon, M., Korzec, S., Adriani, P. (2007) ID6MDL: Post-Pruning Incremental Decision Trees.
↑ Utgoff, P.E.; Berkman, N.C.; Clouse, J.A. (1997). "Decision tree induction based on efficient tree restructuring" (PDF). Machine Learning. 29: 5–44. doi:10.1023/A:1007413323501. S2CID 2743403.
↑ Appavu, S.; Rajaram, R. (2009). "Knowledge-based system for text classification using ID6NB algorithm]". Knowledge-Based Systems. 22 (1): 1–7. doi:10.1016/j.knosys.2008.04.006.
↑ Schlimmer, J.C.; Granger, Jr., R.H. (1986). "Incremental learning from noisy data". Machine Learning. 1 (3): 317–354. doi: 10.1007/BF00116895 . S2CID 33776987.
↑ Michalski, R.S.; Larson, J.B. (1978). Selection of most representative training examples and incremental generation of VL hypotheses: The underlying methodology and the description of the programs ESEL and AQ11 (PDF) (Technical report). University of Illinois, Department of Computer Science. hdl:1920/1544/78-03. UIUCDCS-R-78-867.
↑ Michalski, R.S. (1973). "Discovering classification rules using variable-valued logic system VL1" (PDF). IJCAI'73: Proceedings of the Third International Joint Conference on Artificial Intelligence. Stanford, CA: Morgan Kaufmann. pp. 162–172. hdl:1920/1515/73-01.
↑ Fisher, D.; Schlimmer, J. (1988). Models of Incremental Concept Learning: A coupled research proposal (Technical report). Vanderbilt University. CS-88-05.
↑ Domingos, P.; Hulten, G. (2000). "Mining high-speed data streams" (PDF). Proceedings KDD Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM Press. pp. 71–80. doi:10.1145/347090.347107. ISBN 1-58113-233-6. S2CID 8810610.
↑ Hulten, G.; Spencer, L.; Domingos, P. (2001). "Mining time-changing data streams" (PDF). Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM Press. pp. 97–106. doi:10.1145/502512.502529. ISBN 978-1-58113-391-2. S2CID 6416602.
↑ Gama, J.; Fernandes, R.; Rocha, R. (2006). "Decision trees for mining data streams". Intelligent Data Analysis. 10: 23–45. doi:10.3233/IDA-2006-10103.
↑ Last, M. (2002). "Online classification of nonstationary data streams" (PDF). Intell. Data Anal. 6 (2): 129–147. doi:10.3233/IDA-2002-6203.
↑ Cohen, L.; Avrahami, G.; Last, M.; Kandel, A. (2008). "Info-fuzzy algorithms for mining dynamic data streams" (PDF). Applied Soft Computing. 8 (4): 1283–94. doi:10.1016/j.asoc.2007.11.003.
↑ Maimon, O.; Last, M. (2000). The info-fuzzy network (IFN) methodology. Knowledge Discovery and Data Mining. Kluwer. doi:10.1007/978-1-4757-3296-2. ISBN 978-1-4757-3296-2. S2CID 41520652.

External links

ITI code. http://www-lrn.cs.umass.edu/iti/index.html
VFML code. http://www.cs.washington.edu/dm/vfml/
C++ incremental decision tree. https://github.com/greenfish77/gaenari

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. (1984). Classification and regression trees. Belmont, CA: Wadsworth International. ISBN 978-1-351-46048-4.

[2] Morgan, J.N; Sondquist, J.A. (1963). "Problems in the analysis of survey data, and a proposal" (PDF). J. Amer. Statist. Assoc. 58 (302): 415–434. doi:10.1080/01621459.1963.10500855.

[3] Crawford, S.L. (1989). "Extensions to the CART algorithm". International Journal of Man-Machine Studies. 31 (2): 197–217. doi:10.1016/0020-7373(89)90027-8.

[4] Quinlan, J.R. (1986). "Induction of Decision Trees" (PDF). Machine Learning. 1 (1): 81–106. doi:10.1007/BF00116251. S2CID 13252401.

[5] Quinlan, J.R. (2014) [1993]. C4.5: Programs for machine learning. Elsevier. ISBN 978-1-55860-238-0.

[6] Hunt, E.B.; Marin, J.; Stone, P.J. (1966). Experiments in induction. Academic Press. ISBN 978-0-12-362350-8.

[Schlimmer,_J._C._1986_pp._496-501-7] 1 2 3 4 Schlimmer, J.C.; Fisher, D. (1986). "A case study of incremental concept induction". AAAI'86: Proceedings of the Fifth National Conference on Artificial Intelligence. Morgan Kaufmann. pp. 496–501.

[8] Utgoff, P.E. (1988). "ID5: An incremental ID3" (PDF). Machine Learning Proceedings 1988 Proceedings of the Fifth International Conference on Machine Learning. Morgan Kaufmann. pp. 107–120. doi:10.1016/B978-0-934613-64-4.50017-7. ISBN 978-0-934613-64-4. Publishers.

[9] Utgoff, P.E. (1989). "Incremental induction of decision trees" (PDF). Machine Learning. 4 (2): 161–186. doi:10.1023/A:1022699900025. S2CID 5293072.

[10] Kroon, M., Korzec, S., Adriani, P. (2007) ID6MDL: Post-Pruning Incremental Decision Trees.

[11] Utgoff, P.E.; Berkman, N.C.; Clouse, J.A. (1997). "Decision tree induction based on efficient tree restructuring" (PDF). Machine Learning. 29: 5–44. doi:10.1023/A:1007413323501. S2CID 2743403.

[12] Appavu, S.; Rajaram, R. (2009). "Knowledge-based system for text classification using ID6NB algorithm]". Knowledge-Based Systems. 22 (1): 1–7. doi:10.1016/j.knosys.2008.04.006.

[13] Schlimmer, J.C.; Granger, Jr., R.H. (1986). "Incremental learning from noisy data". Machine Learning. 1 (3): 317–354. doi: 10.1007/BF00116895 . S2CID 33776987.

[MichalskiLarson78-14] Michalski, R.S.; Larson, J.B. (1978). Selection of most representative training examples and incremental generation of VL hypotheses: The underlying methodology and the description of the programs ESEL and AQ11 (PDF) (Technical report). University of Illinois, Department of Computer Science. hdl:1920/1544/78-03. UIUCDCS-R-78-867.

[Michalski,_1973-15] Michalski, R.S. (1973). "Discovering classification rules using variable-valued logic system VL1" (PDF). IJCAI'73: Proceedings of the Third International Joint Conference on Artificial Intelligence. Stanford, CA: Morgan Kaufmann. pp. 162–172. hdl:1920/1515/73-01.

[FisherSchlimmer88a-16] Fisher, D.; Schlimmer, J. (1988). Models of Incremental Concept Learning: A coupled research proposal (Technical report). Vanderbilt University. CS-88-05.

[17] Domingos, P.; Hulten, G. (2000). "Mining high-speed data streams" (PDF). Proceedings KDD Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM Press. pp. 71–80. doi:10.1145/347090.347107. ISBN 1-58113-233-6. S2CID 8810610.

[18] Hulten, G.; Spencer, L.; Domingos, P. (2001). "Mining time-changing data streams" (PDF). Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM Press. pp. 97–106. doi:10.1145/502512.502529. ISBN 978-1-58113-391-2. S2CID 6416602.

[19] Gama, J.; Fernandes, R.; Rocha, R. (2006). "Decision trees for mining data streams". Intelligent Data Analysis. 10: 23–45. doi:10.3233/IDA-2006-10103.

[20] Last, M. (2002). "Online classification of nonstationary data streams" (PDF). Intell. Data Anal. 6 (2): 129–147. doi:10.3233/IDA-2002-6203.

[21] Cohen, L.; Avrahami, G.; Last, M.; Kandel, A. (2008). "Info-fuzzy algorithms for mining dynamic data streams" (PDF). Applied Soft Computing. 8 (4): 1283–94. doi:10.1016/j.asoc.2007.11.003.

[22] Maimon, O.; Last, M. (2000). The info-fuzzy network (IFN) methodology. Knowledge Discovery and Data Mining. Kluwer. doi:10.1007/978-1-4757-3296-2. ISBN 978-1-4757-3296-2. S2CID 41520652.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[21]

[22]