Incremental decision tree

Last updated

An incremental decision tree algorithm is an online machine learning algorithm that outputs a decision tree. Many decision tree methods, such as C4.5, construct a tree using a complete dataset. Incremental decision tree methods allow an existing tree to be updated using only new individual data instances, without having to re-process past instances. This may be useful in situations where the entire dataset is not available when the tree is updated (i.e. the data was not stored), the original data set is too large to process or the characteristics of the data change over time.

Contents

Applications

Methods

Here is a short list of incremental decision tree methods, organized by their (usually non-incremental) parent algorithms.

CART family

CART [1] (1984) is a nonincremental decision tree inducer for both classification and regression problems. developed in the mathematics and statistics communities. CART traces its roots to AID (1963) [2]

ID3/C4.5 family

ID3 (1986) [4] and C4.5 (1993) [5] were developed by Quinlan and have roots in Hunt's Concept Learning System (CLS, 1966) [6] The ID3 family of tree inducers was developed in the engineering and computer science communities.

note: ID6NB (2009) [12] is not incremental.

Other Incremental Learning Systems

There were several incremental concept learning systems that did not build decision trees, but which predated and influenced the development of the earliest incremental decision tree learners, notably ID4. [7] Notable among these was Schlimmer and Granger's STAGGER (1986), [13] which learned disjunctive concepts incrementally. STAGGER was developed to examine concepts that changed over time (concept drift). Prior to STAGGER, Michalski and Larson (1978) [14] investigated an incremental variant of AQ (Michalski, 1973), [15] a supervised system for learning concepts in disjunctive normal form (DNF). Experience with these earlier systems and others, to include incremental tree-structured unsupervised learning, contributed to a conceptual framework for evaluating incremental decision tree learners specifically, and incremental concept learning generally, along four dimensions that reflect the inherent tradeoffs between learning cost and quality: [7] (1) cost of knowledge base update, (2) the number of observations that are required to converge on a knowledge base with given characteristics, (3) the total effort (as a function of the first two dimensions) that a system exerts, and the (4) quality (often consistency) of the final knowledge base. Some of the historical context in which incremental decision tree learners emerged is given in Fisher and Schlimmer (1988), [16] and which also expands on the four factor framework that was used to evaluate and design incremental learning systems.

VFDT Algorithm

Very Fast Decision Trees learner reduces training time for large incremental data sets by subsampling the incoming data stream.

EFDT Algorithm

The Extremely Fast Decision Tree learner [20] is statistically more powerful than VFDT, allowing it to learn more detailed trees from less data. It differs from VFDT in the method for deciding when to insert a new branch into the tree. VFDT waits until it is confident that the best available branch is better than any alternative. In contrast, EFDT splits as soon as it is confident that the best available branch is better than the current alternative. Initially, the current alternative is no branch. This allows EFDT to insert branches much more rapidly than VFDT. During incremental learning this means that EFDT can deploy useful trees much sooner than VFDT.

However, the new branch selection method greatly increases the likelihood of selecting a suboptimal branch. In consequence, EFDT keeps monitoring the performance of all branches and will replace a branch as soon as it is confident there is a better alternative.

OLIN and IFN

GAENARI

See also

Related Research Articles

<span class="mw-page-title-main">Boosting (machine learning)</span> Method in machine learning

In machine learning, boosting is an ensemble meta-algorithm for primarily reducing bias, and also variance in supervised learning, and a family of machine learning algorithms that convert weak learners to strong ones. Boosting is based on the question posed by Kearns and Valiant : "Can a set of weak learners create a single strong learner?" A weak learner is defined to be a classifier that is only slightly correlated with the true classification. In contrast, a strong learner is a classifier that is arbitrarily well-correlated with the true classification.

<span class="mw-page-title-main">Decision tree learning</span> Machine learning algorithm

Decision tree learning is a supervised learning approach used in statistics, data mining and machine learning. In this formalism, a classification or regression decision tree is used as a predictive model to draw conclusions about a set of observations.

<span class="mw-page-title-main">Association rule learning</span> Method for discovering interesting relations between variables in databases

Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness. In any given transaction with a variety of items, association rules are meant to discover the rules that determine how or why certain items are connected.

<span class="mw-page-title-main">Learning classifier system</span> Paradigm of rule-based machine learning methods

Learning classifier systems, or LCS, are a paradigm of rule-based machine learning methods that combine a discovery component with a learning component. Learning classifier systems seek to identify a set of context-dependent rules that collectively store and apply knowledge in a piecewise manner in order to make predictions. This approach allows complex solution spaces to be broken up into smaller, simpler parts.

Data Stream Mining is the process of extracting knowledge structures from continuous, rapid data records. A data stream is an ordered sequence of instances that in many applications of data stream mining can be read only once or a small number of times using limited computing and storage capabilities.

In predictive analytics, data science, machine learning and related fields, concept drift or drift is an evolution of data that invalidates the data model. It happens when the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways. This causes problems because the predictions become less accurate as time passes. Drift detection and drift adaptation are of paramount importance in the fields that involve dynamically changing data and data models.

John Ross Quinlan is a computer science researcher in data mining and decision theory. He has contributed extensively to the development of decision tree algorithms, including inventing the canonical C4.5 and ID3 algorithms. He also contributed to early ILP literature with First Order Inductive Learner (FOIL). He is currently running the company RuleQuest Research which he founded in 1997.

Conceptual clustering is a machine learning paradigm for unsupervised classification that has been defined by Ryszard S. Michalski in 1980 and developed mainly during the 1980s. It is distinguished from ordinary data clustering by generating a concept description for each generated class. Most conceptual clustering methods are capable of generating hierarchical category structures; see Categorization for more information on hierarchy. Conceptual clustering is closely related to formal concept analysis, decision tree learning, and mixture model learning.

<span class="mw-page-title-main">Rule induction</span> Area of machine learning

Rule induction is an area of machine learning in which formal rules are extracted from a set of observations. The rules extracted may represent a full scientific model of the data, or merely represent local patterns in the data.

<span class="mw-page-title-main">Anomaly detection</span> Approach in data analysis

In data analysis, anomaly detection is generally understood to be the identification of rare items, events or observations which deviate significantly from the majority of the data and do not conform to a well defined notion of normal behaviour. Such examples may arouse suspicions of being generated by a different mechanism, or appear inconsistent with the remainder of that set of data.

<span class="mw-page-title-main">Local outlier factor</span>

In anomaly detection, the local outlier factor (LOF) is an algorithm proposed by Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng and Jörg Sander in 2000 for finding anomalous data points by measuring the local deviation of a given data point with respect to its neighbours.

<span class="mw-page-title-main">Active learning (machine learning)</span> Machine learning strategy

Active learning is a special case of machine learning in which a learning algorithm can interactively query a user to label new data points with the desired outputs. In statistics literature, it is sometimes also called optimal experimental design. The information source is also called teacher or oracle.

Contrast set learning is a form of association rule learning that seeks to identify meaningful differences between separate groups by reverse-engineering the key predictors that identify for each particular group. For example, given a set of attributes for a pool of students, a contrast set learner would identify the contrasting features between students seeking bachelor's degrees and those working toward PhD degrees.

Massive Online Analysis (MOA) is a free open-source software project specific for data stream mining with concept drift. It is written in Java and developed at the University of Waikato, New Zealand.

Geoffrey I. Webb is Professor of Computer Science at Monash University, founder and director of Data Mining software development and consultancy company G. I. Webb and Associates, and former editor-in-chief of the journal Data Mining and Knowledge Discovery. Before joining Monash University he was on the faculty at Griffith University from 1986 to 1988 and then at Deakin University from 1988 to 2002.

The rules extraction system (RULES) family is a family of inductive learning that includes several covering algorithms. This family is used to build a predictive model based on given observation. It works based on the concept of separate-and-conquer to directly induce rules from a given training set and build its knowledge repository.

In computer science, incremental learning is a method of machine learning in which input data is continuously used to extend the existing model's knowledge i.e. to further train the model. It represents a dynamic technique of supervised learning and unsupervised learning that can be applied when training data becomes available gradually over time or its size is out of system memory limits. Algorithms that can facilitate incremental learning are known as incremental machine learning algorithms.

<span class="mw-page-title-main">Outline of machine learning</span> Overview of and topical guide to machine learning

The following outline is provided as an overview of and topical guide to machine learning. Machine learning is a subfield of soft computing within computer science that evolved from the study of pattern recognition and computational learning theory in artificial intelligence. In 1959, Arthur Samuel defined machine learning as a "field of study that gives computers the ability to learn without being explicitly programmed". Machine learning explores the study and construction of algorithms that can learn from and make predictions on data. Such algorithms operate by building a model from an example training set of input observations in order to make data-driven predictions or decisions expressed as outputs, rather than following strictly static program instructions.

Arthur Zimek is a professor in data mining, data science and machine learning at the University of Southern Denmark in Odense, Denmark.

Himabindu "Hima" Lakkaraju is an Indian-American computer scientist who works on machine learning, artificial intelligence, algorithmic bias, and AI accountability. She is currently an Assistant Professor at the Harvard Business School and is also affiliated with the Department of Computer Science at Harvard University. Lakkaraju is known for her work on explainable machine learning. More broadly, her research focuses on developing machine learning models and algorithms that are interpretable, transparent, fair, and reliable. She also investigates the practical and ethical implications of deploying machine learning models in domains involving high-stakes decisions such as healthcare, criminal justice, business, and education. Lakkaraju was named as one of the world's top Innovators Under 35 by both Vanity Fair and the MIT Technology Review.

References

  1. Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. (1984). Classification and regression trees. Belmont, CA: Wadsworth International. ISBN   978-1-351-46048-4.
  2. Morgan, J.N; Sondquist, J.A. (1963). "Problems in the analysis of survey data, and a proposal" (PDF). J. Amer. Statist. Assoc. 58 (302): 415–434. doi:10.1080/01621459.1963.10500855.
  3. Crawford, S.L. (1989). "Extensions to the CART algorithm". International Journal of Man-Machine Studies. 31 (2): 197–217. doi:10.1016/0020-7373(89)90027-8.
  4. Quinlan, J.R. (1986). "Induction of Decision Trees" (PDF). Machine Learning. 1 (1): 81–106. doi:10.1007/BF00116251. S2CID   13252401.
  5. Quinlan, J.R. (2014) [1993]. C4.5: Programs for machine learning. Elsevier. ISBN   978-1-55860-238-0.
  6. Hunt, E.B.; Marin, J.; Stone, P.J. (1966). Experiments in induction. Academic Press. ISBN   978-0-12-362350-8.
  7. 1 2 3 4 Schlimmer, J.C.; Fisher, D. (1986). "A case study of incremental concept induction". AAAI'86: Proceedings of the Fifth National Conference on Artificial Intelligence. Morgan Kaufmann. pp. 496–501.
  8. Utgoff, P.E. (1988). "ID5: An incremental ID3" (PDF). Machine Learning Proceedings 1988 Proceedings of the Fifth International Conference on Machine Learning. Morgan Kaufmann. pp. 107–120. doi:10.1016/B978-0-934613-64-4.50017-7. ISBN   978-0-934613-64-4. Publishers.
  9. Utgoff, P.E. (1989). "Incremental induction of decision trees" (PDF). Machine Learning. 4 (2): 161–186. doi:10.1023/A:1022699900025. S2CID   5293072.
  10. Kroon, M., Korzec, S., Adriani, P. (2007) ID6MDL: Post-Pruning Incremental Decision Trees.
  11. Utgoff, P.E.; Berkman, N.C.; Clouse, J.A. (1997). "Decision tree induction based on efficient tree restructuring" (PDF). Machine Learning. 29: 5–44. doi:10.1023/A:1007413323501. S2CID   2743403.
  12. Appavu, S.; Rajaram, R. (2009). "Knowledge-based system for text classification using ID6NB algorithm]". Knowledge-Based Systems. 22 (1): 1–7. doi:10.1016/j.knosys.2008.04.006.
  13. Schlimmer, J.C.; Granger, Jr., R.H. (1986). "Incremental learning from noisy data". Machine Learning. 1 (3): 317–354. doi: 10.1007/BF00116895 . S2CID   33776987.
  14. Michalski, R.S.; Larson, J.B. (1978). Selection of most representative training examples and incremental generation of VL hypotheses: The underlying methodology and the description of the programs ESEL and AQ11 (PDF) (Technical report). University of Illinois, Department of Computer Science. hdl:1920/1544/78-03. UIUCDCS-R-78-867.
  15. Michalski, R.S. (1973). "Discovering classification rules using variable-valued logic system VL1" (PDF). IJCAI'73: Proceedings of the Third International Joint Conference on Artificial Intelligence. Stanford, CA: Morgan Kaufmann. pp. 162–172. hdl:1920/1515/73-01.
  16. Fisher, D.; Schlimmer, J. (1988). Models of Incremental Concept Learning: A coupled research proposal (Technical report). Vanderbilt University. CS-88-05.
  17. Domingos, P.; Hulten, G. (2000). "Mining high-speed data streams" (PDF). Proceedings KDD Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM Press. pp. 71–80. doi:10.1145/347090.347107. ISBN   1-58113-233-6. S2CID   8810610.
  18. Hulten, G.; Spencer, L.; Domingos, P. (2001). "Mining time-changing data streams" (PDF). Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM Press. pp. 97–106. doi:10.1145/502512.502529. ISBN   978-1-58113-391-2. S2CID   6416602.
  19. Gama, J.; Fernandes, R.; Rocha, R. (2006). "Decision trees for mining data streams". Intelligent Data Analysis. 10: 23–45. doi:10.3233/IDA-2006-10103.
  20. Manapragada, C.; Webb, G. I.; Salehi, M. (2018). "Extremely Fast Decision Tree". KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM Press. pp. 1953–62. arXiv: 1802.08780 . doi:10.1145/3219819.3220005. ISBN   978-1-4503-5552-0. S2CID   3513409.
  21. Last, M. (2002). "Online classification of nonstationary data streams" (PDF). Intell. Data Anal. 6 (2): 129–147. doi:10.3233/IDA-2002-6203.
  22. Cohen, L.; Avrahami, G.; Last, M.; Kandel, A. (2008). "Info-fuzzy algorithms for mining dynamic data streams" (PDF). Applied Soft Computing. 8 (4): 1283–94. doi:10.1016/j.asoc.2007.11.003.
  23. Maimon, O.; Last, M. (2000). The info-fuzzy network (IFN) methodology. Knowledge Discovery and Data Mining. Kluwer. doi:10.1007/978-1-4757-3296-2. ISBN   978-1-4757-3296-2. S2CID   41520652.