Associative classifier

Last updated

An associative classifier (AC) is a kind of supervised learning model that uses association rules to assign a target value. The term associative classification was coined by Bing Liu et al., [1] in which the authors defined a model made of rules "whose right-hand side are restricted to the classification class attribute".

Contents

Model

The model generated by an AC and used to label new records consists of association rules, where the consequent corresponds to the class label. As such, they can also be seen as a list of "if-then" clauses: if the record matches some criteria (expressed in the left side of the rule, also called antecedent), it is then labeled accordingly to the class on the right side of the rule (or consequent).

Most ACs read the list of rules in order, and apply the first matching rule to label the new record. [2]

Metrics

The rules of an AC inherit some of the metrics of association rules, like the support or the confidence. [3] Metrics can be used to order or filter the rules in the model [4] and to evaluate their quality.

Implementations

The first proposal of a classification model made of association rules was FBM. [5] The approach was popularized by CBA, [1] although other authors had also previously proposed the mining of association rules for classification. [6] Other authors have since then proposed multiple changes to the initial model, like the addition of a redundant rule pruning phase [7] or the exploitation of Emerging Patterns. [8]

Notable implementations include:

Related Research Articles

<span class="mw-page-title-main">Data mining</span> Process of extracting and discovering patterns in large data sets

Data mining is the process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and statistics with an overall goal of extracting information from a data set and transforming the information into a comprehensible structure for further use. Data mining is the analysis step of the "knowledge discovery in databases" process, or KDD. Aside from the raw analysis step, it also involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.

<span class="mw-page-title-main">Decision tree learning</span> Machine learning algorithm

Decision tree learning is a supervised learning approach used in statistics, data mining and machine learning. In this formalism, a classification or regression decision tree is used as a predictive model to draw conclusions about a set of observations.

<span class="mw-page-title-main">Association rule learning</span> Method for discovering interesting relations between variables in databases

Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness. In any given transaction with a variety of items, association rules are meant to discover the rules that determine how or why certain items are connected.

A recommender system, or a recommendation system, is a subclass of information filtering system that provide suggestions for items that are most pertinent to a particular user. Typically, the suggestions refer to various decision-making processes, such as what product to purchase, what music to listen to, or what online news to read. Recommender systems are particularly useful when an individual needs to choose an item from a potentially overwhelming number of items that a service may offer.

In predictive analytics, data science, machine learning and related fields, concept drift or drift is an evolution of data that invalidates the data model. It happens when the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways. This causes problems because the predictions become less accurate as time passes. Drift detection and drift adaptation are of paramount importance in the fields that involve dynamically changing data and data models.

<span class="mw-page-title-main">Non-negative matrix factorization</span> Algorithms for matrix decomposition

Non-negative matrix factorization, also non-negative matrix approximation is a group of algorithms in multivariate analysis and linear algebra where a matrix V is factorized into (usually) two matrices W and H, with the property that all three matrices have no negative elements. This non-negativity makes the resulting matrices easier to inspect. Also, in applications such as processing of audio spectrograms or muscular activity, non-negativity is inherent to the data being considered. Since the problem is not exactly solvable in general, it is commonly approximated numerically.

This page describes mining for molecules. Since molecules may be represented by molecular graphs this is strongly related to graph mining and structured data mining. The main problem is how to represent molecules while discriminating the data instances. One way to do this is chemical similarity metrics, which has a long tradition in the field of cheminformatics.

Sentiment analysis is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine. With the rise of deep language models, such as RoBERTa, also more difficult data domains can be analyzed, e.g., news texts where authors typically express their opinion/sentiment less explicitly.

<span class="mw-page-title-main">Anomaly detection</span> Approach in data analysis

In data analysis, anomaly detection is generally understood to be the identification of rare items, events or observations which deviate significantly from the majority of the data and do not conform to a well defined notion of normal behaviour. Such examples may arouse suspicions of being generated by a different mechanism, or appear inconsistent with the remainder of that set of data.

In machine learning, one-class classification (OCC), also known as unary classification or class-modelling, tries to identify objects of a specific class amongst all objects, by primarily learning from a training set containing only the objects of that class, although there exist variants of one-class classifiers where counter-examples are used to further refine the classification boundary. This is different from and more difficult than the traditional classification problem, which tries to distinguish between two or more classes with the training set containing objects from all the classes. Examples include the monitoring of helicopter gearboxes, motor failure prediction, or the operational status of a nuclear plant as 'normal': In this scenario, there are few, if any, examples of catastrophic system states; only the statistics of normal operation are known.

AMiner is a free online service used to index, search, and mine big scientific data.

<span class="mw-page-title-main">Active learning (machine learning)</span>

Active learning is a special case of machine learning in which a learning algorithm can interactively query a user to label new data points with the desired outputs. In statistics literature, it is sometimes also called optimal experimental design. The information source is also called teacher or oracle.

Massive Online Analysis (MOA) is a free open-source software project specific for data stream mining with concept drift. It is written in Java and developed at the University of Waikato, New Zealand.

Geoffrey I. Webb is Professor of Computer Science at Monash University, founder and director of Data Mining software development and consultancy company G. I. Webb and Associates, and former editor-in-chief of the journal Data Mining and Knowledge Discovery. Before joining Monash University he was on the faculty at Griffith University from 1986 to 1988 and then at Deakin University from 1988 to 2002.

<span class="mw-page-title-main">Probabilistic classification</span>

In machine learning, a probabilistic classifier is a classifier that is able to predict, given an observation of an input, a probability distribution over a set of classes, rather than only outputting the most likely class that the observation should belong to. Probabilistic classifiers provide classification that can be useful in its own right or when combining classifiers into ensembles.

Bing Liu is a Chinese-American professor of computer science who specializes in data mining, machine learning, and natural language processing. In 2002, he became a scholar at University of Illinois at Chicago. He holds a PhD from the University of Edinburgh (1988). His PhD advisors were Austin Tate and Kenneth Williamson Currie, and his PhD thesis was titled Reinforcement Planning for Resource Allocation and Constraint Satisfaction.

<span class="mw-page-title-main">Haesun Park</span> South Korean American mathematician

Haesun Park is a professor and chair of Computational Science and Engineering at the Georgia Institute of Technology. She is an IEEE Fellow, ACM Fellow, and Society for Industrial and Applied Mathematics Fellow. Park's main areas of research are Numerical Algorithms, Data Analysis, Visual Analytics and Parallel Computing. She has co-authored over 100 articles in peer-reviewed journals and conferences.

<span class="mw-page-title-main">Martin Ester</span>

Martin Ester is a Canadian-German Full Professor of Computing Science at Simon Fraser University. His research focuses on researcher data mining and machine learning.

<span class="mw-page-title-main">Spatial embedding</span>

Spatial embedding is one of feature learning techniques used in spatial analysis where points, lines, polygons or other spatial data types. representing geographic locations are mapped to vectors of real numbers. Conceptually it involves a mathematical embedding from a space with many dimensions per geographic object to a continuous vector space with a much lower dimension.

References

  1. 1 2 Liu, Bing; Hsu, Wynne; Ma, Yiming (1998). "Integrating Classification and Association Rule Mining": 80––86. CiteSeerX   10.1.1.48.8380 .{{cite journal}}: Cite journal requires |journal= (help)
  2. Thabtah, Fadi (2007). "A review of associative classification mining" (PDF). The Knowledge Engineering Review. 22 (1): 37–65. doi:10.1017/s0269888907001026. ISSN   0269-8889. S2CID   15986963.
  3. Liao, T Warren; Triantaphyllou, Evangelos (2008). Recent Advances in Data Mining of Enterprise Data: Algorithms and Applications. Series on Computers and Operations Research. WORLD SCIENTIFIC. doi:10.1142/6689. ISBN   9789812779854. S2CID   34599426.
  4. "CBA homepage" . Retrieved 2018-10-04.
  5. Webb, Geoffrey (1989). "A Machine Learning Approach to Student Modelling" (PDF). Proceedings of the Third Australian Joint Conference on Artificial Intelligence (AI 89): 195–205.
  6. Ali, Kamal; Manganaris, Stefanos; Srikant, Ramakrishnan (1997-08-14). "Partial classification using association rules". KDD'97. AAAI Press: 115–118.{{cite journal}}: Cite journal requires |journal= (help)
  7. 1 2 Wenmin Li; Jiawei Han; Jian Pei (2001). CMAR: accurate and efficient classification based on multiple class-association rules. Proceedings 2001 IEEE International Conference on Data Mining. IEEE Comput. Soc. pp. 369–376. CiteSeerX   10.1.1.13.219 . doi:10.1109/icdm.2001.989541. ISBN   978-0769511191. S2CID   2243455.
  8. 1 2 Dong, Guozhu; Zhang, Xiuzhen; Wong, Limsoon; Li, Jinyan (1999), "CAEP: Classification by Aggregating Emerging Patterns" , Discovery Science, Springer Berlin Heidelberg, pp.  30–42, CiteSeerX   10.1.1.37.3226 , doi:10.1007/3-540-46846-3_4, ISBN   9783540667131
  9. "CMAR Implementation". cgi.csc.liv.ac.uk. Retrieved 2018-10-04.
  10. Yin, Xiaoxin; Han, Jiawei (2003), "CPAR: Classification based on Predictive Association Rules", Proceedings of the 2003 SIAM International Conference on Data Mining, Society for Industrial and Applied Mathematics, pp. 331–335, CiteSeerX   10.1.1.12.7268 , doi:10.1137/1.9781611972733.40, ISBN   9780898715453
  11. "THE LUCS-KDD IMPLEMENTATIONS OF THE FOIL, PRM AND CPAR ALGORITHMS". cgi.csc.liv.ac.uk. Retrieved 2018-10-04.
  12. Baralis, E.; Chiusano, S.; Garza, P. (2008). "A Lazy Approach to Associative Classification". IEEE Transactions on Knowledge and Data Engineering. 20 (2): 156–171. doi:10.1109/tkde.2007.190677. ISSN   1041-4347. S2CID   14829459.
  13. "L3 implementation". dbdmg.polito.it. Retrieved 2018-10-08.
  14. Chen, Guoqing; Liu, Hongyan; Yu, Lan; Wei, Qiang; Zhang, Xing (2006). "A new approach to classification based on association rule mining". Decision Support Systems. 42 (2): 674–689. doi:10.1016/j.dss.2005.03.005. ISSN   0167-9236.
  15. Wang, Ke; Zhou, Senqiang; He, Yu (2000). Growing decision trees on support-less association rules. Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '00. New York, New York, USA: ACM Press. CiteSeerX   10.1.1.36.9265 . doi:10.1145/347090.347147. ISBN   978-1581132335. S2CID   8296096.