Massive Online Analysis

Last updated
MOA
Developer(s) University of Waikato
Stable release
20.12.0 [1] / 16 December 2020;28 days ago (16 December 2020)
Repository OOjs UI icon edit-ltr-progressive.svg
Operating system Cross-platform
Type Machine Learning
License GNU General Public License
Website moa.cms.waikato.ac.nz

Massive Online Analysis (MOA) is a free open-source software project specific for data stream mining with concept drift. It is written in Java and developed at the University of Waikato, New Zealand. [2]

Contents

Description

MOA is an open-source framework software that allows to build and run experiments of machine learning or data mining on evolving data streams. It includes a set of learners and stream generators that can be used from the Graphical User Interface (GUI), the command-line, and the Java API. MOA contains several collections of machine learning algorithms:

These algorithms are designed for large scale machine learning, dealing with concept drift, and big data streams in real time.

MOA supports bi-directional interaction with Weka (machine learning). MOA is free software released under the GNU GPL.

See also

Related Research Articles

Data mining is a process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and statistics with an overall goal to extract information from a data set and transform the information into a comprehensible structure for further use. Data mining is the analysis step of the "knowledge discovery in databases" process, or KDD. Aside from the raw analysis step, it also involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.

In machine learning, boosting is an ensemble meta-algorithm for primarily reducing bias, and also variance in supervised learning, and a family of machine learning algorithms that convert weak learners to strong ones. Boosting is based on the question posed by Kearns and Valiant : "Can a set of weak learners create a single strong learner?" A weak learner is defined to be a classifier that is only slightly correlated with the true classification. In contrast, a strong learner is a classifier that is arbitrarily well-correlated with the true classification.

Machine learning (ML) is the study of computer algorithms that improve automatically through experience. It is seen as a subset of artificial intelligence. Machine learning algorithms build a model based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide variety of applications, such as email filtering and computer vision, where it is difficult or unfeasible to develop conventional algorithms to perform the needed tasks.

Formal concept analysis (FCA) is a principled way of deriving a concept hierarchy or formal ontology from a collection of objects and their properties. Each concept in the hierarchy represents the objects sharing some set of properties; and each sub-concept in the hierarchy represents a subset of the objects in the concepts above it. The term was introduced by Rudolf Wille in 1981, and builds on the mathematical theory of lattices and ordered sets that was developed by Garrett Birkhoff and others in the 1930s.

Learning classifier system

Learning classifier systems, or LCS, are a paradigm of rule-based machine learning methods that combine a discovery component with a learning component. Learning classifier systems seek to identify a set of context-dependent rules that collectively store and apply knowledge in a piecewise manner in order to make predictions. This approach allows complex solution spaces to be broken up into smaller, simpler parts.

Data Stream Mining is the process of extracting knowledge structures from continuous, rapid data records. A data stream is an ordered sequence of instances that in many applications of data stream mining can be read only once or a small number of times using limited computing and storage capabilities.

In predictive analytics and machine learning, the concept drift means that the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways. This causes problems because the predictions become less accurate as time passes.

Weka (machine learning)

Waikato Environment for Knowledge Analysis (Weka), developed at the University of Waikato, New Zealand, is free software licensed under the GNU General Public License, and the companion software to the book "Data Mining: Practical Machine Learning Tools and Techniques".

In machine learning, multi-label classification and the strongly related problem of multi-output classification are variants of the classification problem where multiple labels may be assigned to each instance. Multi-label classification is a generalization of multiclass classification, which is the single-label problem of categorizing instances into precisely one of more than two classes; in the multi-label problem there is no constraint on how many of the classes the instance can be assigned to.

In data analysis, anomaly detection is the identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data. Typically the anomalous items will translate to some kind of problem such as bank fraud, a structural defect, medical problems or errors in a text. Anomalies are also referred to as outliers, novelties, noise, deviations and exceptions.

An alternating decision tree (ADTree) is a machine learning method for classification. It generalizes decision trees and has connections to boosting.

In machine learning, one-class classification (OCC), also known as unary classification or class-modelling, tries to identify objects of a specific class amongst all objects, by primarily learning from a training set containing only the objects of that class, although there exist variants of one-class classifiers where counter-examples are used to further refine the classification boundary. This is different from and more difficult than the traditional classification problem, which tries to distinguish between two or more classes with the training set containing objects from all the classes. Examples include the monitoring of helicopter gearboxes, motor failure prediction, or the operational status of a nuclear plant as 'normal': In this scenario, there are few, if any, examples of catastrophic system states; only the statistics of normal operation are known.

In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. Unlike a statistical ensemble in statistical mechanics, which is usually infinite, a machine learning ensemble consists of only a concrete finite set of alternative models, but typically allows for much more flexible structure to exist among those alternatives.

ELKI Data mining framework

ELKI is a data mining software framework developed for use in research and teaching. It was originally at the database systems research unit of Professor Hans-Peter Kriegel at the Ludwig Maximilian University of Munich, Germany, and now continued at the Technical University of Dortmund, Germany. It aims at allowing the development and evaluation of advanced data mining algorithms and their interaction with database index structures.

Infobox Template used to collect and present a subset of information about a subject

On wikis, an infobox is a table used to collect and present a subset of information about its subject, such as a document. It is a structured document containing a set of attribute–value pairs, and in Wikipedia represents a summary of information about the subject of an article. In this way, they are comparable to data tables in some aspects. When presented within the larger document it summarizes, an infobox is often presented in a sidebar format.

Adversarial machine learning is a machine learning technique that attempts to fool models by supplying deceptive input. The most common reason is to cause a malfunction in a machine learning model.

The following outline is provided as an overview of and topical guide to machine learning. Machine learning is a subfield of soft computing within computer science that evolved from the study of pattern recognition and computational learning theory in artificial intelligence. In 1959, Arthur Samuel defined machine learning as a "field of study that gives computers the ability to learn without being explicitly programmed". Machine learning explores the study and construction of algorithms that can learn from and make predictions on data. Such algorithms operate by building a model from an example training set of input observations in order to make data-driven predictions or decisions expressed as outputs, rather than following strictly static program instructions.

Structured k-Nearest Neighbours is a machine learning algorithm that generalizes the k-Nearest Neighbors (kNN) classifier. Whereas the kNN classifier supports binary classification, multiclass classification and regression, the Structured kNN (SkNN) allows training of a classifier for general structured output labels.

scikit-multiflow Machine learning library for data streams in Python

scikit-mutliflow is a free and open source software machine learning library for multi-output/multi-label and stream data written in Python.

References

  1. "Release 20.12.0". 16 December 2020. Retrieved 13 January 2021.
  2. Bifet, Albert; Holmes, Geoff; Kirkby, Richard; Pfahringer, Bernhard (2010). "MOA: Massive online analysis". The Journal of Machine Learning Research. 99: 1601–1604.
  3. Losing, Viktor; Hammer, Barbara; Wersing, Heiko (2017). "Tackling heterogeneous concept drift with the Self-Adjusting Memory (SAM)". Knowledge and Information Systems. 54: 171–201. doi:10.1007/s10115-017-1137-y. ISSN   0885-6125. S2CID   29600755.
  4. Read, Jesse; Bifet, Albert; Holmes, Geoff; Pfahringer, Bernhard (2012). "Scalable and efficient multi-label classification for evolving data streams". Machine Learning. 88 (1–2): 243–272. doi: 10.1007/s10994-012-5279-6 . ISSN   0885-6125. S2CID   14676146.
  5. Zliobaite, Indre; Bifet, Albert; Pfahringer, Bernhard; Holmes, Geoffrey (2014). "Active Learning With Drifting Streaming Data". IEEE Transactions on Neural Networks and Learning Systems. 25 (1): 27–39. doi:10.1109/TNNLS.2012.2236570. ISSN   2162-237X. PMID   24806642. S2CID   14687075.
  6. Ikonomovska, Elena; Gama, João; Džeroski, Sašo (2010). "Learning model trees from evolving data streams" (PDF). Data Mining and Knowledge Discovery. 23 (1): 128–168. doi:10.1007/s10618-010-0201-y. ISSN   1384-5810. S2CID   7114108.
  7. Almeida, Ezilda; Ferreira, Carlos; Gama, João (2013). "Adaptive Model Rules from Data Streams". Advanced Information Systems Engineering. Lecture Notes in Computer Science. 8188. pp. 480–492. CiteSeerX   10.1.1.638.5472 . doi:10.1007/978-3-642-40988-2_31. ISBN   978-3-642-38708-1. ISSN   0302-9743.
  8. Kranen, Philipp; Kremer, Hardy; Jansen, Timm; Seidl, Thomas; Bifet, Albert; Holmes, Geoff; Pfahringer, Bernhard (2010). "Clustering Performance on Evolving Data Streams: Assessing Algorithms and Evaluation Measures within MOA". 2010 IEEE International Conference on Data Mining Workshops. pp. 1400–1403. doi:10.1109/ICDMW.2010.17. ISBN   978-1-4244-9244-2. S2CID   2064336.
  9. Georgiadis, Dimitrios; Kontaki, Maria; Gounaris, Anastasios; Papadopoulos, Apostolos N.; Tsichlas, Kostas; Manolopoulos, Yannis (2013). "Continuous outlier detection in data streams". Proceedings of the 2013 international conference on Management of data - SIGMOD '13. p. 1061. doi:10.1145/2463676.2463691. ISBN   9781450320375. S2CID   1886134.
  10. Assent, Ira; Kranen, Philipp; Baldauf, Corinna; Seidl, Thomas (2012). "AnyOut: Anytime Outlier Detection on Streaming Data". Database Systems for Advanced Applications. Lecture Notes in Computer Science. 7238. pp. 228–242. doi:10.1007/978-3-642-29038-1_18. ISBN   978-3-642-29037-4. ISSN   0302-9743.
  11. Quadrana, Massimo; Bifet, Albert; Gavaldà, Ricard (2013). "An Efficient Closed Frequent Itemset Miner for the MOA Stream Mining System". Frontiers in Artificial Intelligence and Applications. 256 (Artificial Intelligence Research and Development): 203. doi:10.3233/978-1-61499-320-9-203.
  12. Bifet, Albert; Holmes, Geoff; Pfahringer, Bernhard; Gavaldà, Ricard (2011). "Mining frequent closed graphs on evolving data streams". Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '11. p. 591. CiteSeerX   10.1.1.297.1721 . doi:10.1145/2020408.2020501. ISBN   9781450308137. S2CID   8588858.
  13. Bifet, Albert; Read, Jesse; Pfahringer, Bernhard; Holmes, Geoff; Žliobaitė, Indrė (2013). "CD-MOA: Change Detection Framework for Massive Online Analysis". Advances in Intelligent Data Analysis XII. Lecture Notes in Computer Science. 8207. pp. 92–103. doi:10.1007/978-3-642-41398-8_9. ISBN   978-3-642-41397-1. ISSN   0302-9743.