Massive Online Analysis

Last updated
MOA
Developer(s) University of Waikato
Stable release
24.07.0 [1] / 18 July 2024;4 months ago (18 July 2024)
Repository
Operating system Cross-platform
Type Machine Learning
License GNU General Public License
Website moa.cms.waikato.ac.nz

Massive Online Analysis (MOA) is a free open-source software project specific for data stream mining with concept drift. It is written in Java and developed at the University of Waikato, New Zealand. [2]

Contents

Description

MOA is an open-source framework software that allows to build and run experiments of machine learning or data mining on evolving data streams. It includes a set of learners and stream generators that can be used from the graphical user interface (GUI), the command-line, and the Java API.

MOA contains several collections of machine learning algorithms:

These algorithms are designed for large scale machine learning, dealing with concept drift, and big data streams in real time.

MOA supports bi-directional interaction with Weka. MOA is free software released under the GNU GPL.

See also

Related Research Articles

Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can learn from data and generalize to unseen data, and thus perform tasks without explicit instructions. Advances in the field of deep learning have allowed neural networks to surpass many previous approaches in performance.

In information science, formal concept analysis (FCA) is a principled way of deriving a concept hierarchy or formal ontology from a collection of objects and their properties. Each concept in the hierarchy represents the objects sharing some set of properties; and each sub-concept in the hierarchy represents a subset of the objects in the concepts above it. The term was introduced by Rudolf Wille in 1981, and builds on the mathematical theory of lattices and ordered sets that was developed by Garrett Birkhoff and others in the 1930s.

Data Stream Mining is the process of extracting knowledge structures from continuous, rapid data records. A data stream is an ordered sequence of instances that in many applications of data stream mining can be read only once or a small number of times using limited computing and storage capabilities.

In predictive analytics, data science, machine learning and related fields, concept drift or drift is an evolution of data that invalidates the data model. It happens when the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways. This causes problems because the predictions become less accurate as time passes. Drift detection and drift adaptation are of paramount importance in the fields that involve dynamically changing data and data models.

In machine learning, multi-label classification or multi-output classification is a variant of the classification problem where multiple nonexclusive labels may be assigned to each instance. Multi-label classification is a generalization of multiclass classification, which is the single-label problem of categorizing instances into precisely one of several classes. In the multi-label problem the labels are nonexclusive and there is no constraint on how many of the classes the instance can be assigned to.

In data analysis, anomaly detection is generally understood to be the identification of rare items, events or observations which deviate significantly from the majority of the data and do not conform to a well defined notion of normal behavior. Such examples may arouse suspicions of being generated by a different mechanism, or appear inconsistent with the remainder of that set of data.

In machine learning, one-class classification (OCC), also known as unary classification or class-modelling, tries to identify objects of a specific class amongst all objects, by primarily learning from a training set containing only the objects of that class, although there exist variants of one-class classifiers where counter-examples are used to further refine the classification boundary. This is different from and more difficult than the traditional classification problem, which tries to distinguish between two or more classes with the training set containing objects from all the classes. Examples include the monitoring of helicopter gearboxes, motor failure prediction, or the operational status of a nuclear plant as 'normal': In this scenario, there are few, if any, examples of catastrophic system states; only the statistics of normal operation are known.

Within statistics, oversampling and undersampling in data analysis are techniques used to adjust the class distribution of a data set. These terms are used both in statistical sampling, survey design methodology and in machine learning.

An incremental decision tree algorithm is an online machine learning algorithm that outputs a decision tree. Many decision tree methods, such as C4.5, construct a tree using a complete dataset. Incremental decision tree methods allow an existing tree to be updated using only new individual data instances, without having to re-process past instances. This may be useful in situations where the entire dataset is not available when the tree is updated, the original data set is too large to process or the characteristics of the data change over time.

The European Joint Conferences on Theory and Practice of Software (ETAPS) is a confederation of (currently) four computer science conferences taking place annually at one conference site, usually end of March or April. Three of the four conferences are top ranked in software engineering and one (ESOP) is top ranked in programming languages.

<span class="mw-page-title-main">ELKI</span> Data mining framework

ELKI is a data mining software framework developed for use in research and teaching. It was originally created by the database systems research unit at the Ludwig Maximilian University of Munich, Germany, led by Professor Hans-Peter Kriegel. The project has continued at the Technical University of Dortmund, Germany. It aims at allowing the development and evaluation of advanced data mining algorithms and their interaction with database index structures.

<span class="mw-page-title-main">Astroinformatics</span> Interdisciplinary field of study

Astroinformatics is an interdisciplinary field of study involving the combination of astronomy, data science, machine learning, informatics, and information/communications technologies. The field is closely related to astrostatistics.

<span class="mw-page-title-main">Infobox</span> Template used to collect and present a subset of information about a subject

An infobox is a digital or physical table used to collect and present a subset of information about its subject, such as a document. It is a structured document containing a set of attribute–value pairs, and in Wikipedia represents a summary of information about the subject of an article. In this way, they are comparable to data tables in some aspects. When presented within the larger document it summarizes, an infobox is often presented in a sidebar format.

Structured k-nearest neighbours (SkNN) is a machine learning algorithm that generalizes k-nearest neighbors (k-NN). k-NN supports binary classification, multiclass classification, and regression, whereas SkNN allows training of a classifier for general structured output.

Arthur Zimek is a professor in data mining, data science and machine learning at the University of Southern Denmark in Odense, Denmark.

Runtime predictive analysis is a runtime verification technique in computer science for detecting property violations in program executions inferred from an observed execution. An important class of predictive analysis methods has been developed for detecting concurrency errors in concurrent programs, where a runtime monitor is used to predict errors which did not happen in the observed run, but can happen in an alternative execution of the same program. The predictive capability comes from the fact that the analysis is performed on an abstract model extracted online from the observed execution, which admits a class of executions beyond the observed one.

scikit-multiflow Machine learning library for data streams in Python

scikit-mutliflow is a free and open source software machine learning library for multi-output/multi-label and stream data written in Python.

A copy detection pattern (CDP) or graphical code is a small random or pseudo-random digital image which is printed on documents, labels or products for counterfeit detection. Authentication is made by scanning the printed CDP using an image scanner or mobile phone camera. It is possible to store additional product-specific data into the CDP that will be decoded during the scanning process. A CDP can also be inserted into a 2D barcode to facilitate smartphone authentication and to connect with traceability data.

The International Symposium on Experimental Algorithms (SEA), previously known as Workshop on Experimental Algorithms (WEA), is a computer science conference in the area of algorithm engineering.

Yun Sing Koh is a New Zealand computer science academic, and is a full professor at the University of Auckland, specialising in machine learning and artificial intelligence. She is a co-director of the Centre of Machine Learning for Social Good, and the Advanced Machine Learning and Data Analytics Research (MARS) Lab at Auckland.

References

  1. "Release 24.07.0". 18 July 2024. Retrieved 23 July 2024.
  2. Bifet, Albert; Holmes, Geoff; Kirkby, Richard; Pfahringer, Bernhard (2010). "MOA: Massive online analysis". The Journal of Machine Learning Research. 99: 1601–1604.
  3. Losing, Viktor; Hammer, Barbara; Wersing, Heiko (2017). "Tackling heterogeneous concept drift with the Self-Adjusting Memory (SAM)". Knowledge and Information Systems. 54: 171–201. doi:10.1007/s10115-017-1137-y. ISSN   0885-6125. S2CID   29600755.
  4. Read, Jesse; Bifet, Albert; Holmes, Geoff; Pfahringer, Bernhard (2012). "Scalable and efficient multi-label classification for evolving data streams". Machine Learning. 88 (1–2): 243–272. doi: 10.1007/s10994-012-5279-6 . ISSN   0885-6125. S2CID   14676146.
  5. Zliobaite, Indre; Bifet, Albert; Pfahringer, Bernhard; Holmes, Geoffrey (2014). "Active Learning With Drifting Streaming Data". IEEE Transactions on Neural Networks and Learning Systems. 25 (1): 27–39. doi:10.1109/TNNLS.2012.2236570. ISSN   2162-237X. PMID   24806642. S2CID   14687075.
  6. Ikonomovska, Elena; Gama, João; Džeroski, Sašo (2010). "Learning model trees from evolving data streams" (PDF). Data Mining and Knowledge Discovery. 23 (1): 128–168. doi:10.1007/s10618-010-0201-y. ISSN   1384-5810. S2CID   7114108.
  7. Almeida, Ezilda; Ferreira, Carlos; Gama, João (2013). "Adaptive Model Rules from Data Streams". Advanced Information Systems Engineering. Lecture Notes in Computer Science. Vol. 8188. pp. 480–492. CiteSeerX   10.1.1.638.5472 . doi:10.1007/978-3-642-40988-2_31. ISBN   978-3-642-38708-1. ISSN   0302-9743.
  8. Kranen, Philipp; Kremer, Hardy; Jansen, Timm; Seidl, Thomas; Bifet, Albert; Holmes, Geoff; Pfahringer, Bernhard (2010). "Clustering Performance on Evolving Data Streams: Assessing Algorithms and Evaluation Measures within MOA". 2010 IEEE International Conference on Data Mining Workshops. pp. 1400–1403. doi:10.1109/ICDMW.2010.17. ISBN   978-1-4244-9244-2. S2CID   2064336.
  9. Georgiadis, Dimitrios; Kontaki, Maria; Gounaris, Anastasios; Papadopoulos, Apostolos N.; Tsichlas, Kostas; Manolopoulos, Yannis (2013). "Continuous outlier detection in data streams". Proceedings of the 2013 international conference on Management of data - SIGMOD '13. p. 1061. doi:10.1145/2463676.2463691. ISBN   9781450320375. S2CID   1886134.
  10. Assent, Ira; Kranen, Philipp; Baldauf, Corinna; Seidl, Thomas (2012). "AnyOut: Anytime Outlier Detection on Streaming Data". Database Systems for Advanced Applications. Lecture Notes in Computer Science. Vol. 7238. pp. 228–242. doi:10.1007/978-3-642-29038-1_18. ISBN   978-3-642-29037-4. ISSN   0302-9743.
  11. Quadrana, Massimo; Bifet, Albert; Gavaldà, Ricard (2013). "An Efficient Closed Frequent Itemset Miner for the MOA Stream Mining System". Frontiers in Artificial Intelligence and Applications. 256 (Artificial Intelligence Research and Development): 203. doi:10.3233/978-1-61499-320-9-203.
  12. Bifet, Albert; Holmes, Geoff; Pfahringer, Bernhard; Gavaldà, Ricard (2011). "Mining frequent closed graphs on evolving data streams". Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '11. p. 591. CiteSeerX   10.1.1.297.1721 . doi:10.1145/2020408.2020501. ISBN   9781450308137. S2CID   8588858.
  13. Bifet, Albert; Read, Jesse; Pfahringer, Bernhard; Holmes, Geoff; Žliobaitė, Indrė (2013). "CD-MOA: Change Detection Framework for Massive Online Analysis". Advances in Intelligent Data Analysis XII. Lecture Notes in Computer Science. Vol. 8207. pp. 92–103. doi:10.1007/978-3-642-41398-8_9. ISBN   978-3-642-41397-1. ISSN   0302-9743.