The topic of this article may not meet Wikipedia's notability guidelines for products and services .(May 2013) |
Developer(s) | University of Waikato |
---|---|
Stable release | 24.07.0 [1] / 18 July 2024 |
Repository | |
Operating system | Cross-platform |
Type | Machine Learning |
License | GNU General Public License |
Website | moa |
Massive Online Analysis (MOA) is a free open-source software project specific for data stream mining with concept drift. It is written in Java and developed at the University of Waikato, New Zealand. [2]
MOA is an open-source framework software that allows to build and run experiments of machine learning or data mining on evolving data streams. It includes a set of learners and stream generators that can be used from the graphical user interface (GUI), the command-line, and the Java API.
MOA contains several collections of machine learning algorithms:
These algorithms are designed for large scale machine learning, dealing with concept drift, and big data streams in real time.
MOA supports bi-directional interaction with Weka. MOA is free software released under the GNU GPL.
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can learn from data and generalize to unseen data, and thus perform tasks without explicit instructions. Advances in the field of deep learning have allowed neural networks to surpass many previous approaches in performance.
In information science, formal concept analysis (FCA) is a principled way of deriving a concept hierarchy or formal ontology from a collection of objects and their properties. Each concept in the hierarchy represents the objects sharing some set of properties; and each sub-concept in the hierarchy represents a subset of the objects in the concepts above it. The term was introduced by Rudolf Wille in 1981, and builds on the mathematical theory of lattices and ordered sets that was developed by Garrett Birkhoff and others in the 1930s.
Data Stream Mining is the process of extracting knowledge structures from continuous, rapid data records. A data stream is an ordered sequence of instances that in many applications of data stream mining can be read only once or a small number of times using limited computing and storage capabilities.
In predictive analytics, data science, machine learning and related fields, concept drift or drift is an evolution of data that invalidates the data model. It happens when the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways. This causes problems because the predictions become less accurate as time passes. Drift detection and drift adaptation are of paramount importance in the fields that involve dynamically changing data and data models.
In machine learning, multi-label classification or multi-output classification is a variant of the classification problem where multiple nonexclusive labels may be assigned to each instance. Multi-label classification is a generalization of multiclass classification, which is the single-label problem of categorizing instances into precisely one of several classes. In the multi-label problem the labels are nonexclusive and there is no constraint on how many of the classes the instance can be assigned to.
In data analysis, anomaly detection is generally understood to be the identification of rare items, events or observations which deviate significantly from the majority of the data and do not conform to a well defined notion of normal behavior. Such examples may arouse suspicions of being generated by a different mechanism, or appear inconsistent with the remainder of that set of data.
In machine learning, one-class classification (OCC), also known as unary classification or class-modelling, tries to identify objects of a specific class amongst all objects, by primarily learning from a training set containing only the objects of that class, although there exist variants of one-class classifiers where counter-examples are used to further refine the classification boundary. This is different from and more difficult than the traditional classification problem, which tries to distinguish between two or more classes with the training set containing objects from all the classes. Examples include the monitoring of helicopter gearboxes, motor failure prediction, or the operational status of a nuclear plant as 'normal': In this scenario, there are few, if any, examples of catastrophic system states; only the statistics of normal operation are known.
Within statistics, oversampling and undersampling in data analysis are techniques used to adjust the class distribution of a data set. These terms are used both in statistical sampling, survey design methodology and in machine learning.
An incremental decision tree algorithm is an online machine learning algorithm that outputs a decision tree. Many decision tree methods, such as C4.5, construct a tree using a complete dataset. Incremental decision tree methods allow an existing tree to be updated using only new individual data instances, without having to re-process past instances. This may be useful in situations where the entire dataset is not available when the tree is updated, the original data set is too large to process or the characteristics of the data change over time.
The European Joint Conferences on Theory and Practice of Software (ETAPS) is a confederation of (currently) four computer science conferences taking place annually at one conference site, usually end of March or April. Three of the four conferences are top ranked in software engineering and one (ESOP) is top ranked in programming languages.
ELKI is a data mining software framework developed for use in research and teaching. It was originally created by the database systems research unit at the Ludwig Maximilian University of Munich, Germany, led by Professor Hans-Peter Kriegel. The project has continued at the Technical University of Dortmund, Germany. It aims at allowing the development and evaluation of advanced data mining algorithms and their interaction with database index structures.
Astroinformatics is an interdisciplinary field of study involving the combination of astronomy, data science, machine learning, informatics, and information/communications technologies. The field is closely related to astrostatistics.
An infobox is a digital or physical table used to collect and present a subset of information about its subject, such as a document. It is a structured document containing a set of attribute–value pairs, and in Wikipedia represents a summary of information about the subject of an article. In this way, they are comparable to data tables in some aspects. When presented within the larger document it summarizes, an infobox is often presented in a sidebar format.
Structured k-nearest neighbours (SkNN) is a machine learning algorithm that generalizes k-nearest neighbors (k-NN). k-NN supports binary classification, multiclass classification, and regression, whereas SkNN allows training of a classifier for general structured output.
Arthur Zimek is a professor in data mining, data science and machine learning at the University of Southern Denmark in Odense, Denmark.
Runtime predictive analysis is a runtime verification technique in computer science for detecting property violations in program executions inferred from an observed execution. An important class of predictive analysis methods has been developed for detecting concurrency errors in concurrent programs, where a runtime monitor is used to predict errors which did not happen in the observed run, but can happen in an alternative execution of the same program. The predictive capability comes from the fact that the analysis is performed on an abstract model extracted online from the observed execution, which admits a class of executions beyond the observed one.
scikit-mutliflow is a free and open source software machine learning library for multi-output/multi-label and stream data written in Python.
A copy detection pattern (CDP) or graphical code is a small random or pseudo-random digital image which is printed on documents, labels or products for counterfeit detection. Authentication is made by scanning the printed CDP using an image scanner or mobile phone camera. It is possible to store additional product-specific data into the CDP that will be decoded during the scanning process. A CDP can also be inserted into a 2D barcode to facilitate smartphone authentication and to connect with traceability data.
The International Symposium on Experimental Algorithms (SEA), previously known as Workshop on Experimental Algorithms (WEA), is a computer science conference in the area of algorithm engineering.
Yun Sing Koh is a New Zealand computer science academic, and is a full professor at the University of Auckland, specialising in machine learning and artificial intelligence. She is a co-director of the Centre of Machine Learning for Social Good, and the Advanced Machine Learning and Data Analytics Research (MARS) Lab at Auckland.