Instance-based learning

Last updated May 25, 2021

In machine learning, instance-based learning (sometimes called memory-based learning^[1]) is a family of learning algorithms that, instead of performing explicit generalization, compare new problem instances with instances seen in training, which have been stored in memory. Because computation is postponed until a new instance is observed, these algorithms are sometimes referred to as "lazy."^[2]

It is called instance-based because it constructs hypotheses directly from the training instances themselves.^[3] This means that the hypothesis complexity can grow with the data:^[3] in the worst case, a hypothesis is a list of n training items and the computational complexity of classifying a single new instance is O(n). One advantage that instance-based learning has over other methods of machine learning is its ability to adapt its model to previously unseen data. Instance-based learners may simply store a new instance or throw an old instance away.

Examples of instance-based learning algorithms are the k-nearest neighbors algorithm, kernel machines and RBF networks.^[2]^{:ch. 8} These store (a subset of) their training set; when predicting a value/class for a new instance, they compute distances or similarities between this instance and the training instances to make a decision.

To battle the memory complexity of storing all training instances, as well as the risk of overfitting to noise in the training set, instance reduction algorithms have been proposed.^[4]

Related Research Articles

In computer science, the computational complexity or simply complexity of an algorithm is the amount of resources required to run it. Particular focus is given to time and memory requirements. The complexity of a problem is the complexity of the best algorithms that allow solving the problem.

Complexity characterises the behaviour of a system or model whose components interact in multiple ways and follow local rules, meaning there is no reasonable higher instruction to define the various possible interactions.

Computational complexity theory focuses on classifying computational problems according to their resource usage, and relating these classes to each other. A computational problem is a task solved by a computer. A computation problem is solvable by mechanical application of mathematical steps, such as an algorithm.

Supervised learning (SL) is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. It infers a function from labeled training data consisting of a set of training examples. In supervised learning, each example is a pair consisting of an input object and a desired output value. A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples. An optimal scenario will allow for the algorithm to correctly determine the class labels for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way. This statistical quality of an algorithm is measured through the so-called generalization error.

Computational geometry is a branch of computer science devoted to the study of algorithms which can be stated in terms of geometry. Some purely geometrical problems arise out of the study of computational geometric algorithms, and such problems are also considered to be part of computational geometry. While modern computational geometry is a recent development, it is one of the oldest fields of computing with a history stretching back to antiquity.

Machine learning (ML) is the study of computer algorithms that improve automatically through experience and by the use of data. It is seen as a part of artificial intelligence. Machine learning algorithms build a model based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide variety of applications, such as in medicine, email filtering, and computer vision, where it is difficult or unfeasible to develop conventional algorithms to perform the needed tasks.

While logical inference and mathematical proof had existed previously, in 1931 Kurt Gödel proved with his incompleteness theorem that there are fundamental limitations on what statements could be proved or disproved.

Minimum description length (MDL) refers to various formalizations of Occam's razor based on formal languages used to parsimoniously describe data. In its most basic form, MDL is a model selection principle: the shortest description of the data as the best model. These descriptions are intended to provide data-driven scientific models. This idea can be extended to other forms of inductive inference and learning, beyond model selection - it can also be used for estimation and sequential prediction, for example, without explicitly identifying a single model for the data. MDL is an important concept in information theory and computational learning theory. Importantly, the definite noun phrase "the minimum description length principle" is in use for at least three quite different instantiations of this idea, that, while interrelated, vary in what is meant by 'description'. It is mostly used for Jorma Rissanen's theory of learning, in which models are statistical hypotheses and descriptions are defined as universal codes, a central concept of information theory. Secondly, it is still often used to refer to Rissanen's 1978 very first pragmatic attempt to automatically derive short descriptions, related to the Bayesian Information Criterion (BIC). Thirdly, it is often used within Algorithmic Information Theory, where the description length of a data sequence is the length of the smallest program that outputs that data set. In this context, it is also known as 'idealized' MDL principle and it is closely related to Solomonoff's theory of inductive inference, which is that the best model of a data set is embodied by its shortest self-extracting archive.

Solomonoff's theory of inductive inference is a mathematical theory of induction introduced by Ray Solomonoff, based on probability theory and theoretical computer science. In essence, Solomonoff's induction derives the posterior probability of any computable theory, given a sequence of observed data. This posterior probability is derived from Bayes rule and some universal prior, that is, a prior that assigns a positive probability to any computable theory.

In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by the algorithm, supposing that each elementary operation takes a fixed amount of time to perform. Thus, the amount of time taken and the number of elementary operations performed by the algorithm are taken to differ by at most a constant factor.

In computational complexity theory, a complexity class is a set of computational problems of related resource-based complexity. The two most commonly analyzed resources are time and memory.

Learning classifier systems, or LCS, are a paradigm of rule-based machine learning methods that combine a discovery component with a learning component. Learning classifier systems seek to identify a set of context-dependent rules that collectively store and apply knowledge in a piecewise manner in order to make predictions. This approach allows complex solution spaces to be broken up into smaller, simpler parts.

Statistical learning theory is a framework for machine learning drawing from the fields of statistics and functional analysis. Statistical learning theory deals with the problem of finding a predictive function based on data. Statistical learning theory has led to successful applications in fields such as computer vision, speech recognition, and bioinformatics.

A neural network is a network or circuit of neurons, or in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus a neural network is either a biological neural network, made up of real biological neurons, or an artificial neural network, for solving artificial intelligence (AI) problems. The connections of the biological neuron are modeled as weights. A positive weight reflects an excitatory connection, while negative values mean inhibitory connections. All inputs are modified by a weight and summed. This activity is referred to as a linear combination. Finally, an activation function controls the amplitude of the output. For example, an acceptable range of output is usually between 0 and 1, or it could be −1 and 1.

Grammar induction is the process in machine learning of learning a formal grammar from a set of observations, thus constructing a model which accounts for the characteristics of the observed objects. More generally, grammatical inference is that branch of machine learning where the instance space consists of discrete combinatorial objects such as strings, trees and graphs.

The cerebellar model arithmetic computer (CMAC) is a type of neural network based on a model of the mammalian cerebellum. It is also known as the cerebellar model articulation controller. It is a type of associative memory.

In computational complexity theory and analysis of algorithms, a number of metrics are defined describing the resources, such as time or space, that a machine needs to solve a particular problem. Interpreting these metrics meaningfully requires context, and this context is frequently implicit and depends on the field and the problem under consideration. This article describes a number of important pieces of context and how they affect metrics.

In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. Unlike a statistical ensemble in statistical mechanics, which is usually infinite, a machine learning ensemble consists of only a concrete finite set of alternative models, but typically allows for much more flexible structure to exist among those alternatives.

Quantum machine learning is the integration of quantum algorithms within machine learning programs. The most common use of the term refers to machine learning algorithms for the analysis of classical data executed on a quantum computer, i.e. quantum-enhanced machine learning. While machine learning algorithms are used to compute immense quantities of data, quantum machine learning utilizes qubits and quantum operations or specialized quantum systems to improve computational speed and data storage done by algorithms in a program. This includes hybrid methods that involve both classical and quantum processing, where computationally difficult subroutines are outsourced to a quantum device. These routines can be more complex in nature and executed faster on a quantum computer. Furthermore, quantum algorithms can be used to analyze quantum states instead of classical data. Beyond quantum computing, the term "quantum machine learning" is also associated with classical machine learning methods applied to data generated from quantum experiments, such as learning the phase transitions of a quantum system or creating new quantum experiments. Quantum machine learning also extends to a branch of research that explores methodological and structural similarities between certain physical systems and learning systems, in particular neural networks. For example, some mathematical and numerical techniques from quantum physics are applicable to classical deep learning and vice versa. Furthermore, researchers investigate more abstract notions of learning theory with respect to quantum information, sometimes referred to as "quantum learning theory".

This glossary of artificial intelligence is a list of definitions of terms and concepts relevant to the study of artificial intelligence, its sub-disciplines, and related fields. Related glossaries include Glossary of computer science, Glossary of robotics, and Glossary of machine vision.

References

↑ Walter Daelemans; Antal van den Bosch (2005). Memory-Based Language Processing. Cambridge University Press.CS1 maint: discouraged parameter (link)
1 2 Tom Mitchell (1997). Machine Learning. McGraw-Hill.
1 2 Stuart Russell and Peter Norvig (2003). Artificial Intelligence: A Modern Approach , second edition, p. 733. Prentice Hall. ISBN 0-13-080302-2
↑ D. Randall Wilson; Tony R. Martinez (2000). "Reduction techniques for instance-based learning algorithms". Machine Learning .

This artificial intelligence-related article is a stub. You can help Wikipedia by expanding it.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Walter Daelemans; Antal van den Bosch (2005). Memory-Based Language Processing. Cambridge University Press.CS1 maint: discouraged parameter (link)

[mitchell-2] 1 2 Tom Mitchell (1997). Machine Learning. McGraw-Hill.

[aima733-3] 1 2 Stuart Russell and Peter Norvig (2003). Artificial Intelligence: A Modern Approach , second edition, p. 733. Prentice Hall. ISBN 0-13-080302-2

[4] D. Randall Wilson; Tony R. Martinez (2000). "Reduction techniques for instance-based learning algorithms". Machine Learning .

[1]

[2]

[3]

[4]

Instance-based learning

See also

Related Research Articles

References