Computational learning theory

Last updated

In computer science, computational learning theory (or just learning theory) is a subfield of artificial intelligence devoted to studying the design and analysis of machine learning algorithms. [1]

Contents

Overview

Theoretical results in machine learning mainly deal with a type of inductive learning called supervised learning. In supervised learning, an algorithm is given samples that are labeled in some useful way. For example, the samples might be descriptions of mushrooms, and the labels could be whether or not the mushrooms are edible. The algorithm takes these previously labeled samples and uses them to induce a classifier. This classifier is a function that assigns labels to samples, including samples that have not been seen previously by the algorithm. The goal of the supervised learning algorithm is to optimize some measure of performance such as minimizing the number of mistakes made on new samples.

In addition to performance bounds, computational learning theory studies the time complexity and feasibility of learning.[ citation needed ] In computational learning theory, a computation is considered feasible if it can be done in polynomial time.[ citation needed ] There are two kinds of time complexity results:

Negative results often rely on commonly believed, but yet unproven assumptions,[ citation needed ] such as:

There are several different approaches to computational learning theory based on making different assumptions about the inference principles used to generalise from limited data. This includes different definitions of probability (see frequency probability, Bayesian probability) and different assumptions on the generation of samples.[ citation needed ] The different approaches include:

While its primary goal is to understand learning abstractly, computational learning theory has led to the development of practical algorithms. For example, PAC theory inspired boosting, VC theory led to support vector machines, and Bayesian inference led to belief networks.

See also

Related Research Articles

The following outline is provided as an overview of and topical guide to statistics:

<span class="mw-page-title-main">Manuel Blum</span> Venezuelan computer scientist

Manuel Blum is a Venezuelan born American computer scientist who received the Turing Award in 1995 "In recognition of his contributions to the foundations of computational complexity theory and its application to cryptography and program checking".

Inferences are steps in reasoning, moving from premises to logical consequences; etymologically, the word infer means to "carry forward". Inference is theoretically traditionally divided into deduction and induction, a distinction that in Europe dates at least to Aristotle. Deduction is inference deriving logical conclusions from premises known or assumed to be true, with the laws of valid inference being studied in logic. Induction is inference from particular evidence to a universal conclusion. A third type of inference is sometimes distinguished, notably by Charles Sanders Peirce, contradistinguishing abduction from induction.

Minimum Description Length (MDL) is a model selection principle where the shortest description of the data is the best model. MDL methods learn through a data compression perspective and are sometimes described as mathematical applications of Occam's razor. The MDL principle can be extended to other forms of inductive inference and learning, for example to estimation and sequential prediction, without explicitly identifying a single model of the data.

In computational learning theory, probably approximately correct (PAC) learning is a framework for mathematical analysis of machine learning. It was proposed in 1984 by Leslie Valiant.

Algorithmic learning theory is a mathematical framework for analyzing machine learning problems and algorithms. Synonyms include formal learning theory and algorithmic inductive inference. Algorithmic learning theory is different from statistical learning theory in that it does not make use of statistical assumptions and analysis. Both algorithmic and statistical learning theory are concerned with machine learning and can thus be viewed as branches of computational learning theory.

Ray Solomonoff was an American mathematician who invented algorithmic probability, his General Theory of Inductive Inference, and was a founder of algorithmic information theory. He was an originator of the branch of artificial intelligence based on machine learning, prediction and probability. He circulated the first report on non-semantic machine learning in 1956.

<span class="mw-page-title-main">Algorithmic probability</span>

In algorithmic information theory, algorithmic probability, also known as Solomonoff probability, is a mathematical method of assigning a prior probability to a given observation. It was invented by Ray Solomonoff in the 1960s. It is used in inductive inference theory and analyses of algorithms. In his general theory of inductive inference, Solomonoff uses the method together with Bayes' rule to obtain probabilities of prediction for an algorithm's future outputs.

Solomonoff's theory of inductive inference is a mathematical theory of induction introduced by Ray Solomonoff, based on probability theory and theoretical computer science. In essence, Solomonoff's induction derives the posterior probability of any computable theory, given a sequence of observed data. This posterior probability is derived from Bayes' rule and some universal prior, that is, a prior that assigns a positive probability to any computable theory.

In formal language theory, in particular in algorithmic learning theory, a class C of languages has finite thickness if every string is contained in at most finitely many languages in C. This condition was introduced by Dana Angluin as a sufficient condition for C being identifiable in the limit.

Algorithmic information theory (AIT) is a branch of theoretical computer science that concerns itself with the relationship between computation and information of computably generated objects, such as strings or any other data structure. In other words, it is shown within algorithmic information theory that computational incompressibility "mimics" the relations or inequalities found in information theory. According to Gregory Chaitin, it is "the result of putting Shannon's information theory and Turing's computability theory into a cocktail shaker and shaking vigorously."

Formal epistemology uses formal methods from decision theory, logic, probability theory and computability theory to model and reason about issues of epistemological interest. Work in this area spans several academic fields, including philosophy, computer science, economics, and statistics. The focus of formal epistemology has tended to differ somewhat from that of traditional epistemology, with topics like uncertainty, induction, and belief revision garnering more attention than the analysis of knowledge, skepticism, and issues with justification.

Grammar induction is the process in machine learning of learning a formal grammar from a set of observations, thus constructing a model which accounts for the characteristics of the observed objects. More generally, grammatical inference is that branch of machine learning where the instance space consists of discrete combinatorial objects such as strings, trees and graphs.

Concept learning, also known as category learning, concept attainment, and concept formation, is defined by Bruner, Goodnow, & Austin (1967) as "the search for and listing of attributes that can be used to distinguish exemplars from non exemplars of various categories". More simply put, concepts are the mental categories that help us classify objects, events, or ideas, building on the understanding that each object, event, or idea has a set of common relevant features. Thus, concept learning is a strategy which requires a learner to compare and contrast groups or categories that contain concept-relevant features with groups or categories that do not contain concept-relevant features.

<i>Information and Computation</i> Academic journal

Information and Computation is a closed-access computer science journal published by Elsevier. The journal was founded in 1957 under its former name Information and Control and given its current title in 1987. As of July 2022, the current editor-in-chief is David Peleg. The journal publishes 12 issues a year.

Dana Angluin is a professor emeritus of computer science at Yale University. She is known for foundational work in computational learning theory and distributed computing.

Probabilistic programming (PP) is a programming paradigm in which probabilistic models are specified and inference for these models is performed automatically. It represents an attempt to unify probabilistic modeling and traditional general purpose programming in order to make the former easier and more widely applicable. It can be used to create systems that help make decisions in the face of uncertainty.

In computational learning theory, Occam learning is a model of algorithmic learning where the objective of the learner is to output a succinct representation of received training data. This is closely related to probably approximately correct (PAC) learning, where the learner is evaluated on its predictive power of a test set.

In PAC learning, error tolerance refers to the ability of an algorithm to learn when the examples received have been corrupted in some way. In fact, this is a very common and important issue since in many applications it is not possible to access noise-free data. Noise can interfere with the learning process at different levels: the algorithm may receive data that have been occasionally mislabeled, or the inputs may have some false information, or the classification of the examples may have been maliciously adulterated.

The following outline is provided as an overview of and topical guide to machine learning:

References

  1. "ACL - Association for Computational Learning".
  2. Valiant, Leslie (1984). "A Theory of the Learnable" (PDF). Communications of the ACM. 27 (11): 1134–1142. doi:10.1145/1968.1972. S2CID   12837541. Archived from the original (PDF) on 2019-05-17. Retrieved 2022-11-24.
  3. Vapnik, V.; Chervonenkis, A. (1971). "On the uniform convergence of relative frequencies of events to their probabilities" (PDF). Theory of Probability and Its Applications. 16 (2): 264–280. doi:10.1137/1116025.
  4. Solomonoff, Ray (March 1964). "A Formal Theory of Inductive Inference Part 1". Information and Control. 7 (1): 1–22. doi: 10.1016/S0019-9958(64)90223-2 .
  5. Solomonoff, Ray (1964). "A Formal Theory of Inductive Inference Part 2". Information and Control. 7 (2): 224–254. doi:10.1016/S0019-9958(64)90131-7.
  6. Gold, E. Mark (1967). "Language identification in the limit" (PDF). Information and Control. 10 (5): 447–474. doi: 10.1016/S0019-9958(67)91165-5 .

Further reading

A description of some of these publications is given at important publications in machine learning.

Surveys

Feature selection

Optimal O notation learning

Negative results

Error tolerance

Equivalence