Algorithmic learning theory

Last updated

Algorithmic learning theory is a mathematical framework for analyzing machine learning problems and algorithms. Synonyms include formal learning theory and algorithmic inductive inference[ citation needed ]. Algorithmic learning theory is different from statistical learning theory in that it does not make use of statistical assumptions and analysis. Both algorithmic and statistical learning theory are concerned with machine learning and can thus be viewed as branches of computational learning theory [ citation needed ].

Contents

Distinguishing characteristics

Unlike statistical learning theory and most statistical theory in general, algorithmic learning theory does not assume that data are random samples, that is, that data points are independent of each other. This makes the theory suitable for domains where observations are (relatively) noise-free but not random, such as language learning [1] and automated scientific discovery. [2] [3]

The fundamental concept of algorithmic learning theory is learning in the limit: as the number of data points increases, a learning algorithm should converge to a correct hypothesis on every possible data sequence consistent with the problem space. This is a non-probabilistic version of statistical consistency, which also requires convergence to a correct model in the limit, but allows a learner to fail on data sequences with probability measure 0 [ citation needed ].

Algorithmic learning theory investigates the learning power of Turing machines. Other frameworks consider a much more restricted class of learning algorithms than Turing machines, for example, learners that compute hypotheses more quickly, for instance in polynomial time. An example of such a framework is probably approximately correct learning [ citation needed ].

Learning in the limit

The concept was introduced in E. Mark Gold's seminal paper "Language identification in the limit". [4] The objective of language identification is for a machine running one program to be capable of developing another program by which any given sentence can be tested to determine whether it is "grammatical" or "ungrammatical". The language being learned need not be English or any other natural language - in fact the definition of "grammatical" can be absolutely anything known to the tester.

In Gold's learning model, the tester gives the learner an example sentence at each step, and the learner responds with a hypothesis, which is a suggested program to determine grammatical correctness. It is required of the tester that every possible sentence (grammatical or not) appears in the list eventually, but no particular order is required. It is required of the learner that at each step the hypothesis must be correct for all the sentences so far.[ citation needed ]

A particular learner is said to be able to "learn a language in the limit" if there is a certain number of steps beyond which its hypothesis no longer changes.[ citation needed ] At this point it has indeed learned the language, because every possible sentence appears somewhere in the sequence of inputs (past or future), and the hypothesis is correct for all inputs (past or future), so the hypothesis is correct for every sentence. The learner is not required to be able to tell when it has reached a correct hypothesis, all that is required is that it be true.

Gold showed that any language which is defined by a Turing machine program can be learned in the limit by another Turing-complete machine using enumeration.[ clarification needed ] This is done by the learner testing all possible Turing machine programs in turn until one is found which is correct so far - this forms the hypothesis for the current step. Eventually, the correct program will be reached, after which the hypothesis will never change again (but note that the learner does not know that it won't need to change).

Gold also showed that if the learner is given only positive examples (that is, only grammatical sentences appear in the input, not ungrammatical sentences), then the language can only be guaranteed to be learned in the limit if there are only a finite number of possible sentences in the language (this is possible if, for example, sentences are known to be of limited length).[ clarification needed ]

Language identification in the limit is a highly abstract model. It does not allow for limits of runtime or computer memory which can occur in practice, and the enumeration method may fail if there are errors in the input. However the framework is very powerful, because if these strict conditions are maintained, it allows the learning of any program known to be computable[ citation needed ]. This is because a Turing machine program can be written to mimic any program in any conventional programming language. See Church-Turing thesis.

Other identification criteria

Learning theorists have investigated other learning criteria, [5] such as the following.

Mind change bounds are closely related to mistake bounds that are studied in statistical learning theory. [7] Kevin Kelly has suggested that minimizing mind changes is closely related to choosing maximally simple hypotheses in the sense of Occam’s Razor. [8]

Annual conference

Since 1990, there is an International Conference on Algorithmic Learning Theory (ALT), called Workshop in its first years (19901997). [9] Between 1992 and 2016, proceedings were published in the LNCS series. [10] Starting from 2017, they are published by the Proceedings of Machine Learning Research. The 34th conference will be held in Singapore in Feb 2023. [11] The topics of the conference cover all of theoretical machine learning, including statistical and computational learning theory, online learning, active learning, reinforcement learning, and deep learning.

See also

Related Research Articles

<span class="mw-page-title-main">Natural language processing</span> Field of linguistics and computer science

Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data. The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves.

In computability theory, a system of data-manipulation rules is said to be Turing-complete or computationally universal if it can be used to simulate any Turing machine. This means that this system is able to recognize or decide other data-manipulation rule sets. Turing completeness is used as a way to express the power of such a data-manipulation rule set. Virtually all programming languages today are Turing-complete.

<span class="mw-page-title-main">Boosting (machine learning)</span> Method in machine learning

In machine learning, boosting is an ensemble meta-algorithm for primarily reducing bias, and also variance in supervised learning, and a family of machine learning algorithms that convert weak learners to strong ones. Boosting is based on the question posed by Kearns and Valiant : "Can a set of weak learners create a single strong learner?" A weak learner is defined to be a classifier that is only slightly correlated with the true classification. In contrast, a strong learner is a classifier that is arbitrarily well-correlated with the true classification.

Hypercomputation or super-Turing computation refers to models of computation that can provide outputs that are not Turing-computable. Super-Turing computing, introduced in the early 1990's by Hava Siegelmann, refers to such neurological inspired, biological and physical realizable computing; It became the mathematical foundations of Lifelong Machine Learning. Hypercomputation, introduced as a field of science in the late 1990s, is said to be based on the Super Turing but it also includes constructs which are philosophical. For example, a machine that could solve the halting problem would be a hypercomputer; so too would one that can correctly evaluate every statement in Peano arithmetic.

<span class="mw-page-title-main">Machine learning</span> Study of algorithms that improve automatically through experience

Machine learning (ML) is a field of inquiry devoted to understanding and building methods that "learn" – that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.

Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part.

<span class="mw-page-title-main">Computational learning theory</span> Theory of machine learning

In computer science, computational learning theory is a subfield of artificial intelligence devoted to studying the design and analysis of machine learning algorithms.

Ray Solomonoff was the inventor of algorithmic probability, his General Theory of Inductive Inference, and was a founder of algorithmic information theory. He was an originator of the branch of artificial intelligence based on machine learning, prediction and probability. He circulated the first report on non-semantic machine learning in 1956.

Solomonoff's theory of inductive inference is a mathematical proof that if a universe is generated by an algorithm, then observations of that universe, encoded as a dataset, are best predicted by the smallest executable archive of that dataset. This formalization of Occam's razor for induction was introduced by Ray Solomonoff, based on probability theory and theoretical computer science. In essence, Solomonoff's induction derives the posterior probability of any computable theory, given a sequence of observed data. This posterior probability is derived from Bayes rule and some universal prior, that is, a prior that assigns a positive probability to any computable theory.

Computability is the ability to solve a problem in an effective manner. It is a key topic of the field of computability theory within mathematical logic and the theory of computation within computer science. The computability of a problem is closely linked to the existence of an algorithm to solve the problem.

Second-language acquisition (SLA), sometimes called second-language learning — otherwise referred to as L2acquisition, is the process by which people learn a second language. Second-language acquisition is also the scientific discipline devoted to studying that process. The field of second-language acquisition is regarded by some but not everybody as a sub-discipline of applied linguistics but also receives research attention from a variety of other disciplines, such as psychology and education.

<span class="mw-page-title-main">Grammar induction</span>

Grammar induction is the process in machine learning of learning a formal grammar from a set of observations, thus constructing a model which accounts for the characteristics of the observed objects. More generally, grammatical inference is that branch of machine learning where the instance space consists of discrete combinatorial objects such as strings, trees and graphs.

In computability theory and computational complexity theory, an undecidable problem is a decision problem for which it is proved to be impossible to construct an algorithm that always leads to a correct yes-or-no answer. The halting problem is an example: it can be proven that there is no algorithm that correctly determines whether arbitrary programs eventually halt when run.

The input hypothesis, also known as the monitor model, is a group of five hypotheses of second-language acquisition developed by the linguist Stephen Krashen in the 1970s and 1980s. Krashen originally formulated the input hypothesis as just one of the five hypotheses, but over time the term has come to refer to the five hypotheses as a group. The hypotheses are the input hypothesis, the acquisition–learning hypothesis, the monitor hypothesis, the natural order hypothesis and the affective filter hypothesis. The input hypothesis was first published in 1977.

Margin-infused relaxed algorithm (MIRA) is a machine learning algorithm, an online algorithm for multiclass classification problems. It is designed to learn a set of parameters by processing all the given training examples one-by-one and updating the parameters according to each training example, so that the current training example is classified correctly with a margin against incorrect classifications at least as large as their loss. The change of the parameters is kept as small as possible.

The history of natural language processing describes the advances of natural language processing. There is some overlap with the history of machine translation, the history of speech recognition, and the history of artificial intelligence.

The main purpose of theories of second-language acquisition (SLA) is to shed light on how people who already know one language learn a second language. The field of second-language acquisition involves various contributions, such as linguistics, sociolinguistics, psychology, cognitive science, neuroscience, and education. These multiple fields in second-language acquisition can be grouped as four major research strands: (a) linguistic dimensions of SLA, (b) cognitive dimensions of SLA, (c) socio-cultural dimensions of SLA, and (d) instructional dimensions of SLA. While the orientation of each research strand is distinct, they are in common in that they can guide us to find helpful condition to facilitate successful language learning. Acknowledging the contributions of each perspective and the interdisciplinarity between each field, more and more second language researchers are now trying to have a bigger lens on examining the complexities of second language acquisition.

The following outline is provided as an overview of and topical guide to natural-language processing:

This glossary of artificial intelligence is a list of definitions of terms and concepts relevant to the study of artificial intelligence, its sub-disciplines, and related fields. Related glossaries include Glossary of computer science, Glossary of robotics, and Glossary of machine vision.

In language acquisition, negative evidence is information concerning what is not possible in a language. Importantly, negative evidence does not show what is grammatical; that is positive evidence. In theory, negative evidence would help eliminate ungrammatical constructions by revealing what is not grammatical. Direct negative evidence refers to comments made by an adult language-user in response to a learner's ungrammatical utterance. Indirect negative evidence refers to the absence of ungrammatical sentences in the language that the child is exposed to. There is debate among linguists and psychologists about whether negative evidence can help children determine the grammar of their language. Negative evidence, if it is used, could help children rule out ungrammatical constructions in their language.

References

  1. Jain, S. et al (1999): Systems That Learn , 2nd ed. Cambridge, MA: MIT Press.
  2. Langley, P.; Simon, H.; Bradshaw, G. & Zytkow, J. (1987), Scientific Discovery: Computational Explorations of the Creative Processes , MIT Press, Cambridge
  3. Schulte, O. (2009), Simultaneous Discovery of Conservation Laws and Hidden Particles With Smith Matrix Decomposition , in Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence (IJCAI-09), pp. 1481-1487
  4. E. Mark Gold (May 1967). "Language Identification in the Limit". Information and Control . 10 (5): 447–474. doi: 10.1016/S0019-9958(67)91165-5 .
  5. Jain, S. et al (1999): Systems That Learn, 2nd ed. Cambridge, MA: MIT Press.
  6. Luo, W. & Schulte, O. (2005), Mind Change Efficient Learning , in Peter Auer & Ron Meir, ed., Proceedings of the Conference on Learning Theory (COLT), pp. 398-412
  7. Jain, S. and Sharma, A. (1999), On a generalized notion of mistake bounds , Proceedings of the Conference on Learning Theory (COLT), pp.249-256.
  8. Kevin T. Kelly (2007), Ockham’s Razor, Empirical Complexity, and Truth-finding Efficiency , Theoretical Computer Science, 383: 270-289.
  9. Archives of ALT-Workshops and Conferences at Hokkaido University
  10. ALT proceedings page at Springer
  11. ALT'23 home page