Language identification in the limit

Last updated

Language identification in the limit is a formal model for inductive inference of formal languages, mainly by computers (see machine learning and induction of regular languages). It was introduced by E. Mark Gold in a technical report [1] and a journal article [2] with the same title.

Contents

In this model, a teacher provides to a learner some presentation (i.e. a sequence of strings) of some formal language. The learning is seen as an infinite process. Each time the learner reads an element of the presentation, it should provide a representation (e.g. a formal grammar) for the language.

Gold defines that a learner can identify in the limit a class of languages if, given any presentation of any language in the class, the learner will produce only a finite number of wrong representations, and then stick with the correct representation. However, the learner need not be able to announce its correctness; and the teacher might present a counterexample to any representation arbitrarily long after.

Gold defined two types of presentations:

Learnability

This model is an early attempt to formally capture the notion of learnability. Gold's journal article [3] introduces for contrast the stronger models

A weaker formal model of learnability is the Probably approximately correct learning (PAC) model, introduced by Leslie Valiant in 1984.

Examples

4. Complete presentation
by request
TeacherLearner's
GuessQuery
0.abab
1.yesababbaba
2.yesa*(ba)*b*aa
3.no(ab)*(ba)*(ab)*(ba)*bababa
4.yes(ab+ba)*babb
5.no(ab+ba)*baaa
......
3. Complete presentation
by telling
TeacherLearner
1.abababab
2.babaa*(ba)*b*
3.aa(ab)*(ba)*(ab)*(ba)*
4.bababa(ab+ba)*
5.babb(ab+ba)*
6.baaa(ab+ba)*
7.ε(ab+ba)*
......
2. Union-guessing
 
TeacherLearner
1.abababab
2.baabab+ba
3.babaabab+ba+baba
4.baabab+ba+baba
5.babaabab+ba+baba
6.abababab+ba+baba
7.εabab+ba+baba
......
1. Text presentation
 
TeacherLearner
1.abababab
2.babaabab+baba
3.baabab(b+ε)(ab)*
4.baabab(b+ε)(ab)*+baabab
5.abbaabba(ab)*(ba)*(ab)*(ba)*
6.baabbaab(ab+ba)*
7.bababa(ab+ba)*
......

It is instructive to look at concrete examples (in the tables) of learning sessions the definition of identification in the limit speaks about.

  1. A fictitious session to learn a regular language L over the alphabet {a,b} from text presentation:
    In each step, the teacher gives a string belonging to L, and the learner answers a guess for L, encoded as a regular expression. [note 1] In step 3, the learner's guess is not consistent with the strings seen so far; in step 4, the teacher gives a string repeatedly. After step 6, the learner sticks to the regular expression (ab+ba)*. If this happens to be a description of the language L the teacher has in mind, it is said that the learner has learned that language.
    If a computer program for the learner's role would exist that was able to successfully learn each regular language, that class of languages would be identifiable in the limit. Gold has shown that this is not the case. [4]
  2. A particular learning algorithm always guessingL to be just the union of all strings seen so far:
    If L is a finite language, the learner will eventually guess it correctly, however, without being able to tell when. Although the guess didn't change during step 3 to 6, the learner couldn't be sure to be correct.
    Gold has shown that the class of finite languages is identifiable in the limit, [5] however, this class is neither finitely nor fixed-time identifiable.
  3. Learning from complete presentation by telling:
    In each step, the teacher gives a string and tells whether it belongs to L (green) or not (red, struck-out). Each possible string is eventually classified in this way by the teacher.
  4. Learning from complete presentation by request:
    The learner gives a query string, the teacher tells whether it belongs to L (yes) or not (no); the learner then gives a guess for L, followed by the next query string. In this example, the learner happens to query in each step just the same string as given by the teacher in example 3.
    In general, Gold has shown that each language class identifiable in the request-presentation setting is also identifiable in the telling-presentation setting, [6] since the learner, instead of querying a string, just needs to wait until it is eventually given by the teacher.

Gold's theorem

More formally, [7]

Notes:

Gold's theorem (1967) (Theorem I.8 of (Gold, 1967))  If a language family contains , such that

and , then it is not learnable.

Proof

Suppose is a learner that can learn , then we show it cannot learn , by constructing an environment for that "tricks" .

First, construct environments for languages .

Next, construct environment for inductively as follows:

By construction, the resulting environment contains the entirety of , thus it contains , so it is an environment for . Since the learner always switches to for some finite , it never converges to .

Gold's theorem is easily bypassed if negative examples are allowed. In particular, the language family can be learned by a learner that always guesses until it receives the first negative example , where , at which point it always guesses .

Learnability characterization

Dana Angluin gave the characterizations of learnability from text (positive information) in a 1980 paper. [8] If a learner is required to be effective, then an indexed class of recursive languages is learnable in the limit if there is an effective procedure that uniformly enumerates tell-tales for each language in the class (Condition 1). [9] It is not hard to see that if an ideal learner (i.e., an arbitrary function) is allowed, then an indexed class of languages is learnable in the limit if each language in the class has a tell-tale (Condition 2). [10]

Language classes learnable in the limit

Dividing lines between identifiable and nonidentifiable language classes [11]
Learnability modelClass of languages
Anomalous text presentation [note 2]
Recursively enumerable
Recursive
Complete presentation
Primitive recursive [note 3]
Context-sensitive
Context-free
Regular
Superfinite [note 4]
Normal text presentation [note 5]
Finite
Singleton [note 6]

The table shows which language classes are identifiable in the limit in which learning model. On the right-hand side, each language class is a superclass of all lower classes. Each learning model (i.e. type of presentation) can identify in the limit all classes below it. In particular, the class of finite languages is identifiable in the limit by text presentation (cf. Example 2 above), while the class of regular languages is not.

Pattern Languages , introduced by Dana Angluin in another 1980 paper, [12] are also identifiable by normal text presentation; they are omitted in the table, since they are above the singleton and below the primitive recursive language class, but incomparable to the classes in between. [note 7] [ clarification needed ]

Sufficient conditions for learnability

Condition 1 in Angluin's paper [9] is not always easy to verify. Therefore, people come up with various sufficient conditions for the learnability of a language class. See also Induction of regular languages for learnable subclasses of regular languages.

Finite thickness

A class of languages has finite thickness if every non-empty set of strings is contained in at most finitely many languages of the class. This is exactly Condition 3 in Angluin's paper. [13] Angluin showed that if a class of recursive languages has finite thickness, then it is learnable in the limit. [14]

A class with finite thickness certainly satisfies MEF-condition and MFF-condition; in other words, finite thickness implies M-finite thickness. [15]

Finite elasticity

A class of languages is said to have finite elasticity if for every infinite sequence of strings and every infinite sequence of languages in the class , there exists a finite number n such that implies is inconsistent with . [16]

It is shown that a class of recursively enumerable languages is learnable in the limit if it has finite elasticity.

Mind change bound

A bound over the number of hypothesis changes that occur before convergence.

Other concepts

Infinite cross property

A language L has infinite cross property within a class of languages if there is an infinite sequence of distinct languages in and a sequence of finite subset such that:

Note that L is not necessarily a member of the class of language.

It is not hard to see that if there is a language with infinite cross property within a class of languages, then that class of languages has infinite elasticity.

Relations between concepts

Open questions

Notes

  1. "A+B" contains all strings that are in A or in B; "AB" contains all concatenations of a string in A with a string in B; "A*" contains all repetitions (zero or more times) of strings in A; "ε" denotes the empty string; "a" and "b" denote themselves. For example, the expression "(ab+ba)*" in step 7 denotes the infinite set { ε, ab, ba, abab, abba, baab, baba, ababab, ababba, ... }.
  2. i.e. text presentation, where the string given by the teacher is a primitive recursive function of the current step number, and the learner encodes a language guess as a program that enumerates the language
  3. i.e. the class of languages that are decidable by primitive recursive functions
  4. i.e. containing all finite languages and at least one infinite one
  5. i.e. text presentation, except for the anomalous text presentation setting
  6. i.e. the class of languages consisting of a single string (they are mentioned here only as a common lower bound to finite languages and pattern languages)
  7. incomparable to regular and to context-free language class: Theorem 3.10, p.53

Related Research Articles

<span class="mw-page-title-main">Formal language</span> Sequence of words formed by specific rules

In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules.

In mathematics, a series is, roughly speaking, a description of the operation of adding infinitely many quantities, one after the other, to a given starting quantity. The study of series is a major part of calculus and its generalization, mathematical analysis. Series are used in most areas of mathematics, even for studying finite structures through generating functions. In addition to their ubiquity in mathematics, infinite series are also widely used in other quantitative disciplines such as physics, computer science, statistics and finance.

In theoretical computer science and formal language theory, a regular language is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science.

In computability theory, Rice's theorem states that all non-trivial semantic properties of programs are undecidable. A semantic property is one about the program's behavior, unlike a syntactic property. A property is non-trivial if it is neither true for every partial computable function, nor false for every partial computable function.

<span class="mw-page-title-main">Sequence</span> Finite or infinite ordered list of elements

In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members. The number of elements is called the length of the sequence. Unlike a set, the same elements can appear multiple times at different positions in a sequence, and unlike a set, the order does matter. Formally, a sequence can be defined as a function from natural numbers to the elements at each position. The notion of a sequence can be generalized to an indexed family, defined as a function from an arbitrary index set.

<span class="mw-page-title-main">Bernoulli process</span> Random process of binary (boolean) random variables

In probability and statistics, a Bernoulli process is a finite or infinite sequence of binary random variables, so it is a discrete-time stochastic process that takes only two values, canonically 0 and 1. The component Bernoulli variablesXi are identically distributed and independent. Prosaically, a Bernoulli process is a repeated coin flipping, possibly with an unfair coin. Every variable Xi in the sequence is associated with a Bernoulli trial or experiment. They all have the same Bernoulli distribution. Much of what can be said about the Bernoulli process can also be generalized to more than two outcomes ; this generalization is known as the Bernoulli scheme.

In mathematics, a real number is said to be simply normal in an integer base b if its infinite sequence of digits is distributed uniformly in the sense that each of the b digit values has the same natural density 1/b. A number is said to be normal in base b if, for every positive integer n, all possible strings n digits long have density bn.

<span class="mw-page-title-main">Gabriel's horn</span> Geometric figure which has infinite surface area but finite volume

A Gabriel's horn is a type of geometric figure that has infinite surface area but finite volume. The name refers to the Christian tradition where the archangel Gabriel blows the horn to announce Judgment Day. The properties of this figure were first studied by Italian physicist and mathematician Evangelista Torricelli in the 17th century.

In the theory of formal languages, the Myhill–Nerode theorem provides a necessary and sufficient condition for a language to be regular. The theorem is named for John Myhill and Anil Nerode, who proved it at the University of Chicago in 1958.

Solomonoff's theory of inductive inference is a mathematical proof that if a universe is generated by an algorithm, then observations of that universe, encoded as a dataset, are best predicted by the smallest executable archive of that dataset. This formalization of Occam's razor for induction was introduced by Ray Solomonoff, based on probability theory and theoretical computer science. In essence, Solomonoff's induction derives the posterior probability of any computable theory, given a sequence of observed data. This posterior probability is derived from Bayes rule and some universal prior, that is, a prior that assigns a positive probability to any computable theory.

Computability is the ability to solve a problem in an effective manner. It is a key topic of the field of computability theory within mathematical logic and the theory of computation within computer science. The computability of a problem is closely linked to the existence of an algorithm to solve the problem.

In formal language theory, in particular in algorithmic learning theory, a class C of languages has finite thickness if every string is contained in at most finitely many languages in C. This condition was introduced by Dana Angluin as a sufficient condition for C being identifiable in the limit.

Finite model theory is a subarea of model theory. Model theory is the branch of logic which deals with the relation between a formal language (syntax) and its interpretations (semantics). Finite model theory is a restriction of model theory to interpretations on finite structures, which have a finite universe.

In theoretical computer science and mathematical logic a string rewriting system (SRS), historically called a semi-Thue system, is a rewriting system over strings from a alphabet. Given a binary relation between fixed strings over the alphabet, called rewrite rules, denoted by , an SRS extends the rewriting relation to all strings in which the left- and right-hand side of the rules appear as substrings, that is , where , , , and are strings.

In formal language theory, an alphabet is a non-empty set of symbols/glyphs, typically thought of as representing letters, characters, or digits but among other possibilities the "symbols" could also be a set of phonemes. Alphabets in this technical sense of a set are used in a diverse range of fields including logic, mathematics, computer science, and linguistics. An alphabet may have any cardinality ("size") and depending on its purpose maybe be finite, countable, or even uncountable.

<span class="mw-page-title-main">Grammar induction</span>

Grammar induction is the process in machine learning of learning a formal grammar from a set of observations, thus constructing a model which accounts for the characteristics of the observed objects. More generally, grammatical inference is that branch of machine learning where the instance space consists of discrete combinatorial objects such as strings, trees and graphs.

In theoretical computer science, a pattern language is a formal language that can be defined as the set of all particular instances of a string of constants and variables. Pattern Languages were introduced by Dana Angluin in the context of machine learning.

In computational learning theory, induction of regular languages refers to the task of learning a formal description of a regular language from a given set of example strings. Although E. Mark Gold has shown that not every regular language can be learned this way, approaches have been investigated for a variety of subclasses. They are sketched in this article. For learning of more general grammars, see Grammar induction.

In theoretical computer science, in particular in formal language theory, the Brzozowski derivative of a set of strings and a string is the set of all strings obtainable from a string in by cutting off the prefix , as illustrated in the figure. Formally: .

<span class="mw-page-title-main">Occam learning</span> Model of algorithmic learning

In computational learning theory, Occam learning is a model of algorithmic learning where the objective of the learner is to output a succinct representation of received training data. This is closely related to probably approximately correct (PAC) learning, where the learner is evaluated on its predictive power of a test set.

References

  1. Gold, E. Mark (1964). Language identification in the limit (RAND Research Memorandum RM-4136-PR). RAND Corporation.
  2. Gold, E. Mark (May 1967). "Language identification in the limit" (PDF). Information and Control. 10 (5): 447–474. doi: 10.1016/S0019-9958(67)91165-5 .
  3. p.457
  4. Theorem I.8,I.9, p.470-471
  5. Theorem I.6, p.469
  6. Theorem I.3, p.467
  7. Johnson, Kent (October 2004). "Gold's Theorem and Cognitive Science". Philosophy of Science. 71 (4): 571–592. doi:10.1086/423752. ISSN   0031-8248. S2CID   5589573.
  8. Dana Angluin (1980). "Inductive Inference of Formal Languages from Positive Data" (PDF). Information and Control. 45 (2): 117–135. doi: 10.1016/S0019-9958(80)90285-5 .
  9. 1 2 p.121 top
  10. p.123 top
  11. Table 1, p.452, in (Gold 1967)
  12. Dana Angluin (1980). "Finding Patterns Common to a Set of Strings". Journal of Computer and System Sciences. 21: 46–62. doi: 10.1016/0022-0000(80)90041-0 .
  13. p.123 mid
  14. p.123 bot, Corollary 2
  15. 1 2 Andris Ambainis; Sanjay Jain; Arun Sharma (1997). "Ordinal mind change complexity of language identification" (PDF). Computational Learning Theory. LNCS. Vol. 1208. Springer. pp. 301–315.; here: Proof of Corollary 29
  16. 1 2 Motoki, Shinohara, and Wright (1991) "The correct definition of finite elasiticity: corrigendum to identification of unions", Proc. 4th Workshop on Computational Learning Theory, 375-375
  17. Wright, Keith (1989) "Identification of Unions of Languages Drawn from an Identifiable Class". Proc. 2nd Workwhop on Computational Learning Theory, 328-333; with correction in: [16]