Indexed language

Last updated

Indexed languages are a class of formal languages discovered by Alfred Aho; [1] they are described by indexed grammars and can be recognized by nested stack automata. [2]

Contents

Indexed languages are a proper subset of context-sensitive languages. [1] They qualify as an abstract family of languages (furthermore a full AFL) and hence satisfy many closure properties. However, they are not closed under intersection or complement. [1]

The class of indexed languages has practical importance in natural language processing as a computationally affordable[ citation needed ] generalization of context-free languages, since indexed grammars can describe many of the nonlocal constraints occurring in natural languages.

Gerald Gazdar (1988) [3] and Vijay-Shanker (1987) [4] introduced a mildly context-sensitive language class now known as linear indexed grammars (LIG). [5] Linear indexed grammars have additional restrictions relative to IG. LIGs are weakly equivalent (generate the same language class) as tree adjoining grammars. [6]

Examples

The following languages are indexed, but are not context-free:

[3]
[2]

These two languages are also indexed, but are not even mildly context sensitive under Gazdar's characterization:

[2]
[3]

On the other hand, the following language is not indexed: [7]

Properties

Hopcroft and Ullman tend to consider indexed languages as a "natural" class, since they are generated by several formalisms, such as: [9]

Hayashi [14] generalized the pumping lemma to indexed grammars. Conversely, Gilman [7] gives a "shrinking lemma" for indexed languages.

See also

Related Research Articles

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols. Context-sensitive grammars are more general than context-free grammars, in the sense that there are languages that can be described by a CSG but not by a context-free grammar. Context-sensitive grammars are less general than unrestricted grammars. Thus, CSGs are positioned between context-free and unrestricted grammars in the Chomsky hierarchy.

In formal language theory, a context-sensitive language is a language that can be defined by a context-sensitive grammar. Context-sensitive is one of the four types of grammars in the Chomsky hierarchy.

<span class="mw-page-title-main">Context-free grammar</span> Type of formal grammar

In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules can be applied to a nonterminal symbol regardless of its context. In particular, in a context-free grammar, each production rule is of the form

In formal language theory, a context-free language (CFL) is a language generated by a context-free grammar (CFG).

In formal language theory, a context-free grammar, G, is said to be in Chomsky normal form if all of its production rules are of the form:

<span class="mw-page-title-main">Pushdown automaton</span> Type of automaton

In the theory of computation, a branch of theoretical computer science, a pushdown automaton (PDA) is a type of automaton that employs a stack.

In theoretical computer science and formal language theory, a regular language is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science.

<span class="mw-page-title-main">Alfred Aho</span> Canadian computer scientist

Alfred Vaino Aho is a Canadian computer scientist best known for his work on programming languages, compilers, and related algorithms, and his textbooks on the art and science of computer programming.

Sheila Adele Greibach is an American researcher in formal languages in computing, automata, compiler theory and computer science. She is an Emeritus Professor of Computer Science at the University of California, Los Angeles, and notable work include working with Seymour Ginsburg and Michael A. Harrison in context-sensitive parsing using the stack automaton model.

In computer science, an ambiguous grammar is a context-free grammar for which there exists a string that can have more than one leftmost derivation or parse tree. Every non-empty context-free language admits an ambiguous grammar by introducing e.g. a duplicate rule. A language that only admits ambiguous grammars is called an inherently ambiguous language. Deterministic context-free grammars are always unambiguous, and are an important subclass of unambiguous grammars; there are non-deterministic unambiguous grammars, however.

In computer science, a linear grammar is a context-free grammar that has at most one nonterminal in the right-hand side of each of its productions.

In the theory of formal languages, Ogden's lemma is a generalization of the pumping lemma for context-free languages.

Seymour Ginsburg was an American pioneer of automata theory, formal language theory, and database theory, in particular; and computer science, in general. His work was influential in distinguishing theoretical Computer Science from the disciplines of Mathematics and Electrical Engineering.

In formal language theory, deterministic context-free languages (DCFL) are a proper subset of context-free languages. They are the context-free languages that can be accepted by a deterministic pushdown automaton. DCFLs are always unambiguous, meaning that they admit an unambiguous grammar. There are non-deterministic unambiguous CFLs, so DCFLs form a proper subset of unambiguous CFLs.

Indexed grammars are a generalization of context-free grammars in that nonterminals are equipped with lists of flags, or index symbols. The language produced by an indexed grammar is called an indexed language.

An embedded pushdown automaton or EPDA is a computational model for parsing languages generated by tree-adjoining grammars (TAGs). It is similar to the context-free grammar-parsing pushdown automaton, but instead of using a plain stack to store symbols, it has a stack of iterated stacks that store symbols, giving TAGs a generative capacity between context-free and context-sensitive grammars, or a subset of mildly context-sensitive grammars. Embedded pushdown automata should not be confused with nested stack automata which have more computational power.

In computer science, more specifically in automata and formal language theory, nested words are a concept proposed by Alur and Madhusudan as a joint generalization of words, as traditionally used for modelling linearly ordered structures, and of ordered unranked trees, as traditionally used for modelling hierarchical structures. Finite-state acceptors for nested words, so-called nested word automata, then give a more expressive generalization of finite automata on words. The linear encodings of languages accepted by finite nested word automata gives the class of visibly pushdown languages. The latter language class lies properly between the regular languages and the deterministic context-free languages. Since their introduction in 2004, these concepts have triggered much research in that area.

<span class="mw-page-title-main">LL grammar</span> Type of a context-free grammar

In formal language theory, an LL grammar is a context-free grammar that can be parsed by an LL parser, which parses the input from Left to right, and constructs a Leftmost derivation of the sentence. A language that has an LL grammar is known as an LL language. These form subsets of deterministic context-free grammars (DCFGs) and deterministic context-free languages (DCFLs), respectively. One says that a given grammar or language "is an LL grammar/language" or simply "is LL" to indicate that it is in this class.

In linguistics, cross-serial dependencies occur when the lines representing the dependency relations between two series of words cross over each other. They are of particular interest to linguists who wish to determine the syntactic structure of natural language; languages containing an arbitrary number of them are non-context-free. By this fact, Dutch and Swiss-German have been proven to be non-context-free.

In computational linguistics, the term mildly context-sensitive grammar formalisms refers to several grammar formalisms that have been developed in an effort to provide adequate descriptions of the syntactic structure of natural language.

References

  1. 1 2 3 4 Aho, Alfred (1968). "Indexed grammars—an extension of context-free grammars". Journal of the ACM . 15 (4): 647–671. doi: 10.1145/321479.321488 . S2CID   9539666.
  2. 1 2 3 Partee, Barbara; ter Meulen, Alice; Wall, Robert E. (1990). Mathematical Methods in Linguistics. Kluwer Academic Publishers. pp. 536–542. ISBN   978-90-277-2245-4.
  3. 1 2 3 Gazdar, Gerald (1988). "Applicability of Indexed Grammars to Natural Languages". In Reyle, U.; Rohrer, C. (eds.). Natural Language Parsing and Linguistic Theories. Studies in Linguistics and Philosophy. Vol. 35. Springer Netherlands. pp. 69–94. doi:10.1007/978-94-009-1337-0_3. ISBN   978-94-009-1337-0.
  4. Vijayashanker, K. (1987). A study of tree adjoining grammars (Thesis). ProQuest   303610666.
  5. Kallmeyer, Laura (2010). Parsing Beyond Context-Free Grammars. Springer. p. 31. ISBN   978-3-642-14846-0.
  6. Kallmeyer, Laura (16 August 2010). Parsing Beyond Context-Free Grammars. Springer. p. 32. ISBN   978-3-642-14846-0.
  7. 1 2 Gilman, Robert H. (1996). "A Shrinking Lemma for Indexed Languages". Theoretical Computer Science. 163 (1–2): 277–281. arXiv: math/9509205 . doi:10.1016/0304-3975(96)00244-7. S2CID   14479068.
  8. Hopcroft, John; Ullman, Jeffrey (1979). Introduction to automata theory, languages, and computation. Addison-Wesley. p. 390. ISBN   978-0-201-02988-8.
  9. Introduction to automata theory, languages, and computation, [8] Bibliographic notes, p.394-395
  10. Aho, Alfred V. (July 1969). "Nested Stack Automata". Journal of the ACM. 16 (3): 383–406. doi: 10.1145/321526.321529 . S2CID   685569.
  11. Fischer, Michael J. (October 1968). "Grammars with macro-like productions". 9th Annual Symposium on Switching and Automata Theory (Swat 1968). 9th Annual Symposium on Switching and Automata Theory (swat 1968). pp. 131–142. doi:10.1109/SWAT.1968.12.
  12. Greibach, Sheila A. (March 1970). "Full AFLs and nested iterated substitution". Information and Control. 16 (1): 7–35. doi:10.1016/s0019-9958(70)80039-0.
  13. Maibaum, T.S.E. (June 1974). "A generalized approach to formal languages". Journal of Computer and System Sciences. 8 (3): 409–439. doi: 10.1016/s0022-0000(74)80031-0 .
  14. Hayashi, Takeshi (1973). "On derivation trees of indexed grammars: an extension of the {$uvwxy$}-theorem". Publications of the Research Institute for Mathematical Sciences. 9 (1): 61–92. doi: 10.2977/prims/1195192738 .