Context-sensitive language

Last updated

In formal language theory, a context-sensitive language is a language that can be defined by a context-sensitive grammar (and equivalently by a noncontracting grammar). Context-sensitive is one of the four types of grammars in the Chomsky hierarchy.

Contents

Computational properties

Computationally, a context-sensitive language is equivalent to a linear bounded nondeterministic Turing machine, also called a linear bounded automaton. That is a non-deterministic Turing machine with a tape of only cells, where is the size of the input and is a constant associated with the machine. This means that every formal language that can be decided by such a machine is a context-sensitive language, and every context-sensitive language can be decided by such a machine.

This set of languages is also known as NLINSPACE or NSPACE(O(n)), because they can be accepted using linear space on a non-deterministic Turing machine. [1] The class LINSPACE (or DSPACE(O(n))) is defined the same, except using a deterministic Turing machine. Clearly LINSPACE is a subset of NLINSPACE, but it is not known whether LINSPACE=NLINSPACE. [2]

Examples

One of the simplest context-sensitive but not context-free languages is : the language of all strings consisting of n occurrences of the symbol "a", then n "b"s, then n "c"s (abc, aabbcc, aaabbbccc, etc.). A superset of this language, called the Bach language, [3] is defined as the set of all strings where "a", "b" and "c" (or any other set of three symbols) occurs equally often (aabccb, baabcaccb, etc.) and is also context-sensitive. [4] [5]

L can be shown to be a context-sensitive language by constructing a linear bounded automaton which accepts L. The language can easily be shown to be neither regular nor context-free by applying the respective pumping lemmas for each of the language classes to L.

Similarly:

is another context-sensitive language; the corresponding context-sensitive grammar can be easily projected starting with two context-free grammars generating sentential forms in the formats and and then supplementing them with a permutation production like , a new starting symbol and standard syntactic sugar.

is another context-sensitive language (the "3" in the name of this language is intended to mean a ternary alphabet); that is, the "product" operation defines a context-sensitive language (but the "sum" defines only a context-free language as the grammar and shows). Because of the commutative property of the product, the most intuitive grammar for is ambiguous. This problem can be avoided considering a somehow more restrictive definition of the language, e.g. . This can be specialized to and, from this, to , , etc.

is a context-sensitive language. The corresponding context-sensitive grammar can be obtained as a generalization of the context-sensitive grammars for , , etc.

is a context-sensitive language. [6]

is a context-sensitive language (the "2" in the name of this language is intended to mean a binary alphabet). This was proved by Hartmanis using pumping lemmas for regular and context-free languages over a binary alphabet and, after that, sketching a linear bounded multitape automaton accepting . [7]

is a context-sensitive language (the "1" in the name of this language is intended to mean an unary alphabet). This was credited by A. Salomaa to Matti Soittola by means of a linear bounded automaton over an unary alphabet [8] (pages 213-214, exercise 6.8) and also to Marti Penttonen by means of a context-sensitive grammar also over an unary alphabet (See: Formal Languages by A. Salomaa, page 14, Example 2.5).

An example of recursive language that is not context-sensitive is any recursive language whose decision is an EXPSPACE-hard problem, say, the set of pairs of equivalent regular expressions with exponentiation.

Properties of context-sensitive languages

See also

Related Research Articles

<span class="mw-page-title-main">Chomsky hierarchy</span> Hierarchy of classes of formal grammars

The Chomsky hierarchy in the fields of formal language theory, computer science, and linguistics, is a containment hierarchy of classes of formal grammars. A formal grammar describes how to form strings from a language's vocabulary that are valid according to the language's syntax. Linguist Noam Chomsky theorized that four different classes of formal grammars existed that could generate increasingly complex languages. Each class can also completely generate the language of all inferior classes.

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols. Context-sensitive grammars are more general than context-free grammars, in the sense that there are languages that can be described by a CSG but not by a context-free grammar. Context-sensitive grammars are less general than unrestricted grammars. Thus, CSGs are positioned between context-free and unrestricted grammars in the Chomsky hierarchy.

<span class="mw-page-title-main">Context-free grammar</span> Type of formal grammar

In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules can be applied to a nonterminal symbol regardless of its context. In particular, in a context-free grammar, each production rule is of the form

In formal language theory, a context-free language (CFL) is a language generated by a context-free grammar (CFG).

<span class="mw-page-title-main">Formal language</span> Sequence of words formed by specific rules

In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules called a formal grammar.

<span class="mw-page-title-main">Pushdown automaton</span> Type of automaton

In the theory of computation, a branch of theoretical computer science, a pushdown automaton (PDA) is a type of automaton that employs a stack.

In theoretical computer science and formal language theory, a regular language is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science.

<span class="mw-page-title-main">Automata theory</span> Study of abstract machines and automata

Automata theory is the study of abstract machines and automata, as well as the computational problems that can be solved using them. It is a theory in theoretical computer science with close connections to mathematical logic. The word automata comes from the Greek word αὐτόματος, which means "self-acting, self-willed, self-moving". An automaton is an abstract self-propelled computing device which follows a predetermined sequence of operations automatically. An automaton with a finite number of states is called a Finite Automaton (FA) or Finite-State Machine (FSM). The figure on the right illustrates a finite-state machine, which is a well-known type of automaton. This automaton consists of states and transitions. As the automaton sees a symbol of input, it makes a transition to another state, according to its transition function, which takes the previous state and current input symbol as its arguments.

Sheila Adele Greibach is a researcher in formal languages in computing, automata, compiler theory and computer science. She is an Emeritus Professor of Computer Science at the University of California, Los Angeles, and notable work include working with Seymour Ginsburg and Michael A. Harrison in context-sensitive parsing using the stack automaton model.

In computer science, an ambiguous grammar is a context-free grammar for which there exists a string that can have more than one leftmost derivation or parse tree. Every non-empty context-free language admits an ambiguous grammar by introducing e.g. a duplicate rule. A language that only admits ambiguous grammars is called an inherently ambiguous language. Deterministic context-free grammars are always unambiguous, and are an important subclass of unambiguous grammars; there are non-deterministic unambiguous grammars, however.

<span class="mw-page-title-main">Deterministic finite automaton</span> Finite-state machine

In the theory of computation, a branch of theoretical computer science, a deterministic finite automaton (DFA)—also known as deterministic finite acceptor (DFA), deterministic finite-state machine (DFSM), or deterministic finite-state automaton (DFSA)—is a finite-state machine that accepts or rejects a given string of symbols, by running through a state sequence uniquely determined by the string. Deterministic refers to the uniqueness of the computation run. In search of the simplest models to capture finite-state machines, Warren McCulloch and Walter Pitts were among the first researchers to introduce a concept similar to finite automata in 1943.

In computer science, a linear grammar is a context-free grammar that has at most one nonterminal in the right-hand side of each of its productions.

In computer science, a linear bounded automaton is a restricted form of Turing machine.

Conjunctive grammars are a class of formal grammars studied in formal language theory. They extend the basic type of grammars, the context-free grammars, with a conjunction operation. Besides explicit conjunction, conjunctive grammars allow implicit disjunction represented by multiple rules for a single nonterminal symbol, which is the only logical connective expressible in context-free grammars. Conjunction can be used, in particular, to specify intersection of languages. A further extension of conjunctive grammars known as Boolean grammars additionally allows explicit negation.

In formal language theory, deterministic context-free languages (DCFL) are a proper subset of context-free languages. They are the context-free languages that can be accepted by a deterministic pushdown automaton. DCFLs are always unambiguous, meaning that they admit an unambiguous grammar. There are non-deterministic unambiguous CFLs, so DCFLs form a proper subset of unambiguous CFLs.

An embedded pushdown automaton or EPDA is a computational model for parsing languages generated by tree-adjoining grammars (TAGs). It is similar to the context-free grammar-parsing pushdown automaton, but instead of using a plain stack to store symbols, it has a stack of iterated stacks that store symbols, giving TAGs a generative capacity between context-free and context-sensitive grammars, or a subset of mildly context-sensitive grammars. Embedded pushdown automata should not be confused with nested stack automata which have more computational power.

In mathematics, logic and computer science, a formal language is called recursive if it is a recursive subset of the set of all possible finite sequences over the alphabet of the language. Equivalently, a formal language is recursive if there exists a Turing machine that, when given a finite sequence of symbols as input, always halts and accepts it if it belongs to the language and halts and rejects it otherwise. In Theoretical computer science, such always-halting Turing machines are called total Turing machines or algorithms. Recursive languages are also called decidable.

In computer science, more specifically in automata and formal language theory, nested words are a concept proposed by Alur and Madhusudan as a joint generalization of words, as traditionally used for modelling linearly ordered structures, and of ordered unranked trees, as traditionally used for modelling hierarchical structures. Finite-state acceptors for nested words, so-called nested word automata, then give a more expressive generalization of finite automata on words. The linear encodings of languages accepted by finite nested word automata gives the class of visibly pushdown languages. The latter language class lies properly between the regular languages and the deterministic context-free languages. Since their introduction in 2004, these concepts have triggered much research in that area.

In automata theory, an unambiguous finite automaton (UFA) is a nondeterministic finite automaton (NFA) such that each word has at most one accepting path. Each deterministic finite automaton (DFA) is an UFA, but not vice versa. DFA, UFA, and NFA recognize exactly the same class of formal languages. On the one hand, an NFA can be exponentially smaller than an equivalent DFA. On the other hand, some problems are easily solved on DFAs and not on UFAs. For example, given an automaton A, an automaton A which accepts the complement of A can be computed in linear time when A is a DFA, it is not known whether it can be done in polynomial time for UFA. Hence UFAs are a mix of the worlds of DFA and of NFA; in some cases, they lead to smaller automata than DFA and quicker algorithms than NFA.

State complexity is an area of theoretical computer science dealing with the size of abstract automata, such as different kinds of finite automata. The classical result in the area is that simulating an -state nondeterministic finite automaton by a deterministic finite automaton requires exactly states in the worst case.

References

  1. Rothe, Jörg (2005), Complexity theory and cryptology, Texts in Theoretical Computer Science. An EATCS Series, Berlin: Springer-Verlag, p. 77, ISBN   978-3-540-22147-0, MR   2164257 .
  2. Odifreddi, P. G. (1999), Classical recursion theory. Vol. II, Studies in Logic and the Foundations of Mathematics, vol. 143, Amsterdam: North-Holland Publishing Co., p. 236, ISBN   978-0-444-50205-6, MR   1718169 .
  3. Pullum, Geoffrey K. (1983). Context-freeness and the computer processing of human languages. Proc. 21st Annual Meeting of the ACL.
  4. Bach, E. (1981). "Discontinuous constituents in generalized categorial grammars" Archived 2014-01-21 at the Wayback Machine . NELS, vol. 11, pp. 112.
  5. Joshi, A.; Vijay-Shanker, K.; and Weir, D. (1991). "The convergence of mildly context-sensitive grammar formalisms". In: Sells, P., Shieber, S.M. and Wasow, T. (Editors). Foundational Issues in Natural Language Processing. Cambridge MA: Bradford.
  6. Example 9.5 (p. 224) of Hopcroft, John E.; Ullman, Jeffrey D. (1979). Introduction to Automata Theory, Languages, and Computation. Addison-Wesley
  7. J. Hartmanis and H. Shank (Jul 1968). "On the Recognition of Primes by Automata" (PDF). Journal of the ACM . 15 (3): 382–389. doi:10.1145/321466.321470. hdl: 1813/5864 . S2CID   17998039.
  8. Salomaa, Arto (1969), Theory of Automata, ISBN   978-0-08-013376-8, Pergamon, 276 pages. doi : 10.1016/C2013-0-02221-9
  9. John E. Hopcroft; Jeffrey D. Ullman (1979). Introduction to Automata Theory, Languages, and Computation . Addison-Wesley. ISBN   9780201029888.; Exercise 9.10, p.230. In the 2000 edition, the chapter on context-sensitive languages has been omitted.
  10. Immerman, Neil (1988). "Nondeterministic space is closed under complementation" (PDF). SIAM J. Comput. 17 (5): 935–938. CiteSeerX   10.1.1.54.5941 . doi:10.1137/0217058. Archived (PDF) from the original on 2004-06-25.