Deterministic context-free language

Last updated

In formal language theory, deterministic context-free languages (DCFL) are a proper subset of context-free languages. They are the context-free languages that can be accepted by a deterministic pushdown automaton. DCFLs are always unambiguous, meaning that they admit an unambiguous grammar. There are non-deterministic unambiguous CFLs, so DCFLs form a proper subset of unambiguous CFLs.

Contents

DCFLs are of great practical interest, as they can be parsed in linear time, and various restricted forms of DCFGs admit simple practical parsers. They are thus widely used throughout computer science.

Description

The notion of the DCFL is closely related to the deterministic pushdown automaton (DPDA). It is where the language power of pushdown automata is reduced to if we make them deterministic; the pushdown automata become unable to choose between different state-transition alternatives and as a consequence cannot recognize all context-free languages. [1] Unambiguous grammars do not always generate a DCFL. For example, the language of even-length palindromes on the alphabet of 0 and 1 has the unambiguous context-free grammar S → 0S0 | 1S1 | ε. An arbitrary string of this language cannot be parsed without reading all its letters first, which means that a pushdown automaton has to try alternative state transitions to accommodate for the different possible lengths of a semi-parsed string. [2]

Properties

Deterministic context-free languages can be recognized by a deterministic Turing machine in polynomial time and O(log2n) space; as a corollary, DCFL is a subset of the complexity class SC . [3]

The set of deterministic context-free languages is closed under the following operations: [4]

The set of deterministic context-free language is not closed under the following operations: [4]

Importance

The languages of this class have great practical importance in computer science as they can be parsed much more efficiently than nondeterministic context-free languages. The complexity of the program and execution time of a deterministic pushdown automaton is vastly less than that of a nondeterministic one. In the naive implementation, the latter must make copies of the stack every time a nondeterministic step occurs. The best known algorithm to test membership in any context-free language is Valiant's algorithm, taking O(n2.378) time, where n is the length of the string. On the other hand, deterministic context-free languages can be accepted in O(n) time by an LR(k) parser. [5] This is very important for computer language translation because many computer languages belong to this class of languages.

See also

Related Research Articles

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols. Context-sensitive grammars are more general than context-free grammars, in the sense that there are languages that can be described by CSG but not by context-free grammars. Context-sensitive grammars are less general than unrestricted grammars. Thus, CSG are positioned between context-free and unrestricted grammars in the Chomsky hierarchy.

In formal language theory, a context-sensitive language is a language that can be defined by a context-sensitive grammar. Context-sensitive is one of the four types of grammars in the Chomsky hierarchy.

<span class="mw-page-title-main">Context-free grammar</span> Type of formal grammar

In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form

In formal language theory, a context-free language (CFL) is a language generated by a context-free grammar (CFG).

<span class="mw-page-title-main">Finite-state machine</span> Mathematical model of computation

A finite-state machine (FSM) or finite-state automaton, finite automaton, or simply a state machine, is a mathematical model of computation. It is an abstract machine that can be in exactly one of a finite number of states at any given time. The FSM can change from one state to another in response to some inputs; the change from one state to another is called a transition. An FSM is defined by a list of its states, its initial state, and the inputs that trigger each transition. Finite-state machines are of two types—deterministic finite-state machines and non-deterministic finite-state machines. A deterministic finite-state machine can be constructed equivalent to any non-deterministic one.

<span class="mw-page-title-main">Pushdown automaton</span> Type of automaton

In the theory of computation, a branch of theoretical computer science, a pushdown automaton (PDA) is a type of automaton that employs a stack.

In theoretical computer science and formal language theory, a regular language is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science.

In theoretical computer science and formal language theory, a regular grammar is a grammar that is right-regular or left-regular. While their exact definition varies from textbook to textbook, they all require that

Sheila Adele Greibach is a researcher in formal languages in computing, automata, compiler theory and computer science. She is an Emeritus Professor of Computer Science at the University of California, Los Angeles, and notable work include working with Seymour Ginsburg and Michael A. Harrison in context-sensitive parsing using the stack automaton model.

In computer science, an ambiguous grammar is a context-free grammar for which there exists a string that can have more than one leftmost derivation or parse tree, while an unambiguous grammar is a context-free grammar for which every valid string has a unique leftmost derivation or parse tree. Many languages admit both ambiguous and unambiguous grammars, while some languages admit only ambiguous grammars. Any non-empty language admits an ambiguous grammar by taking an unambiguous grammar and introducing a duplicate rule or synonym. A language that only admits ambiguous grammars is called an inherently ambiguous language, and there are inherently ambiguous context-free languages. Deterministic context-free grammars are always unambiguous, and are an important subclass of unambiguous grammars; there are non-deterministic unambiguous grammars, however.

In automata theory, a finite-state machine is called a deterministic finite automaton (DFA), if

In the theory of computation and automata theory, the powerset construction or subset construction is a standard method for converting a nondeterministic finite automaton (NFA) into a deterministic finite automaton (DFA) which recognizes the same formal language. It is important in theory because it establishes that NFAs, despite their additional flexibility, are unable to recognize any language that cannot be recognized by some DFA. It is also important in practice for converting easier-to-construct NFAs into more efficiently executable DFAs. However, if the NFA has n states, the resulting DFA may have up to 2n states, an exponentially larger number, which sometimes makes the construction impractical for large NFAs.

In computer science, a linear grammar is a context-free grammar that has at most one nonterminal in the right-hand side of each of its productions.

In automata theory, a deterministic pushdown automaton is a variation of the pushdown automaton. The class of deterministic pushdown automata accepts the deterministic context-free languages, a proper subset of context-free languages.

In computer science, in particular in automata theory, a two-way finite automaton is a finite automaton that is allowed to re-read its input.

In formal language theory, an alphabet is a non-empty set of symbols/glyphs, typically thought of as representing letters, characters, or digits but among other possibilities the "symbols" could also be a set of phonemes. Alphabets in this technical sense of a set are used in a diverse range of fields including logic, mathematics, computer science, and linguistics. An alphabet may have any cardinality ("size") and depending on its purpose maybe be finite, countable, or even uncountable.

In formal grammar theory, the deterministic context-free grammars (DCFGs) are a proper subset of the context-free grammars. They are the subset of context-free grammars that can be derived from deterministic pushdown automata, and they generate the deterministic context-free languages. DCFGs are always unambiguous, and are an important subclass of unambiguous CFGs; there are non-deterministic unambiguous CFGs, however.

<span class="mw-page-title-main">Counter automaton</span>

In computer science, more particular in the theory of formal languages, a counter automaton, or counter machine, is a pushdown automaton with only two symbols, and the initial symbol in , the finite set of stack symbols.

In computer science, more specifically in automata and formal language theory, nested words are a concept proposed by Alur and Madhusudan as a joint generalization of words, as traditionally used for modelling linearly ordered structures, and of ordered unranked trees, as traditionally used for modelling hierarchical structures. Finite-state acceptors for nested words, so-called nested word automata, then give a more expressive generalization of finite automata on words. The linear encodings of languages accepted by finite nested word automata gives the class of visibly pushdown languages. The latter language class lies properly between the regular languages and the deterministic context-free languages. Since their introduction in 2004, these concepts have triggered much research in that area.

<span class="mw-page-title-main">LL grammar</span> Type of a context-free grammar

In formal language theory, an LL grammar is a context-free grammar that can be parsed by an LL parser, which parses the input from Left to right, and constructs a Leftmost derivation of the sentence. A language that has an LL grammar is known as an LL language. These form subsets of deterministic context-free grammars (DCFGs) and deterministic context-free languages (DCFLs), respectively. One says that a given grammar or language "is an LL grammar/language" or simply "is LL" to indicate that it is in this class.

References

  1. Hopcroft, John; Jeffrey Ullman (1979). Introduction to automata theory, languages, and computation . Addison-Wesley. p. 233.
  2. Hopcroft, John; Rajeev Motwani; Jeffrey Ullman (2001). Introduction to automata theory, languages, and computation 2nd edition . Addison-Wesley. pp. 249–253.
  3. Cook, Stephen A. (April 30 – May 2, 1979). "Deterministic CFL's are accepted simultaneously in polynomial time and log squared space". Proceedings of the eleventh annual ACM Symposium on Theory of Computing - STOC '79. Atlanta. pp. 338–345. doi:10.1145/800135.804426.
  4. 1 2 Hoogeboom, Hendrik; Engelfriet, Joost (2004). Formal Languages and Applications. Springer-Verlag Berlin Heidelberg. p. 128. ISBN   978-3-642-53554-3.
  5. Knuth, D. E. (July 1965). "On the translation of languages from left to right". Information and Control . 8 (6): 607–639. doi: 10.1016/S0019-9958(65)90426-2 .