Deterministic context-free grammar

Last updated

In formal grammar theory, the deterministic context-free grammars (DCFGs) are a proper subset of the context-free grammars. They are the subset of context-free grammars that can be derived from deterministic pushdown automata, and they generate the deterministic context-free languages. DCFGs are always unambiguous, and are an important subclass of unambiguous CFGs; there are non-deterministic unambiguous CFGs, however.

Contents

DCFGs are of great practical interest, as they can be parsed in linear time and in fact a parser can be automatically generated from the grammar by a parser generator. They are thus widely used throughout computer science. Various restricted forms of DCFGs can be parsed by simpler, less resource-intensive parsers, and thus are often used. These grammar classes are referred to by the type of parser that parses them, and important examples are LALR, SLR, and LL.

History

In the 1960s, theoretical research in computer science on regular expressions and finite automata led to the discovery that context-free grammars are equivalent to nondeterministic pushdown automata. [1] [2] [3] These grammars were thought to capture the syntax of computer programming languages. The first high-level computer programming languages were under development at the time (see History of programming languages) and writing compilers was difficult. But using context-free grammars to help automate the parsing part of the compiler simplified the task. Deterministic context-free grammars were particularly useful because they could be parsed sequentially by a deterministic pushdown automaton, which was a requirement due to computer memory constraints. [4] In 1965, Donald Knuth invented the LR(k) parser and proved that there exists an LR(k) grammar for every deterministic context-free language. [5] This parser still required a lot of memory. In 1969 Frank DeRemer invented the LALR and Simple LR parsers, both based on the LR parser and having greatly reduced memory requirements at the cost of less language recognition power. The LALR parser was the stronger alternative. [6] These two parsers have since been widely used in compilers of many computer languages. Recent research has identified methods by which canonical LR parsers may be implemented with dramatically reduced table requirements over Knuth's table-building algorithm. [7]

See also

Related Research Articles

Context-free grammar Type of formal grammar

In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form

In formal language theory, a context-free language (CFL) is a language generated by a context-free grammar (CFG).

In computer science, an LALR parser or Look-Ahead LR parser is a simplified version of a canonical LR parser, to parse a text according to a set of production rules specified by a formal grammar for a computer language.

In computer science, LR parsers are a type of bottom-up parser that analyse deterministic context-free languages in linear time. There are several variants of LR parsers: SLR parsers, LALR parsers, Canonical LR(1) parsers, Minimal LR(1) parsers, and GLR parsers. LR parsers can be generated by a parser generator from a formal grammar defining the syntax of the language to be parsed. They are widely used for the processing of computer languages.

In computer science, a Simple LR or SLR parser is a type of LR parser with small parse tables and a relatively simple parser generator algorithm. As with other types of LR(1) parser, an SLR parser is quite efficient at finding the single correct bottom-up parse in a single left-to-right scan over the input stream, without guesswork or backtracking. The parser is mechanically generated from a formal grammar for the language.

In computer science, a canonical LR parser or LR(1) parser is an LR(k) parser for k=1, i.e. with a single lookahead terminal. The special attribute of this parser is that any LR(k) grammar with k>1 can be transformed into an LR(1) grammar. However, back-substitutions are required to reduce k and as back-substitutions increase, the grammar can quickly become large, repetitive and hard to understand. LR(k) can handle all deterministic context-free languages. In the past this LR(k) parser has been avoided because of its huge memory requirements in favor of less powerful alternatives such as the LALR and the LL(1) parser. Recently, however, a "minimal LR(1) parser" whose space requirements are close to LALR parsers, is being offered by several parser generators.

Richard E. Stearns American computer scientist

Richard Edwin Stearns is a prominent computer scientist who, with Juris Hartmanis, received the 1993 ACM Turing Award "in recognition of their seminal paper which established the foundations for the field of computational complexity theory". In 1994 he was inducted as a Fellow of the Association for Computing Machinery.

In computer science, an ambiguous grammar is a context-free grammar for which there exists a string that can have more than one leftmost derivation or parse tree, while an unambiguous grammar is a context-free grammar for which every valid string has a unique leftmost derivation or parse tree. Many languages admit both ambiguous and unambiguous grammars, while some languages admit only ambiguous grammars. Any non-empty language admits an ambiguous grammar by taking an unambiguous grammar and introducing a duplicate rule or synonym. A language that only admits ambiguous grammars is called an inherently ambiguous language, and there are inherently ambiguous context-free languages. Deterministic context-free grammars are always unambiguous, and are an important subclass of unambiguous grammars; there are non-deterministic unambiguous grammars, however.

An operator precedence grammar is a kind of grammar for formal languages.

A GLR parser is an extension of an LR parser algorithm to handle non-deterministic and ambiguous grammars. The theoretical foundation was provided in a 1974 paper by Bernard Lang. It describes a systematic way to produce such algorithms, and provides uniform results regarding correctness proofs, complexity with respect to grammar classes, and optimization techniques. The first actual implementation of GLR was described in a 1984 paper by Masaru Tomita, it has also been referred to as a "parallel parser". Tomita presented five stages in his original work, though in practice it is the second stage that is recognized as the GLR parser.

This is a list of notable lexer generators and parser generators for various language classes.

In automata theory, a deterministic pushdown automaton is a variation of the pushdown automaton. The class of deterministic pushdown automata accepts the deterministic context-free languages, a proper subset of context-free languages.

In formal language theory, deterministic context-free languages (DCFL) are a proper subset of context-free languages. They are the context-free languages that can be accepted by a deterministic pushdown automaton. DCFLs are always unambiguous, meaning that they admit an unambiguous grammar. There are non-deterministic unambiguous CFLs, so DCFLs form a proper subset of unambiguous CFLs.

In computer science, SYNTAX is a system used to generate lexical and syntactic analyzers (parsers) for all kinds of context-free grammars (CFGs) as well as some classes of contextual grammars. It has been developed at INRIA (France) for several decades, mostly by Pierre Boullier, but has become free software since 2007 only. SYNTAX is distributed under the CeCILL license.

History of compiler construction

In computing, a compiler is a computer program that transforms source code written in a programming language or computer language, into another computer language. The most common reason for transforming source code is to create an executable program.

A lookahead LR parser (LALR) generator is a software tool that reads a BNF grammar and creates an LALR parser which is capable of parsing files written in the computer language defined by the BNF grammar. LALR parsers are desirable because they are very fast and small in comparison to other types of parsers.

<i>Information and Computation</i> Academic journal

Information and Computation is a closed-access computer science journal published by Elsevier. The journal was founded in 1957 under its former name Information and Control and given its current title in 1987. As of July 2022, the current editor-in-chief is David Peleg. The journal publishes 12 issues a year.

LL grammar Type of a context-free grammar

In formal language theory, an LL grammar is a context-free grammar that can be parsed by an LL parser, which parses the input from Left to right, and constructs a Leftmost derivation of the sentence. A language that has an LL grammar is known as an LL language. These form subsets of deterministic context-free grammars (DCFGs) and deterministic context-free languages (DCFLs), respectively. One says that a given grammar or language "is an LL grammar/language" or simply "is LL" to indicate that it is in this class.

A shift-reduce parser is a class of efficient, table-driven bottom-up parsing methods for computer languages and other notations formally defined by a grammar. The parsing methods most commonly used for parsing programming languages, LR parsing and its variations, are shift-reduce methods. The precedence parsers used before the invention of LR parsing are also shift-reduce methods. All shift-reduce parsers have similar outward effects, in the incremental order in which they build a parse tree or call specific output actions.

JFLAP

JFLAP is interactive educational software written in Java for experimenting with topics in the computer science area of formal languages and automata theory, primarily intended for use at the undergraduate level or as an advanced topic for high school. JFLAP allows one to create and simulate structures, such as programming a finite state machine, and experiment with proofs, such as converting a nondeterministic finite automaton (NFA) to a deterministic finite automaton (DFA).

References

  1. Chomsky, Noam (1962). "Context Free Grammars and Pushdown Storage". Quarterly Progress Report, MIT Research Laboratory in Electronics. 65: 187–194.
  2. Evey, R.J. (1963). "Application of Pushdown-Store Machines". 1963 AFIPS Proceedings of the Fall Joint Computer Conference: 215–227.
  3. Schützenberger, Marcel Paul (1963). "On Context-Free Languages and Push-Down Automata". Information and Control. 6 (3): 246–264. doi: 10.1016/s0019-9958(63)90306-1 .
  4. A. Salomaa; D. Wood; S. Yu, eds. (2001). A Half-Century of Automata Theory . World Scientific Publishing. pp. 38–39.
  5. Knuth, D. E. (July 1965). "On the translation of languages from left to right". Information and Control. 8 (6): 607–639. doi: 10.1016/S0019-9958(65)90426-2 .
  6. Franklin L. DeRemer (1969). "Practical Translators for LR(k) languages" (PDF). MIT, PhD Dissertation. Archived from the original (PDF) on 2012-04-05.
  7. X. Chen, Measuring and extending LR(1) parsing, University of Hawaii PhD dissertation, 2009.