Simple LR parser

Last updated

In computer science, a Simple LR or SLR parser is a type of LR parser with small parse tables and a relatively simple parser generator algorithm. As with other types of LR(1) parser, an SLR parser is quite efficient at finding the single correct bottom-up parse in a single left-to-right scan over the input stream, without guesswork or backtracking. The parser is mechanically generated from a formal grammar for the language.

Contents

SLR and the more-general methods LALR parser and Canonical LR parser have identical methods and similar tables at parse time; they differ only in the mathematical grammar analysis algorithms used by the parser generator tool. SLR and LALR generators create tables of identical size and identical parser states. SLR generators accept fewer grammars than do LALR generators like yacc and Bison. Many computer languages don't readily fit the restrictions of SLR, as is. Bending the language's natural grammar into SLR grammar form requires more compromises and grammar hackery. So LALR generators have become much more widely used than SLR generators, despite being somewhat more complicated tools. SLR methods remain a useful learning step in college classes on compiler theory.

SLR and LALR were both developed by Frank DeRemer as the first practical uses of Donald Knuth's LR parser theory.[ citation needed ] The tables created for real grammars by full LR methods were impractically large, larger than most computer memories of that decade, with 100 times or more parser states than the SLR and LALR methods.[ citation needed ].

Lookahead sets

To understand the differences between SLR and LALR, it is important to understand their many similarities and how they both make shift-reduce decisions. (See the article LR parser now for that background, up through the section on reductions' lookahead sets.)

The one difference between SLR and LALR is how their generators calculate the lookahead sets of input symbols that should appear next, whenever some completed production rule is found and reduced.

SLR generators calculate that lookahead by an easy approximation method based directly on the grammar, ignoring the details of individual parser states and transitions. This ignores the particular context of the current parser state. If some nonterminal symbol S is used in several places in the grammar, SLR treats those places in the same single way rather than handling them individually. The SLR generator works out Follow(S), the set of all terminal symbols which can immediately follow some occurrence of S. In the parse table, each reduction to S uses Follow(S) as its LR(1) lookahead set. Such follow sets are also used by generators for LL top-down parsers. A grammar that has no shift/reduce or reduce/reduce conflicts when using follow sets is called an SLR grammar.

LALR generators calculate lookahead sets by a more precise method based on exploring the graph of parser states and their transitions. This method considers the particular context of the current parser state. It customizes the handling of each grammar occurrence of some nonterminal S. See article LALR parser for further details of this calculation. The lookahead sets calculated by LALR generators are a subset of (and hence better than) the approximate sets calculated by SLR generators. If a grammar has table conflicts when using SLR follow sets, but is conflict-free when using LALR follow sets, it is called a LALR grammar.

Example

A grammar that can be parsed by an SLR parser but not by an LR(0) parser is the following:

(0) S → E
(1) E → 1 E
(2) E → 1

Constructing the action and goto table as is done for LR(0) parsers would give the following item sets and tables:

Item set 0
S → • E
+ E → • 1 E
+ E → • 1
Item set 1
E → 1 • E
E → 1 •
+ E → • 1 E
+ E → • 1
Item set 2
S → E •
Item set 3
E → 1 E •

The action and goto tables:

actiongoto
state1$E
0s12
1s1/r2r23
2acc
3r1r1

As can be observed there is a shift-reduce conflict for state 1 and terminal '1'. This occurs because, when the action table for an LR(0) parser is created, reduce actions are inserted on a per-row basis. However, by using a follow set, reduce actions can be added with finer granularity. The follow set for this grammar:

symbolSE1
following$$1,$

A reduce only needs to be added to a particular action column if that action is in the follow set associated with that reduce. This algorithm describes whether a reduce action must be added to an action column:

function mustBeAdded(reduceAction, action) {   ruleNumber = reduceAction.value;   ruleSymbol = rules[ruleNumber].leftHandSide;   return (action in followSet(ruleSymbol)) }

for example, mustBeAdded(r2, "1") is false, because the left hand side of rule 2 is "E", and 1 is not in E's follow set. Contrariwise, mustBeAdded(r2, "$") is true, because "$" is in E's follow set.

By using mustBeAdded on each reduce action in the action table, the result is a conflict-free action table:

actiongoto
state1$E
0s12
1s1r23
2acc
3r1

See also

Related Research Articles

In formal language theory, a context-free grammar (CFG) is a certain type of formal grammar: a set of production rules that describe all possible strings in a given formal language. Production rules are simple replacements. For example, the rule

In computer science, the Earley parser is an algorithm for parsing strings that belong to a given context-free language, though it may suffer problems with certain nullable grammars. The algorithm, named after its inventor, Jay Earley, is a chart parser that uses dynamic programming; it is mainly used for parsing in computational linguistics. It was first introduced in his dissertation in 1968.

In computer science, an LALR parser or Look-Ahead LR parser is a simplified version of a canonical LR parser, to parse a text according to a set of production rules specified by a formal grammar for a computer language.

In computer science, LR parsers are a type of bottom-up parser that analyses deterministic context-free languages in linear time. There are several variants of LR parsers: SLR parsers, LALR parsers, Canonical LR(1) parsers, Minimal LR(1) parsers, GLR parsers. LR parsers can be generated by a parser generator from a formal grammar defining the syntax of the language to be parsed. They are widely used for the processing of computer languages.

In computer science, an LL parser is a top-down parser for a subset of context-free languages. It parses the input from Left to right, performing Leftmost derivation of the sentence.

In computer science, a recursive descent parser is a kind of top-down parser built from a set of mutually recursive procedures where each such procedure implements one of the nonterminals of the grammar. Thus the structure of the resulting program closely mirrors that of the grammar it recognizes.

In computer science, a canonical LR parser or LR(1) parser is an LR(k) parser for k=1, i.e. with a single lookahead terminal. The special attribute of this parser is that any LR(k) grammar with k>1 can be transformed into an LR(1) grammar. However, back-substitutions are required to reduce k and as back-substitutions increase, the grammar can quickly become large, repetitive and hard to understand. LR(k) can handle all deterministic context-free languages.. In the past this LR(k) parser has been avoided because of its huge memory requirements in favor of less powerful alternatives such as the LALR and the LL(1) parser. Recently, however, a "minimal LR(1) parser" whose space requirements are close to LALR parsers, is being offered by several parser generators.

Parsing, syntax analysis, or syntactic analysis is the process of analysing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part.

In computer science, parsing reveals the grammatical structure of linear input text, as a first step in working out its meaning. Bottom-up parsing recognizes the text's lowest-level small details first, before its mid-level structures, and leaving the highest-level overall structure to last.

In computer science, a parsing expression grammar, or PEG, is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in 2004 and is closely related to the family of top-down parsing languages introduced in the early 1970s. Syntactically, PEGs also look similar to context-free grammars (CFGs), but they have a different interpretation: the choice operator selects the first match in PEG, while it is ambiguous in CFG. This is closer to how string recognition tends to be done in practice, e.g. by a recursive descent parser.

XPL is a programming language based on PL/I, a portable one-pass compiler written in its own language, and a parser generator tool for easily implementing similar compilers for other languages. XPL was designed in 1967 as a way to teach compiler design principles and as starting point for students to build compilers for their own languages.

In computer science, an operator precedence parser is a bottom-up parser that interprets an operator-precedence grammar. For example, most calculators use operator precedence parsers to convert from the human-readable infix notation relying on order of operations to a format that is optimized for evaluation such as Reverse Polish notation (RPN).

In computer science, scannerless parsing performs tokenization and parsing in a single step, rather than breaking it up into a pipeline of a lexer followed by a parser, executing concurrently. A language grammar is scannerless if it uses a single formalism to express both the lexical and phrase level structure of the language.

In formal grammar theory, the deterministic context-free grammars (DCFGs) are a proper subset of the context-free grammars. They are the subset of context-free grammars that can be derived from deterministic pushdown automata, and they generate the deterministic context-free languages. DCFGs are always unambiguous, and are an important subclass of unambiguous CFGs; there are non-deterministic unambiguous CFGs, however.

In computer science, recursive ascent parsing is a technique for implementing an LALR parser which uses mutually-recursive functions rather than tables. Thus, the parser is directly encoded in the host language similar to recursive descent. Direct encoding usually yields a parser which is faster than its table-driven equivalent for the same reason that compilation is faster than interpretation. It is also (nominally) possible to hand edit a recursive ascent parser, whereas a tabular implementation is nigh unreadable to the average human.

In computing, a compiler is a computer program that transforms source code written in a programming language or computer language, into another computer language. The most common reason for transforming source code is to create an executable program.

In general science, SLR grammars are the class of formal grammars accepted by a Simple LR parser. SLR grammars are a superset of all LR(0) grammars and a subset of all LALR(1) and LR(1) grammars.

A lookahead left-to-right (LALR) parser generator is a software tool that reads a BNF grammar and creates an LALR parser which is capable of parsing files written in the computer language defined by the BNF grammar. LALR parsers are desirable because they are very fast and small in comparison to other types of parsers.

LL grammar

In formal language theory, an LL grammar is a context-free grammar that can be parsed by an LL parser, which parses the input from Left to right, and constructs a Leftmost derivation of the sentence. A language that has an LL grammar is known as an LL language. These form subsets of deterministic context-free grammars (DCFGs) and deterministic context-free languages (DCFLs), respectively. One says that a given grammar or language "is an LL grammar/language" or simply "is LL" to indicate that it is in this class.

A shift-reduce parser is a class of efficient, table-driven bottom-up parsing methods for computer languages and other notations formally defined by a grammar. The parsing methods most commonly used for parsing programming languages, LR parsing and its variations, are shift-reduce methods. The precedence parsers used before the invention of LR parsing are also shift-reduce methods. All shift-reduce parsers have similar outward effects, in the incremental order in which they build a parse tree or call specific output actions.