Simple LR parser

Last updated November 08, 2024

In computer science, a Simple LR or SLR parser is a type of LR parser with small parse tables and a relatively simple parser generator algorithm. As with other types of LR(1) parser, an SLR parser is quite efficient at finding the single correct bottom-up parse in a single left-to-right scan over the input stream, without guesswork or backtracking. The parser is mechanically generated from a formal grammar for the language.

SLR and the more general methods LALR parser and Canonical LR parser have identical methods and similar tables at parse time; they differ only in the mathematical grammar analysis algorithms used by the parser generator tool. SLR and LALR generators create tables of identical size and identical parser states. SLR generators accept fewer grammars than LALR generators like yacc and Bison.^{[ citation needed ]} Many computer languages don't readily fit the restrictions of SLR, as is. Bending the language's natural grammar into SLR grammar form requires more compromises and grammar hackery. So LALR generators have become much more widely used than SLR generators, despite being somewhat more complicated tools. SLR methods remain a useful learning step in college classes on compiler theory.^{[ citation needed ]}

SLR and LALR were both developed by Frank DeRemer as the first practical uses of Donald Knuth's LR parser theory.^[1]^[2] The tables created for real grammars by full LR methods were impractically large, larger than most computer memories of that decade, with 100 times or more parser states than the SLR and LALR methods.^[3]

Lookahead sets

To understand the differences between SLR and LALR, it is important to understand their many similarities and how they both make shift-reduce decisions. (See the article LR parser now for that background, up through the section on reductions' lookahead sets.)

The one difference between SLR and LALR is how their generators calculate the lookahead sets of input symbols that should appear next, whenever some completed production rule is found and reduced.

SLR generators calculate that lookahead by an easy approximation method based directly on the grammar, ignoring the details of individual parser states and transitions. This ignores the particular context of the current parser state. If some nonterminal symbol S is used in several places in the grammar, SLR treats those places in the same single way rather than handling them individually. The SLR generator works out Follow(S), the set of all terminal symbols which can immediately follow some occurrence of S. In the parse table, each reduction to S uses Follow(S) as its LR(1) lookahead set. Such follow sets are also used by generators for LL top-down parsers. A grammar that has no shift/reduce or reduce/reduce conflicts when using follow sets is called an SLR grammar. ^{[ citation needed ]}

LALR generators calculate lookahead sets by a more precise method based on exploring the graph of parser states and their transitions. This method considers the particular context of the current parser state. It customizes the handling of each grammar occurrence of some nonterminal S. See article LALR parser for further details of this calculation. The lookahead sets calculated by LALR generators are a subset of (and hence better than) the approximate sets calculated by SLR generators. If a grammar has table conflicts when using SLR follow sets, but is conflict-free when using LALR follow sets, it is called a LALR grammar. ^{[ citation needed ]}

Example

A grammar that can be parsed by an SLR parser but not by an LR(0) parser is the following:

(0) S → E

(1) E → 1 E

(2) E → 1

Constructing the action and goto table as is done for LR(0) parsers would give the following item sets and tables:

Item set 0: S → • E; + E → • 1 E; + E → • 1

Item set 1: E → 1 • E; E → 1 •; + E → • 1 E; + E → • 1

Item set 2: S → E •

Item set 3: E → 1 E •

The action and goto tables:

	action		goto
state	1	$	E
0	s1		2
1	s1/r2	r2	3
2		acc
3	r1	r1

As can be observed there is a shift-reduce conflict for state 1 and terminal '1'. This occurs because, when the action table for an LR(0) parser is created, reduce actions are inserted on a per-row basis. However, by using a follow set, reduce actions can be added with finer granularity. The follow set for this grammar:

symbol	S	E	1
following	$	$	1,$

A reduce only needs to be added to a particular action column if that action is in the follow set associated with that reduce. This algorithm describes whether a reduce action must be added to an action column:

function mustBeAdded(reduceAction, action) {     ruleNumber = reduceAction.value;     ruleSymbol = rules[ruleNumber].leftHandSide;     return (action in followSet(ruleSymbol)) }

for example, mustBeAdded(r2, "1") is false, because the left hand side of rule 2 is "E", and 1 is not in E's follow set. Contrariwise, mustBeAdded(r2, "$") is true, because "$" is in E's follow set.

By using mustBeAdded on each reduce action in the action table, the result is a conflict-free action table:

	action		goto
state	1	$	E
0	s1		2
1	s1	r2	3
2		acc
3		r1

Related Research Articles

In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules can be applied to a nonterminal symbol regardless of its context. In particular, in a context-free grammar, each production rule is of the form

In computer science, the Earley parser is an algorithm for parsing strings that belong to a given context-free language, though it may suffer problems with certain nullable grammars. The algorithm, named after its inventor, Jay Earley, is a chart parser that uses dynamic programming; it is mainly used for parsing in computational linguistics. It was first introduced in his dissertation in 1968.

In computer science, an LALR parser is part of the compiling process where human readable text is converted into a structured representation to be read by computers. An LALR parser is a software tool to process (parse) text into a very specific internal representation that other programs, such as compilers, can work with. This process happens according to a set of production rules specified by a formal grammar for a computer language.

In computer science, LR parsers are a type of bottom-up parser that analyse deterministic context-free languages in linear time. There are several variants of LR parsers: SLR parsers, LALR parsers, canonical LR(1) parsers, minimal LR(1) parsers, and generalized LR parsers. LR parsers can be generated by a parser generator from a formal grammar defining the syntax of the language to be parsed. They are widely used for the processing of computer languages.

In computer science, an LL parser is a top-down parser for a restricted context-free language. It parses the input from Left to right, performing Leftmost derivation of the sentence.

In computer science, a recursive descent parser is a kind of top-down parser built from a set of mutually recursive procedures where each such procedure implements one of the nonterminals of the grammar. Thus the structure of the resulting program closely mirrors that of the grammar it recognizes.

A canonical LR parser is a type of bottom-up parsing algorithm used in computer science to analyze and process programming languages. It is based on the LR parsing technique, which stands for "left-to-right, rightmost derivation in reverse."

Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part.

The dangling else is a problem in programming of parser generators in which an optional else clause in an if–then(–else) statement results in nested conditionals being ambiguous. Formally, the reference context-free grammar of the language is ambiguous, meaning there is more than one correct parse tree.

In computer science, a parsing expression grammar (PEG) is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in 2004 and is closely related to the family of top-down parsing languages introduced in the early 1970s. Syntactically, PEGs also look similar to context-free grammars (CFGs), but they have a different interpretation: the choice operator selects the first match in PEG, while it is ambiguous in CFG. This is closer to how string recognition tends to be done in practice, e.g. by a recursive descent parser.

XPL, for expert's programming language is a programming language based on PL/I, a portable one-pass compiler written in its own language, and a parser generator tool for easily implementing similar compilers for other languages. XPL was designed in 1967 as a way to teach compiler design principles and as starting point for students to build compilers for their own languages.

In computer science, an operator-precedence parser is a bottom-up parser that interprets an operator-precedence grammar. For example, most calculators use operator-precedence parsers to convert from the human-readable infix notation relying on order of operations to a format that is optimized for evaluation such as Reverse Polish notation (RPN).

In the formal language theory of computer science, left recursion is a special case of recursion where a string is recognized as part of a language by the fact that it decomposes into a string from that same language and a suffix. For instance, $can be recognized as a sum because it can be broken into, also a sum, and, a suitable suffix.$

This is a list of notable lexer generators and parser generators for various language classes.

In formal grammar theory, the deterministic context-free grammars (DCFGs) are a proper subset of the context-free grammars. They are the subset of context-free grammars that can be derived from deterministic pushdown automata, and they generate the deterministic context-free languages. DCFGs are always unambiguous, and are an important subclass of unambiguous CFGs; there are non-deterministic unambiguous CFGs, however.

In computing, a compiler is a computer program that transforms source code written in a programming language or computer language, into another computer language. The most common reason for transforming source code is to create an executable program.

SLR grammars are the class of formal grammars accepted by a Simple LR parser. SLR grammars are a superset of all LR(0) grammars and a subset of all LALR(1) and LR(1) grammars.

A lookahead LR parser (LALR) generator is a software tool that reads a context-free grammar (CFG) and creates an LALR parser which is capable of parsing files written in the context-free language defined by the CFG. LALR parsers are desirable because they are very fast and small in comparison to other types of parsers.

In formal language theory, an LL grammar is a context-free grammar that can be parsed by an LL parser, which parses the input from Left to right, and constructs a Leftmost derivation of the sentence. A language that has an LL grammar is known as an LL language. These form subsets of deterministic context-free grammars (DCFGs) and deterministic context-free languages (DCFLs), respectively. One says that a given grammar or language "is an LL grammar/language" or simply "is LL" to indicate that it is in this class.

A shift-reduce parser is a class of efficient, table-driven bottom-up parsing methods for computer languages and other notations formally defined by a grammar. The parsing methods most commonly used for parsing programming languages, LR parsing and its variations, are shift-reduce methods. The precedence parsers used before the invention of LR parsing are also shift-reduce methods. All shift-reduce parsers have similar outward effects, in the incremental order in which they build a parse tree or call specific output actions.

References

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] ttps://wiki.eecs.yorku.ca/course_archive/2013-14/W/6339/_media/lrk.pdf

[2] ttps://www.seas.upenn.edu/~cis5110/notes/cis511-sl9.pdf

[3] ttps://books.google.com/books?id=nEA9AAAAIAAJ&pg=PA87

[1]

[2]

[3]

v t e Parsing algorithms
Top-down	Earley LL Recursive descent Tail recursive
Bottom-up	Precedence Simple Operator Shunting-yard LR Simple Look-ahead Canonical Generalized CYK Recursive ascent Shift-reduce
Mixed, other	Combinator Chart Left corner Statistical
Related topics	PEG Definite clause grammar Deterministic parsing Dynamic programming Memoization Parser generator LALR Parse tree AST Scannerless parsing History of compiler construction Comparison of parser generators Operator-precedence grammar

Simple LR parser

Contents

Lookahead sets

Example

See also

Related Research Articles

References