Simple precedence parser

Last updated

In computer science, a simple precedence parser is a type of bottom-up parser for context-free grammars that can be used only by simple precedence grammars.

Contents

The implementation of the parser is quite similar to the generic bottom-up parser. A stack is used to store a viable prefix of a sentential form from a rightmost derivation. The symbols ⋖, ≐ and ⋗ are used to identify the pivot, and to know when to Shift or when to Reduce.

Implementation

SearchProductionToReduce (Stack)

Example

Given following language, which can parse arithmetic expressions with the multiplication and addition operations:

E  --> E + T' | T' T' --> T T  --> T * F  | F F  --> ( E' ) | num E' --> E 

num is a terminal, and the lexer parse any integer as num; E represents an arithmetic expression, T is a term and F is a factor.

and the Parsing table:

EE'TT'F+*()num$
E
E'
T
T'
F
+
*
(
)
num
$
STACK                   PRECEDENCE    INPUT            ACTION  $                            ⋖        2 * ( 1 + 3 )$   SHIFT $ ⋖ 2                        ⋗        * ( 1 + 3 )$     REDUCE (F -> num) $ ⋖ F                        ⋗        * ( 1 + 3 )$     REDUCE (T -> F) $ ⋖ T                        ≐        * ( 1 + 3 )$     SHIFT $ ⋖ T ≐ *                    ⋖        ( 1 + 3 )$       SHIFT $ ⋖ T ≐ * ⋖ (                ⋖        1 + 3 )$         SHIFT $ ⋖ T ≐ * ⋖ ( ⋖ 1            ⋗        + 3 )$           REDUCE 4× (F -> num) (T -> F) (T' -> T) (E ->T ')  $ ⋖ T ≐ * ⋖ ( ⋖ E            ≐        + 3 )$           SHIFT $ ⋖ T ≐ * ⋖ ( ⋖ E ≐ +        ⋖        3 )$             SHIFT $ ⋖ T ≐ * ⋖ ( ⋖ E ≐ + < 3    ⋗        )$               REDUCE 3× (F -> num) (T -> F) (T' -> T)  $ ⋖ T ≐ * ⋖ ( ⋖ E ≐ + ≐ T    ⋗        )$               REDUCE 2× (E -> E + T) (E' -> E) $ ⋖ T ≐ * ⋖ ( ≐ E'           ≐        )$               SHIFT $ ⋖ T ≐ * ⋖ ( ≐ E' ≐ )       ⋗        $                REDUCE (F -> ( E' )) $ ⋖ T ≐ * ≐ F                ⋗        $                REDUCE (T -> T * F) $ ⋖ T                        ⋗        $                REDUCE 2× (T' -> T) (E -> T') $ ⋖ E                                 $                ACCEPT 

Related Research Articles

In computer science, an LALR parser or Look-Ahead LR parser is a simplified version of a canonical LR parser, to parse a text according to a set of production rules specified by a formal grammar for a computer language.

In computer science, LR parsers are a type of bottom-up parser that analyse deterministic context-free languages in linear time. There are several variants of LR parsers: SLR parsers, LALR parsers, Canonical LR(1) parsers, Minimal LR(1) parsers, and GLR parsers. LR parsers can be generated by a parser generator from a formal grammar defining the syntax of the language to be parsed. They are widely used for the processing of computer languages.

GNU Bison, commonly known as Bison, is a parser generator that is part of the GNU Project. Bison reads a specification in the BNF notation, warns about any parsing ambiguities, and generates a parser that reads sequences of tokens and decides whether the sequence conforms to the syntax specified by the grammar.

In computer science, an LL parser is a top-down parser for a restricted context-free language. It parses the input from Left to right, performing Leftmost derivation of the sentence.

In computer science, a recursive descent parser is a kind of top-down parser built from a set of mutually recursive procedures where each such procedure implements one of the nonterminals of the grammar. Thus the structure of the resulting program closely mirrors that of the grammar it recognizes.

In computer science, a Simple LR or SLR parser is a type of LR parser with small parse tables and a relatively simple parser generator algorithm. As with other types of LR(1) parser, an SLR parser is quite efficient at finding the single correct bottom-up parse in a single left-to-right scan over the input stream, without guesswork or backtracking. The parser is mechanically generated from a formal grammar for the language.

In computer science, a canonical LR parser or LR(1) parser is an LR(k) parser for k=1, i.e. with a single lookahead terminal. The special attribute of this parser is that any LR(k) grammar with k>1 can be transformed into an LR(1) grammar. However, back-substitutions are required to reduce k and as back-substitutions increase, the grammar can quickly become large, repetitive and hard to understand. LR(k) can handle all deterministic context-free languages. In the past this LR(k) parser has been avoided because of its huge memory requirements in favor of less powerful alternatives such as the LALR and the LL(1) parser. Recently, however, a "minimal LR(1) parser" whose space requirements are close to LALR parsers, is being offered by several parser generators.

In computer science, lexical analysis, lexing or tokenization is the process of converting a sequence of characters into a sequence of lexical tokens. A program that performs lexical analysis may be termed a lexer, tokenizer, or scanner, although scanner is also a term for the first stage of a lexer. A lexer is generally combined with a parser, which together analyze the syntax of programming languages, web pages, and so forth.

In computer programming, the interpreter pattern is a design pattern that specifies how to evaluate sentences in a language. The basic idea is to have a class for each symbol in a specialized computer language. The syntax tree of a sentence in the language is an instance of the composite pattern and is used to evaluate (interpret) the sentence for a client. See also Composite pattern.

Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part.

Top-down parsing in computer science is a parsing strategy where one first looks at the highest level of the parse tree and works down the parse tree by using the rewriting rules of a formal grammar. LL parsers are a type of parser that uses a top-down parsing strategy.

This is a list of operators in the C and C++ programming languages. All the operators listed exist in C++; the column "Included in C", states whether an operator is also present in C. Note that C does not support operator overloading.

In computer science, an operator precedence parser is a bottom-up parser that interprets an operator-precedence grammar. For example, most calculators use operator precedence parsers to convert from the human-readable infix notation relying on order of operations to a format that is optimized for evaluation such as Reverse Polish notation (RPN).

In computer science, a Wirth–Weber relationship between a pair of symbols is necessary to determine if a formal grammar is a simple precedence grammar. In such a case, the simple precedence parser can be used. The relationship is named after computer scientists Niklaus Wirth and Helmut Weber.

An operator precedence grammar is a kind of grammar for formal languages.

In computer science, the shunting yard algorithm is a method for parsing arithmetical or logical expressions, or a combination of both, specified in infix notation. It can produce either a postfix notation string, also known as Reverse Polish notation (RPN), or an abstract syntax tree (AST). The algorithm was invented by Edsger Dijkstra and named the "shunting yard" algorithm because its operation resembles that of a railroad shunting yard. Dijkstra first described the shunting yard algorithm in the Mathematisch Centrum report MR 34/61.

A GLR parser is an extension of an LR parser algorithm to handle non-deterministic and ambiguous grammars. The theoretical foundation was provided in a 1974 paper by Bernard Lang. It describes a systematic way to produce such algorithms, and provides uniform results regarding correctness proofs, complexity with respect to grammar classes, and optimization techniques. The first actual implementation of GLR was described in a 1984 paper by Masaru Tomita, it has also been referred to as a "parallel parser". Tomita presented five stages in his original work, though in practice it is the second stage that is recognized as the GLR parser.

In computer science, recursive ascent parsing is a technique for implementing an LALR parser which uses mutually-recursive functions rather than tables. Thus, the parser is directly encoded in the host language similar to recursive descent. Direct encoding usually yields a parser which is faster than its table-driven equivalent for the same reason that compilation is faster than interpretation. It is also (nominally) possible to hand edit a recursive ascent parser, whereas a tabular implementation is nigh unreadable to the average human.

SLR grammars are the class of formal grammars accepted by a Simple LR parser. SLR grammars are a superset of all LR(0) grammars and a subset of all LALR(1) and LR(1) grammars.

A shift-reduce parser is a class of efficient, table-driven bottom-up parsing methods for computer languages and other notations formally defined by a grammar. The parsing methods most commonly used for parsing programming languages, LR parsing and its variations, are shift-reduce methods. The precedence parsers used before the invention of LR parsing are also shift-reduce methods. All shift-reduce parsers have similar outward effects, in the incremental order in which they build a parse tree or call specific output actions.

References