LL parser

Last updated March 22, 2024

In computer science, an LL parser (Left-to-right, leftmost derivation) is a top-down parser for a restricted context-free language. It parses the input from Left to right, performing Leftmost derivation of the sentence.

An LL parser is called an LL(k) parser if it uses k tokens of lookahead when parsing a sentence. A grammar is called an LL(k) grammar if an LL(k) parser can be constructed from it. A formal language is called an LL(k) language if it has an LL(k) grammar. The set of LL(k) languages is properly contained in that of LL(k+1) languages, for each k ≥ 0.^[1] A corollary of this is that not all context-free languages can be recognized by an LL(k) parser.

An LL parser is called LL-regular (LLR) if it parses an LL-regular language.^{[ clarification needed ]}^[2]^[3]^[4] The class of LLR grammars contains every LL(k) grammar for every k. For every LLR grammar there exists an LLR parser that parses the grammar in linear time.^{[ citation needed ]}

Two nomenclative outlier parser types are LL(*) and LL(finite). A parser is called LL(*)/LL(finite) if it uses the LL(*)/LL(finite) parsing strategy.^[5]^[6] LL(*) and LL(finite) parsers are functionally closer to PEG parsers. An LL(finite) parser can parse an arbitrary LL(k) grammar optimally in the amount of lookahead and lookahead comparisons. The class of grammars parsable by the LL(*) strategy encompasses some context-sensitive languages due to the use of syntactic and semantic predicates and has not been identified. It has been suggested that LL(*) parsers are better thought of as TDPL parsers.^[7] Against the popular misconception, LL(*) parsers are not LLR in general, and are guaranteed by construction to perform worse on average (super-linear against linear time) and far worse in the worst-case (exponential against linear time).

LL grammars, particularly LL(1) grammars, are of great practical interest, as parsers for these grammars are easy to construct, and many computer languages are designed to be LL(1) for this reason.^[8] LL parsers may be table-based,^{[ citation needed ]} i.e. similar to LR parsers, but LL grammars can also be parsed by recursive descent parsers. According to Waite and Goos (1984),^[9] LL(k) grammars were introduced by Stearns and Lewis (1969).^[10]

Overview

For a given context-free grammar, the parser attempts to find the leftmost derivation. Given an example grammar $G$ :

$S\to E$
$E\to (E+E)$
$E\to i$

the leftmost derivation for $w=((i+i)+i)$ is:

S\ {\overset {(1)}{\Rightarrow }}\ E\ {\overset {(2)}{\Rightarrow }}\ (E+E)\ {\overset {(2)}{\Rightarrow }}\ ((E+E)+E)\ {\overset {(3)}{\Rightarrow }}\ ((i+E)+E)\ {\overset {(3)}{\Rightarrow }}\ ((i+i)+E)\ {\overset {(3)}{\Rightarrow }}\ ((i+i)+i)

Generally, there are multiple possibilities when selecting a rule to expand the leftmost non-terminal. In step 2 of the previous example, the parser must choose whether to apply rule 2 or rule 3:

{\displaystyle S\ {\overset {(1)}{\Rightarrow }}\ E\ {\overset {(?)}{\Rightarrow }}\

To be efficient, the parser must be able to make this choice deterministically when possible, without backtracking. For some grammars, it can do this by peeking on the unread input (without reading). In our example, if the parser knows that the next unread symbol is $($ , the only correct rule that can be used is 2.

Generally, an $LL(k)$ parser can look ahead at $k$ symbols. However, given a grammar, the problem of determining if there exists a $LL(k)$ parser for some $k$ that recognizes it is undecidable. For each $k$ , there is a language that cannot be recognized by an $LL(k)$ parser, but can be by an $LL(k+1)$ .

We can use the above analysis to give the following formal definition:

Let $G$ be a context-free grammar and $k\geq 1$ . We say that $G$ is $LL(k)$ , if and only if for any two leftmost derivations:

$S\ \Rightarrow \ \cdots \ \Rightarrow \ wA\alpha \ \Rightarrow \ \cdots \ \Rightarrow \ w\beta \alpha \ \Rightarrow \ \cdots \ \Rightarrow \ wu$
$S\ \Rightarrow \ \cdots \ \Rightarrow \ wA\alpha \ \Rightarrow \ \cdots \ \Rightarrow \ w\gamma \alpha \ \Rightarrow \ \cdots \ \Rightarrow \ wv$

the following condition holds: the prefix of the string $u$ of length $k$ equals the prefix of the string $v$ of length $k$ implies $\beta \ =\ \gamma$ .

In this definition, $S$ is the start symbol and $A$ any non-terminal. The already derived input $w$ , and yet unread $u$ and $v$ are strings of terminals. The Greek letters $\alpha$ , $\beta$ and $\gamma$ represent any string of both terminals and non-terminals (possibly empty). The prefix length corresponds to the lookahead buffer size, and the definition says that this buffer is enough to distinguish between any two derivations of different words.

Parser

The $LL(k)$ parser is a deterministic pushdown automaton with the ability to peek on the next $k$ input symbols without reading. This peek capability can be emulated by storing the lookahead buffer contents in the finite state space, since both buffer and input alphabet are finite in size. As a result, this does not make the automaton more powerful, but is a convenient abstraction.

The stack alphabet is $\Gamma =N\cup \Sigma$ , where:

$N$ is the set of non-terminals;
$\Sigma$ the set of terminal (input) symbols with a special end-of-input (EOI) symbol $\$$ .

The parser stack initially contains the starting symbol above the EOI: $[\ S\ \$\ ]$ . During operation, the parser repeatedly replaces the symbol $X$ on top of the stack:

with some $\alpha$ , if $X\in N$ and there is a rule $X\to \alpha$ ;
with $\epsilon$ (in some notations $\lambda$ ), i.e. $X$ is popped off the stack, if $X\in \Sigma$ . In this case, an input symbol $x$ is read and if $x\neq X$ , the parser rejects the input.

If the last symbol to be removed from the stack is the EOI, the parsing is successful; the automaton accepts via an empty stack.

The states and the transition function are not explicitly given; they are specified (generated) using a more convenient parse table instead. The table provides the following mapping:

row: top-of-stack symbol $X$
column: $|w|\leq k$ lookahead buffer contents
cell: rule number for $X\to \alpha$ or $\epsilon$

If the parser cannot perform a valid transition, the input is rejected (empty cells). To make the table more compact, only the non-terminal rows are commonly displayed, since the action is the same for terminals.

Concrete example

Set up

To explain an LL(1) parser's workings we will consider the following small LL(1) grammar:

S → F
S → ( S + F )
F → a

and parse the following input:

( a + a )

An LL(1) parsing table for a grammar has a row for each of the non-terminals and a column for each terminal (including the special terminal, represented here as $, that is used to indicate the end of the input stream).

Each cell of the table may point to at most one rule of the grammar (identified by its number). For example, in the parsing table for the above grammar, the cell for the non-terminal 'S' and terminal '(' points to the rule number 2:

	(	)	a	+	$
S	2	—	1	—	—
F	—	—	3	—	—

The algorithm to construct a parsing table is described in a later section, but first let's see how the parser uses the parsing table to process its input.

Parsing procedure

In each step, the parser reads the next-available symbol from the input stream, and the top-most symbol from the stack. If the input symbol and the stack-top symbol match, the parser discards them both, leaving only the unmatched symbols in the input stream and on the stack.

Thus, in its first step, the parser reads the input symbol '(' and the stack-top symbol 'S'. The parsing table instruction comes from the column headed by the input symbol '(' and the row headed by the stack-top symbol 'S'; this cell contains '2', which instructs the parser to apply rule (2). The parser has to rewrite 'S' to '( S + F )' on the stack by removing 'S' from stack and pushing ')', 'F', '+', 'S', '(' onto the stack, and this writes the rule number 2 to the output. The stack then becomes:

[ (, S, +, F, ), $ ]

In the second step, the parser removes the '(' from its input stream and from its stack, since they now match. The stack now becomes:

[ S, +, F, ), $ ]

Now the parser has an 'a' on its input stream and an 'S' as its stack top. The parsing table instructs it to apply rule (1) from the grammar and write the rule number 1 to the output stream. The stack becomes:

[ F, +, F, ), $ ]

The parser now has an 'a' on its input stream and an 'F' as its stack top. The parsing table instructs it to apply rule (3) from the grammar and write the rule number 3 to the output stream. The stack becomes:

[ a, +, F, ), $ ]

The parser now has an 'a' on the input stream and an 'a' at its stack top. Because they are the same, it removes it from the input stream and pops it from the top of the stack. The parser then has an '+' on the input stream and '+' is at the top of the stack meaning, like with 'a', it is popped from the stack and removed from the input stream. This results in:

[ F, ), $ ]

In the next three steps the parser will replace 'F' on the stack by 'a', write the rule number 3 to the output stream and remove the 'a' and ')' from both the stack and the input stream. The parser thus ends with '$' on both its stack and its input stream.

In this case the parser will report that it has accepted the input string and write the following list of rule numbers to the output stream:

[ 2, 1, 3, 3 ]

This is indeed a list of rules for a leftmost derivation of the input string, which is:

S → ( S + F ) → ( F + F ) → ( a + F ) → ( a + a )

Parser implementation in C++

Below follows a C++ implementation of a table-based LL parser for the example language:

#include<iostream>#include<map>#include<stack>enumSymbols{// the symbols:// Terminal symbols:TS_L_PARENS,// (TS_R_PARENS,// )TS_A,// aTS_PLUS,// +TS_EOS,// $, in this case corresponds to '\0'TS_INVALID,// invalid token// Non-terminal symbols:NTS_S,// SNTS_F// F};/*Converts a valid token to the corresponding terminal symbol*/Symbolslexer(charc){switch(c){case'(':returnTS_L_PARENS;case')':returnTS_R_PARENS;case'a':returnTS_A;case'+':returnTS_PLUS;case'\0':returnTS_EOS;// end of stack: the $ terminal symboldefault:returnTS_INVALID;}}intmain(intargc,char**argv){usingnamespacestd;if(argc<2){cout<<"usage:\n\tll '(a+a)'"<<endl;return0;}// LL parser table, maps < non-terminal, terminal> pair to actionmap<Symbols,map<Symbols,int>>table;stack<Symbols>ss;// symbol stackchar*p;// input buffer// initialize the symbols stackss.push(TS_EOS);// terminal, $ss.push(NTS_S);// non-terminal, S// initialize the symbol stream cursorp=&argv[1][0];// set up the parsing tabletable[NTS_S][TS_L_PARENS]=2;table[NTS_S][TS_A]=1;table[NTS_F][TS_A]=3;while(ss.size()>0){if(lexer(*p)==ss.top()){cout<<"Matched symbols: "<<lexer(*p)<<endl;p++;ss.pop();}else{cout<<"Rule "<<table[ss.top()][lexer(*p)]<<endl;switch(table[ss.top()][lexer(*p)]){case1:// 1. S → Fss.pop();ss.push(NTS_F);// Fbreak;case2:// 2. S → ( S + F )ss.pop();ss.push(TS_R_PARENS);// )ss.push(NTS_F);// Fss.push(TS_PLUS);// +ss.push(NTS_S);// Sss.push(TS_L_PARENS);// (break;case3:// 3. F → ass.pop();ss.push(TS_A);// abreak;default:cout<<"parsing table defaulted"<<endl;return0;}}}cout<<"finished parsing"<<endl;return0;}

Parser implementation in Python

# All constants are indexed from 0TERM=0RULE=1# TerminalsT_LPAR=0T_RPAR=1T_A=2T_PLUS=3T_END=4T_INVALID=5# Non-TerminalsN_S=0N_F=1# Parse tabletable=[[1,-1,0,-1,-1,-1],[-1,-1,2,-1,-1,-1]]RULES=[[(RULE,N_F)],[(TERM,T_LPAR),(RULE,N_S),(TERM,T_PLUS),(RULE,N_F),(TERM,T_RPAR)],[(TERM,T_A)]]stack=[(TERM,T_END),(RULE,N_S)]deflexical_analysis(inputstring:str)->list:print("Lexical analysis")tokens=[]forcininputstring:ifc=="+":tokens.append(T_PLUS)elifc=="(":tokens.append(T_LPAR)elifc==")":tokens.append(T_RPAR)elifc=="a":tokens.append(T_A)else:tokens.append(T_INVALID)tokens.append(T_END)print(tokens)returntokensdefsyntactic_analysis(tokens:list)->None:print("Syntactic analysis")position=0whilelen(stack)>0:(stype,svalue)=stack.pop()token=tokens[position]ifstype==TERM:ifsvalue==token:position+=1print("pop",svalue)iftoken==T_END:print("input accepted")else:print("bad term on input:",token)breakelifstype==RULE:print("svalue",svalue,"token",token)rule=table[svalue][token]print("rule",rule)forrinreversed(RULES[rule]):stack.append(r)print("stack",stack)inputstring="(a+a)"syntactic_analysis(lexical_analysis(inputstring))

Remarks

As can be seen from the example, the parser performs three types of steps depending on whether the top of the stack is a nonterminal, a terminal or the special symbol $:

If the top is a nonterminal then the parser looks up in the parsing table, on the basis of this nonterminal and the symbol on the input stream, which rule of the grammar it should use to replace nonterminal on the stack. The number of the rule is written to the output stream. If the parsing table indicates that there is no such rule then the parser reports an error and stops.
If the top is a terminal then the parser compares it to the symbol on the input stream and if they are equal they are both removed. If they are not equal the parser reports an error and stops.
If the top is $ and on the input stream there is also a $ then the parser reports that it has successfully parsed the input, otherwise it reports an error. In both cases the parser will stop.

These steps are repeated until the parser stops, and then it will have either completely parsed the input and written a leftmost derivation to the output stream or it will have reported an error.

Constructing an LL(1) parsing table

In order to fill the parsing table, we have to establish what grammar rule the parser should choose if it sees a nonterminal A on the top of its stack and a symbol a on its input stream. It is easy to see that such a rule should be of the form A → w and that the language corresponding to w should have at least one string starting with a. For this purpose we define the First-set of w, written here as Fi(w), as the set of terminals that can be found at the start of some string in w, plus ε if the empty string also belongs to w. Given a grammar with the rules A₁ → w₁, …, A_n → w_n, we can compute the Fi(w_i) and Fi(A_i) for every rule as follows:

initialize every Fi(A_i) with the empty set
add Fi(w_i) to Fi(A_i) for every rule A_i → w_i, where Fi is defined as follows:
- Fi(aw') = { a } for every terminal a
- Fi(Aw') = Fi(A) for every nonterminal A with ε not in Fi(A)
- Fi(Aw' ) = (Fi(A) \ { ε }) ∪ Fi(w' ) for every nonterminal A with ε in Fi(A)
- Fi(ε) = { ε }
add Fi(w_i) to Fi(A_i) for every rule A_i → w_i
do steps 2 and 3 until all Fi sets stay the same.

The result is the least fixed point solution to the following system:

Fi(A) ⊇ Fi(w) for each rule A → w
Fi(a) ⊇ { a }, for each terminal a
Fi(w₀w₁) ⊇ Fi(w₀)·Fi(w₁), for all words w₀ and w₁
Fi(ε) ⊇ {ε}

where, for sets of words U and V, the truncated product is defined by $U\cdot V=\{(uv):1\mid u\in U,v\in V\}$ , and w:1 denotes the initial length-1 prefix of words w of length 2 or more, or w, itself, if w has length 0 or 1.

Unfortunately, the First-sets are not sufficient to compute the parsing table. This is because a right-hand side w of a rule might ultimately be rewritten to the empty string. So the parser should also use the rule A → w if ε is in Fi(w) and it sees on the input stream a symbol that could follow A. Therefore, we also need the Follow-set of A, written as Fo(A) here, which is defined as the set of terminals a such that there is a string of symbols αAaβ that can be derived from the start symbol. We use $ as a special terminal indicating end of input stream, and S as start symbol.

Computing the Follow-sets for the nonterminals in a grammar can be done as follows:

initialize Fo(S) with { $ } and every other Fo(A_i) with the empty set
if there is a rule of the form A_j → wA_iw' , then
- if the terminal a is in Fi(w' ), then add a to Fo(A_i)
- if ε is in Fi(w' ), then add Fo(A_j) to Fo(A_i)
- if w' has length 0, then add Fo(A_j) to Fo(A_i)
repeat step 2 until all Fo sets stay the same.

This provides the least fixed point solution to the following system:

Fo(S) ⊇ {$}
Fo(A) ⊇ Fi(w)·Fo(B) for each rule of the form B → ... A w

Now we can define exactly which rules will appear where in the parsing table. If T[A, a] denotes the entry in the table for nonterminal A and terminal a, then

T[A,a] contains the rule A → w if and only if

a is in Fi(w) or

ε is in Fi(w) and a is in Fo(A).

Equivalently: T[A, a] contains the rule A → w for each a ∈ Fi(w)·Fo(A).

If the table contains at most one rule in every one of its cells, then the parser will always know which rule it has to use and can therefore parse strings without backtracking. It is in precisely this case that the grammar is called an LL(1) grammar.

Constructing an LL(k) parsing table

The construction for LL(1) parsers can be adapted to LL(k) for k > 1 with the following modifications:

the truncated product is defined $U\cdot V=\{(uv):k\mid u\in U,v\in V\}$ , where w:k denotes the initial length-k prefix of words of length > k, or w, itself, if w has length k or less,
Fo(S) = {$^k}
Apply Fi(αβ) = Fi(α) $\cdot$ Fi(β) also in step 2 of the Fi construction given for LL(1).
In step 2 of the Fo construction, for A_j → wA_iw' simply add Fi(w') $\cdot$ Fo(A_j) to Fo(A_i).

where an input is suffixed by k end-markers $, to fully account for the k lookahead context. This approach eliminates special cases for ε, and can be applied equally well in the LL(1) case.

Until the mid-1990s, it was widely believed that LL(k) parsing^{[ clarify ]} (for k > 1) was impractical,^[11]^{: 263–265} since the parser table would have exponential size in k in the worst case. This perception changed gradually after the release of the Purdue Compiler Construction Tool Set around 1992, when it was demonstrated that many programming languages can be parsed efficiently by an LL(k) parser without triggering the worst-case behavior of the parser. Moreover, in certain cases LL parsing is feasible even with unlimited lookahead. By contrast, traditional parser generators like yacc use LALR(1) parser tables to construct a restricted LR parser with a fixed one-token lookahead.

Conflicts

As described in the introduction, LL(1) parsers recognize languages that have LL(1) grammars, which are a special case of context-free grammars; LL(1) parsers cannot recognize all context-free languages. The LL(1) languages are a proper subset of the LR(1) languages, which in turn are a proper subset of all context-free languages. In order for a context-free grammar to be an LL(1) grammar, certain conflicts must not arise, which we describe in this section.

Terminology

Let A be a non-terminal. FIRST(A) is (defined to be) the set of terminals that can appear in the first position of any string derived from A. FOLLOW(A) is the union over:^[12]

FIRST(B) where B is any non-terminal that immediately follows A in the right-hand side of a production rule.
FOLLOW(B) where B is any head of a rule of the form B → wA.

LL(1) conflicts

There are two main types of LL(1) conflicts:

FIRST/FIRST conflict

The FIRST sets of two different grammar rules for the same non-terminal intersect. An example of an LL(1) FIRST/FIRST conflict:

S -> E | E 'a' E -> 'b' | ε

FIRST(E) = {b, ε} and FIRST(Ea) = {b, a}, so when the table is drawn, there is conflict under terminal b of production rule S.

Special case: left recursion

Left recursion will cause a FIRST/FIRST conflict with all alternatives.

E -> E '+' term | alt1 | alt2

FIRST/FOLLOW conflict

The FIRST and FOLLOW set of a grammar rule overlap. With an empty string (ε) in the FIRST set, it is unknown which alternative to select. An example of an LL(1) conflict:

S -> A 'a' 'b' A -> 'a' | ε

The FIRST set of A is {a, ε}, and the FOLLOW set is {a}.

Solutions to LL(1) conflicts

Left factoring

A common left-factor is "factored out".

A -> X | X Y Z

becomes

A -> X B B -> Y Z | ε

Can be applied when two alternatives start with the same symbol like a FIRST/FIRST conflict.

Another example (more complex) using above FIRST/FIRST conflict example:

S -> E | E 'a' E -> 'b' | ε

becomes (merging into a single non-terminal)

S -> 'b' | ε | 'b' 'a' | 'a'

then through left-factoring, becomes

S -> 'b' E | E E -> 'a' | ε

Substitution

Substituting a rule into another rule to remove indirect or FIRST/FOLLOW conflicts. Note that this may cause a FIRST/FIRST conflict.

Left recursion removal

See.^[13]

For a general method, see removing left recursion. A simple example for left recursion removal: The following production rule has left recursion on E

E -> E '+' T E -> T

This rule is nothing but list of Ts separated by '+'. In a regular expression form T ('+' T)*. So the rule could be rewritten as

E -> T Z Z -> '+' T Z Z -> ε

Now there is no left recursion and no conflicts on either of the rules.

However, not all context-free grammars have an equivalent LL(k)-grammar, e.g.:

S -> A | B A -> 'a' A 'b' | ε B -> 'a' B 'b' 'b' | ε

It can be shown that there does not exist any LL(k)-grammar accepting the language generated by this grammar.

Notes

↑ Rosenkrantz, D. J.; Stearns, R. E. (1970). "Properties of Deterministic Top Down Grammars". Information and Control. 17 (3): 226–256. doi: 10.1016/s0019-9958(70)90446-8 .
↑ Jarzabek, Stanislav; Krawczyk, Tomasz (1974). "LL-Regular Grammars". Instytutu Maszyn Matematycznych: 107–119.
↑ Jarzabek, Stanislav; Krawczyk, Tomasz (Nov 1975). "LL-Regular Grammars". Information Processing Letters . 4 (2): 31–37. doi:10.1016/0020-0190(75)90009-5.
↑ David A. Poplawski (Aug 1977). Properties of LL-Regular Languages (Technical Report). Purdue University, Department of Computer Science.
↑ Parr, Terence and Fisher, Kathleen (2011). "LL (*) the foundation of the ANTLR parser generator". ACM SIGPLAN Notices. 46 (6): 425–436. doi:10.1145/1993316.1993548.{{cite journal}}: CS1 maint: multiple names: authors list (link)
↑ Belcak, Peter (2020). "The LL(finite) parsing strategy for optimal LL(k) parsing". arXiv: 2010.07874 [cs.PL].
↑ Ford, Bryan (2004). "Parsing Expression Grammars: A Recognition-Based Syntactic Foundation". ACM SIGPLAN Notices. doi:10.1145/982962.964011.
↑ Pat Terry (2005). Compiling with C# and Java. Pearson Education. pp. 159–164. ISBN 9780321263605.
↑ William M. Waite and Gerhard Goos (1984). Compiler Construction. Texts and Monographs in Computer Science. Heidelberg: Springer. ISBN 978-3-540-90821-0. Here: Sect. 5.3.2, p. 121-127; in particular, p. 123.
↑ Richard E. Stearns and P.M. Lewis (1969). "Property Grammars and Table Machines". Information and Control . 14 (6): 524–549. doi: 10.1016/S0019-9958(69)90312-X .
↑ Fritzson, Peter A. (23 March 1994). Compiler Construction: 5th International Conference, CC '94, Edinburgh, U.K., April 7 - 9, 1994. Proceedings. Springer Science & Business Media. ISBN 978-3-540-57877-2.
↑ "LL Grammars" (PDF). Archived (PDF) from the original on 2010-06-18. Retrieved 2010-05-11.
↑ Modern Compiler Design, Grune, Bal, Jacobs and Langendoen

External links

A tutorial on implementing LL(1) parsers in C# (archived)
Parsing Simulator This simulator is used to generate parsing tables LL(1) and to resolve the exercises of the book.
Language theoretic comparison of LL and LR grammars
LL(k) Parsing Theory

Related Research Articles

<span class="mw-page-title-main">Chomsky hierarchy</span> Hierarchy of classes of formal grammars

The Chomsky hierarchy in the fields of formal language theory, computer science, and linguistics, is a containment hierarchy of classes of formal grammars. A formal grammar describes how to form strings from a language's vocabulary that are valid according to the language's syntax. Linguist Noam Chomsky theorized that four different classes of formal grammars existed that could generate increasingly complex languages. Each class can also completely generate the language of all inferior classes.

In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules can be applied to a nonterminal symbol regardless of its context. In particular, in a context-free grammar, each production rule is of the form

In formal language theory, a context-free grammar, G, is said to be in Chomsky normal form if all of its production rules are of the form:

In computer science, the Earley parser is an algorithm for parsing strings that belong to a given context-free language, though it may suffer problems with certain nullable grammars. The algorithm, named after its inventor, Jay Earley, is a chart parser that uses dynamic programming; it is mainly used for parsing in computational linguistics. It was first introduced in his dissertation in 1968.

In computer science, an LALR parser is part of the compiling process where human readable text is converted into a structured representation to be read by computers. An LALR parser is a software tool to process (parse) text into a very specific internal representation that other programs, such as compilers, can work with. This process happens according to a set of production rules specified by a formal grammar for a computer language.

In computer science, LR parsers are a type of bottom-up parser that analyse deterministic context-free languages in linear time. There are several variants of LR parsers: SLR parsers, LALR parsers, canonical LR(1) parsers, minimal LR(1) parsers, and generalized LR parsers. LR parsers can be generated by a parser generator from a formal grammar defining the syntax of the language to be parsed. They are widely used for the processing of computer languages.

In computer science, a Simple LR or SLR parser is a type of LR parser with small parse tables and a relatively simple parser generator algorithm. As with other types of LR(1) parser, an SLR parser is quite efficient at finding the single correct bottom-up parse in a single left-to-right scan over the input stream, without guesswork or backtracking. The parser is mechanically generated from a formal grammar for the language.

A canonical LR parser is a type of bottom-up parsing algorithm used in computer science to analyze and process programming languages. It is based on the LR parsing technique, which stands for "left-to-right, rightmost derivation in reverse."

Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part.

Top-down parsing in computer science is a parsing strategy where one first looks at the highest level of the parse tree and works down the parse tree by using the rewriting rules of a formal grammar. LL parsers are a type of parser that uses a top-down parsing strategy.

Top-Down Parsing Language (TDPL) is a type of analytic formal grammar developed by Alexander Birman in the early 1970s in order to study formally the behavior of a common class of practical top-down parsers that support a limited form of backtracking. Birman originally named his formalism the TMG Schema (TS), after TMG, an early parser generator, but it was later given the name TDPL by Aho and Ullman in their classic anthology The Theory of Parsing, Translation and Compiling.

In computer science, a parsing expression grammar (PEG) is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in 2004 and is closely related to the family of top-down parsing languages introduced in the early 1970s. Syntactically, PEGs also look similar to context-free grammars (CFGs), but they have a different interpretation: the choice operator selects the first match in PEG, while it is ambiguous in CFG. This is closer to how string recognition tends to be done in practice, e.g. by a recursive descent parser.

The Packrat parser is a type of parser that shares similarities with the recursive descent parser in its construction. However, it differs because it takes parsing expression grammars (PEGs) as input rather than LL grammars.

In the formal language theory of computer science, left recursion is a special case of recursion where a string is recognized as part of a language by the fact that it decomposes into a string from that same language and a suffix. For instance, $can be recognized as a sum because it can be broken into, also a sum, and, a suitable suffix.$

In computer science, a simple precedence parser is a type of bottom-up parser for context-free grammars that can be used only by simple precedence grammars.

Indexed grammars are a generalization of context-free grammars in that nonterminals are equipped with lists of flags, or index symbols. The language produced by an indexed grammar is called an indexed language.

A formal grammar describes which strings from an alphabet of a formal language are valid according to the language's syntax. A grammar does not describe the meaning of the strings or what can be done with them in whatever context—only their form. A formal grammar is defined as a set of production rules for such strings in a formal language.

SLR grammars are the class of formal grammars accepted by a Simple LR parser. SLR grammars are a superset of all LR(0) grammars and a subset of all LALR(1) and LR(1) grammars.

In formal language theory, an LL grammar is a context-free grammar that can be parsed by an LL parser, which parses the input from Left to right, and constructs a Leftmost derivation of the sentence. A language that has an LL grammar is known as an LL language. These form subsets of deterministic context-free grammars (DCFGs) and deterministic context-free languages (DCFLs), respectively. One says that a given grammar or language "is an LL grammar/language" or simply "is LL" to indicate that it is in this class.

A shift-reduce parser is a class of efficient, table-driven bottom-up parsing methods for computer languages and other notations formally defined by a grammar. The parsing methods most commonly used for parsing programming languages, LR parsing and its variations, are shift-reduce methods. The precedence parsers used before the invention of LR parsing are also shift-reduce methods. All shift-reduce parsers have similar outward effects, in the incremental order in which they build a parse tree or call specific output actions.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Rosenkrantz, D. J.; Stearns, R. E. (1970). "Properties of Deterministic Top Down Grammars". Information and Control. 17 (3): 226–256. doi: 10.1016/s0019-9958(70)90446-8 .

[2] Jarzabek, Stanislav; Krawczyk, Tomasz (1974). "LL-Regular Grammars". Instytutu Maszyn Matematycznych: 107–119.

[3] Jarzabek, Stanislav; Krawczyk, Tomasz (Nov 1975). "LL-Regular Grammars". Information Processing Letters . 4 (2): 31–37. doi:10.1016/0020-0190(75)90009-5.

[4] David A. Poplawski (Aug 1977). Properties of LL-Regular Languages (Technical Report). Purdue University, Department of Computer Science.

[5] Parr, Terence and Fisher, Kathleen (2011). "LL (*) the foundation of the ANTLR parser generator". ACM SIGPLAN Notices. 46 (6): 425–436. doi:10.1145/1993316.1993548.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[6] Belcak, Peter (2020). "The LL(finite) parsing strategy for optimal LL(k) parsing". arXiv: 2010.07874 [cs.PL].

[7] Ford, Bryan (2004). "Parsing Expression Grammars: A Recognition-Based Syntactic Foundation". ACM SIGPLAN Notices. doi:10.1145/982962.964011.

[8] Pat Terry (2005). Compiling with C# and Java. Pearson Education. pp. 159–164. ISBN 9780321263605.

[9] William M. Waite and Gerhard Goos (1984). Compiler Construction. Texts and Monographs in Computer Science. Heidelberg: Springer. ISBN 978-3-540-90821-0. Here: Sect. 5.3.2, p. 121-127; in particular, p. 123.

[10] Richard E. Stearns and P.M. Lewis (1969). "Property Grammars and Table Machines". Information and Control . 14 (6): 524–549. doi: 10.1016/S0019-9958(69)90312-X .

[11] Fritzson, Peter A. (23 March 1994). Compiler Construction: 5th International Conference, CC '94, Edinburgh, U.K., April 7 - 9, 1994. Proceedings. Springer Science & Business Media. ISBN 978-3-540-57877-2.

[12] "LL Grammars" (PDF). Archived (PDF) from the original on 2010-06-18. Retrieved 2010-05-11.

[13] Modern Compiler Design, Grune, Bal, Jacobs and Langendoen

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

v t e Parsing algorithms
Top-down	Earley LL Recursive descent Tail recursive
Bottom-up	Precedence Simple Operator Shunting-yard LR Simple Look-ahead Canonical Generalized CYK Recursive ascent Shift-reduce
Mixed, other	Combinator Chart Left corner Statistical
Related topics	PEG Definite clause grammar Deterministic parsing Dynamic programming Memoization Parser generator LALR Parse tree AST Scannerless parsing History of compiler construction Comparison of parser generators Operator-precedence grammar