Packrat parser

Packrat parser
Class	Parsing grammars that are PEG
Data structure	String
Worst-case performance	or without special handling of iterative combinator
Best-case performance	;
Average performance
Worst-case space complexity

Last updated May 21, 2024

The Packrat parser is a type of parser that shares similarities with the recursive descent parser in its construction. However, it differs because it takes parsing expression grammars (PEGs) as input rather than LL grammars.^[1]

In 1970, Alexander Birman laid the groundwork for packrat parsing by introducing the "TMG recognition scheme" (TS), and "generalized TS" (gTS). TS was based upon Robert M. McClure's TMG compiler-compiler, and gTS was based upon Dewey Val Schorre's META compiler-compiler. Birman's work was later refined by Aho and Ullman; and renamed as Top-Down Parsing Language (TDPL), and Generalized TDPL (GTDPL), respectively. These algorithms were the first of their kind to employ deterministic top-down parsing with backtracking.^[2]^[3]

Bryan Ford developed PEGs as an expansion of GTDPL and TS. Unlike CFGs, PEGs are unambiguous and can match well with machine-oriented languages. PEGs, similar to GTDPL and TS, can also express all LL(k) and LR(k). Bryan also introduced Packrat as a parser that uses memoization techniques on top of a simple PEG parser. This was done because PEGs have an unlimited lookahead capability resulting in a parser with exponential time performance in the worst case.^[2]^[3]

Packrat keeps track of the intermediate results for all mutually recursive parsing functions. Each parsing function is only called once at a specific input position. In some instances of packrat implementation, if there is insufficient memory, certain parsing functions may need to be called multiple times at the same input position, causing the parser to take longer than linear time.^[4]

Syntax

The packrat parser takes in input the same syntax as PEGs: a simple PEG is composed of terminal and nonterminal symbols, possibly interleaved with operators that compose one or several derivation rules.^[2]

Symbols

Nonterminal symbols are indicated with capital letters (e.g., $\{S,E,F,D\}$ )
Terminal symbols are indicated with lowercase (e.g., $\{a,b,z,e,g\}$ )
Expressions are indicated with lower-case Greek letter (e.g., $\{\alpha ,\beta ,\gamma ,\omega ,\tau \}$ $Packrat parser$ )
- Expressions can be a mix of terminal symbols, nonterminal symbols and operators

Operators

Syntax Rules
Operator	Semantics
Sequence $\alpha \beta$	Success: If $\alpha$ and $\beta$ are recognized Failure: If $\alpha$ or $\beta$ are not recognized Consumed: $\alpha$ and $\beta$ in case of success
Ordered choice $\alpha /\beta /\gamma$	Success: If any of $\{\alpha ,\beta ,\gamma \}$ is recognized starting from the left Failure: All of $\{\alpha ,\beta ,\gamma \}$ do not match Consumed: The atomic expression that has generated a success so if multiple succeed the first one is always returned
And predicate $\&\alpha$	Success: If $\alpha$ is recognized Failure: If $\alpha$ is not recognized Consumed: No input is consumed
Not predicate ${\displaystyle$	Success: If $\alpha$ is not recognized Failure: If $\alpha$ is recognized Consumed: No input is consumed
One or more $\alpha +$	Success: Try to recognize $\alpha$ one or multiple time Failure: If $\alpha$ is not recognized Consumed: The maximum number that $\alpha$ is recognized
Zero or more $\alpha *$	Success: Try to recognize $\alpha$ zero or multiple time Failure: Cannot fail Consumed: The maximum number that $\alpha$ is recognized
Zero or one ${\displaystyle \alpha$	Success: Try to recognize $\alpha$ zero or once Failure: Cannot fail Consumed: $\alpha$ if it is recognized
Terminal range [ $a-b$ ]	Success: Recognize any terminal $c$ that are inside the range $[a-b]$ . In the case of $[{\textbf {'}}h{\textbf {'}}-{\textbf {'}}z{\textbf {'}}]$ , $c$ can be any letter from h to z Failure: If no terminal inside of $[a-b]$ can be recognized Consumed: $c$ if it is recognized
Any character $.$	Success: Recognize any character in the input Failure: If no character in the input Consumed: any character in the input

Rules

A derivation rule is composed by a nonterminal symbol and an expression $S\rightarrow \alpha$ .

A special expression $\alpha _{s}$ is the starting point of the grammar.^[2] In case no $\alpha _{s}$ is specified, the first expression of the first rule is used.

An input string is considered accepted by the parser if the $\alpha _{s}$ is recognized. As a side-effect, a string $x$ can be recognized by the parser even if it was not fully consumed.^[2]

An extreme case of this rule is that the grammar $S\rightarrow x*$ matches any string.

This can be avoided by rewriting the grammar as $S\rightarrow x*!.$

Example

${\begin{cases}S\rightarrow A/B/D\\A\rightarrow {\textbf {'a'}}\ S\ {\textbf {'a'}}\\B\rightarrow {\textbf {'b'}}\ S\ {\textbf {'b'}}\\D\rightarrow ({\textbf {'0'}}-{\textbf {'9'}})?\end{cases}}$

This grammar recognizes a palindrome over the alphabet $\{a,b\}$ , with an optional digit in the middle.

Example strings accepted by the grammar include: ${\textbf {'aa'}}$ and ${\textbf {'aba3aba'}}$ .

Left recursion

Left recursion happens when a grammar production refers to itself as its left-most element, either directly or indirectly. Since Packrat is a recursive descent parser, it cannot handle left recursion directly.^[5] During the early stages of development, it was found that a production that is left-recursive can be transformed into a right-recursive production.^[6] This modification significantly simplifies the task of a Packrat parser. Nonetheless, if there is an indirect left recursion involved, the process of rewriting can be quite complex and challenging. If the time complexity requirements are loosened from linear to superlinear, it is possible to modify the memoization table of a Packrat parser to permit left recursion, without altering the input grammar.^[5]

Iterative combinator

The iterative combinator $\alpha +$ , $\alpha *$ , needs special attention when used in a Packrat parser. As a matter of fact, the use of iterative combinators introduces a secret recursion that does not record intermediate results in the outcome matrix. This can lead to the parser operating with a superlinear behaviour. This problem can be resolved apply the following transformation:^[1]


Original	Translated
$S\rightarrow \alpha +$	$S\rightarrow \alpha S/\alpha$
$S\rightarrow \alpha *$	$S\rightarrow \alpha S/\epsilon$

With this transformation, the intermediate results can be properly memoized.

Memoization technique

Memoization is an optimization technique in computing that aims to speed up programs by storing the results of expensive function calls. This technique essentially works by caching the results so that when the same inputs occur again, the cached result is simply returned, thus avoiding the time-consuming process of re-computing.^[7] When using packrat parsing and memoization, it's noteworthy that the parsing function for each nonterminal is solely based on the input string. It does not depend on any information gathered during the parsing process. Essentially, memoization table entries do not affect or rely on the parser's specific state at any given time.^[8]

Packrat parsing stores results in a matrix or similar data structure that allows for quick look-ups and insertions. When a production is encountered, the matrix is checked to see if it has already occurred. If it has, the result is retrieved from the matrix. If not, the production is evaluated, the result is inserted into the matrix, and then returned.^[9] When evaluating the entire $m*n$ matrix in a tabular approach, it would require $\Theta (mn)$ space.^[9] Here, $m$ represents the number of nonterminals, and $n$ represents the input string size.

In a naïve implementation, the entire table can be derived from the input string starting from the end of the string.

The Packrat parser can be improved to update only the necessary cells in the matrix through a depth-first visit of each subexpression tree. Consequently, using a matrix with dimensions of $m*n$ is often wasteful, as most entries would remain empty.^[5] These cells are linked to the input string, not to the nonterminals of the grammar. This means that increasing the input string size would always increase memory consumption, while the number of parsing rules changes only the worst space complexity.^[1]

Cut operator

Another operator called cut has been introduced to Packrat to reduce its average space complexity even further. This operator utilizes the formal structures of many programming languages to eliminate impossible derivations. For instance, control statements parsing in a standard programming language is mutually exclusive from the first recognized token, e.g., $\{if,do,while,switch\}$ .^[10]

Operator	Semantics
Cut ${\begin{array}{l}\alpha \uparrow \beta /\gamma \\(\alpha \uparrow \beta )*\end{array}}$	if $\alpha$ is recognized but $\beta$ is not, skip the evaluation of the alternative. In the first case don't evaluate $\gamma$ if $\alpha$ was recognized The second rule is can be rewritten as $N\rightarrow \alpha \uparrow \beta N/\epsilon$ and the same rules can be applied.

When a Packrat parser uses cut operators, it effectively clears its backtracking stack. This is because a cut operator reduces the number of possible alternatives in an ordered choice. By adding cut operators in the right places in a grammar's definition, the resulting Packrat parser only needs a nearly constant amount of space for memoization.^[10]

The algorithm

Sketch of an implementation of a Packrat algorithm in a Lua-like pseudocode.^[5]

INPUT(n)-- return the character at position nRULE(R:Rule,P:Position)entry=GET_MEMO(R,P)-- return the number of elements previously matched in rule R at position Pifentry==nilthenreturnEVAL(R,P);endreturnentry;EVAL(R:Rule,P:Position)start=P;forchoiceinR.choices-- Return a list of choiceacc=0;forsymbolinchoicethen-- Return each element of a rule, terminal and nonterminalifsymbol.is_terminalthenifINPUT(start+acc)==symbol.terminalthenacc=acc+1;--Found correct terminal skip pass itelsebreak;endelseres=RULE(symbol.nonterminal,start+acc);-- try to recognize a nonterminal in position start+accSET_MEMO(symbol.nonterminal,start+acc,res);-- we memoize also the failure with special value failifres==failthenbreak;endacc=acc+res;endifsymbol==choice.last-- check if we have matched the last symbol in a choice if so returnreturnacc;endendreturnfail;--if no choice match return fail

Example

Given the following context, a free grammar that recognizes simple arithmetic expressions composed of single digits interleaved by sum, multiplication, and parenthesis.

${\begin{cases}S\rightarrow A\\A\rightarrow M\ {\textbf {'+'}}\ A\ /\ M\\M\rightarrow P\ {\textbf {'*'}}\ M\ /\ P\\P\rightarrow {\textbf {'('}}\ A\ {\textbf {')'}}\ /\ D\\D\rightarrow ({\textbf {'0'}}-{\textbf {'9'}})\end{cases}}$

Denoted with $\dashv$ the line terminator we can apply the packrat algorithm

Derivation of

2*(3+4)\dashv

Syntax tree

Action

Packrat Table

Derivation Rules	Input shifted
${\begin{array}{l}S\rightarrow A\\A\rightarrow M\ {\textbf {'+'}}\ A\\M\rightarrow P\ {\textbf {'*'}}\ M\\P\rightarrow {\textbf {'('}}\ A\ {\textbf {')'}}\end{array}}$	ɛ
Notes	Input left
Input doesn't match the first element in the derivation. Backtrack to the first grammar rule with unexplored alternative ${\textstyle P\rightarrow {\textbf {'('}}\ A\ {\textbf {')'}}\ /\ {\underline {D}}}$	$2*(3+4)\dashv$


	Index
	1	2	3	4	5	6	7
S
A
M
P
D
	2	*	(	3	+	4	)

No update because no terminal was recognized

Derivation Rules	Input shifted
$P\rightarrow D$ $D\rightarrow 2$	$2$
Notes	Input left
Shift input by one after deriving terminal $2$	$*(3+4)\dashv$

	Index
	1	2	3	4	5	6	7
S
A
M
P	1
D	1
	2	*	(	3	+	4	)

Update:

D(1) = 1;

P(1) = 1;

Derivation Rules	Input shifted
$M\rightarrow P\ {\textbf {'*'}}\ M$ $P\rightarrow {\textbf {'('}}\ A\ {\textbf {')'}}$	$2*($
Notes	Input left
Shift input by two terminal $\{{\textbf {*}},{\textbf {(}}\}$	$3+4)\dashv$

	Index
	1	2	3	4	5	6	7
S
A
M
P	1
D	1
	2	*	(	3	+	4	)

No update because no nonterminal was fully recognized

Derivation Rules	Input shifted
$A\rightarrow M\ {\textbf {'+'}}\ A$ $M\rightarrow P\ {\textbf {'*'}}\ M$ $P\rightarrow {\textbf {'('}}\ A\ {\textbf {')'}}$	$2*($
Notes	Input left
Input doesn't match the first element in the derivation. Backtrack to the first grammar rule with unexplored alternative ${\textstyle P\rightarrow {\textbf {'('}}\ A\ {\textbf {')'}}\ /\ {\underline {D}}}$	$3+4)\dashv$

	Index
	1	2	3	4	5	6	7
S
A
M
P	1
D	1
	2	*	(	3	+	4	)

No update because no terminal was recognized

Derivation Rules	Input shifted
$P\rightarrow D$ $D\rightarrow 3$	$2*($
Notes	Input left
Shift input by one after deriving terminal $3$ but the new input will not match inside $M\rightarrow P\ {\textbf {''}}\ M$ so an unroll is necessary to $M\rightarrow P\ {\textbf {''}}\ M\ /\ {\underline {P}}$	$3+4)\dashv$

	Index
	1	2	3	4	5	6	7
S
A
M
P	1			1
D	1			1
	2	*	(	3	+	4	)

Update:

D(4) = 1;

P(4) = 1;

Derivation Rules	Input shifted
$M\rightarrow P$	$2*(3+$
Notes	Input left
Roll Back to $M\rightarrow P\ {\textbf {'*'}}\ M\ /\ {\underline {P}}$ And we don't expand it has we have an hit in the memoization table P(4) ≠ 0 so shift the input by P(4). Shift also the $+$ from $A\rightarrow M\ {\textbf {'+'}}\ A$	$4)\dashv$

	Index
	1	2	3	4	5	6	7
S
A
M				1
P	1			1
D	1			1
	2	*	(	3	+	4	)

Hit on P(4)

Update M(4) = 1 as M was recognized

Derivation Rules	Input shifted
$A\rightarrow M\ {\textbf {'+'}}\ A$ $M\rightarrow P\ {\textbf {'*'}}\ M$ $P\rightarrow {\textbf {'('}}\ A\ {\textbf {')'}}$	$2*(3+$
Notes	Input left
Input doesn't match the first element in the derivation. Backtrack to the first grammar rule with unexplored alternative ${\textstyle P\rightarrow {\textbf {'('}}\ A\ {\textbf {')'}}\ /\ {\underline {D}}}$	$4)\dashv$

	Index
	1	2	3	4	5	6	7
S
A
M				1
P	1			1
D	1			1
	2	*	(	3	+	4	)

No update because no terminal was recognized

Derivation Rules	Input shifted
$P\rightarrow D$ $D\rightarrow 4$	$2*(3+$
Notes	Input left
Shift input by one after deriving terminal $4$ but the new input will not match inside $M\rightarrow P\ {\textbf {'*'}}\ M$ so an unroll is necessary	$4)\dashv$

	Index
	1	2	3	4	5	6	7
S
A
M				1
P	1			1		1
D	1			1		1
	2	*	(	3	+	4	)

Update:

D(6) = 1;

P(6) = 1;

Derivation Rules	Input shifted
$M\rightarrow P$	$2*(3+$
Notes	Input left
Roll Back to $M\rightarrow P\ {\textbf {'*'}}\ M\ /\ {\underline {P}}$ And we don't expand it has we have an hit in the memoization table P(6) ≠ 0 so shift the input by P(6). but the new input will not match $+$ inside $A\rightarrow M\ {\textbf {'+'}}\ A$ so an unroll is necessary	$4)\dashv$

	Index
	1	2	3	4	5	6	7
S
A
M				1		1
P	1			1		1
D	1			1		1
	2	*	(	3	+	4	)

Hit on P(6)

Update M(6) = 1 as M was recognized

Derivation Rules	Input shifted
$A\rightarrow M$	$2*(3+4)$
Notes	Input left
Roll Back to $A\rightarrow M\ {\textbf {'+'}}\ A\ /\ {\underline {M}}$ And we don't expand it has we have an hit in the memoization table M(6) ≠ 0 so shift the input by M(6). Also shift $)$ from $P\rightarrow {\textbf {'('}}\ A\ {\textbf {')'}}$	$\dashv$

	Index
	1	2	3	4	5	6	7
S
A				3
M				1		1
P	1		5	1		1
D	1			1		1
	2	*	(	3	+	4	)

Hit on M(6)

Update A(4) = 3 as A was recognized

Update P(3)=5 as P was recognized

Derivation Rules	Input shifted
	$2*$
Notes	Input left
Roll Back to $M\rightarrow P\ {\textbf {''}}\ M\ /\ {\underline {P}}$ as terminal $\neq \dashv$	$(3+4)\dashv$

	Index
	1	2	3	4	5	6	7
S
A				3
M				1		1
P	1		5	1		1
D	1			1		1
	2	*	(	3	+	4	)

No update because no terminal was recognized

Derivation Rules	Input shifted
$M\rightarrow P$	$2*(3+4)$
Notes	Input left
we don't expand it as we have a hit in the memoization table P(3) ≠ 0, so shift the input by P(3).	$\dashv$

	Index
	1	2	3	4	5	6	7
S
A				3
M	7			1		1
P	1		5	1		1
D	1			1		1
	2	*	(	3	+	4	)

Hit on P(3)

Update M(1)=7 as M was recognized

Derivation Rules	Input shifted

Notes	Input left
Roll Back to $A\rightarrow M\ {\textbf {'+'}}\ A\ /\ {\underline {M}}$ as terminal $+\neq \dashv$	$2*(3+4)\dashv$

	Index
	1	2	3	4	5	6	7
S
A				3
M	7			1		1
P	1		5	1		1
D	1			1		1
	2	*	(	3	+	4	)

No update because no terminal was recognized

Derivation Rules	Input shifted
$A\rightarrow M$	$2*(3+4)\dashv$
Notes	Input left
We don't expand it as we have a hit in the memoization table M(1) ≠ 0, so shift the input by M(1). S was totally reduced, so the input string is recognized.

	Index
	1	2	3	4	5	6	7
S	7
A	7			3
M	7			1		1
P	1		5	1		1
D	1			1		1
	2	*	(	3	+	4	)

Hit on M(1)

Update A(1)=7 as A was recognized

Update S(1)=7 as S was recognized

Implementation

Name	Parsing algorithm	Output languages	Grammar, code	Development platform	License
AustenX	Packrat (modified)	Java	Separate	All	Free, BSD
Aurochs	Packrat	C, OCaml, Java	Mixed	All	Free, GNU GPL
Canopy	Packrat	Java, JavaScript, Python, Ruby	Separate	All	Free, GNU GPL
CL-peg	Packrat	Common Lisp	Mixed	All	Free, MIT
Drat!	Packrat	D	Mixed	All	Free, GNU GPL
Frisby	Packrat	Haskell	Mixed	All	Free, BSD
grammar::peg	Packrat	Tcl	Mixed	All	Free, BSD
IronMeta	Packrat	C#	Mixed	Windows	Free, BSD
lars::Parser	Packrat (supporting left-recursion and grammar ambiguity)	C++	Identical	All	Free, BSD
Narwhal	Packrat	C	Mixed	POSIX, Windows	Free, BSD
neotoma	Packrat	Erlang	Separate	All	Free, MIT
OMeta	Packrat (modified, partial memoization)	JavaScript, Squeak, Python	Mixed	All	Free, MIT
PackCC	Packrat (modified, left-recursion support)	C	Mixed	All	Free, MIT
Packrat	Packrat	Scheme	Mixed	All	Free, MIT
Pappy	Packrat	Haskell	Mixed	All	Free, BSD
Parsnip	Packrat	C++	Mixed	Windows	Free, GNU GPL
PEG.js	Packrat (partial memoization)	JavaScript	Mixed	All	Free, MIT
Peggy^[11]	Packrat (partial memoization)	JavaScript	Mixed	All	Free, MIT
Pegasus	Recursive descent, Packrat (selectively)	C#	Mixed	Windows	Free, MIT
PetitParser	Packrat	Smalltalk, Java, Dart	Mixed	All	Free, MIT
PyPy rlib	Packrat	Python	Mixed	All	Free, MIT
Rats!	Packrat	Java	Mixed	Java virtual machine	Free, GNU LGPL
go-packrat	Packrat	Go	Identical	All	GPLv3

Related Research Articles

<span class="mw-page-title-main">Chomsky hierarchy</span> Hierarchy of classes of formal grammars

The Chomsky hierarchy in the fields of formal language theory, computer science, and linguistics, is a containment hierarchy of classes of formal grammars. A formal grammar describes how to form strings from a language's vocabulary that are valid according to the language's syntax. The linguist Noam Chomsky theorized that four different classes of formal grammars existed that could generate increasingly complex languages. Each class can also completely generate the language of all inferior classes.

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols. Context-sensitive grammars are more general than context-free grammars, in the sense that there are languages that can be described by a CSG but not by a context-free grammar. Context-sensitive grammars are less general than unrestricted grammars. Thus, CSGs are positioned between context-free and unrestricted grammars in the Chomsky hierarchy.

In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules can be applied to a nonterminal symbol regardless of its context. In particular, in a context-free grammar, each production rule is of the form

In computer science, the Earley parser is an algorithm for parsing strings that belong to a given context-free language, though it may suffer problems with certain nullable grammars. The algorithm, named after its inventor, Jay Earley, is a chart parser that uses dynamic programming; it is mainly used for parsing in computational linguistics. It was first introduced in his dissertation in 1968.

In computer science, the Cocke–Younger–Kasami algorithm is a parsing algorithm for context-free grammars published by Itiroo Sakai in 1961. The algorithm is named after some of its rediscoverers: John Cocke, Daniel Younger, Tadao Kasami, and Jacob T. Schwartz. It employs bottom-up parsing and dynamic programming.

In computer science, an LL parser is a top-down parser for a restricted context-free language. It parses the input from Left to right, performing Leftmost derivation of the sentence.

In computer science, a recursive descent parser is a kind of top-down parser built from a set of mutually recursive procedures where each such procedure implements one of the nonterminals of the grammar. Thus the structure of the resulting program closely mirrors that of the grammar it recognizes.

Top-down parsing in computer science is a parsing strategy where one first looks at the highest level of the parse tree and works down the parse tree by using the rewriting rules of a formal grammar. LL parsers are a type of parser that uses a top-down parsing strategy.

An attribute grammar is a formal way to supplement a formal grammar with semantic information processing. Semantic information is stored in attributes associated with terminal and nonterminal symbols of the grammar. The values of attributes are the result of attribute evaluation rules associated with productions of the grammar. Attributes allow the transfer of information from anywhere in the abstract syntax tree to anywhere else, in a controlled and formal way.

In computing, memoization or memoisation is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls to pure functions and returning the cached result when the same inputs occur again. Memoization has also been used in other contexts, such as in simple mutually recursive descent parsing. It is a type of caching, distinct from other forms of caching such as buffering and page replacement. In the context of some logic programming languages, memoization is also known as tabling.

Top-Down Parsing Language (TDPL) is a type of analytic formal grammar developed by Alexander Birman in the early 1970s in order to study formally the behavior of a common class of practical top-down parsers that support a limited form of backtracking. Birman originally named his formalism the TMG Schema (TS), after TMG, an early parser generator, but it was later given the name TDPL by Aho and Ullman in their classic anthology The Theory of Parsing, Translation and Compiling.

In computer science, a parsing expression grammar (PEG) is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in 2004 and is closely related to the family of top-down parsing languages introduced in the early 1970s. Syntactically, PEGs also look similar to context-free grammars (CFGs), but they have a different interpretation: the choice operator selects the first match in PEG, while it is ambiguous in CFG. This is closer to how string recognition tends to be done in practice, e.g. by a recursive descent parser.

In the formal language theory of computer science, left recursion is a special case of recursion where a string is recognized as part of a language by the fact that it decomposes into a string from that same language and a suffix. For instance, $can be recognized as a sum because it can be broken into, also a sum, and, a suitable suffix.$

Conjunctive grammars are a class of formal grammars studied in formal language theory. They extend the basic type of grammars, the context-free grammars, with a conjunction operation. Besides explicit conjunction, conjunctive grammars allow implicit disjunction represented by multiple rules for a single nonterminal symbol, which is the only logical connective expressible in context-free grammars. Conjunction can be used, in particular, to specify intersection of languages. A further extension of conjunctive grammars known as Boolean grammars additionally allows explicit negation.

In formal languages, terminal and nonterminal symbols are the lexical elements used in specifying the production rules constituting a formal grammar. Terminal symbols are the elementary symbols of the language defined as part of a formal grammar. Nonterminal symbols are replaced by groups of terminal symbols according to the production rules.

In computer programming, a parser combinator is a higher-order function that accepts several parsers as input and returns a new parser as its output. In this context, a parser is a function accepting strings as input and returning some structure as output, typically a parse tree or a set of indices representing locations in the string where parsing stopped successfully. Parser combinators enable a recursive descent parsing strategy that facilitates modular piecewise construction and testing. This parsing technique is called combinatory parsing.

A formal grammar describes which strings from an alphabet of a formal language are valid according to the language's syntax. A grammar does not describe the meaning of the strings or what can be done with them in whatever context—only their form. A formal grammar is defined as a set of production rules for such strings in a formal language.

In formal language theory, an LL grammar is a context-free grammar that can be parsed by an LL parser, which parses the input from Left to right, and constructs a Leftmost derivation of the sentence. A language that has an LL grammar is known as an LL language. These form subsets of deterministic context-free grammars (DCFGs) and deterministic context-free languages (DCFLs), respectively. One says that a given grammar or language "is an LL grammar/language" or simply "is LL" to indicate that it is in this class.

A Hindley–Milner (HM) type system is a classical type system for the lambda calculus with parametric polymorphism. It is also known as Damas–Milner or Damas–Hindley–Milner. It was first described by J. Roger Hindley and later rediscovered by Robin Milner. Luis Damas contributed a close formal analysis and proof of the method in his PhD thesis.

PackCC is a parser generator for C. Its main features are as follows:

References

1 2 3 Ford, Bryan (2006). "Packrat Parsing: Simple, Powerful, Lazy, Linear Time". arXiv: cs/0603077 .
1 2 3 4 5 Ford, Bryan (2004-01-01). "Parsing expression grammars". Proceedings of the 31st ACM SIGPLAN-SIGACT symposium on Principles of programming languages. POPL '04. New York, NY, USA: Association for Computing Machinery. pp. 111–122. doi:10.1145/964001.964011. ISBN 978-1-58113-729-3. S2CID 7762102.
1 2 Flodin, Daniel. "A Comparison Between Packrat Parsing and Conventional Shift-Reduce Parsing on Real-World Grammars and Inputs" (PDF).
↑ Mizushima, Kota; Maeda, Atusi; Yamaguchi, Yoshinori (2010-05-06). "Packrat parsers can handle practical grammars in mostly constant space". Proceedings of the 9th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering. ACM. pp. 29–36. doi:10.1145/1806672.1806679. ISBN 978-1-4503-0082-7. S2CID 14498865.
1 2 3 4 Warth, Alessandro; Douglass, James R.; Millstein, Todd (2008-01-07). "Packrat parsers can support left recursion". Proceedings of the 2008 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation. PEPM '08. New York, NY, USA: Association for Computing Machinery. pp. 103–110. doi:10.1145/1328408.1328424. ISBN 978-1-59593-977-7. S2CID 2168153.
↑ Aho, Alfred V.; Lam, Monica S.; Sethi, Ravi; Ullman, Jeffrey D., eds. (2007). Compilers: principles, techniques, & tools (2nd ed.). Boston Munich: Pearson Addison-Wesley. ISBN 978-0-321-48681-3.
↑ Norvig, Peter (1991-03-01). "Techniques for automatic memoization with applications to context-free parsing". Computational Linguistics. 17 (1): 91–98. ISSN 0891-2017.
↑ Dubroy, Patrick; Warth, Alessandro (2017-10-23). "Incremental packrat parsing". Proceedings of the 10th ACM SIGPLAN International Conference on Software Language Engineering. SLE 2017. New York, NY, USA: Association for Computing Machinery. pp. 14–25. doi:10.1145/3136014.3136022. ISBN 978-1-4503-5525-4. S2CID 13047585.
1 2 Science, International Journal of Scientific Research in; Ijsrset, Engineering and Technology. "A Survey of Packrat Parser". A Survey of Packrat Parser.
1 2 Mizushima, Kota; Maeda, Atusi; Yamaguchi, Yoshinori (2010-05-06). "Packrat parsers can handle practical grammars in mostly constant space". Proceedings of the 9th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering. PASTE '10. New York, NY, USA: Association for Computing Machinery. pp. 29–36. doi:10.1145/1806672.1806679. ISBN 978-1-4503-0082-7. S2CID 14498865.
↑ Maintained fork of PEG.js

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[:3-1] 1 2 3 Ford, Bryan (2006). "Packrat Parsing: Simple, Powerful, Lazy, Linear Time". arXiv: cs/0603077 .

[:1-2] 1 2 3 4 5 Ford, Bryan (2004-01-01). "Parsing expression grammars". Proceedings of the 31st ACM SIGPLAN-SIGACT symposium on Principles of programming languages. POPL '04. New York, NY, USA: Association for Computing Machinery. pp. 111–122. doi:10.1145/964001.964011. ISBN 978-1-58113-729-3. S2CID 7762102.

[:0-3] 1 2 Flodin, Daniel. "A Comparison Between Packrat Parsing and Conventional Shift-Reduce Parsing on Real-World Grammars and Inputs" (PDF).

[4] Mizushima, Kota; Maeda, Atusi; Yamaguchi, Yoshinori (2010-05-06). "Packrat parsers can handle practical grammars in mostly constant space". Proceedings of the 9th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering. ACM. pp. 29–36. doi:10.1145/1806672.1806679. ISBN 978-1-4503-0082-7. S2CID 14498865.

[:2-5] 1 2 3 4 Warth, Alessandro; Douglass, James R.; Millstein, Todd (2008-01-07). "Packrat parsers can support left recursion". Proceedings of the 2008 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation. PEPM '08. New York, NY, USA: Association for Computing Machinery. pp. 103–110. doi:10.1145/1328408.1328424. ISBN 978-1-59593-977-7. S2CID 2168153.

[6] Aho, Alfred V.; Lam, Monica S.; Sethi, Ravi; Ullman, Jeffrey D., eds. (2007). Compilers: principles, techniques, & tools (2nd ed.). Boston Munich: Pearson Addison-Wesley. ISBN 978-0-321-48681-3.

[7] Norvig, Peter (1991-03-01). "Techniques for automatic memoization with applications to context-free parsing". Computational Linguistics. 17 (1): 91–98. ISSN 0891-2017.

[8] Dubroy, Patrick; Warth, Alessandro (2017-10-23). "Incremental packrat parsing". Proceedings of the 10th ACM SIGPLAN International Conference on Software Language Engineering. SLE 2017. New York, NY, USA: Association for Computing Machinery. pp. 14–25. doi:10.1145/3136014.3136022. ISBN 978-1-4503-5525-4. S2CID 13047585.

[:4-9] 1 2 Science, International Journal of Scientific Research in; Ijsrset, Engineering and Technology. "A Survey of Packrat Parser". A Survey of Packrat Parser.

[:5-10] 1 2 Mizushima, Kota; Maeda, Atusi; Yamaguchi, Yoshinori (2010-05-06). "Packrat parsers can handle practical grammars in mostly constant space". Proceedings of the 9th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering. PASTE '10. New York, NY, USA: Association for Computing Machinery. pp. 29–36. doi:10.1145/1806672.1806679. ISBN 978-1-4503-0082-7. S2CID 14498865.

[11] Maintained fork of PEG.js

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

v t e Parsing algorithms
Top-down	Earley LL Recursive descent Tail recursive
Bottom-up	Precedence Simple Operator Shunting-yard LR Simple Look-ahead Canonical Generalized CYK Recursive ascent Shift-reduce
Mixed, other	Combinator Chart Left corner Statistical
Related topics	PEG Definite clause grammar Deterministic parsing Dynamic programming Memoization Parser generator LALR Parse tree AST Scannerless parsing History of compiler construction Comparison of parser generators Operator-precedence grammar

Packrat parser

Contents

Syntax

Symbols

Operators

Rules

Example

Left recursion

Iterative combinator

Memoization technique

Cut operator

The algorithm

Example

Implementation

See also

Related Research Articles

References

External links