Lemon (parser generator)

Last updated
Lemon
Developer(s) D. Richard Hipp
Written in C
Operating system Cross-platform
Type Parser generator
License Public domain
Website www.hwaci.com/sw/lemon/ OOjs UI icon edit-ltr-progressive.svg

Lemon is a parser generator, maintained as part of the SQLite project, that generates a look-ahead LR parser (LALR parser) in the programming language C from an input context-free grammar. The generator is quite simple, implemented in one C source file with another file used as a template for output. Lexical analysis is performed externally.

Contents

Lemon is similar to the programs Bison and Yacc, but is incompatible with both. The grammar input format is different, to help prevent common coding errors. Other distinctive features include a reentrant, thread-safe output parser, and the concept of non-terminal destructors that try to make it easier to avoid memory leaks.

SQLite uses Lemon with a hand-coded tokenizer to parse SQL strings.

Lemon, together with re2c and a re2c wrapper named Perplex, are used [1] [2] [3] in BRL-CAD as platform-agnostic and easily compilable alternatives to Flex and Bison. This combination is also used with STEPcode. [4]

OpenFOAM expression evaluation [5] uses a combination of ragel and a version of lemon that has been minimally modified [6] to ease C++ integration without affecting C integration. [7] The parser grammars are augmented with m4 macros.

Notes

  1. Brlcad; Carlmoore; Starseeker (2017-11-30). "BRL-CAD: The Lemon Parser Generator". SourceForge. Slashdot Media. Retrieved 2019-09-21.
  2. Bumbulis, Peter (2011-08-23). "Read Me". SourceForge. Slashdot Media. Retrieved 2019-09-21.
  3. Boerger, Marcus (2014-06-24). "Read Me". SourceForge. Slashdot Media. Retrieved 2019-09-21.
  4. "Read Me". STEPcode. GitHub. 2015. Archived from the original on 2018-04-10. Retrieved 2019-09-21.{{cite web}}: CS1 maint: unfit URL (link)
  5. "New expressions syntax". OpenFOAM. OpenCFD. 2019-12-23. Retrieved 2020-01-13.
  6. "wmake sources". OpenFOAM. OpenCFD. 2019-09-27. Retrieved 2020-01-13.
  7. "README". OpenFOAM. OpenCFD. 2019-09-27. Retrieved 2020-01-13.

Related Research Articles

In computer science, LR parsers are a type of bottom-up parser that analyse deterministic context-free languages in linear time. There are several variants of LR parsers: SLR parsers, LALR parsers, canonical LR(1) parsers, minimal LR(1) parsers, and generalized LR parsers. LR parsers can be generated by a parser generator from a formal grammar defining the syntax of the language to be parsed. They are widely used for the processing of computer languages.

Yacc is a computer program for the Unix operating system developed by Stephen C. Johnson. It is a lookahead left-to-right rightmost derivation (LALR) parser generator, generating a LALR parser based on a formal grammar, written in a notation similar to Backus–Naur form (BNF). Yacc is supplied as a standard utility on BSD and AT&T Unix. GNU-based Linux distributions include Bison, a forward-compatible Yacc replacement.

GNU Bison, commonly known as Bison, is a parser generator that is part of the GNU Project. Bison reads a specification in Bison syntax, warns about any parsing ambiguities, and generates a parser that reads sequences of tokens and decides whether the sequence conforms to the syntax specified by the grammar.

In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine.

Lexical tokenization is conversion of a text into meaningful lexical tokens belonging to categories defined by a "lexer" program. In case of a natural language, those categories include nouns, verbs, adjectives, punctuations etc. In case of a programming language, the categories include identifiers, operators, grouping symbols and data types. Lexical tokenization is not the same process as the probabilistic tokenization, used for a large language model's data preprocessing, that encodes text into numerical tokens, using byte pair encoding.

Lex is a computer program that generates lexical analyzers.

Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part.

Flex is a free and open-source software alternative to lex. It is a computer program that generates lexical analyzers . It is frequently used as the lex implementation together with Berkeley Yacc parser generator on BSD-derived operating systems, or together with GNU bison in *BSD ports and in Linux distributions. Unlike Bison, flex is not part of the GNU Project and is not released under the GNU General Public License, although a manual for Flex was produced and published by the Free Software Foundation.

JavaCC is an open-source parser generator and lexical analyzer generator written in the Java programming language.

In computer-based language recognition, ANTLR, or ANother Tool for Language Recognition, is a parser generator that uses a LL(*) algorithm for parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set (PCCTS), first developed in 1989, and is under active development. Its maintainer is Professor Terence Parr of the University of San Francisco.

XPL, for expert's programming language is a programming language based on PL/I, a portable one-pass compiler written in its own language, and a parser generator tool for easily implementing similar compilers for other languages. XPL was designed in 1967 as a way to teach compiler design principles and as starting point for students to build compilers for their own languages.

Berkeley Yacc (byacc) is a Unix parser generator designed to be compatible with Yacc. It was originally written by Robert Corbett and released in 1989. Due to its liberal license and because it was faster than the AT&T Yacc, it quickly became the most popular version of Yacc. It has the advantages of being written in ANSI C89 and being public domain software.

A GLR parser is an extension of an LR parser algorithm to handle non-deterministic and ambiguous grammars. The theoretical foundation was provided in a 1974 paper by Bernard Lang. It describes a systematic way to produce such algorithms, and provides uniform results regarding correctness proofs, complexity with respect to grammar classes, and optimization techniques. The first actual implementation of GLR was described in a 1984 paper by Masaru Tomita, it has also been referred to as a "parallel parser". Tomita presented five stages in his original work, though in practice it is the second stage that is recognized as the GLR parser.

This is a list of notable lexer generators and parser generators for various language classes.

JetPAG is an open-source LL(k) parser and lexical analyzer generator, licensed under the GNU General Public License. It is a personal work of Tareq H. Sharafy, and is currently at final beta stages of development.

The table below provides an overview of notable computer-aided design (CAD) software. It does not judge power, ease of use, or other user-experience aspects. The table does not include software that is still in development. For all-purpose 3D programs, see Comparison of 3D computer graphics software. CAD refers to a specific type of drawing and modelling software application that is used for creating designs and technical drawings. These can be 3D drawings or 2D drawings.

re2c is a free and open-source lexer generator for C, C++, Go, and Rust. It compiles declarative regular expression specifications to deterministic finite automata. Originally written by Peter Bumbulis and described in his paper, re2c was put in public domain and has been since maintained by volunteers. It is the lexer generator adopted by projects such as PHP, SpamAssassin, Ninja build system and others. Together with the Lemon parser generator, re2c is used in BRL-CAD. This combination is also used with STEPcode, an implementation of ISO 10303 standard.

<span class="mw-page-title-main">FEATool Multiphysics</span>

FEATool Multiphysics is a physics, finite element analysis (FEA), and partial differential equation (PDE) simulation toolbox. FEATool Multiphysics features the ability to model fully coupled heat transfer, fluid dynamics, chemical engineering, structural mechanics, fluid-structure interaction (FSI), electromagnetics, as well as user-defined and custom PDE problems in 1D, 2D (axisymmetry), or 3D, all within a graphical user interface (GUI) or optionally as script files. FEATool has been employed and used in academic research, teaching, and industrial engineering simulation contexts.

References