ANTLR

Last updated
ANTLR
Original author(s) Terence Parr and others
Initial releaseApril 10, 1992;31 years ago (1992-04-10)
Stable release
4.11.1 / 4 September 2022;8 months ago (2022-09-04)
Repository
Written in Java
Platform Cross-platform
License BSD License
Website www.antlr.org

In computer-based language recognition, ANTLR (pronounced antler ), or ANother Tool for Language Recognition, is a parser generator that uses a LL(*) algorithm for parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set (PCCTS), first developed in 1989, and is under active development. Its maintainer is Professor Terence Parr of the University of San Francisco.[ citation needed ]

Contents

PCCTS 1.00 was announced April 10, 1992 [1] [2] .

Usage

ANTLR takes as input a grammar that specifies a language and generates as output source code for a recognizer of that language. While Version 3 supported generating code in the programming languages Ada95, ActionScript, C, C#, Java, JavaScript, Objective-C, Perl, Python, Ruby, and Standard ML, [3] Version 4 at present targets C#, C++, Dart, [4] [5] Java, JavaScript, Go, PHP, Python (2 and 3), and Swift.

A language is specified using a context-free grammar expressed using Extended Backus–Naur Form (EBNF).[ citation needed ] [6] ANTLR can generate lexers, parsers, tree parsers, and combined lexer-parsers. Parsers can automatically generate parse trees or abstract syntax trees, which can be further processed with tree parsers. ANTLR provides a single consistent notation for specifying lexers, parsers, and tree parsers.

By default, ANTLR reads a grammar and generates a recognizer for the language defined by the grammar (i.e., a program that reads an input stream and generates an error if the input stream does not conform to the syntax specified by the grammar). If there are no syntax errors, the default action is to simply exit without printing any message. In order to do something useful with the language, actions can be attached to grammar elements in the grammar. These actions are written in the programming language in which the recognizer is being generated. When the recognizer is being generated, the actions are embedded in the source code of the recognizer at the appropriate points. Actions can be used to build and check symbol tables and to emit instructions in a target language, in the case of a compiler.[ citation needed ] [6]

Other than lexers and parsers, ANTLR can be used to generate tree parsers. These are recognizers that process abstract syntax trees, which can be automatically generated by parsers. These tree parsers are unique to ANTLR and help processing abstract syntax trees.[ citation needed ] [6]

Licensing

ANTLR 3[ citation needed ] and ANTLR 4 are free software, published under a three-clause BSD License. [7] Prior versions were released as public domain software. [8] Documentation, derived from Parr's book The Definitive ANTLR 4 Reference, is included with the BSD-licensed ANTLR 4 source. [7] [9]

Various plugins have been developed for the Eclipse development environment to support the ANTLR grammar, including ANTLR Studio, a proprietary product, as well as the "ANTLR 2" [10] and "ANTLR 3" [11] plugins for Eclipse hosted on SourceForge.[ citation needed ]

ANTLR 4

ANTLR 4 deals with direct left recursion correctly, but not with left recursion in general, i.e., grammar rules x that refer to y that refer to x. [12]

Development

As reported on the tools [13] page of the ANTLR project, plug-ins that enable features like syntax highlighting, syntax error checking and code completion are freely available for the most common IDEs (Intellij IDEA, NetBeans, Eclipse, Visual Studio [14] and Visual Studio Code).

Projects

Software built using ANTLR includes:

Over 200 grammars implemented in ANTLR 4 are available on GitHub. [19] They range from grammars for a URL to grammars for entire languages like C, Java and Go.

Example

In the following example, a parser in ANTLR describes the sum of expressions can be seen in the form of "1 + 2 + 3":

// Common options, for example, the target languageoptions{language="CSharp";}// Followed by the parser classSumParserextendsParser;options{k=1;// Parser Lookahead: 1 Token}// Definition of an expressionstatement:INTEGER(PLUS^INTEGER)*;// Here is the LexerclassSumLexerextendsLexer;options{k=1;// Lexer Lookahead: 1 characters}PLUS:'+';DIGIT:('0'..'9');INTEGER:(DIGIT)+;

The following listing demonstrates the call of the parser in a program:

TextReaderreader;// (...) Fill TextReader with characterSumLexerlexer=newSumLexer(reader);SumParserparser=newSumParser(lexer);parser.statement();

See also

Related Research Articles

In computing, a compiler is a computer program that translates computer code written in one programming language into another language. The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a low-level programming language to create an executable program.

Yacc is a computer program for the Unix operating system developed by Stephen C. Johnson. It is a Look Ahead Left-to-Right Rightmost Derivation (LALR) parser generator, generating a LALR parser based on a formal grammar, written in a notation similar to Backus–Naur Form (BNF). Yacc is supplied as a standard utility on BSD and AT&T Unix. GNU-based Linux distributions include Bison, a forward-compatible Yacc replacement.

GNU Bison, commonly known as Bison, is a parser generator that is part of the GNU Project. Bison reads a specification in the BNF notation, warns about any parsing ambiguities, and generates a parser that reads sequences of tokens and decides whether the sequence conforms to the syntax specified by the grammar.

In computer science, a recursive descent parser is a kind of top-down parser built from a set of mutually recursive procedures where each such procedure implements one of the nonterminals of the grammar. Thus the structure of the resulting program closely mirrors that of the grammar it recognizes.

In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine.

<span class="mw-page-title-main">Abstract syntax tree</span> Tree representation of the abstract syntactic structure of source code

In computer science, an abstract syntax tree (AST), or just syntax tree, is a tree representation of the abstract syntactic structure of text written in a formal language. Each node of the tree denotes a construct occurring in the text.

In computer science, lexical analysis, lexing or tokenization is the process of converting a sequence of characters into a sequence of lexical tokens. A program that performs lexical analysis may be termed a lexer, tokenizer, or scanner, although scanner is also a term for the first stage of a lexer. A lexer is generally combined with a parser, which together analyze the syntax of programming languages, web pages, and so forth.

SableCC is an open-source compiler generator in Java. Stable version is licensed under the GNU Lesser General Public License (LGPL). Rewritten version 4 is licensed under Apache License 2.0.

Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part.

A domain-specific language (DSL) is a computer language specialized to a particular application domain. This is in contrast to a general-purpose language (GPL), which is broadly applicable across domains. There are a wide variety of DSLs, ranging from widely used languages for common domains, such as HTML for web pages, down to languages used by only one or a few pieces of software, such as MUSH soft code. DSLs can be further subdivided by the kind of language, and include domain-specific markup languages, domain-specific modeling languages, and domain-specific programming languages. Special-purpose computer languages have always existed in the computer age, but the term "domain-specific language" has become more popular due to the rise of domain-specific modeling. Simpler DSLs, particularly ones used by a single application, are sometimes informally called mini-languages.

Coco/R is a compiler generator that takes wirth syntax notation grammars of a source language and generates a scanner and a parser for that language.

<span class="mw-page-title-main">Syntax (programming languages)</span> Set of rules defining correctly structured programs

In computer science, the syntax of a computer language is the rules that define the combinations of symbols that are considered to be correctly structured statements or expressions in that language. This applies both to programming languages, where the document represents source code, and to markup languages, where the document represents data.

In computer science, scannerless parsing performs tokenization and parsing in a single step, rather than breaking it up into a pipeline of a lexer followed by a parser, executing concurrently. A language grammar is scannerless if it uses a single formalism to express both the lexical and phrase level structure of the language.

This is a list of notable lexer generators and parser generators for various language classes.

Terence John Parr is a professor of computer science at the University of San Francisco. He is best known for his ANTLR parser generator and contributions to parsing theory. He also developed the StringTemplate engine for Java and other programming languages.

A syntactic predicate specifies the syntactic validity of applying a production in a formal grammar and is analogous to a semantic predicate that specifies the semantic validity of applying a production. It is a simple and effective means of dramatically improving the recognition strength of an LL parser by providing arbitrary lookahead. In their original implementation, syntactic predicates had the form “( α )?” and could only appear on the left edge of a production. The required syntactic condition α could be any valid context-free grammar fragment.

In computer science, SYNTAX is a system used to generate lexical and syntactic analyzers (parsers) for all kinds of context-free grammars (CFGs) as well as some classes of contextual grammars. It has been developed at INRIA in France for several decades, mostly by Pierre Boullier, but has become free software since 2007 only. SYNTAX is distributed under the CeCILL license.

Xtext is an open-source software framework for developing programming languages and domain-specific languages (DSLs). Unlike standard parser generators, Xtext generates not only a parser, but also a class model for the abstract syntax tree, as well as providing a fully featured, customizable Eclipse-based IDE.

OMeta is a specialized object-oriented programming language for pattern matching, developed by Alessandro Warth and Ian Piumarta in 2007 under the Viewpoints Research Institute. The language is based on Parsing Expression Grammars (PEGs) rather than Context-Free Grammars with the intent of providing "a natural and convenient way for programmers to implement tokenizers, parsers, visitors, and tree-transformers".

References

  1. "Comp.compilers: Purdue Compiler-Construction Tool Set 1.00 available". compilers.iecc.com. 10 Apr 1992. Retrieved 2023-05-05.
  2. "Comp.compilers: More on PCCTS". compilers.iecc.com. 30 Apr 1992. Retrieved 2023-05-05.
  3. SML/NJ Language Processing Tools: User Guide
  4. "Runtime Libraries and Code Generation Targets". github. 6 January 2022.
  5. "The ANTLR4 C++ runtime reached home – Soft Gems".
  6. 1 2 3 Parr, Terence (2013-01-15). The Definitive ANTLR 4 Reference. Pragmatic Bookshelf. ISBN   978-1-68050-500-9.
  7. 1 2 "antlr4/LICENSE.txt". GitHub. 2017-03-30.
  8. Parr, Terence (2004-02-05). "licensing stuff". antlr-interest (Mailing list). Archived from the original on 2011-07-18. Retrieved 2009-12-15.
  9. "ANTLR 4 Documentation". GitHub. 2017-03-30.
  10. "ANTLR plugin for Eclipse".
  11. "ANTLR IDE. An eclipse plugin for ANTLR grammars".
  12. What is the difference between ANTLR 3 & 4
  13. "ANTLR Development Tools".
  14. "ANTLR Language Support - Visual Studio Marketplace".
  15. "GroovyRecognizer (Groovy 2.4.0)".
  16. "Jython: 31d97f0de5fe".
  17. Ebersole, Steve (2018-12-06). "Hibernate ORM 6.0.0.Alpha1 released". In Relation To, The Hibernate team blog on everything data. Retrieved 2020-07-11.
  18. "OpenJDK: Compiler Grammar".
  19. Grammars written for ANTLR v4; expectation that the grammars are free of actions.: antlr/grammars-v4, Antlr Project, 2019-09-25, retrieved 2019-09-25

Bibliography

Further reading