This article needs additional citations for verification .(March 2016) |
Original author(s) | Terence Parr and others |
---|---|
Initial release | April 10, 1992 |
Stable release | 4.13.2 / 3 August 2024 |
Repository | |
Written in | Java |
Platform | Cross-platform |
License | BSD License |
Website | www |
In computer-based language recognition, ANTLR (pronounced antler ), or ANother Tool for Language Recognition, is a parser generator that uses a LL(*) algorithm for parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set (PCCTS), first developed in 1989, and is under active development. Its maintainer is Professor Terence Parr of the University of San Francisco.[ citation needed ]
ANTLR takes as input a grammar that specifies a language and generates as output source code for a recognizer of that language. While Version 3 supported generating code in the programming languages Ada95, ActionScript, C, C#, Java, JavaScript, Objective-C, Perl, Python, Ruby, and Standard ML, [3] Version 4 at present targets C#, C++, Dart, [4] [5] Java, JavaScript, Go, PHP, Python (2 and 3), and Swift.
A language is specified using a context-free grammar expressed using Extended Backus–Naur Form (EBNF).[ citation needed ] [6] ANTLR can generate lexers, parsers, tree parsers, and combined lexer-parsers. Parsers can automatically generate parse trees or abstract syntax trees, which can be further processed with tree parsers. ANTLR provides a single consistent notation for specifying lexers, parsers, and tree parsers.
By default, ANTLR reads a grammar and generates a recognizer for the language defined by the grammar (i.e., a program that reads an input stream and generates an error if the input stream does not conform to the syntax specified by the grammar). If there are no syntax errors, the default action is to simply exit without printing any message. In order to do something useful with the language, actions can be attached to grammar elements in the grammar. These actions are written in the programming language in which the recognizer is being generated. When the recognizer is being generated, the actions are embedded in the source code of the recognizer at the appropriate points. Actions can be used to build and check symbol tables and to emit instructions in a target language, in the case of a compiler.[ citation needed ] [6]
Other than lexers and parsers, ANTLR can be used to generate tree parsers. These are recognizers that process abstract syntax trees, which can be automatically generated by parsers. These tree parsers are unique to ANTLR and help processing abstract syntax trees.[ citation needed ] [6]
ANTLR 3[ citation needed ] and ANTLR 4 are free software, published under a three-clause BSD License. [7] Prior versions were released as public domain software. [8] Documentation, derived from Parr's book The Definitive ANTLR 4 Reference, is included with the BSD-licensed ANTLR 4 source. [7] [9]
Various plugins have been developed for the Eclipse development environment to support the ANTLR grammar, including ANTLR Studio, a proprietary product, as well as the "ANTLR 2" [10] and "ANTLR 3" [11] plugins for Eclipse hosted on SourceForge.[ citation needed ]
ANTLR 4 deals with direct left recursion correctly, but not with left recursion in general, i.e., grammar rules x that refer to y that refer to x. [12]
As reported on the tools [13] page of the ANTLR project, plug-ins that enable features like syntax highlighting, syntax error checking and code completion are freely available for the most common IDEs (Intellij IDEA, NetBeans, Eclipse, Visual Studio [14] and Visual Studio Code).
Software built using ANTLR includes:
Over 200 grammars implemented in ANTLR 4 are available on GitHub. [20] They range from grammars for a URL to grammars for entire languages like C, Java and Go.
In the following example, a parser in ANTLR describes the sum of expressions can be seen in the form of "1 + 2 + 3":
// Common options, for example, the target languageoptions{language="CSharp";}// Followed by the parser classSumParserextendsParser;options{k=1;// Parser Lookahead: 1 Token}// Definition of an expressionstatement:INTEGER(PLUS^INTEGER)*;// Here is the LexerclassSumLexerextendsLexer;options{k=1;// Lexer Lookahead: 1 characters}PLUS:'+';DIGIT:('0'..'9');INTEGER:(DIGIT)+;
The following listing demonstrates the call of the parser in a program:
TextReaderreader;// (...) Fill TextReader with characterSumLexerlexer=newSumLexer(reader);SumParserparser=newSumParser(lexer);parser.statement();
In computing, a compiler is a computer program that translates computer code written in one programming language into another language. The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a low-level programming language to create an executable program.
GNU Bison, commonly known as Bison, is a parser generator that is part of the GNU Project. Bison reads a specification in Bison syntax, warns about any parsing ambiguities, and generates a parser that reads sequences of tokens and decides whether the sequence conforms to the syntax specified by the grammar.
In computer science, Backus–Naur form is a notation used to describe the syntax of programming languages or other formal languages. It was developed by John Backus and Peter Naur. BNF can be described as a metasyntax notation for context-free grammars. Backus–Naur form is applied wherever exact descriptions of languages are needed, such as in official language specifications, in manuals, and in textbooks on programming language theory. BNF can be used to describe document formats, instruction sets, and communication protocols.
In computer science, a recursive descent parser is a kind of top-down parser built from a set of mutually recursive procedures where each such procedure implements one of the nonterminals of the grammar. Thus the structure of the resulting program closely mirrors that of the grammar it recognizes.
In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine.
An abstract syntax tree (AST) is a data structure used in computer science to represent the structure of a program or code snippet. It is a tree representation of the abstract syntactic structure of text written in a formal language. Each node of the tree denotes a construct occurring in the text. It is sometimes called just a syntax tree.
Lexical tokenization is conversion of a text into meaningful lexical tokens belonging to categories defined by a "lexer" program. In case of a natural language, those categories include nouns, verbs, adjectives, punctuations etc. In case of a programming language, the categories include identifiers, operators, grouping symbols and data types. Lexical tokenization is related to the type of tokenization used in large language models (LLMs) but with two differences. First, lexical tokenization is usually based on a lexical grammar, whereas LLM tokenizers are usually probability-based. Second, LLM tokenizers perform a second step that converts the tokens into numerical values.
SableCC is an open-source compiler generator in Java. Stable version is licensed under the GNU Lesser General Public License (LGPL). Rewritten version 4 is licensed under Apache License 2.0.
Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part.
Apache Groovy is a Java-syntax-compatible object-oriented programming language for the Java platform. It is both a static and dynamic language with features similar to those of Python, Ruby, and Smalltalk. It can be used as both a programming language and a scripting language for the Java Platform, is compiled to Java virtual machine (JVM) bytecode, and interoperates seamlessly with other Java code and libraries. Groovy uses a curly-bracket syntax similar to Java's. Groovy supports closures, multiline strings, and expressions embedded in strings. Much of Groovy's power lies in its AST transformations, triggered through annotations.
A domain-specific language (DSL) is a computer language specialized to a particular application domain. This is in contrast to a general-purpose language (GPL), which is broadly applicable across domains. There are a wide variety of DSLs, ranging from widely used languages for common domains, such as HTML for web pages, down to languages used by only one or a few pieces of software, such as MUSH soft code. DSLs can be further subdivided by the kind of language, and include domain-specific markup languages, domain-specific modeling languages, and domain-specific programming languages. Special-purpose computer languages have always existed in the computer age, but the term "domain-specific language" has become more popular due to the rise of domain-specific modeling. Simpler DSLs, particularly ones used by a single application, are sometimes informally called mini-languages.
Coco/R is a compiler generator that takes wirth syntax notation grammars of a source language and generates a scanner and a parser for that language.
In computer science, the syntax of a computer language is the rules that define the combinations of symbols that are considered to be correctly structured statements or expressions in that language. This applies both to programming languages, where the document represents source code, and to markup languages, where the document represents data.
In computer science, scannerless parsing performs tokenization and parsing in a single step, rather than breaking it up into a pipeline of a lexer followed by a parser, executing concurrently. A language grammar is scannerless if it uses a single formalism to express both the lexical and phrase level structure of the language.
This is a list of notable lexer generators and parser generators for various language classes.
Terence John Parr is a professor of computer science at the University of San Francisco. He is best known for his ANTLR parser generator and contributions to parsing theory. He also developed the StringTemplate engine for Java and other programming languages.
A syntactic predicate specifies the syntactic validity of applying a production in a formal grammar and is analogous to a semantic predicate that specifies the semantic validity of applying a production. It is a simple and effective means of dramatically improving the recognition strength of an LL parser by providing arbitrary lookahead. In their original implementation, syntactic predicates had the form “( α )?” and could only appear on the left edge of a production. The required syntactic condition α could be any valid context-free grammar fragment.
In computer science, SYNTAX is a system used to generate lexical and syntactic analyzers (parsers) for all kinds of context-free grammars (CFGs) as well as some classes of contextual grammars. It has been developed at INRIA in France for several decades, mostly by Pierre Boullier, but has become free software since 2007 only. SYNTAX is distributed under the CeCILL license.
Xtext is an open-source software framework for developing programming languages and domain-specific languages (DSLs). Unlike standard parser generators, Xtext generates not only a parser, but also a class model for the abstract syntax tree, as well as providing a fully featured, customizable Eclipse-based IDE.
OMeta is a specialized object-oriented programming language for pattern matching, developed by Alessandro Warth and Ian Piumarta in 2007 at the Viewpoints Research Institute. The language is based on parsing expression grammars (PEGs), rather than context-free grammars, with the intent to provide "a natural and convenient way for programmers to implement tokenizers, parsers, visitors, and tree-transformers".