JetPAG

Last updated
JetPAG
Developer(s) Tareq H. Sharafy
Stable release
0.6.1 / February 7, 2007;11 years ago (2007-02-07)
Preview release
0.6.3 / 2007
Written in C++
Operating system Platform-independent
Type Parser generator
License GNU General Public License
Website JetPAG Homepage

JetPAG (Jet Parser Auto-Generator) is an open-source LL(k) parser and lexical analyzer generator, licensed under the GNU General Public License. It is a personal work of Tareq H. Sharafy, and is currently at final beta stages of development.

Open-source software software licensed to ensure source code usage rights

Open-source software (OSS) is a type of computer software in which source code is released under a license in which the copyright holder grants users the rights to study, change, and distribute the software to anyone and for any purpose. Open-source software may be developed in a collaborative public manner. Open-source software is a prominent example of open collaboration.

In computer science, an LL parser is a top-down parser for a subset of context-free languages. It parses the input from Left to right, performing Leftmost derivation of the sentence.

Parsing, syntax analysis, or syntactic analysis is the process of analysing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part.

Contents

History

Tareq started JetPAG as a small program written for practice purposes only. Soon when it started expanding many goals were added rapidly, and it was obvious that JetPAG is worthy being a complete project. Real development of JetPAG started in late 2005, targeting a complete framework for a powerful recursive descent lexical analyzer and parser generator with emphasis on ease of use, code readability and high performance of generated code. After a long period of in-house development and testing, the first development package of JetPAG was released through SourceForge on 18 November 2006. Development of JetPAG is current at beta stage, current version is 0.6.1. The development was delayed from mid-2007 until early 2009 but resumed after.

In computer science, a recursive descent parser is a kind of top-down parser built from a set of mutually recursive procedures where each such procedure implements one of the nonterminals of the grammar. Thus the structure of the resulting program closely mirrors that of the grammar it recognizes.

SourceForge web-based source code repository

SourceForge is a web-based service that offers software developers a centralized online location to control and manage free and open-source software projects. It provides a source code repository, bug tracking, mirroring of downloads for load balancing, a wiki for documentation, developer and user mailing lists, user-support forums, user-written reviews and ratings, a news bulletin, micro-blog for publishing project updates, and other features.

Software release life cycle sum of the phases of development and maturity for computer software

A software release life cycle is the sum of the stages of development and maturity for a piece of computer software: ranging from its initial development to its eventual release, and including updated versions of the released version to help improve software or fix software bugs still present in the software.

Overview

Jetpag incorporates several modules: the front end, the analyzers and the code generators.

The front end accepts the grammar metalanguages as an input.

The analyzers mainly perform two operations through tree traversal. The first is calculating strong lookahead sets for the elements in the grammar and the second is constructing lookahead paths from the lookahead sets. Lookahead paths group, factorize and perform many enhancements and optimizations to lookahead sets using special analysis. From lookahead paths, lookahead sets are transformed to a nested tree form, gaining a great overall efficiency and improvement in most cases.

In computer science, tree traversal is a form of graph traversal and refers to the process of visiting each node in a tree data structure, exactly once. Such traversals are classified by the order in which the nodes are visited. The following algorithms are described for a binary tree, but they may be generalized to other trees as well.

In computer science, best, worst, and average cases of a given algorithm express what the resource usage is at least, at most and on average, respectively. Usually the resource being considered is running time, i.e. time complexity, but it could also be memory or other resource. Best case is the function which performs the minimum number of steps on input data of n elements. Worst case is the function which performs the maximum number of steps on input data of size n. Average case is the function which performs an average number of steps on input data of n elements.

Code generators generate source code for recognizers compatible with the input grammars based on them along with information collected from the analyzers. Currently, JetPAG generates source code in C++ only.

The nature of JetPAG's metalanguage and framework makes it easy and simple to integrate generated recognizers into larger applications. JetPAG also includes some facilities in the provided framework to aid developers with small utilities and save development time from many minimal language recognition tasks.

JetPAG grammars

JetPAG's garmmars are written in a meta language based on the EBNF form and regular expressions, with extensive additions and tweaks. The meta language of JetPAG grammars was designed to be maximally flexible handle both simple grammars and large, complicated ones easily. Parsers and lexical analyzers are similarly defined and generated for simplicity and ease of use. This is a simple example of a grammar for a basic calculator:

Broadly, any metalanguage is language or symbols used when language itself is being discussed or examined. In logic and linguistics, a metalanguage is a language used to make statements about statements in another language. Expressions in a metalanguage are often distinguished from those in an object language by the use of italics, quotation marks, or writing on a separate line. The structure of sentences and phrases in a metalanguage can be described by a metasyntax.

grammar Calc:  parser CalcP:  expression:  multiplicative  ( '+' multiplicative  | '-' multiplicative   )*  ;  multiplicative:  factor  ( '*' factor  | '/' factor   )*  ;  factor:    INT  | '(' expression ')'  ;  scanner CalcS:  INT:   '0'-'9'+; PLUS:  '+'; MINUS: '-'; STAR:  '*'; SLASH: '/'; LP:    '('; RP:    ')'; 

See also

Related Research Articles

In computer science, an LALR parser or Look-Ahead LR parser is a simplified version of a canonical LR parser, to parse a text according to a set of production rules specified by a formal grammar for a computer language.

In computer science, LR parsers are a type of bottom-up parser that efficiently read deterministic context-free languages, in guaranteed linear time. There are several variants of LR parsers: SLR parsers, LALR parsers, Canonical LR(1) parsers, Minimal LR(1) parsers, GLR parsers. LR parsers can be generated by a parser generator from a formal grammar defining the syntax of the language to be parsed. They are widely used for the processing of computer languages.

Yacc is a computer program for the Unix operating system developed by Stephen C. Johnson. It is a Look Ahead Left-to-Right (LALR) parser generator, generating a parser, the part of a compiler that tries to make syntactic sense of the source code, specifically a LALR parser, based on an analytic grammar written in a notation similar to Backus–Naur Form (BNF). Yacc is supplied as a standard utility on BSD and AT&T Unix. GNU-based Linux distributions include Bison, a forward-compatible Yacc replacement.

GNU Bison, commonly known as Bison, is a parser generator that is part of the GNU Project. Bison reads a specification of a context-free language, warns about any parsing ambiguities, and generates a parser which reads sequences of tokens and decides whether the sequence conforms to the syntax specified by the grammar. The generated parsers are portable: they do not require any specific compilers. Bison by default generates LALR(1) parsers but it can also generate canonical LR, IELR(1) and GLR parsers.

In computer science, Backus–Naur form or Backus normal form (BNF) is a notation technique for context-free grammars, often used to describe the syntax of languages used in computing, such as computer programming languages, document formats, instruction sets and communication protocols. They are applied wherever exact descriptions of languages are needed: for instance, in official language specifications, in manuals, and in textbooks on programming language theory.

In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine. The input may be a text file containing the grammar written in BNF or EBNF that defines the syntax of a programming language, and whose generated output is some source code of the parser for the programming language, although other definitions exist. Usually, the resulting source code will have to be extended upon before a complete compiler emerges.

In computer science, lexical analysis, lexing or tokenization is the process of converting a sequence of characters into a sequence of tokens. A program that performs lexical analysis may be termed a lexer, tokenizer, or scanner, though scanner is also a term for the first stage of a lexer. A lexer is generally combined with a parser, which together analyze the syntax of programming languages, web pages, and so forth.

Lex is a computer program that generates lexical analyzers.

Flex is a free and open-source software alternative to lex. It is a computer program that generates lexical analyzers . It is frequently used as the lex implementation together with Berkeley Yacc parser generator on BSD-derived operating systems, or together with GNU bison in *BSD ports and in Linux distributions. Unlike Bison, flex is not part of the GNU Project and is not released under the GNU General Public License.

JavaCC is an open-source parser generator and lexical analyzer generator written in the Java programming language. JavaCC is similar to yacc in that it generates a parser from a formal grammar written in EBNF notation. Unlike yacc, however, JavaCC generates top-down parsers. JavaCC can resolve choices based on the next k input tokens, and so can handle LL(k) grammars automatically; by use of "lookahead specifications", it can also resolve choices requiring unbounded look ahead. JavaCC also generates lexical analyzers in a fashion similar to lex. The tree builder that accompanies it, JJTree, constructs its trees from the bottom up.

In computer-based language recognition, ANTLR, or Another Tool For Language Recognition, is a parser generator that uses LL(*) for parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set (PCCTS), first developed in 1989, and is under active development. Its maintainer is Professor Terence Parr of the University of San Francisco.

In computer science, an operator precedence parser is a bottom-up parser that interprets an operator-precedence grammar. For example, most calculators use operator precedence parsers to convert from the human-readable infix notation relying on order of operations to a format that is optimized for evaluation such as Reverse Polish notation (RPN).

In computer science, scannerless parsing refers to performing both tokenization and parsing in a single step, rather than breaking it up into a pipeline of a lexer followed by a parser, executing concurrently. It also refers to the associated grammar: using a single formalism to express both the lexical grammar and phrase level grammar used to parse a language.

In computing, a compiler is a computer program that transforms source code written in a programming language or computer language, into another computer language. The most common reason for transforming source code is to create an executable program.

A lookahead left-to-right (LALR) parser generator is a software tool that reads a BNF grammar and creates an LALR parser which is capable of parsing files written in the computer language defined by the BNF grammar. LALR parsers are desirable because they are very fast and small in comparison to other types of parsers.

A shift-reduce parser is a class of efficient, table-driven bottom-up parsing methods for computer languages and other notations formally defined by a grammar. The parsing methods most commonly used for parsing programming languages, LR parsing and its variations, are shift-reduce methods. The precedence parsers used before the invention of LR parsing are also shift-reduce methods. All shift-reduce parsers have similar outward effects, in the incremental order in which they build a parse tree or call specific output actions.

OMeta is a specialized object-oriented programming language for pattern matching, developed by Alessandro Warth and Ian Piumarta in 2007 under the Viewpoints Research Institute. The language is based on Parsing Expression Grammars (PEGs) rather than Context-Free Grammars with the intent of providing “a natural and convenient way for programmers to implement tokenizers, parsers, visitors, and tree-transformers”.

PackCC is a parser generator for C. Its main features are as follows: