SYNTAX

Last updated
SYNTAX
Developer(s) INRIA
Type Generator
License CeCILL
Website sourcesup.renater.fr/projects/syntax

In computer science, SYNTAX is a system used to generate lexical and syntactic analyzers (parsers) (both deterministic and non-deterministic) for all kinds of context-free grammars (CFGs) as well as some classes of contextual grammars.[ citation needed ] It has been developed at INRIA in France for several decades, mostly by Pierre Boullier, but has become free software since 2007 only. SYNTAX is distributed under the CeCILL license.[ citation needed ]

Contents

Context-free parsing

SYNTAX handles most classes of deterministic (unambiguous) grammars (LR, LALR, RLR as well as general context-free grammars. The deterministic version has been used in operational contexts (e.g., Ada [1] ), and is currently used both in the domain of compilation. [2] The non-deterministic features include an Earley parser generator used for natural language processing. [3] Parsers generated by SYNTAX include powerful error recovery mechanisms, and allow the execution of semantic actions and attribute evaluation on the abstract tree or on the shared parse forest.

Contextual parsing

The current version of SYNTAX (version 6.0 beta) includes also parser generators for other formalisms, used for natural language processing as well as bio-informatics. These formalisms are context-sensitive formalisms (TAG, RCG or formalisms that rely on context-free grammars and are extended thanks to attribute evaluation, in particular for natural language processing (LFG).

Error recovery

A nice feature of SYNTAX (compared to Lex/Yacc) is its built-in algorithm [4] for automatically recovering from lexical and syntactic errors, by deleting extra characters or tokens, inserting missing characters or tokens, permuting characters or tokens, etc. This algorithm has a default behaviour that can be modified by providing a custom set of recovery rules adapted to the language for which the lexer and parser are built.

Related Research Articles

In computing, a compiler is a computer program that translates computer code written in one programming language into another language. The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a low-level programming language to create an executable program.

<span class="mw-page-title-main">Formal language</span> Sequence of words formed by specific rules

In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules.

In computer science, LR parsers are a type of bottom-up parser that analyse deterministic context-free languages in linear time. There are several variants of LR parsers: SLR parsers, LALR parsers, Canonical LR(1) parsers, Minimal LR(1) parsers, and GLR parsers. LR parsers can be generated by a parser generator from a formal grammar defining the syntax of the language to be parsed. They are widely used for the processing of computer languages.

Yacc is a computer program for the Unix operating system developed by Stephen C. Johnson. It is a Look Ahead Left-to-Right Rightmost Derivation (LALR) parser generator, generating a LALR parser based on a formal grammar, written in a notation similar to Backus–Naur Form (BNF). Yacc is supplied as a standard utility on BSD and AT&T Unix. GNU-based Linux distributions include Bison, a forward-compatible Yacc replacement.

GNU Bison, commonly known as Bison, is a parser generator that is part of the GNU Project. Bison reads a specification in the BNF notation, warns about any parsing ambiguities, and generates a parser that reads sequences of tokens and decides whether the sequence conforms to the syntax specified by the grammar.

In computer science, lexical analysis, lexing or tokenization is the process of converting a sequence of characters into a sequence of lexical tokens. A program that performs lexical analysis may be termed a lexer, tokenizer, or scanner, although scanner is also a term for the first stage of a lexer. A lexer is generally combined with a parser, which together analyze the syntax of programming languages, web pages, and so forth.

Lexical functional grammar (LFG) is a constraint-based grammar framework in theoretical linguistics. It posits two separate levels of syntactic structure, a phrase structure grammar representation of word order and constituency, and a representation of grammatical functions such as subject and object, similar to dependency grammar. The development of the theory was initiated by Joan Bresnan and Ronald Kaplan in the 1970s, in reaction to the theory of transformational grammar which was current in the late 1970s. It mainly focuses on syntax, including its relation with morphology and semantics. There has been little LFG work on phonology.

Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part.

In computer-based language recognition, ANTLR, or ANother Tool for Language Recognition, is a parser generator that uses ALL(*) for parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set (PCCTS), first developed in 1989, and is under active development. Its maintainer is Professor Terence Parr of the University of San Francisco.

Coco/R is a compiler generator that takes wirth syntax notation grammars of a source language and generates a scanner and a parser for that language.

In computer science, a parsing expression grammar (PEG) is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in 2004 and is closely related to the family of top-down parsing languages introduced in the early 1970s. Syntactically, PEGs also look similar to context-free grammars (CFGs), but they have a different interpretation: the choice operator selects the first match in PEG, while it is ambiguous in CFG. This is closer to how string recognition tends to be done in practice, e.g. by a recursive descent parser.

In computer science, a Van Wijngaarden grammar is a two-level grammar which provides a technique to define potentially infinite context-free grammars in a finite number of rules. The formalism was invented by Adriaan van Wijngaarden. to define rigorously some syntactic restrictions which previously had to be formulated in natural language, despite their essentially syntactical content.

<span class="mw-page-title-main">Syntax (programming languages)</span> Set of rules defining correctly structured programs

In computer science, the syntax of a computer language is the rules that defines the combinations of symbols that are considered to be correctly structured statements or expressions in that language. This applies both to programming languages, where the document represents source code, and to markup languages, where the document represents data.

In computer science, scannerless parsing performs tokenization and parsing in a single step, rather than breaking it up into a pipeline of a lexer followed by a parser, executing concurrently. A language grammar is scannerless if it uses a single formalism to express both the lexical and phrase level structure of the language.

A GLR parser is an extension of an LR parser algorithm to handle non-deterministic and ambiguous grammars. The theoretical foundation was provided in a 1974 paper by Bernard Lang. It describes a systematic way to produce such algorithms, and provides uniform results regarding correctness proofs, complexity with respect to grammar classes, and optimization techniques. The first actual implementation of GLR was described in a 1984 paper by Masaru Tomita, it has also been referred to as a "parallel parser". Tomita presented five stages in his original work, though in practice it is the second stage that is recognized as the GLR parser.

A syntactic predicate specifies the syntactic validity of applying a production in a formal grammar and is analogous to a semantic predicate that specifies the semantic validity of applying a production. It is a simple and effective means of dramatically improving the recognition strength of an LL parser by providing arbitrary lookahead. In their original implementation, syntactic predicates had the form “( α )?” and could only appear on the left edge of a production. The required syntactic condition α could be any valid context-free grammar fragment.

An adaptive grammar is a formal grammar that explicitly provides mechanisms within the formalism to allow its own production rules to be manipulated.

Glue semantics, or simply Glue, is a linguistic theory of semantic composition and the syntax–semantics interface which assumes that meaning composition is constrained by a set of instructions stated within a formal logic. These instructions, called meaning constructors, state how the meanings of the parts of a sentence can be combined to provide the meaning of the sentence.

<span class="mw-page-title-main">History of compiler construction</span>

In computing, a compiler is a computer program that transforms source code written in a programming language or computer language, into another computer language. The most common reason for transforming source code is to create an executable program.

Syntactic parsing is the automatic analysis of syntactic structure of natural language, especially syntactic relations and labelling spans of constituents. It is motivated by the problem of structural ambiguity in natural language: a sentence can be assigned multiple grammatical parses, so some kind of knowledge beyond computational grammar rules are need to tell which parse is intended. Syntactic parsing is one of the important tasks in computational linguistics and natural language processing, and has been a subject of research since the mid-20th century with the advent of computers.

References

  1. The first tool-translator for the ADA language has been developed with SYNTAX by Pierre Boullier and others, as recalled in this page on the history of ADA. See also Pierre Boullier and Knut Ripken. Building an Ada compiler following meta-compilation methods. In Séminaires Langages et Traducteurs 1978-1981, pages 99-140. INRIA, Rocquencourt, France, 1981.
  2. E.g., by the VASY and CONVECS teams at INRIA, in particular for the development of CADP and Traian.
  3. E.g., in the SxLFG parser, whose first version is described in this paper.
  4. Pierre Boullier and Martin Jourdan. A New Error Repair and Recovery Scheme for Lexical and Syntactic Analysis. Science of Computer Programming 9(3): 271-286 (1987).