Syntax Definition Formalism

Last updated

The Syntax Definition Formalism (SDF) is a metasyntax used to define context-free grammars: that is, a formal way to describe formal languages. It can express the entire range of context-free grammars. Its current version is SDF3. [1] A parser and parser generator for SDF specifications are provided as part of the free ASF+SDF Meta Environment. These operate using the SGLR (Scannerless GLR parser). An SDF parser outputs parse trees or, in the case of ambiguities, parse forests.

Contents

Overview

Features of SDF:

Examples

The following example defines a simple Boolean expression syntax in SDF2:

module basic/Booleans  exports   sorts Boolean   context-free start-symbols Boolean  context-free syntax    "true"                      -> Boolean    "false"                     -> Boolean    lhs:Boolean "|" rhs:Boolean -> Boolean {left}             lhs:Boolean "&" rhs:Boolean -> Boolean {left}           "not" "(" Boolean ")"       -> Boolean               "(" Boolean ")"             -> Boolean   context-free priorities    Boolean "&" Boolean -> Boolean >    Boolean "|" Boolean -> Boolean

Program analysis and transformation systems using SDF

See also

Related Research Articles

Context-free grammar Type of formal grammar

In formal language theory, a context-free grammar (CFG) is a formal grammar in which every production rule is of the form

Formal language Words whose letters are taken from an alphabet and are well-formed according to a specific set of rules

In mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules.

In computer science, Backus–Naur form or Backus normal form (BNF) is a notation technique for context-free grammars, often used to describe the syntax of languages used in computing, such as computer programming languages, document formats, instruction sets and communication protocols. They are applied wherever exact descriptions of languages are needed: for instance, in official language specifications, in manuals, and in textbooks on programming language theory.

In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine.

In computer science, extended Backus–Naur form (EBNF) is a family of metasyntax notations, any of which can be used to express a context-free grammar. EBNF is used to make a formal description of a formal language such as a computer programming language. They are extensions of the basic Backus–Naur form (BNF) metasyntax notation.

Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part.

An attribute grammar is a formal way to define attributes for the productions of a formal grammar, associating these attributes with values. The evaluation occurs in the nodes of the abstract syntax tree, when the language is processed by some parser or compiler.

Tree-adjoining grammar (TAG) is a grammar formalism defined by Aravind Joshi. Tree-adjoining grammars are somewhat similar to context-free grammars, but the elementary unit of rewriting is the tree rather than the symbol. Whereas context-free grammars have rules for rewriting symbols as strings of other symbols, tree-adjoining grammars have rules for rewriting the nodes of trees as other trees.

In computer-based language recognition, ANTLR, or ANother Tool for Language Recognition, is a parser generator that uses LL(*) for parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set (PCCTS), first developed in 1989, and is under active development. Its maintainer is Professor Terence Parr of the University of San Francisco.

In computer science, a parsing expression grammar, or PEG, is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in 2004 and is closely related to the family of top-down parsing languages introduced in the early 1970s. Syntactically, PEGs also look similar to context-free grammars (CFGs), but they have a different interpretation: the choice operator selects the first match in PEG, while it is ambiguous in CFG. This is closer to how string recognition tends to be done in practice, e.g. by a recursive descent parser.

In computer science, a Van Wijngaarden grammar is a two-level grammar which provides a technique to define potentially infinite context-free grammars in a finite number of rules. The formalism was invented by Adriaan van Wijngaarden to define rigorously some syntactic restrictions which previously had to be formulated in natural language, despite their essentially syntactical content.

In computer science, scannerless parsing performs tokenization and parsing in a single step, rather than breaking it up into a pipeline of a lexer followed by a parser, executing concurrently. A language grammar is scannerless if it uses a single formalism to express both the lexical and phrase level structure of the language.

A syntactic predicate specifies the syntactic validity of applying a production in a formal grammar and is analogous to a semantic predicate that specifies the semantic validity of applying a production. It is a simple and effective means of dramatically improving the recognition strength of an LL parser by providing arbitrary lookahead. In their original implementation, syntactic predicates had the form “( α )?” and could only appear on the left edge of a production. The required syntactic condition α could be any valid context-free grammar fragment.

An adaptive grammar is a formal grammar that explicitly provides mechanisms within the formalism to allow its own production rules to be manipulated.

Stratego/XT language and toolset for constructing stand-alone program transformation systems

Stratego/XT is a language and toolset for constructing stand-alone program transformation systems. It combines the Stratego transformation language with the XT toolset of transformation components, providing a framework for constructing stand-alone program transformation systems. The Stratego language is based on a programming paradigm called strategic term rewriting. It provides rewrite rules for expressing basic transformation steps. The application of these rules can be controlled using strategies, a form of subroutines. The XT toolset provides reusable transformation components and declarative languages for deriving new components, such as parsing grammars using the Modular Syntax Definition Formalism (SDF) and implementing pretty-printing.

ASF+SDF Meta Environment

The ASF+SDF Meta-Environment is an IDE and toolset for interactive program analysis and transformation. It combines SDF, ASF and other technologies.

In formal language theory, a grammar describes how to form strings from a language's alphabet that are valid according to the language's syntax. A grammar does not describe the meaning of the strings or what can be done with them in whatever context—only their form. A formal grammar is defined as a set of production rules for strings in a formal language.

In computing, a compiler is a computer program that transforms source code written in a programming language or computer language, into another computer language. The most common reason for transforming source code is to create an executable program.

In formal language theory, weak equivalence of two grammars means they generate the same set of strings, i.e. that the formal language they generate is the same. In compiler theory the notion is distinguished from strongequivalence, which additionally means that the two parse trees are reasonably similar in that the same semantic interpretation can be assigned to both.

In computational linguistics, the term mildly context-sensitive grammar formalisms refers to several grammar formalisms that have been developed with the ambition to provide adequate descriptions of the syntactic structure of natural language.

References

Further reading