This article has multiple issues. Please help improve it or discuss these issues on the talk page . (Learn how and when to remove these template messages)
|
An adaptive grammar is a formal grammar that explicitly provides mechanisms within the formalism to allow its own production rules to be manipulated.
John N. Shutt defines adaptive grammar as a grammatical formalism that allows rule sets (aka sets of production rules) to be explicitly manipulated within a grammar. Types of manipulation include rule addition, deletion, and modification. [1]
The first description of grammar adaptivity (though not under that name) in the literature is generally [2] [3] [4] taken to be in a paper by Alfonso Caracciolo di Forino published in 1963. [5] The next generally accepted reference to an adaptive formalism (extensible context-free grammars) came from Wegbreit in 1970 [6] in the study of extensible programming languages, followed by the dynamic syntax of Hanford and Jones in 1973. [7]
Until fairly recently, much of the research into the formal properties of adaptive grammars was uncoordinated between researchers, only first being summarized by Henning Christiansen in 1990 [2] in response to a paper in ACM SIGPLAN Notices by Boris Burshteyn. [8] The Department of Engineering at the University of São Paulo has its Adaptive Languages and Techniques Laboratory, specifically focusing on research and practice in adaptive technologies and theory. The LTA also maintains a page naming researchers in the field. [9]
While early efforts made reference to dynamic syntax [7] and extensible, [6] modifiable, [10] dynamic, [11] and adaptable [2] [12] grammars, more recent usage has tended towards the use of the term adaptive (or some variant such as adaptativa, [13] [14] depending on the publication language of the literature). [3] Iwai refers to her formalism as adaptive grammars, [13] but this specific use of simply adaptive grammars is not typically currently used in the literature without name qualification. Moreover, no standardization or categorization efforts have been undertaken between various researchers, although several have made efforts in this direction. [3] [4]
Shutt categorizes adaptive grammar models into two main categories: [3] [15]
Jackson refines Shutt's taxonomy, referring to changes over time as global and changes over space as local, and adding a hybrid time-space category: [4]
Adaptive formalisms may be divided into two main categories: full grammar formalisms (adaptive grammars), and adaptive machines, upon which some grammar formalisms have been based.
The following is a list (by no means complete) of grammar formalisms that, by Shutt's definition above, are considered to be (or have been classified by their own inventors as being) adaptive grammars. They are listed in their historical order of first mention in the literature.
Described in Wegbreit's doctoral dissertation in 1970, [6] an extensible context-free grammar consists of a context-free grammar whose rule set is modified according to instructions output by a finite state transducer when reading the terminal prefix during a leftmost derivation. Thus, the rule set varies over position in the generated string, but this variation ignores the hierarchical structure of the syntax tree. Extensible context-free grammars were classified by Shutt as imperative. [3]
First introduced in 1985 as Generative Grammars [16] and later more elaborated upon, [17] Christiansen grammars (apparently dubbed so by Shutt, possibly due to conflict with Chomsky generative grammars) are an adaptive extension of attribute grammars. Christiansen grammars were classified by Shutt as declarative. [3]
The redoubling language is demonstrated as follows: [17]
<program↓G> → <dcl↓G↑w> <body↓{w-rule}>
where w-rule = <body↓G’> → w
<dcl↓G↑ch•w> → <char↓G↑ch> <dcl↓G↑w> <dcl↓G↑<>> → <ε> <char↓G↑a> → a
First introduced in May 1990 [8] and later expanded upon in December 1990, [10] modifiable grammars explicitly provide a mechanism for the addition and deletion of rules during a parse. In response to the ACM SIGPLAN Notices responses, Burshteyn later modified his formalism and introduced his adaptive Universal Syntax and Semantics Analyzer (USSA) in 1992. [18] These formalisms were classified by Shutt as imperative. [3]
Introduced in 1993, Recursive Adaptive Grammars (RAGs) were an attempt to introduce a Turing powerful formalism that maintained much of the elegance of context-free grammars. [3] Shutt self-classifies RAGs as being a declarative formalism.
Boullier's dynamic grammars, introduced in 1994, [11] appear to be the first adaptive grammar family of grammars to rigorously introduce the notion of a time continuum of a parse as part of the notation of the grammar formalism itself. [4] Dynamic grammars are a sequence of grammars, with each grammar Gi differing in some way from other grammars in the sequence, over time. Boullier's main paper on dynamic grammars also defines a dynamic parser, the machine that effects a parse against these grammars, and shows examples of how his formalism can handle such things as type checking, extensible languages, polymorphism, and other constructs typically considered to be in the semantic domain of programming language translation.
The work of Iwai in 2000 [13] takes the adaptive automata of Neto [19] further by applying adaptive automata to context-sensitive grammars. Iwai's adaptive grammars (note the qualifier by name) allow for three operations during a parse: ? query (similar in some respects to a syntactic predicate, but tied to inspection of rules from which modifications are chosen), + addition, and - deletion (which it shares with its predecessor adaptive automata).
Introduced in 2000 [20] and most fully discussed in 2006, [4] the §-Calculus (§ here pronounced meta-ess) allows for the explicit addition, deletion, and modification of productions within a grammar, as well as providing for syntactic predicates. This formalism is self-classified by its creator as both imperative and adaptive, or, more specifically, as a time-space adaptive grammar formalism, and was further classified by others as being an analytic formalism. [14] [21]
The redoubling language is demonstrated as follows:
grammar ww { S ::= #phi(A.X<-"") R; R ::= $C('[ab]') #phi(A.X<-A.X C) #phi(N<=A.X) N | R; };
(Note on notation: In the above example, the #phi(...)
statements identify the points in the production R that modify the grammar explicitly. #phi(A.X<-A.X C)
represents a global modification (over time) and the #phi(N<=A.X)
statement identifies a local modification (over space). The #phi(A.X<-"")
statement in the S production effectively declares a global production called A.X by placing the empty string into that production before its reference by R.)
First described by Neto in 2001, [22] adaptive devices were later enhanced and expanded upon by Pistori in 2003. [23]
In 2002, [24] Adam Carmi introduced an LALR(1)-based adaptive grammar formalism known as Adapser. Specifics of the formalism do not appear to have been released.
In 2004, [14] César Bravo introduced the notion of merging the concept of appearance checking [25] with adaptive context-free grammars, a restricted form of Iwai's adaptive grammars, [13] showing these new grammars, called Adaptive CFGs with Appearance Checking to be Turing powerful.
The formalisms listed below, while not grammar formalisms, either serve as the basis of full grammar formalisms, or are included here because they are adaptive in nature. They are listed in their historical order of first mention in the literature.
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form
In formal language theory, a context-free grammar, G, is said to be in Chomsky normal form if all of its production rules are of the form:
In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules.
The Standard Generalized Markup Language is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on two postulates":
In computer science, Backus–Naur form or Backus normal form (BNF) is a metasyntax notation for context-free grammars, often used to describe the syntax of languages used in computing, such as computer programming languages, document formats, instruction sets and communication protocols. It is applied wherever exact descriptions of languages are needed: for instance, in official language specifications, in manuals, and in textbooks on programming language theory.
Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part.
An attribute grammar is a formal way to supplement a formal grammar with semantic information processing. Semantic information is stored in attributes associated with terminal and nonterminal symbols of the grammar. The values of attributes are result of attribute evaluation rules associated with productions of the grammar. Attributes allow to transfer information from anywhere in the abstract syntax tree to anywhere else, in a controlled and formal way.
Sheila Adele Greibach is a researcher in formal languages in computing, automata, compiler theory and computer science. She is an Emeritus Professor of Computer Science at the University of California, Los Angeles, and notable work include working with Seymour Ginsburg and Michael A. Harrison in context-sensitive parsing using the stack automaton model.
Tree-adjoining grammar (TAG) is a grammar formalism defined by Aravind Joshi. Tree-adjoining grammars are somewhat similar to context-free grammars, but the elementary unit of rewriting is the tree rather than the symbol. Whereas context-free grammars have rules for rewriting symbols as strings of other symbols, tree-adjoining grammars have rules for rewriting the nodes of trees as other trees.
In computer science, a parsing expression grammar (PEG) is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in 2004 and is closely related to the family of top-down parsing languages introduced in the early 1970s. Syntactically, PEGs also look similar to context-free grammars (CFGs), but they have a different interpretation: the choice operator selects the first match in PEG, while it is ambiguous in CFG. This is closer to how string recognition tends to be done in practice, e.g. by a recursive descent parser.
Extensible programming is a term used in computer science to describe a style of computer programming that focuses on mechanisms to extend the programming language, compiler and runtime environment. Extensible programming languages, supporting this style of programming, were an active area of work in the 1960s, but the movement was marginalized in the 1970s. Extensible programming has become a topic of renewed interest in the 21st century.
A GLR parser is an extension of an LR parser algorithm to handle non-deterministic and ambiguous grammars. The theoretical foundation was provided in a 1974 paper by Bernard Lang. It describes a systematic way to produce such algorithms, and provides uniform results regarding correctness proofs, complexity with respect to grammar classes, and optimization techniques. The first actual implementation of GLR was described in a 1984 paper by Masaru Tomita, it has also been referred to as a "parallel parser". Tomita presented five stages in his original work, though in practice it is the second stage that is recognized as the GLR parser.
Indexed languages are a class of formal languages discovered by Alfred Aho; they are described by indexed grammars and can be recognized by nested stack automata.
A syntactic predicate specifies the syntactic validity of applying a production in a formal grammar and is analogous to a semantic predicate that specifies the semantic validity of applying a production. It is a simple and effective means of dramatically improving the recognition strength of an LL parser by providing arbitrary lookahead. In their original implementation, syntactic predicates had the form “( α )?” and could only appear on the left edge of a production. The required syntactic condition α could be any valid context-free grammar fragment.
A structure editor, also structured editor or projectional editor, is any document editor that is cognizant of the document's underlying structure. Structure editors can be used to edit hierarchical or marked up text, computer programs, diagrams, chemical formulas, and any other type of content with clear and well-defined structure. In contrast, a text editor is any document editor used for editing plain text files.
In computer science, SYNTAX is a system used to generate lexical and syntactic analyzers (parsers) for all kinds of context-free grammars (CFGs) as well as some classes of contextual grammars. It has been developed at INRIA (France) for several decades, mostly by Pierre Boullier, but has become free software since 2007 only. SYNTAX is distributed under the CeCILL license.
In formal language theory, a grammar describes how to form strings from a language's alphabet that are valid according to the language's syntax. A grammar does not describe the meaning of the strings or what can be done with them in whatever context—only their form. A formal grammar is defined as a set of production rules for such strings in a formal language.
In computing, a compiler is a computer program that transforms source code written in a programming language or computer language, into another computer language. The most common reason for transforming source code is to create an executable program.
In computational linguistics, the term mildly context-sensitive grammar formalisms refers to several grammar formalisms that have been developed in an effort to provide adequate descriptions of the syntactic structure of natural language.
Syntactic parsing is the automatic analysis of syntactic structure of natural language, especially syntactic relations and labelling spans of constituents. It is motivated by the problem of structural ambiguity in natural language: a sentence can be assigned multiple grammatical parses, so some kind of knowledge beyond computational grammar rules are need to tell which parse is intended. Syntactic parsing is one of the important tasks in computational linguistics and natural language processing, and has been a subject of research since the mid-20th century with the advent of computers.