Adaptive grammar

Last updated

An adaptive grammar is a formal grammar that explicitly provides mechanisms within the formalism to allow its own production rules to be manipulated.

Contents

Overview

John N. Shutt defines adaptive grammar as a grammatical formalism that allows rule sets (aka sets of production rules) to be explicitly manipulated within a grammar. Types of manipulation include rule addition, deletion, and modification. [1]

Early history

The first description of grammar adaptivity (though not under that name) in the literature is generally [2] [3] [4] taken to be in a paper by Alfonso Caracciolo di Forino published in 1963. [5] The next generally accepted reference to an adaptive formalism (extensible context-free grammars) came from Wegbreit in 1970 [6] in the study of extensible programming languages, followed by the dynamic syntax of Hanford and Jones in 1973. [7]

Collaborative efforts

Until fairly recently, much of the research into the formal properties of adaptive grammars was uncoordinated between researchers, only first being summarized by Henning Christiansen in 1990 [2] in response to a paper in ACM SIGPLAN Notices by Boris Burshteyn. [8] The Department of Engineering at the University of São Paulo has its Adaptive Languages and Techniques Laboratory, specifically focusing on research and practice in adaptive technologies and theory. The LTA also maintains a page naming researchers in the field. [9]

Terminology and taxonomy

While early efforts made reference to dynamic syntax [7] and extensible, [6] modifiable, [10] dynamic, [11] and adaptable [2] [12] grammars, more recent usage has tended towards the use of the term adaptive (or some variant such as adaptativa, [13] [14] depending on the publication language of the literature). [3] Iwai refers to her formalism as adaptive grammars, [13] but this specific use of simply adaptive grammars is not typically currently used in the literature without name qualification. Moreover, no standardization or categorization efforts have been undertaken between various researchers, although several have made efforts in this direction. [3] [4]

The Shutt classification (and extensions)

Shutt categorizes adaptive grammar models into two main categories: [3] [15]

  • Imperative adaptive grammars vary their rules based on a global state changing over the time of the generation of a language.
  • Declarative adaptive grammars vary their rules only over the space of the generation of a language (i.e., position in the syntax tree of the generated string).

Jackson refines Shutt's taxonomy, referring to changes over time as global and changes over space as local, and adding a hybrid time-space category: [4]

  • Time-space adaptive grammars (hybrids) vary their rules over either the time or the space (or both) of the generation of a language (and local and global operations are explicitly differentiated by the notation for such changes).

Adaptive formalisms in the literature

Adaptive formalisms may be divided into two main categories: full grammar formalisms (adaptive grammars), and adaptive machines, upon which some grammar formalisms have been based.

Adaptive grammar formalisms

The following is a list (by no means complete) of grammar formalisms that, by Shutt's definition above, are considered to be (or have been classified by their own inventors as being) adaptive grammars. They are listed in their historical order of first mention in the literature.

Extensible context-free grammars (Wegbreit)

Described in Wegbreit's doctoral dissertation in 1970, [6] an extensible context-free grammar consists of a context-free grammar whose rule set is modified according to instructions output by a finite state transducer when reading the terminal prefix during a leftmost derivation. Thus, the rule set varies over position in the generated string, but this variation ignores the hierarchical structure of the syntax tree. Extensible context-free grammars were classified by Shutt as imperative. [3]

Christiansen grammars (Christiansen)

First introduced in 1985 as Generative Grammars [16] and later more elaborated upon, [17] Christiansen grammars (apparently dubbed so by Shutt, possibly due to conflict with Chomsky generative grammars) are an adaptive extension of attribute grammars. Christiansen grammars were classified by Shutt as declarative. [3]

The redoubling language is demonstrated as follows: [17]

<program↓G>       →   <dcl↓Gw> <body↓{w-rule}>
where w-rule  = <body↓G’>         →   w
<dcl↓Gchw>     →   <char↓Gch> <dcl↓Gw> <dcl↓G↑<>>       →   <ε> <char↓G↑a>       →   a

Bottom-up modifiable grammars, top-down modifiable grammars, and USSA (Burshteyn)

First introduced in May 1990 [8] and later expanded upon in December 1990, [10] modifiable grammars explicitly provide a mechanism for the addition and deletion of rules during a parse. In response to the ACM SIGPLAN Notices responses, Burshteyn later modified his formalism and introduced his adaptive Universal Syntax and Semantics Analyzer (USSA) in 1992. [18] These formalisms were classified by Shutt as imperative. [3]

Recursive adaptive grammars (Shutt)

Introduced in 1993, Recursive Adaptive Grammars (RAGs) were an attempt to introduce a Turing powerful formalism that maintained much of the elegance of context-free grammars. [3] Shutt self-classifies RAGs as being a declarative formalism.

Dynamic grammars (Boullier)

Boullier's dynamic grammars, introduced in 1994, [11] appear to be the first adaptive grammar family of grammars to rigorously introduce the notion of a time continuum of a parse as part of the notation of the grammar formalism itself. [4] Dynamic grammars are a sequence of grammars, with each grammar Gi differing in some way from other grammars in the sequence, over time. Boullier's main paper on dynamic grammars also defines a dynamic parser, the machine that effects a parse against these grammars, and shows examples of how his formalism can handle such things as type checking, extensible languages, polymorphism, and other constructs typically considered to be in the semantic domain of programming language translation.

Adaptive grammars (Iwai)

The work of Iwai in 2000 [13] takes the adaptive automata of Neto [19] further by applying adaptive automata to context-sensitive grammars. Iwai's adaptive grammars (note the qualifier by name) allow for three operations during a parse: ? query (similar in some respects to a syntactic predicate, but tied to inspection of rules from which modifications are chosen), + addition, and - deletion (which it shares with its predecessor adaptive automata).

§-calculus (Jackson)

Introduced in 2000 [20] and most fully discussed in 2006, [4] the §-Calculus (§ here pronounced meta-ess) allows for the explicit addition, deletion, and modification of productions within a grammar, as well as providing for syntactic predicates. This formalism is self-classified by its creator as both imperative and adaptive, or, more specifically, as a time-space adaptive grammar formalism, and was further classified by others as being an analytic formalism. [14] [21]

The redoubling language is demonstrated as follows:

grammar ww {  S ::= #phi(A.X<-"") R;  R ::= $C('[ab]') #phi(A.X<-A.X C) #phi(N<=A.X) N | R; };

(Note on notation: In the above example, the #phi(...) statements identify the points in the production R that modify the grammar explicitly. #phi(A.X<-A.X C) represents a global modification (over time) and the #phi(N<=A.X) statement identifies a local modification (over space). The #phi(A.X<-"") statement in the S production effectively declares a global production called A.X by placing the empty string into that production before its reference by R.)

Adaptive devices (Neto & Pistori)

First described by Neto in 2001, [22] adaptive devices were later enhanced and expanded upon by Pistori in 2003. [23]

Adapser (Carmi)

In 2002, [24] Adam Carmi introduced an LALR(1)-based adaptive grammar formalism known as Adapser. Specifics of the formalism do not appear to have been released.

Adaptive CFGs with appearance checking (Bravo)

In 2004, [14] César Bravo introduced the notion of merging the concept of appearance checking [25] with adaptive context-free grammars, a restricted form of Iwai's adaptive grammars, [13] showing these new grammars, called Adaptive CFGs with Appearance Checking to be Turing powerful.

Adaptive machine formalisms

The formalisms listed below, while not grammar formalisms, either serve as the basis of full grammar formalisms, or are included here because they are adaptive in nature. They are listed in their historical order of first mention in the literature.

Self-modifying finite state automata (Shutt & Rubinstein)
Introduced in 1994 by Shutt and Rubinstein, [26] Self-Modifying Finite State Automata (SMFAs) are shown to be, in a restricted form, Turing powerful.
Adaptive automata (Neto)
In 1994, [19] Neto introduced the machine he called a structured pushdown automaton, the core of adaptive automata theory as pursued by Iwai, [13] Pistori, [23] Bravo [14] and others. This formalism allows for the operations of inspection (similar to syntactic predicates, as noted above relating to Iwai's adaptive grammars), addition, and deletion of rules.

See also

Related Research Articles

<span class="mw-page-title-main">Context-free grammar</span> Type of formal grammar

In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form

In formal language theory, a context-free grammar, G, is said to be in Chomsky normal form if all of its production rules are of the form:

<span class="mw-page-title-main">Formal language</span> Sequence of words formed by specific rules

In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules.

<span class="mw-page-title-main">Standard Generalized Markup Language</span> Markup language

The Standard Generalized Markup Language is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on two postulates":

In computer science, Backus–Naur form or Backus normal form (BNF) is a metasyntax notation for context-free grammars, often used to describe the syntax of languages used in computing, such as computer programming languages, document formats, instruction sets and communication protocols. It is applied wherever exact descriptions of languages are needed: for instance, in official language specifications, in manuals, and in textbooks on programming language theory.

Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part.

An attribute grammar is a formal way to supplement a formal grammar with semantic information processing. Semantic information is stored in attributes associated with terminal and nonterminal symbols of the grammar. The values of attributes are result of attribute evaluation rules associated with productions of the grammar. Attributes allow to transfer information from anywhere in the abstract syntax tree to anywhere else, in a controlled and formal way.

Sheila Adele Greibach is a researcher in formal languages in computing, automata, compiler theory and computer science. She is an Emeritus Professor of Computer Science at the University of California, Los Angeles, and notable work include working with Seymour Ginsburg and Michael A. Harrison in context-sensitive parsing using the stack automaton model.

Tree-adjoining grammar (TAG) is a grammar formalism defined by Aravind Joshi. Tree-adjoining grammars are somewhat similar to context-free grammars, but the elementary unit of rewriting is the tree rather than the symbol. Whereas context-free grammars have rules for rewriting symbols as strings of other symbols, tree-adjoining grammars have rules for rewriting the nodes of trees as other trees.

In computer science, a parsing expression grammar (PEG) is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in 2004 and is closely related to the family of top-down parsing languages introduced in the early 1970s. Syntactically, PEGs also look similar to context-free grammars (CFGs), but they have a different interpretation: the choice operator selects the first match in PEG, while it is ambiguous in CFG. This is closer to how string recognition tends to be done in practice, e.g. by a recursive descent parser.

Extensible programming is a term used in computer science to describe a style of computer programming that focuses on mechanisms to extend the programming language, compiler and runtime environment. Extensible programming languages, supporting this style of programming, were an active area of work in the 1960s, but the movement was marginalized in the 1970s. Extensible programming has become a topic of renewed interest in the 21st century.

A GLR parser is an extension of an LR parser algorithm to handle non-deterministic and ambiguous grammars. The theoretical foundation was provided in a 1974 paper by Bernard Lang. It describes a systematic way to produce such algorithms, and provides uniform results regarding correctness proofs, complexity with respect to grammar classes, and optimization techniques. The first actual implementation of GLR was described in a 1984 paper by Masaru Tomita, it has also been referred to as a "parallel parser". Tomita presented five stages in his original work, though in practice it is the second stage that is recognized as the GLR parser.

Indexed languages are a class of formal languages discovered by Alfred Aho; they are described by indexed grammars and can be recognized by nested stack automata.

A syntactic predicate specifies the syntactic validity of applying a production in a formal grammar and is analogous to a semantic predicate that specifies the semantic validity of applying a production. It is a simple and effective means of dramatically improving the recognition strength of an LL parser by providing arbitrary lookahead. In their original implementation, syntactic predicates had the form “( α )?” and could only appear on the left edge of a production. The required syntactic condition α could be any valid context-free grammar fragment.

A structure editor, also structured editor or projectional editor, is any document editor that is cognizant of the document's underlying structure. Structure editors can be used to edit hierarchical or marked up text, computer programs, diagrams, chemical formulas, and any other type of content with clear and well-defined structure. In contrast, a text editor is any document editor used for editing plain text files.

In computer science, SYNTAX is a system used to generate lexical and syntactic analyzers (parsers) for all kinds of context-free grammars (CFGs) as well as some classes of contextual grammars. It has been developed at INRIA (France) for several decades, mostly by Pierre Boullier, but has become free software since 2007 only. SYNTAX is distributed under the CeCILL license.

In formal language theory, a grammar describes how to form strings from a language's alphabet that are valid according to the language's syntax. A grammar does not describe the meaning of the strings or what can be done with them in whatever context—only their form. A formal grammar is defined as a set of production rules for such strings in a formal language.

<span class="mw-page-title-main">History of compiler construction</span>

In computing, a compiler is a computer program that transforms source code written in a programming language or computer language, into another computer language. The most common reason for transforming source code is to create an executable program.

In computational linguistics, the term mildly context-sensitive grammar formalisms refers to several grammar formalisms that have been developed in an effort to provide adequate descriptions of the syntactic structure of natural language.

Syntactic parsing is the automatic analysis of syntactic structure of natural language, especially syntactic relations and labelling spans of constituents. It is motivated by the problem of structural ambiguity in natural language: a sentence can be assigned multiple grammatical parses, so some kind of knowledge beyond computational grammar rules are need to tell which parse is intended. Syntactic parsing is one of the important tasks in computational linguistics and natural language processing, and has been a subject of research since the mid-20th century with the advent of computers.

References

  1. Shutt, John N. "What is an Adaptive Grammar?" . Retrieved 6 February 2019.
  2. 1 2 3 Christiansen, Henning, "A Survey of Adaptable Grammars," ACM SIGPLAN Notices, Vol. 25 No. 11, pp. 35-44, Nov. 1990.
  3. 1 2 3 4 5 6 7 8 Shutt, John N., Recursive Adaptable Grammars , Master’s Thesis, Worcester Polytechnic Institute, 1993. (16 December 2003 emended revision.)
  4. 1 2 3 4 5 Jackson, Quinn Tyler, Adapting to Babel: Adaptivity and Context-Sensitivity in Parsing , Ibis Publications, Plymouth, Massachusetts, March 2006.
  5. Caracciolo di Forino, Alfonso, "Some Remarks on the Syntax of Symbolic Programming Languages," Communications of the ACM, Vol. 6, No. 8., pp. 456-460, August 1963.
  6. 1 2 3 Wegbreit, Ben, Studies in Extensible Programming Languages [ dead link ], ESD-TR-70-297, Harvard University, Cambridge, Massachusetts, May 1970. In book form, Garland Publishing, Inc., New York, 1980.
  7. 1 2 Hanford, K.V. & Jones, C.B., "Dynamic Syntax: A Concept for the Definition of the Syntax of Programming Languages," Annual Review in Automatic Programming 7, Pergamon Press, Oxford, pp. 115-142, 1973.
  8. 1 2 Burshteyn, Boris. "On the Modification of the Formal Grammar at Parse Time", ACM SIGPLAN Notices, Vol. 25 No. 5, pp. 117-123, May 1990.
  9. http://www.pcs.usp.br/~lta/union/index.php?cp=4&categoria=28 [ dead link ]
  10. 1 2 Burshteyn, Boris, "Generation and Recognition of Formal Languages by Modifiable Grammars," ACM SIGPLAN Notices, Vol. 25 No. 12, pp. 45-53, December 1990.
  11. 1 2 Boullier, Pierre, "Dynamic Grammars and Semantic Analysis," INRIA Research Report No. 2322, August 1994.
  12. John Shutt originally called his Recursive Adaptive Grammars by the name Recursive Adaptable Grammars, and notes his change to adaptive at this URL: John Shutt's MS Thesis.
  13. 1 2 3 4 5 Iwai, Margarete Keiko, Um formalismo gramatical adaptativo para linguagens dependentes de contexto, Doctoral thesis, Department of Engineering, University of São Paulo, Brazil, January 2000.
  14. 1 2 3 4 Bravo, César, Grámmaticas Livres de Contexto Adaptativas com verificação de aparência , Doctoral thesis, Department of Electrical Engineering, University of São Paulo, January 2004.
  15. Shutt, John N., "Imperative Adaptive Grammars" Web page dated 28 March 2001, at the URL: http://web.cs.wpi.edu/~jshutt/adapt/imperative.html
  16. Christiansen, Henning, "Syntax, Semantics, and Implementation Strategies for Programming Languages with Powerful Abstraction Mechanisms," Proceedings of the 18th Hawaii International Conference on System Sciences, Vol. 2, pp. 57-66, 1985.
  17. 1 2 Christiansen, Henning, "The Syntax and Semantics of Extensible Languages," Datalogiske skrifter 14, Roskilde University, 1988.
  18. Burshteyn, Boris, "USSA–Universal Syntax and Semantics Analyzer," ACM SIGPLAN Notices, Vol. 27 No. 1, pp. 42-60, January 1992.
  19. 1 2 Neto, João Jose, "Adaptive Automata for Context-Sensitive Languages," ACM SIGPLAN Notices, Vol. 29 No. 9, pp. 115-124, September 1994.
  20. Jackson, Quinn Tyler, "Adaptive Predicates in Natural Language Parsing," Perfection, Vol. 1 No. 4, April 2000.
  21. Okhotin, Alexander, Boolean Grammars: Expressive Power and Algorithms, Doctoral thesis, School of Computing, Queens University, Kingston, Ontario, August 2004.
  22. Neto, João Jose, "Adaptive Rule-Driven Devices: General Formulation and Case Study [ permanent dead link ]," B. W. Watson, D. Wood (Eds.): Implementation and Application of Automata 6th International Conference, CIAA 2001, Lecture Notes in Computer Science, Vol. 2494, Pretoria, South Africa, Springer-Verlag, pp. 234–250, 23–25 July 2001.
  23. 1 2 Pistori, Hemerson, Tecnologia Adaptativa em Engenharia de Computação: Estado da Arte e Aplicações , Doctoral thesis, Department of Electrical Engineering, University of São Paulo, 2003.
  24. Carmi, Adam, "Adapser: An LALR(1) Adaptive Parser [ permanent dead link ]," The Israeli Workshop on Programming Languages & Development Environments, Haifa, Israel, 1 July 2002.
  25. Salomaa, Arto, Formal Languages, Academic Press, 1973.
  26. Shutt, John & Rubinstein, Roy, "Self-Modifying Finite Automata," in B. Pehrson and I. Simon, editors, Technology and Foundations: Information Processing '94 Vol. I: Proceedings of 13th IFIP World Computer Congress, Amsterdam: North-Holland, pp. 493-498, 1994. (archive)