Tree-adjoining grammar

Last updated

Tree-adjoining grammar (TAG) is a grammar formalism defined by Aravind Joshi. Tree-adjoining grammars are somewhat similar to context-free grammars, but the elementary unit of rewriting is the tree rather than the symbol. Whereas context-free grammars have rules for rewriting symbols as strings of other symbols, tree-adjoining grammars have rules for rewriting the nodes of trees as other trees (see tree (graph theory) and tree (data structure)).

Contents

History

TAG originated in investigations by Joshi and his students into the family of adjunction grammars (AG), [1] the "string grammar" of Zellig Harris. [2] AGs handle exocentric properties of language in a natural and effective way, but do not have a good characterization of endocentric constructions; the converse is true of rewrite grammars, or phrase-structure grammar (PSG). In 1969, Joshi introduced a family of grammars that exploits this complementarity by mixing the two types of rules. A few very simple rewrite rules suffice to generate the vocabulary of strings for adjunction rules. This family is distinct from the Chomsky-Schützenberger hierarchy but intersects it in interesting and linguistically relevant ways. [3] The center strings and adjunct strings can also be generated by a dependency grammar, avoiding the limitations of rewrite systems entirely. [4] [5]

Description

Schematic illustration of the adjunction operation: the tree
a
{\displaystyle \alpha }
combines with an auxiliary tree
b
{\displaystyle \beta }
at a node labelled with non-terminal symbol
X
{\displaystyle X}
, which must also be the root and foot node of the auxiliary tree. The resulting tree is deeper than the original one. TAG-schematic-adjunction.svg
Schematic illustration of the adjunction operation: the tree combines with an auxiliary tree at a node labelled with non-terminal symbol , which must also be the root and foot node of the auxiliary tree. The resulting tree is deeper than the original one.
Schematic illustration of the substitution operation: two trees (
a
{\displaystyle \alpha }
and
b
{\displaystyle \beta }
; note that these need not be elementary trees) are joined on a node labelled with non-terminal
X
{\displaystyle X}
; this node is one of the leaf nodes marked for substitution in
a
{\displaystyle \alpha }
, and the root of
b
{\displaystyle \beta }
is a node with the same non-terminal. TAG-schematic-substitution.svg
Schematic illustration of the substitution operation: two trees ( and ; note that these need not be elementary trees) are joined on a node labelled with non-terminal ; this node is one of the leaf nodes marked for substitution in , and the root of is a node with the same non-terminal.

A TAG can be defined as a 5-tuple with: [6]

Additionally, TAGs with adjunction constraints on nodes have been introduced. An adjunction constraint on a node can: completely disallow adjunction (NA, for null adjunction); make it obligatory (OA); or only allow selected auxiliary trees to adjoin (SA). [6]

The two types of basic tree in TAG—initial trees (often denoted by '') and auxiliary trees ('')—are together called elementary trees. Initial trees represent basic valency relations, while auxiliary trees allow for recursion. [7]

A derivation starts with an initial tree, which is combined with further trees via either substitution or adjunction. Substitution replaces a frontier node with an initial tree whose root node has the same label as the leaf for which it is substituted. Adjunction inserts an auxiliary tree—at either a frontier or an internal node—whose root and foot labels both match the label of the node whereat it adjoins. Adjunction can thus have the effect of inserting an auxiliary tree into the center of another tree, which operation may be applied recursively. [4]

Complexity and application

For every context-free grammar, a tree-adjoining grammar can be generated which accepts the same string-language. Thus, TAGs can generate all context-free languages; [8] they can generate, as well, some—but not all—context-sensitive languages.

Two examples of context-sensitive/non-context-free languages that TAGs (with adjunction constraints) can generate are: [8]

Tree-adjoining grammars are more powerful (in terms of weak generative capacity) than context-free grammars, but less powerful than linear context-free rewriting systems, [9] indexed, [note 1] or context-sensitive grammars.

Two examples of context-sensitive languages that TAGs cannot generate are: [8]


The processing of languages that TAGs can generate may be represented by an embedded pushdown automaton.

Tree-adjoining grammars are often described as mildly context-sensitive. These grammar classes are conjectured to be powerful enough to model natural languages while remaining efficiently parsable in the general case. [8]

Equivalences

Vijay-Shanker and Weir (1994) [10] demonstrated that linear indexed grammars, combinatory categorial grammar, tree-adjoining grammars, and head grammars are weakly equivalent formalisms, in that they all define the same string languages.

Variants

Lexicalized tree-adjoining grammars (LTAG) are a variant of TAG in which each elementary tree (initial or auxiliary) is associated with a lexical item. Each tree has at least one terminal as a leaf node, which is then called the (lexical) anchor of the tree. A lexicalized grammar for English has been developed by the XTAG Research Group of the Institute for Research in Cognitive Science at the University of Pennsylvania. [5]

Other variants of TAG allow multi-component trees, trees with multiple foot nodes, and other extensions.

See also

Notes

  1. This, since—for each tree-adjoining grammar—a linear indexed grammar can be found that produces the same language (see below); and, for the latter, a weakly equivalent (proper) indexed grammar can be found in turn (see: Indexed grammar#Computational Power).

References

  1. Joshi, Aravind; S. R. Kosaraju; H. Yamada (1969). "String Adjunct Grammars" (Document). Proceedings Tenth Annual Symposium on Automata Theory, Waterloo, Canada.Joshi, Aravind K.; Kosaraju, S. Rao; Yamada, H. M. (1972), "String Adjunct Grammars: I. Local and Distributed Adjunction", Information and Control, 21 (2): 93–116, doi: 10.1016/S0019-9958(72)90051-4 Joshi, Aravind K.; Kosaraju, S. Rao; Yamada, H. M. (1972), "String Adjunct Grammars: II. Equational Representation, Null Symbols, and Linguistic Relevance", Information and Control, 21 (3): 235–260, doi: 10.1016/S0019-9958(72)80005-6
  2. Harris, Zellig S. (1962). String analysis of sentence structure. Papers on Formal Linguistics. Vol. 1. The Hague: Mouton & Co.
  3. Joshi, Aravind (1969). "Properties of Formal Grammars with Mixed Types of Rules and Their Linguistic Relevance" (Document). Proceedings Third International Symposium on Computational Linguistics, Stockholm, Sweden.
  4. 1 2 Joshi, Aravind; Owen Rambow (2003). "A Formalism for Dependency Grammar Based on Tree Adjoining Grammar" (PDF). Proceedings of the Conference on Meaning-Text Theory.
  5. 1 2 "A Lexicalized Tree Adjoining Grammar for English".
  6. 1 2 Joshi, Aravind K. Joshi; Shabes, Yves (March 1991). Tree-adjoning grammars and lexicalized grammars. MS-CIS-91-22 (Technical report). Department of Computer and Information Science, University of Pennsylvania.
  7. Jurafsky, Daniel; James H. Martin (2000). Speech and Language Processing. Upper Saddle River, NJ: Prentice Hall. p. 354.
  8. 1 2 3 4 Joshi, Aravind (1985). "How much context-sensitivity is necessary for characterizing structural descriptions". In D. Dowty; L. Karttunen; A. Zwicky (eds.). Natural Language Processing: Theoretical, Computational, and Psychological Perspectives . New York, NY: Cambridge University Press. pp.  206–250. ISBN   9780521262033.
  9. Kallmeyer, Laura (2010). Parsing Beyond Context-Free Grammars. Springer. Here: p.215-216
  10. Vijay-Shanker, K. and Weir, David J. 1994. The Equivalence of Four Extensions of Context-Free Grammars. Mathematical Systems Theory 27(6): 511–546.