Literal movement grammar

Last updated

In linguistics and theoretical computer science, literal movement grammars (LMGs) are a grammar formalism intended to characterize certain extraposition phenomena of natural language such as topicalization and cross-serial dependency. LMGs extend the class of context free grammars (CFGs) by adding introducing pattern-matched function-like rewrite semantics, as well as the operations of variable binding and slash deletion. LMGs were introduced by A.V. Groenink in 1995. [1]

Contents

Description

The basic rewrite operation of an LMG is very similar to that of a CFG, with the addition of arguments to the non-terminal symbols. Where a context-free rewrite rule obeys the general schema for some non-terminal and some string of terminals and/or non-terminals , an LMG rewrite rule obeys the general schema , where X is a non-terminal with arity n (called a predicate in LMG terminology), and is a string of "items", as defined below. The arguments are strings of terminal symbols and/or variable symbols defining an argument pattern. In the case where an argument pattern has multiple adjacent variable symbols, the argument pattern will match any and all partitions of the actual value that unify. Thus, if the predicate is and the actual pattern is , there are three valid matches: . In this way, a single rule is actually a family of alternatives.

An "item" in a literal movement grammar is one of

In a rule like , the variable y is bound to whatever terminal string the g predicate produces, and in and , all occurrences of y are replaced by that string, and and are produced as if terminal string had always been there.

An item , where x is something that produces a terminal string (either a terminal string itself or some predicate), and y is a string of terminals and/or variables, is rewritten as the empty string () if and only if , and otherwise cannot be rewritten at all.

Example

LMGs can characterize the non-CF language as follows:

The derivation for aabbcc, using parentheses also for grouping, is therefore

Computational power

Languages generated by LMGs contain the context-free languages as a proper subset, as every CFG is an LMG where all predicates have arity 0 and no production rule contains variable bindings or slash deletions.

Related Research Articles

Context-free grammar Type of formal grammar

In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form

Pushdown automaton Type of automaton

In the theory of computation, a branch of theoretical computer science, a pushdown automaton (PDA) is a type of automaton that employs a stack.

In computer science, an LL parser is a top-down parser for a restricted context-free language. It parses the input from Left to right, performing Leftmost derivation of the sentence.

Grammar theory to model symbol strings originated from work in computational linguistics aiming to understand the structure of natural languages. Probabilistic context free grammars (PCFGs) have been applied in probabilistic modeling of RNA structures almost 40 years after they were introduced in computational linguistics.

In the formal language theory of computer science, left recursion is a special case of recursion where a string is recognized as part of a language by the fact that it decomposes into a string from that same language and a suffix. For instance, can be recognized as a sum because it can be broken into , also a sum, and , a suitable suffix.

Affine arithmetic (AA) is a model for self-validated numerical analysis. In AA, the quantities of interest are represented as affine combinations of certain primitive variables, which stand for sources of uncertainty in the data or approximations made during the computation.

ID/LP Grammars are a subset of Phrase Structure Grammars, differentiated from other formal grammars by distinguishing between immediate dominance (ID) and linear precedence (LP) constraints. Whereas traditional phrase structure rules incorporate dominance and precedence into a single rule, ID/LP Grammars maintains separate rule sets which need not be processed simultaneously. ID/LP Grammars are used in Computational Linguistics.

Conjunctive grammars are a class of formal grammars studied in formal language theory. They extend the basic type of grammars, the context-free grammars, with a conjunction operation. Besides explicit conjunction, conjunctive grammars allow implicit disjunction represented by multiple rules for a single nonterminal symbol, which is the only logical connective expressible in context-free grammars. Conjunction can be used, in particular, to specify intersection of languages. A further extension of conjunctive grammars known as Boolean grammars additionally allows explicit negation.

Covariant formulation of classical electromagnetism

The covariant formulation of classical electromagnetism refers to ways of writing the laws of classical electromagnetism in a form that is manifestly invariant under Lorentz transformations, in the formalism of special relativity using rectilinear inertial coordinate systems. These expressions both make it simple to prove that the laws of classical electromagnetism take the same form in any inertial coordinate system, and also provide a way to translate the fields and forces from one frame to another. However, this is not as general as Maxwell's equations in curved spacetime or non-rectilinear coordinate systems.

A matrix grammar is a formal grammar in which instead of single productions, productions are grouped together into finite sequences. A production cannot be applied separately, it must be applied in sequence. In the application of such a sequence of productions, the rewriting is done in accordance to each production in sequence, the first one, second one etc. till the last production has been used for rewriting. The sequences are referred to as matrices.

Interaction nets are a graphical model of computation devised by Yves Lafont in 1990 as a generalisation of the proof structures of linear logic. An interaction net system is specified by a set of agent types and a set of interaction rules. Interaction nets are an inherently distributed model of computation in the sense that computations can take place simultaneously in many parts of an interaction net, and no synchronisation is needed. The latter is guaranteed by the strong confluence property of reduction in this model of computation. Thus interaction nets provide a natural language for massive parallelism. Interaction nets are at the heart of many implementations of the lambda calculus, such as efficient closed reduction and optimal, in Lévy's sense, Lambdascope.

Indexed grammars are a generalization of context-free grammars in that nonterminals are equipped with lists of flags, or index symbols. The language produced by an indexed grammar is called an indexed language.

Combinatory categorial grammar (CCG) is an efficiently parsable, yet linguistically expressive grammar formalism. It has a transparent interface between surface syntax and underlying semantic representation, including predicate–argument structure, quantification and information structure. The formalism generates constituency-based structures and is therefore a type of phrase structure grammar.

Head grammar (HG) is a grammar formalism introduced in Carl Pollard (1984) as an extension of the context-free grammar class of grammars. Head grammar is therefore a type of phrase structure grammar, as opposed to a dependency grammar. The class of head grammars is a subset of the linear context-free rewriting systems.

Range concatenation grammar (RCG) is a grammar formalism developed by Pierre Boullier in 1998 as an attempt to characterize a number of phenomena of natural language, such as Chinese numbers and German word order scrambling, which are outside the bounds of the Mildly context-sensitive languages.

Generalized context-free grammar (GCFG) is a grammar formalism that expands on context-free grammars by adding potentially non-context free composition functions to rewrite rules. Head grammar is an instance of such a GCFG which is known to be especially adept at handling a wide variety of non-CF properties of natural language.

Controlled grammars are a class of grammars that extend, usually, the context-free grammars with additional controls on the derivations of a sentence in the language. A number of different kinds of controlled grammars exist, the four main divisions being Indexed grammars, grammars with prescribed derivation sequences, grammars with contextual conditions on rule application, and grammars with parallelism in rule application. Because indexed grammars are so well established in the field, this article will address only the latter three kinds of controlled grammars.

In mathematics, the Kodaira–Spencer map, introduced by Kunihiko Kodaira and Donald C. Spencer, is a map associated to a deformation of a scheme or complex manifold X, taking a tangent space of a point of the deformation space to the first cohomology group of the sheaf of vector fields on X.

Buchholz's psi-functions are a hierarchy of single-argument ordinal functions introduced by German mathematician Wilfried Buchholz in 1986. These functions are a simplified version of the -functions, but nevertheless have the same strength as those. Later on this approach was extended by Jaiger and Schütte.

Roth's theorem on arithmetic progressions is a result in additive combinatorics concerning the existence of arithmetic progressions in subsets of the natural numbers. It was first proven by Klaus Roth in 1953. Roth's Theorem is a special case of Szemerédi's Theorem for the case .

References

  1. Groenink, Annius V. 1995. Literal Movement Grammars. In Proceedings of the 7th EACL Conference.