Range concatenation grammars

Last updated May 04, 2019

Range concatenation grammar (RCG) is a grammar formalism developed by Pierre Boullier ^[1] in 1998 as an attempt to characterize a number of phenomena of natural language, such as Chinese numbers and German word order scrambling, which are outside the bounds of the Mildly context-sensitive languages.^[2]

Though intended as a variant on Groenink's Literal movement grammars, RCGs treat the grammatical process more as a proof than as a production. Whereas LMGs produce a terminal string from a start predicate, RCGs aim to reduce a start predicate (which predicates of a terminal string) to the empty string, which constitutes a proof of the terminal strings membership in the language.

Literal movement grammars (LMGs) are a grammar formalism introduced by Groenink in 1995 intended to characterize certain extraposition phenomena of natural language such as topicalization and cross-serial dependencies. LMGs extend the class of CFGs by adding introducing pattern-matched function-like rewrite semantics, as well as the operations of variable binding and slash deletion.

Description

Formal definition

A Positive Range Concatenation Grammar (PRCG) is a tuple $G=(N,~T,~V,~S,~P)$ , where:

$N$ , $T$ and $V$ are disjoint finite sets of (respectively) predicate names, terminal symbols and variable names. Each predicate name has an associated arity given by the function $\dim :N\rightarrow \mathbb {N} \setminus \{0\}$ .
$S\in N$ is the start predicate name and verify $\dim(S)=1$ .
$P$ is a finite set of clauses of the form $\psi _{0}\rightarrow \psi _{1}\ldots \psi _{m}$ , where the $\psi _{i}$ are predicates of the form $A_{i}(\alpha _{1},\ldots ,\alpha _{\dim(A_{i})})$ with $A_{i}\in N$ and $\alpha _{i}\in (T\cup V)^{\star }$ .

A Negative Range Concatenation Grammar (NRCG) is defined like a PRCG, but with the addition that some predicates occurring in the right-hand side of a clause can have the form ${\overline {A_{i}(\alpha _{1},\ldots ,\alpha _{\dim(A_{i})})}}$ . Such predicates are called negative predicates.

A Range Concatenation Grammar is a positive or a negative one. Although PRCGs are technically NRCGs, the terms are used to highlight the absence (PRCG) or presence (NRCG) of negative predicates.

A range in a word $w\in T^{\star }$ is a couple $\langle l,r\rangle _{w}$ , with $0\leq l\leq r\leq n$ , where $n$ is the length of $w$ . Two ranges $\langle l_{1},r_{1}\rangle _{w}$ and $\langle l_{2},r_{2}\rangle _{w}$ can be concatenated iff $r_{1}=l_{2}$ , and we then have: $\langle l_{1},r_{1}\rangle _{w}\cdot \langle l_{2},r_{2}\rangle _{w}=\langle l_{1},r_{2}\rangle _{w}$ .

For a word $w=w_{1}w_{2}\ldots w_{n}$ , with $w_{i}\in T$ , the dotted notation for ranges is: $\langle l,r\rangle _{w}=w_{1}\ldots w_{l-1}\bullet w_{l}\ldots w_{r-1}\bullet w_{r}\ldots w_{n}$ .

Recognition of strings

Like LMGs, RCG clauses have the general schema $A(x_{1},...,x_{n})\to \alpha$ , where in an RCG, $\alpha$ is either the empty string or a string of predicates. The arguments $x_{i}$ consist of strings of terminal symbols and/or variable symbols, which pattern match against actual argument values like in LMG. Adjacent variables constitute a family of matches against partitions, so that the argument $xy$ , with two variables, matches the literal string $ab$ in three different ways: $x=\epsilon ,\ y=ab;\ x=a,\ y=b;\ x=ab,\ y=\epsilon$ .

Predicate terms come in two forms, positive (which produce the empty string on success), and negative (which produce the empty string on failure/if the positive term does not produce the empty string). Negative terms are denoted the same as positive terms, with an overbar, as in ${\overline {A(x_{1},...,x_{n})}}$ .

The rewrite semantics for RCGs is rather simple, identical to the corresponding semantics of LMGs. Given a predicate string $A(\alpha _{1},...,\alpha _{n})$ , where the symbols $\alpha _{i}$ are terminal strings, if there is a rule $A(x_{1},...,x_{n})\to \beta$ in the grammar that the predicate string matches, the predicate string is replaced by $\beta$ , substituting for the matched variables in each $x_{i}$ .

For example, given the rule $A(x,ayb)\to B(axb,y)$ , where $x$ and $y$ are variable symbols and $a$ and $b$ are terminal symbols, the predicate string $A(a,abb)$ can be rewritten as $B(aab,b)$ , because $A(a,abb)$ matches $A(x,ayb)$ when $x=a,\ y=b$ . Similarly, if there were a rule $A(x,ayb)\to A(x,x)\ A(y,y)$ , $A(a,abb)$ could be rewritten as $A(a,a)\ A(b,b)$ .

A proof/recognition of a string $\alpha$ is done by showing that $S(\alpha )$ produces the empty string. For the individual rewrite steps, when multiple alternative variable matches are possible, any rewrite which could lead the whole proof to succeed is considered. Thus, if there is at least one way to produce the empty string from the initial string $S(\alpha )$ , the proof is considered a success, regardless of how many other ways to fail exist.

Example

RCGs are capable of recognizing the non-linear index language $\{www:w\in \{a,b\}^{*}\}$ as follows:

Letting x, y, and z be variable symbols:

$S(xyz)\to A(x,y,z)$

$A(ax,ay,az)\to A(x,y,z)$

$A(bx,by,bz)\to A(x,y,z)$

$A(\epsilon ,\epsilon ,\epsilon )\to \epsilon$

The proof for abbabbabb is then

$S(abbabbabb)\Rightarrow A(abb,abb,abb)\Rightarrow A(bb,bb,bb)\Rightarrow A(b,b,b)\Rightarrow A(\epsilon ,\epsilon ,\epsilon )\Rightarrow \epsilon$

Or, using the more correct dotted notation for ranges:

$S(\bullet {}abbabbabb\bullet {})\Rightarrow A(\bullet {}abb\bullet {}abbabb,abb\bullet {}abb\bullet {}abb,abbabb\bullet {}abb\bullet {})\Rightarrow A(a\bullet {}bb\bullet {}abbabb,abba\bullet {}bb\bullet {}abb,abbabba\bullet {}bb\bullet {})$ $\Rightarrow A(ab\bullet {}b\bullet {}abbabb,abbab\bullet {}b\bullet {}abb,abbabbab\bullet {}b\bullet {})\Rightarrow A(\epsilon ,\epsilon ,\epsilon )\Rightarrow \epsilon$

Related Research Articles

In formal language theory, a context-free grammar (CFG) is a certain type of formal grammar: a set of production rules that describe all possible strings in a given formal language. Production rules are simple replacements. For example, the rule

In mathematics, a differential operator is an operator defined as a function of the differentiation operator. It is helpful, as a matter of notation first, to consider differentiation as an abstract operation that accepts a function and returns another function.

In mathematics, for a given complex Hermitian matrix M and nonzero vector x, the Rayleigh quotient $, is defined as:$

In physics, the S-matrix or scattering matrix relates the initial state and the final state of a physical system undergoing a scattering process. It is used in quantum mechanics, scattering theory and quantum field theory (QFT).

In quantum field theory, the LSZ reduction formula is a method to calculate S-matrix elements from the time-ordered correlation functions of a quantum field theory. It is a step of the path that starts from the Lagrangian of some quantum field theory and leads to prediction of measurable quantities. It is named after the three German physicists Harry Lehmann, Kurt Symanzik and Wolfhart Zimmermann.

In the formal language theory of computer science, left recursion is a special case of recursion where a string is recognized as part of a language by the fact that it decomposes into a string from that same language and a suffix. For instance, $can be recognized as a sum because it can be broken into, also a sum, and, a suitable suffix.$

In mathematical analysis an oscillatory integral is a type of distribution. Oscillatory integrals make rigorous many arguments that, on a naive level, appear to use divergent integrals. It is possible to represent approximate solution operators for many differential equations as oscillatory integrals.

Conjunctive grammars are a class of formal grammars studied in formal language theory. They extend the basic type of grammars, the context-free grammars, with a conjunction operation. Besides explicit conjunction, conjunctive grammars allow implicit disjunction represented by multiple rules for a single nonterminal symbol, which is the only logical connective expressible in context-free grammars. Conjunction can be used, in particular, to specify intersection of languages. A further extension of conjunctive grammars known as Boolean grammars additionally allows explicit negation.

A matrix grammar is a formal grammar in which instead of single productions, productions are grouped together into finite sequences. A production cannot be applied separately, it must be applied in sequence. In the application of such a sequence of productions, the rewriting is done in accordance to each production in sequence, the first one, second one etc. till the last production has been used for rewriting. The sequences are referred to as matrices.

The time-evolving block decimation (TEBD) algorithm is a numerical scheme used to simulate one-dimensional quantum many-body systems, characterized by at most nearest-neighbour interactions. It is dubbed Time-evolving Block Decimation because it dynamically identifies the relevant low-dimensional Hilbert subspaces of an exponentially larger original Hilbert space. The algorithm, based on the Matrix Product States formalism, is highly efficient when the amount of entanglement in the system is limited, a requirement fulfilled by a large class of quantum many-body systems in one dimension.

In many-body theory, the term Green's function is sometimes used interchangeably with correlation function, but refers specifically to correlators of field operators or creation and annihilation operators.

Interaction nets are a graphical model of computation devised by Yves Lafont in 1990 as a generalisation of the proof structures of linear logic. An interaction net system is specified by a set of agent types and a set of interaction rules. Interaction nets are an inherently distributed model of computation in the sense that computations can take place simultaneously in many parts of an interaction net, and no synchronisation is needed. The latter is guaranteed by the strong confluence property of reduction in this model of computation. Thus interaction nets provide a natural language for massive parallelism. Interaction nets are at the heart of many implementations of the lambda calculus, such as efficient closed reduction and optimal, in Lévy's sense, Lambdascope.

In Riemannian geometry, the Gauss–Codazzi–Mainardi equations are fundamental equations in the theory of embedded hypersurfaces in a Euclidean space, and more generally submanifolds of Riemannian manifolds. They also have applications for embedded hypersurfaces of pseudo-Riemannian manifolds.

In Riemannian geometry, Gauss's lemma asserts that any sufficiently small sphere centered at a point in a Riemannian manifold is perpendicular to every geodesic through the point. More formally, let M be a Riemannian manifold, equipped with its Levi-Civita connection, and p a point of M. The exponential map is a mapping from the tangent space at p to M:

A quantum t-design is a probability distribution over either pure quantum states or unitary operators which can duplicate properties of the probability distribution over the Haar measure for polynomials of degree t or less. Specifically, the average of any polynomial function of degree t over the design is exactly the same as the average over Haar measure. Here the Haar measure is a uniform probability distribution over all quantum states or over all unitary operators. Quantum t-designs are so called because they are analogous to t-designs in classical statistics, which arose historically in connection with the problem of design of experiments. Two particularly important types of t-designs in quantum mechanics are spherical and unitary t-designs.

In machine learning, a subfield of computer science, learning with errors (LWE) is the problem to infer a linear $-ary function over a finite ring from given samples some of which may be erroneous. The LWE problem is conjectured to be hard to solve, and thus be useful in cryptography.$

Generalized context-free grammar (GCFG) is a grammar formalism that expands on context-free grammars by adding potentially non-context free composition functions to rewrite rules. Head grammar is an instance of such a GCFG which is known to be especially adept at handling a wide variety of non-CF properties of natural language.

Coherent states have been introduced in a physical context, first as quasi-classical states in quantum mechanics, then as the backbone of quantum optics and they are described in that spirit in the article Coherent states. However, they have generated a huge variety of generalizations, which have led to a tremendous literature in mathematical physics. In this article, we sketch the main directions of research on this line. For further details, we refer to several existing surveys.

This article summarizes important identities in exterior calculus.

References

↑ Boullier, Pierre (Jan 1998). Proposal for a Natural Language Processing Syntactic Backbone (PDF) (Technical report). 3342. INRIA Rocquencourt (France).
↑ Pierre Boullier (1999). "Chinese Numbers, MIX, Scrambling, and Range Concatenation Grammars". Proc. EACL (PDF). pp. 53–60. Archived from the original (PDF) on 2003-05-15.
↑ Laura Kallmeyer (2010). Parsing Beyond Context-Free Grammars. Springer Science & Business Media. p. 37. ISBN 978-3-642-14846-0. citing http://mjn.host.cs.st-andrews.ac.uk/publications/2001d.pdf

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[boullier1998-1] Boullier, Pierre (Jan 1998). Proposal for a Natural Language Processing Syntactic Backbone (PDF) (Technical report). 3342. INRIA Rocquencourt (France).

[boullier1999-2] Pierre Boullier (1999). "Chinese Numbers, MIX, Scrambling, and Range Concatenation Grammars". Proc. EACL (PDF). pp. 53–60. Archived from the original (PDF) on 2003-05-15.

[Kallmeyer2010-3] Laura Kallmeyer (2010). Parsing Beyond Context-Free Grammars. Springer Science & Business Media. p. 37. ISBN 978-3-642-14846-0. citing http://mjn.host.cs.st-andrews.ac.uk/publications/2001d.pdf