ID/LP Grammars are a subset of Phrase Structure Grammars, differentiated from other formal grammars by distinguishing between immediate dominance (ID) and linear precedence (LP) constraints. Whereas traditional phrase structure rules incorporate dominance and precedence into a single rule, ID/LP Grammars maintains separate rule sets which need not be processed simultaneously. ID/LP Grammars are used in Computational Linguistics.
For example, a typical phrase structure rule such as , indicating that an S-node dominates an NP-node and a VP-node, and that the NP precedes the VP in the surface string. In ID/LP Grammars, this rule would only indicate dominance, and a linear precedence statement, such as , would also be given.
The idea first came to prominence as part of Generalized Phrase Structure Grammar; [1] [2] the ID/LP Grammar approach is also used in head-driven phrase structure grammar, [3] lexical functional grammar, and other unification grammars.
Current work in the Minimalist Program also attempts to distinguish between dominance and ordering. For instance, recent papers by Noam Chomsky have proposed that, while hierarchical structure is the result of the syntactic structure-building operation Merge, linear order is not determined by this operation, and is simply the result of externalization (oral pronunciation, or, in the case of sign language, manual signing). [4] [5] [6]
Immediate dominance is the asymmetrical relationship between the mother node of a parse tree and its daughters, where the mother node (to the left of the arrow) is said to immediately dominate the daughter nodes (those to the right of the arrow), but the daughters do not immediately dominate the mother. The daughter nodes are also dominated by any node that immediately dominates the mother node, however this is not an immediate dominance relation.
For example, the context free rule , shows that the node labelled A (mother node) immediately dominates nodes labelled B, C, and D, (daughter nodes) and nodes labelled B, C, and D can be immediately dominated by a node labelled A.
Linear precedence is the order relationship of sister nodes. LP constraints specify in what order sister nodes under the same mother can appear. Nodes that surface earlier in strings precede their sisters. [2] LP can be shown in phrase structure rules in the form to mean B precedes C precedes D, as shown in the tree below.
A rule that has ID constraints but not LP is written with commas between the daughter nodes, for example . Since there is no fixed order for the daughter nodes, it is possible that all three of the trees shown here are generated by this rule.
Alternatively, these relationships can be expressed through linear precedence statements, such as , to mean that anytime B and C are sisters, B must precede C. [2] [7]
The principle of transitivity can be applied to LP relations which means that if and , then as well. LP relationships are asymmetric: if B precedes C, C can never precede B. An LP relationship where there can be no intervening nodes is called immediate precedence, while an LP where there can be intervening nodes (those derived from the principle of transitivity) are said to have weak precedence. [8]
For a string to be grammatical in an ID/LP Grammar, it must belong to a local subtree that follows at least one ID rule and all LP statements of the grammar. If every possible string generated by the grammar fits this criterion, than it is an ID/LP Grammar. Additionally, for a grammar to be able to be written in ID/LP format, it must have the property of Exhaustive Constant Partial Ordering (ECPO): namely that at least part of the ID/LP relations in one rule is observed in all other rules. [2] For example, the set of rules:
(1)
(2)
does not have the ECPO property, because (1) says that C must always precede D, while (2) says that D must always precede C.
Since LP statements apply regardless of the ID rule context, they allow us to make generalizations across the whole grammar. [2] [7] For example, given the LP statement , where V is the head of a VP, this means that in any clause in any sentence, V will always surface before its DP sister [7] in any context, as seen in the following examples.
Lucy won the race.
Ava told Sara to read a book.
This can be generalized into a rule that holds across English, , where X is the head of any phrase and YP is its complement. Non-ID/LP Grammars are unable to make such generalizations across the whole grammar, and so must repeat ordering restrictions for each individual context. [7]
Separating LP requirements from ID rules also accounts for the phenomena of free word order in natural language. For example, in English it is possible to put adverbs before or after a verb and have both strings be grammatical. [7]
John suddenly screamed.John screamed suddenly.
A traditional PS rule would require two separate rules, but this can be described by the single ID/LP rule .This property of ID/LP Grammars enables easier cross linguistic generalizations by describing the language specific differences in constituent order with LP statements, separately from the ID rules that are similar across languages. [7]
Two parsing algorithms used to parse ID/LP Grammars are the Earley Parser and Shieber's algorithm. [9]
ID and LP rules impose constraints on sentence strings; [9] when dealing with large strings, [9] the constraints of these rules can cause the parsed string to become infinite, thus making parsing difficult. The Earley Parser solves this by altering [10] the format of an ID/LP Grammar into a Context Free Grammar (CFG), dividing the ID/LP Grammar into an Ordered Context Free Grammar (CFG) and Unordered Context Free Grammar (UCFG). This allows the two algorithms to parse strings more efficiently; [9] in particular, the Earley Parser uses a point tracking method that follows a linear path established by the LP rules. [9] In a CFG, LP rules do not allow repeated constituents in the parsed string, but an UCFG allows repeated constituents within the parsed strings. [9] If the ID/LP Grammar is converted to a UCFG then the LP rules do not dominate during the parsing process, however, it still follows the point tracking method.
After the ID/LP Grammar has been converted to the equivalent form within a CFG, the algorithm will analyze the string. Let stand for start and stand for the elements of the string and it also represents Syntactic Categories. The algorithm then analyzes the string and identifies the following:
(1)
(2) (is being predicted)
(3)
The parsed strings are then used together to form a Parse List [10] for example:
which the list will help determine whether the completed production element () is accepted within the main string. It does this by looking whether the produced individual strings are found in the Parse List. If one or all of the individual strings are not found within the Parse List, then the overall string will fail. If one or all of the individual strings are found in the Parse List, then the overall string will be accepted. [10]
The UCFG is the appropriate equivalent to convert the ID/LP Grammar into in order to use the Earley Parser. [9] This algorithm read strings similarly to how it parses CFG, however, in this case the order of elements is not enforced; resulting in lack of LP rule enforcement. This allows some elements to be repeated within the parsed strings and the UCFG accepts empty multi-sets along with filled multi-sets within its strings. [9] For example:
(1)
(2) (are being predicted)
(3)
When parsing a string that contains a number of unordered elements, the Earley Parser treats it as a Permutation, Y!, and generates each string individually instead of using one string to represent the repeated parsed strings. [9] Whenever the dot moves over one element, the algorithm begins to generate parsed strings of the elements on the right edge in random positions until there are no more elements in the right edge set. Let X0 represent the original string and X1 as the first parsed string; for example:
string X1 will produce, 3! = 6, different parsed strings of the right edge set:
(1) (4)
(2) (5)
(3) (6)
The Earley Parser applies each string to the individual rules of a grammar [9] and this results in very large sets.The large sets is partly resulted in the conversion of ID/LP Grammar into an equivalent grammar, however, parsing the overall ID/LP Grammar is difficult to begin with. [9]
The foundation of Shieber's Algorithm [9] is based on the Earley Parser for CFG, [10] however, it does not require the ID/LP Grammar to be converted into a different grammar [10] in order to be parsed. ID rules can be parsed in a separate form, S →ID {V, NP, S}, from the LP rules, V < S. [10] Shieber compared parsing of a CFG to an ordered ID/LP Grammar string and Barton compared the parsing of a UCFG to an unordered ID/LP Grammar string.
Parsing an ID/LP Grammar, directly, generates a set list that will determine whether the production of the string will be accepted or fail. The algorithm follows 6 Steps (the symbols used can also represents the Syntactic Categories):
Steps 2-3 are repeated exhaustively [10] until no more new items can be added and then continue on to Step 4. Steps 5-6 are also exhaustively repeated [10] until no further new items can be added to the set list. The string will be accepted if a string behaves or resembles the production, is an element of . [10] For example:
Set Lists | Items |
---|---|
The complete production of is accepted and produces the following production string: . [10]
The Unordered ID/LP Grammar follow the above, 6 step algorithm, in order to parse the strings. The noticeable difference is in the productions of each set list; there is one string that represents the many individual strings within one list. [9] The table below shows the set list of Table 1.0, in an unordered grammar:
Set List | Items |
---|---|
The complete production string, results in a similar string as the ordered ID/LP Grammar; however, the order of the items within the string is not enforced. The final string is accepted as it matches the original string's elements.
Notice how the Unordered version of ID/LP Grammar contains mush less strings than the UCFG; Shieber's Algorithm use one string to represent the multiple different strings for repeated elements. Both algorithms can parse Ordered Grammars equally, however, Shieber's Algorithm seems to be more efficient [9] when parsing an Unordered Grammar.
The Chomsky hierarchy in the fields of formal language theory, computer science, and linguistics, is a containment hierarchy of classes of formal grammars. A formal grammar describes how to form strings from a language's vocabulary that are valid according to the language's syntax. The linguist Noam Chomsky theorized that four different classes of formal grammars existed that could generate increasingly complex languages. Each class can also completely generate the language of all inferior classes.
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules can be applied to a nonterminal symbol regardless of its context. In particular, in a context-free grammar, each production rule is of the form
In computer science, the Earley parser is an algorithm for parsing strings that belong to a given context-free language, though it may suffer problems with certain nullable grammars. The algorithm, named after its inventor, Jay Earley, is a chart parser that uses dynamic programming; it is mainly used for parsing in computational linguistics. It was first introduced in his dissertation in 1968.
In computer science, an LL parser is a top-down parser for a restricted context-free language. It parses the input from Left to right, performing Leftmost derivation of the sentence.
In computer science, a parsing expression grammar (PEG) is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in 2004 and is closely related to the family of top-down parsing languages introduced in the early 1970s. Syntactically, PEGs also look similar to context-free grammars (CFGs), but they have a different interpretation: the choice operator selects the first match in PEG, while it is ambiguous in CFG. This is closer to how string recognition tends to be done in practice, e.g. by a recursive descent parser.
In the formal language theory of computer science, left recursion is a special case of recursion where a string is recognized as part of a language by the fact that it decomposes into a string from that same language and a suffix. For instance, can be recognized as a sum because it can be broken into , also a sum, and , a suitable suffix.
The simply typed lambda calculus, a form of type theory, is a typed interpretation of the lambda calculus with only one type constructor that builds function types. It is the canonical and simplest example of a typed lambda calculus. The simply typed lambda calculus was originally introduced by Alonzo Church in 1940 as an attempt to avoid paradoxical use of the untyped lambda calculus.
In computer science, a Wirth–Weber relationship between a pair of symbols is necessary to determine if a formal grammar is a simple precedence grammar. In such a case, the simple precedence parser can be used. The relationship is named after computer scientists Niklaus Wirth and Helmut Weber.
In mathematics, the interior product is a degree −1 (anti)derivation on the exterior algebra of differential forms on a smooth manifold. The interior product, named in opposition to the exterior product, should not be confused with an inner product. The interior product is sometimes written as
The Duffing equation, named after Georg Duffing (1861–1944), is a non-linear second-order differential equation used to model certain damped and driven oscillators. The equation is given by where the (unknown) function is the displacement at time t, is the first derivative of with respect to time, i.e. velocity, and is the second time-derivative of i.e. acceleration. The numbers and are given constants.
In the mathematical discipline of set theory, there are many ways of describing specific countable ordinals. The smallest ones can be usefully and non-circularly expressed in terms of their Cantor normal forms. Beyond that, many ordinals of relevance to proof theory still have computable ordinal notations. However, it is not possible to decide effectively whether a given putative ordinal notation is a notation or not ; various more-concrete ways of defining ordinals that definitely have notations are available.
In mathematics, the Leray–Hirsch theorem is a basic result on the algebraic topology of fiber bundles. It is named after Jean Leray and Guy Hirsch, who independently proved it in the late 1940s. It can be thought of as a mild generalization of the Künneth formula, which computes the cohomology of a product space as a tensor product of the cohomologies of the direct factors. It is a very special case of the Leray spectral sequence.
Combinatory categorial grammar (CCG) is an efficiently parsable, yet linguistically expressive grammar formalism. It has a transparent interface between surface syntax and underlying semantic representation, including predicate–argument structure, quantification and information structure. The formalism generates constituency-based structures and is therefore a type of phrase structure grammar.
Head grammar (HG) is a grammar formalism introduced in Carl Pollard (1984) as an extension of the context-free grammar class of grammars. Head grammar is therefore a type of phrase structure grammar, as opposed to a dependency grammar. The class of head grammars is a subset of the linear context-free rewriting systems.
Range concatenation grammar (RCG) is a grammar formalism developed by Pierre Boullier in 1998 as an attempt to characterize a number of phenomena of natural language, such as Chinese numbers and German word order scrambling, which are outside the bounds of the mildly context-sensitive languages.
The table of chords, created by the Greek astronomer, geometer, and geographer Ptolemy in Egypt during the 2nd century AD, is a trigonometric table in Book I, chapter 11 of Ptolemy's Almagest, a treatise on mathematical astronomy. It is essentially equivalent to a table of values of the sine function. It was the earliest trigonometric table extensive enough for many practical purposes, including those of astronomy. Since the 8th and 9th centuries, the sine and other trigonometric functions have been used in Islamic mathematics and astronomy, reforming the production of sine tables. Khwarizmi and Habash al-Hasib later produced a set of trigonometric tables.
In topology, a branch of mathematics, a cosheaf is a dual notion to that of a sheaf that is useful in studying Borel-Moore homology.
In general relativity, the Komar superpotential, corresponding to the invariance of the Hilbert–Einstein Lagrangian , is the tensor density:
This article summarizes several identities in exterior calculus, a mathematical notation used in differential geometry.
In differential geometry, a branch of mathematics, the Moser's trick is a method to relate two differential forms and on a smooth manifold by a diffeomorphism such that , provided that one can find a family of vector fields satisfying a certain ODE.