Phrase structure rules

Last updated

Phrase structure rules are a type of rewrite rule used to describe a given language's syntax and are closely associated with the early stages of transformational grammar, proposed by Noam Chomsky in 1957. [1] They are used to break down a natural language sentence into its constituent parts, also known as syntactic categories, including both lexical categories (parts of speech) and phrasal categories. A grammar that uses phrase structure rules is a type of phrase structure grammar. Phrase structure rules as they are commonly employed operate according to the constituency relation, and a grammar that employs phrase structure rules is therefore a constituency grammar; as such, it stands in contrast to dependency grammars, which are based on the dependency relation. [2]


Definition and examples

Phrase structure rules are usually of the following form:

meaning that the constituent is separated into the two subconstituents and . Some examples for English are as follows:

The first rule reads: A S (sentence) consists of a NP (noun phrase) followed by a VP (verb phrase). The second rule reads: A noun phrase consists of an optional Det (determiner) followed by a N (noun). The third rule means that a N (noun) can be preceded by an optional AP (adjective phrase) and followed by an optional PP (prepositional phrase). The round brackets indicate optional constituents.

Beginning with the sentence symbol S, and applying the phrase structure rules successively, finally applying replacement rules to substitute actual words for the abstract symbols, it is possible to generate many proper sentences of English (or whichever language the rules are specified for). If the rules are correct, then any sentence produced in this way ought to be grammatically (syntactically) correct. It is also to be expected that the rules will generate syntactically correct but semantically nonsensical sentences, such as the following well-known example:

Colorless green ideas sleep furiously

This sentence was constructed by Noam Chomsky as an illustration that phrase structure rules are capable of generating syntactically correct but semantically incorrect sentences. Phrase structure rules break sentences down into their constituent parts. These constituents are often represented as tree structures (dendrograms). The tree for Chomsky's sentence can be rendered as follows:


A constituent is any word or combination of words that is dominated by a single node. Thus each individual word is a constituent. Further, the subject NP Colorless green ideas, the minor NP green ideas, and the VP sleep furiously are constituents. Phrase structure rules and the tree structures that are associated with them are a form of immediate constituent analysis.

In transformational grammar, systems of phrase structure rules are supplemented by transformation rules, which act on an existing syntactic structure to produce a new one (performing such operations as negation, passivization, etc.). These transformations are not strictly required for generation, as the sentences they produce could be generated by a suitably expanded system of phrase structure rules alone, but transformations provide greater economy and enable significant relations between sentences to be reflected in the grammar.

Top down

An important aspect of phrase structure rules is that they view sentence structure from the top down. The category on the left of the arrow is a greater constituent and the immediate constituents to the right of the arrow are lesser constituents. Constituents are successively broken down into their parts as one moves down a list of phrase structure rules for a given sentence. This top-down view of sentence structure stands in contrast to much work done in modern theoretical syntax. In Minimalism [3] for instance, sentence structure is generated from the bottom up. The operation Merge merges smaller constituents to create greater constituents until the greatest constituent (i.e. the sentence) is reached. In this regard, theoretical syntax abandoned phrase structure rules long ago, although their importance for computational linguistics seems to remain intact.

Alternative approaches

Constituency vs. dependency

Phrase structure rules as they are commonly employed result in a view of sentence structure that is constituency-based. Thus, grammars that employ phrase structure rules are constituency grammars (= phrase structure grammars), as opposed to dependency grammars , [4] which view sentence structure as dependency-based. What this means is that for phrase structure rules to be applicable at all, one has to pursue a constituency-based understanding of sentence structure. The constituency relation is a one-to-one-or-more correspondence. For every word in a sentence, there is at least one node in the syntactic structure that corresponds to that word. The dependency relation, in contrast, is a one-to-one relation; for every word in the sentence, there is exactly one node in the syntactic structure that corresponds to that word. The distinction is illustrated with the following trees:

Phrase structure rules.jpg

The constituency tree on the left could be generated by phrase structure rules. The sentence S is broken down into smaller and smaller constituent parts. The dependency tree on the right could not, in contrast, be generated by phrase structure rules (at least not as they are commonly interpreted).

Representational grammars

A number of representational phrase structure theories of grammar never acknowledged phrase structure rules, but have pursued instead an understanding of sentence structure in terms the notion of schema. Here phrase structures are not derived from rules that combine words, but from the specification or instantiation of syntactic schemata or configurations, often expressing some kind of semantic content independently of the specific words that appear in them. This approach is essentially equivalent to a system of phrase structure rules combined with a noncompositional semantic theory, since grammatical formalisms based on rewriting rules are generally equivalent in power to those based on substitution into schemata.

So in this type of approach, instead of being derived from the application of a number of phrase structure rules, the sentence Colorless green ideas sleep furiously would be generated by filling the words into the slots of a schema having the following structure:


And which would express the following conceptual content:


Though they are non-compositional, such models are monotonic. This approach is highly developed within Construction grammar [5] and has had some influence in Head-Driven Phrase Structure Grammar [6] and lexical functional grammar, [7] the latter two clearly qualifying as phrase structure grammars.

See also


  1. For general discussions of phrase structure rules, see for instance Borsley (1991:34ff.), Brinton (2000:165), Falk (2001:46ff.).
  2. Dependency grammars are associated above all with the work of Lucien Tesnière (1959).
  3. See for instance Chomsky (1995).
  4. The most comprehensive source on dependency grammar is Ágel et al. (2003/6).
  5. Concerning Construction Grammar, see Goldberg (2006).
  6. Concerning Head-Driven Phrase Structure Grammar, see Pollard and Sag (1994).
  7. Concerning Lexical Functional Grammar, see Bresnan (2001).

Related Research Articles

In linguistics, syntax is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure (constituency), agreement, the nature of crosslinguistic variation, and the relationship between form and meaning (semantics). There are numerous approaches to syntax that differ in their central assumptions and goals.

A syntactic category is a syntactic unit that theories of syntax assume. Word classes, largely corresponding to traditional parts of speech, are syntactic categories. In phrase structure grammars, the phrasal categories are also syntactic categories. Dependency grammars, however, do not acknowledge phrasal categories.

In grammar, a phrase—called expression in some contexts—is a group of words or singular word acting as a grammatical unit. For instance, the English expression "the very happy squirrel" is a noun phrase which contains the adjective phrase "very happy". Phrases can consist of a single word or a complete sentence. In theoretical linguistics, phrases are often analyzed as units of syntactic structure such as a constituent. There is a difference between the common use of the term phrase and its technical use in linguistics. In common usage, a phrase is usually a group of words with some special idiomatic meaning or other significance, such as "all rights reserved", "economical with the truth", "kick the bucket", and the like. It may be a euphemism, a saying or proverb, a fixed expression, a figure of speech, etc.. In linguistics, these are known as phrasemes.

In linguistics, transformational grammar (TG) or transformational-generative grammar (TGG) is part of the theory of generative grammar, especially of natural languages. It considers grammar to be a system of rules that generate exactly those combinations of words that form grammatical sentences in a given language and involves the use of defined operations to produce new sentences from existing ones.

<span class="mw-page-title-main">Parse tree</span> Tree in formal language theory

A parse tree or parsing tree or derivation tree or concrete syntax tree is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term parse tree itself is used primarily in computational linguistics; in theoretical syntax, the term syntax tree is more common.

Lexical semantics, as a subfield of linguistic semantics, is the study of word meanings. It includes the study of how words structure their meaning, how they act in grammar and compositionality, and the relationships between the distinct senses and uses of a word.

In linguistics, X-bar theory is a model of phrase-structure grammar and a theory of syntactic category formation that was first proposed by Noam Chomsky in 1970 reformulating the ideas of Zellig Harris (1951), and further developed by Ray Jackendoff, along the lines of the theory of generative grammar put forth in the 1950s by Chomsky. It attempts to capture the structure of phrasal categories with a single uniform structure called the X-bar schema, basing itself on the assumption that any phrase in natural language is an XP that is headed by a given syntactic category X. It played a significant role in resolving issues that phrase structure rules had, representative of which is the proliferation of grammatical rules, which is against the thesis of generative grammar.

In linguistics, a verb phrase (VP) is a syntactic unit composed of a verb and its arguments except the subject of an independent clause or coordinate clause. Thus, in the sentence A fat man quickly put the money into the box, the words quickly put the money into the box constitute a verb phrase; it consists of the verb put and its arguments, but not the subject a fat man. A verb phrase is similar to what is considered a predicate in traditional grammars.

Lexical functional grammar (LFG) is a constraint-based grammar framework in theoretical linguistics. It posits two separate levels of syntactic structure, a phrase structure grammar representation of word order and constituency, and a representation of grammatical functions such as subject and object, similar to dependency grammar. The development of the theory was initiated by Joan Bresnan and Ronald Kaplan in the 1970s, in reaction to the theory of transformational grammar which was current in the late 1970s. It mainly focuses on syntax, including its relation with morphology and semantics. There has been little LFG work on phonology.

Categorial grammar is a family of formalisms in natural language syntax that share the central assumption that syntactic constituents combine as functions and arguments. Categorial grammar posits a close relationship between the syntax and semantic composition, since it typically treats syntactic categories as corresponding to semantic types. Categorial grammars were developed in the 1930s by Kazimierz Ajdukiewicz and in the 1950s by Yehoshua Bar-Hillel and Joachim Lambek. It saw a surge of interest in the 1970s following the work of Richard Montague, whose Montague grammar assumed a similar view of syntax. It continues to be a major paradigm, particularly within formal semantics.

In linguistics, branching refers to the shape of the parse trees that represent the structure of sentences. Assuming that the language is being written or transcribed from left to right, parse trees that grow down and to the right are right-branching, and parse trees that grow down and to the left are left-branching. The direction of branching reflects the position of heads in phrases, and in this regard, right-branching structures are head-initial, whereas left-branching structures are head-final. English has both right-branching (head-initial) and left-branching (head-final) structures, although it is more right-branching than left-branching. Some languages such as Japanese and Turkish are almost fully left-branching (head-final). Some languages are mostly right-branching (head-initial).

Dependency grammar (DG) is a class of modern grammatical theories that are all based on the dependency relation and that can be traced back primarily to the work of Lucien Tesnière. Dependency is the notion that linguistic units, e.g. words, are connected to each other by directed links. The (finite) verb is taken to be the structural center of clause structure. All other syntactic units (words) are either directly or indirectly connected to the verb in terms of the directed links, which are called dependencies. Dependency grammar differs from phrase structure grammar in that while it can identify phrases it tends to overlook phrasal nodes. A dependency structure is determined by the relation between a word and its dependents. Dependency structures are flatter than phrase structures in part because they lack a finite verb phrase constituent, and they are thus well suited for the analysis of languages with free word order, such as Czech or Warlpiri.

The term phrase structure grammar was originally introduced by Noam Chomsky as the term for grammar studied previously by Emil Post and Axel Thue. Some authors, however, reserve the term for more restricted grammars in the Chomsky hierarchy: context-sensitive grammars or context-free grammars. In a broader sense, phrase structure grammars are also known as constituency grammars. The defining trait of phrase structure grammars is thus their adherence to the constituency relation, as opposed to the dependency relation of dependency grammars.

A sentence diagram is a pictorial representation of the grammatical structure of a sentence. The term "sentence diagram" is used more when teaching written language, where sentences are diagrammed. The model shows the relations between words and the nature of sentence structure and can be used as a tool to help recognize which potential sentences are actual sentences.

In generative grammar, non-configurational languages are languages characterized by a flat phrase structure, which allows syntactically discontinuous expressions, and a relatively free word order.

In theoretical linguistics, a distinction is made between endocentric and exocentric constructions. A grammatical construction is said to be endocentric if it fulfils the same linguistic function as one of its parts, and exocentric if it does not. The distinction reaches back at least to Bloomfield's work of the 1930s, who based it on terms by Pāṇini and Patañjali in Sanskrit grammar. Such a distinction is possible only in phrase structure grammars, since in dependency grammars all constructions are necessarily endocentric.

In linguistics, the projection principle is a stipulation proposed by Noam Chomsky as part of the phrase structure component of generative-transformational grammar. The projection principle is used in the derivation of phrases under the auspices of the principles and parameters theory.

Merge is one of the basic operations in the Minimalist Program, a leading approach to generative syntax, when two syntactic objects are combined to form a new syntactic unit. Merge also has the property of recursion in that it may be applied to its own output: the objects combined by Merge are either lexical items or sets that were themselves formed by Merge. This recursive property of Merge has been claimed to be a fundamental characteristic that distinguishes language from other cognitive faculties. As Noam Chomsky (1999) puts it, Merge is "an indispensable operation of a recursive system ... which takes two syntactic objects A and B and forms the new object G={A,B}" (p. 2).

In linguistics, immediate constituent analysis or IC analysis is a method of sentence analysis that was proposed by Wilhelm Wundt and named by Leonard Bloomfield. The process reached a full-blown strategy for analyzing sentence structure in the distributionalist works of Zellig Harris and Charles F. Hockett, and in glossematics by Knud Togeby. The practice is now widespread. Most tree structures employed to represent the syntactic structure of sentences are products of some form of IC-analysis. The process and result of IC-analysis can, however, vary greatly based upon whether one chooses the constituency relation of phrase structure grammars or the dependency relation of dependency grammars as the underlying principle that organizes constituents into hierarchical structures.

In linguistics, subcategorization denotes the ability/necessity for lexical items to require/allow the presence and types of the syntactic arguments with which they co-occur. For example, the word "walk" as in "X walks home" requires the noun-phrase X to be animate.
