Immediate constituent analysis

Last updated

In linguistics, immediate constituent analysis or IC analysis is a method of sentence analysis that was proposed by Wilhelm Wundt and named by Leonard Bloomfield. The process reached a full-blown strategy for analyzing sentence structure in the distributionalist works of Zellig Harris and Charles F. Hockett, [1] and in glossematics by Knud Togeby. [2] The practice is now widespread. Most tree structures employed to represent the syntactic structure of sentences are products of some form of IC-analysis. The process and result of IC-analysis can, however, vary greatly based upon whether one chooses the constituency relation of phrase structure grammars (= constituency grammars) or the dependency relation of dependency grammars as the underlying principle that organizes constituents into hierarchical structures.

Contents

IC-analysis in phrase structure grammars

Given a phrase structure grammar (= constituency grammar), IC-analysis divides up a sentence into major parts or immediate constituents, and these constituents are in turn divided into further immediate constituents. [3] The process continues until irreducible constituents are reached, i.e., until each constituent consists of only a word or a meaningful part of a word. The end result of IC-analysis is often presented in a visual diagrammatic form that reveals the hierarchical immediate constituent structure of the sentence at hand. These diagrams are usually trees. For example:

E-ICA-01.jpg

This tree illustrates the manner in which the entire sentence is divided first into the two immediate constituents this tree and illustrates IC-analysis according to the constituency relation; these two constituents are further divided into the immediate constituents this and tree, and illustrates IC-analysis and according to the constituency relation; and so on.

An important aspect of IC-analysis in phrase structure grammars is that each individual word is a constituent by definition. The process of IC-analysis always ends when the smallest constituents are reached, which are often words (although the analysis can also be extended into the words to acknowledge the manner in which words are structured). The process is, however, different in dependency grammars, since many individual words do not end up as constituents in dependency grammars.

IC-analysis in dependency grammars

As a rule, dependency grammars do not employ IC-analysis, as the principle of syntactic ordering is not inclusion but, rather, asymmetrical dominance-dependency between words. When an attempt is made to incorporate IC-analysis into a dependency-type grammar, the results are some kind of a hybrid system. In actuality, IC-analysis is different in dependency grammars. [4] Since dependency grammars view the finite verb as the root of all sentence structure, they cannot and do not acknowledge the initial binary subject-predicate division of the clause associated with phrase structure grammars. What this means for the general understanding of constituent structure is that dependency grammars do not acknowledge a finite verb phrase (VP) constituent and many individual words also do not qualify as constituents, which means in turn that they will not show up as constituents in the IC-analysis. Thus in the example sentence This tree illustrates IC-analysis according to the dependency relation, many of the phrase structure grammar constituents do not qualify as dependency grammar constituents:

E-ICA-02.jpg

This IC-analysis does not view the finite verb phrase illustrates IC-analysis according to the dependency relation nor the individual words tree, illustrates, according, to, and relation as constituents.

While the structures that IC-analysis identifies for dependency and constituency grammars differ in significant ways, as the two trees just produced illustrate, both views of sentence structure acknowledge constituents. The constituent is defined in a theory-neutral manner:

Constituent
A given word/node plus all the words/nodes that that word/node dominates

This definition is neutral with respect to the dependency vs. constituency distinction. It allows one to compare the IC-analyses across the two types of structure. A constituent is always a complete tree or a complete subtree of a tree, regardless of whether the tree at hand is a constituency or a dependency tree.

Constituency tests

The IC-analysis for a given sentence is arrived at usually by way of constituency tests. Constituency tests (e.g. topicalization, clefting, pseudoclefting, pro-form substitution, answer ellipsis, passivization, omission, coordination, etc.) identify the constituents, large and small, of English sentences. Two illustrations of the manner in which constituency tests deliver clues about constituent structure and thus about the correct IC-analysis of a given sentence are now given. Consider the phrase The girl in the following trees:

Thegirlishappy.png

The acronym BPS stands for "bare phrase structure", which is an indication that the words are used as the node labels in the tree. Again, focusing on the phrase The girl, the tests unanimously confirm that it is a constituent as both trees show:

...the girl is happy - Topicalization (invalid test because test constituent is already at front of sentence)
It is the girl who is happy. - Clefting
(The one)Who is happy is the girl. - Pseudoclefting
She is happy. - Pro-form substitution
Who is happy? -The girl. - Answer ellipsis

Based on these results, one can safely assume that the noun phrase The girl in the example sentence is a constituent and should therefore be shown as one in the corresponding IC-representation, which it is in both trees. Consider next what these tests tell us about the verb string is happy:

*...is happy, the girl. - Topicalization
*It is is happy that the girl. - Clefting
*What the girl is is happy. - Pseudoclefting
*The girl so/that/did that. - Pro-form substitution
What is the girl? -*Is happy. - Answer ellipsis

The star * indicates that the sentence is not acceptable English. Based on data like these, one might conclude that the finite verb string is happy in the example sentence is not a constituent and should therefore not be shown as a constituent in the corresponding IC-representation. Hence this result supports the IC-analysis in the dependency tree over the one in the constituency tree, since the dependency tree does not view is happy as a constituent.

Notes

  1. Seuren, Pieter (2015). "Prestructuralist and structuralist approaches to syntax". In Kiss and Alexiadou (ed.). Syntax - Theory and Analysis: An International Handbook. De Gruyter. pp. 134–157. ISBN   9783110202762.
  2. Fudge, Erik (2006). "Glossematrics". In Brown, Keith (ed.). Encyclopedia of Language and Linguistics. Elsevier. p. 1439-1444.
  3. The basic concept of immediate constituents is widely employed in phrase structure grammars. See for instance Akmajian and Heny (1980:64), Chisholm (1981:59), Culicover (1982:21), Huddleston (1988:7), Haegeman and Guéron (1999:51).
  4. Concerning dependency grammars, see Ágel et al. (2003/6).

Related Research Articles

<span class="mw-page-title-main">Syntax</span> System responsible for combining morphemes into complex structures

In linguistics, syntax is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure (constituency), agreement, the nature of crosslinguistic variation, and the relationship between form and meaning (semantics). There are numerous approaches to syntax that differ in their central assumptions and goals.

A syntactic category is a syntactic unit that theories of syntax assume. Word classes, largely corresponding to traditional parts of speech, are syntactic categories. In phrase structure grammars, the phrasal categories are also syntactic categories. Dependency grammars, however, do not acknowledge phrasal categories.

In grammar, a phrase—called expression in some contexts—is a group of words or singular word acting as a grammatical unit. For instance, the English expression "the very happy squirrel" is a noun phrase which contains the adjective phrase "very happy". Phrases can consist of a single word or a complete sentence. In theoretical linguistics, phrases are often analyzed as units of syntactic structure such as a constituent.

Phrase structure rules are a type of rewrite rule used to describe a given language's syntax and are closely associated with the early stages of transformational grammar, proposed by Noam Chomsky in 1957. They are used to break down a natural language sentence into its constituent parts, also known as syntactic categories, including both lexical categories and phrasal categories. A grammar that uses phrase structure rules is a type of phrase structure grammar. Phrase structure rules as they are commonly employed operate according to the constituency relation, and a grammar that employs phrase structure rules is therefore a constituency grammar; as such, it stands in contrast to dependency grammars, which are based on the dependency relation.

An adjective phrase is a phrase whose head is an adjective. Almost any grammar or syntax textbook or dictionary of linguistics terminology defines the adjective phrase in a similar way, e.g. Kesner Bland (1996:499), Crystal (1996:9), Greenbaum (1996:288ff.), Haegeman and Guéron (1999:70f.), Brinton (2000:172f.), Jurafsky and Martin (2000:362). The adjective can initiate the phrase, conclude the phrase, or appear in a medial position. The dependents of the head adjective—i.e. the other words and phrases inside the adjective phrase—are typically adverb or prepositional phrases, but they can also be clauses. Adjectives and adjective phrases function in two basic ways, attributively or predicatively. An attributive adjective (phrase) precedes the noun of a noun phrase. A predicative adjective (phrase) follows a linking verb and serves to describe the preceding subject, e.g. The man is very happy.

<span class="mw-page-title-main">Parse tree</span> Tree in formal language theory

A parse tree or parsing tree or derivation tree or concrete syntax tree is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term parse tree itself is used primarily in computational linguistics; in theoretical syntax, the term syntax tree is more common.

In linguistics, a verb phrase (VP) is a syntactic unit composed of a verb and its arguments except the subject of an independent clause or coordinate clause. Thus, in the sentence A fat man quickly put the money into the box, the words quickly put the money into the box constitute a verb phrase; it consists of the verb put and its arguments, but not the subject a fat man. A verb phrase is similar to what is considered a predicate in traditional grammars.

Dependency grammar (DG) is a class of modern grammatical theories that are all based on the dependency relation and that can be traced back primarily to the work of Lucien Tesnière. Dependency is the notion that linguistic units, e.g. words, are connected to each other by directed links. The (finite) verb is taken to be the structural center of clause structure. All other syntactic units (words) are either directly or indirectly connected to the verb in terms of the directed links, which are called dependencies. Dependency grammar differs from phrase structure grammar in that while it can identify phrases it tends to overlook phrasal nodes. A dependency structure is determined by the relation between a word and its dependents. Dependency structures are flatter than phrase structures in part because they lack a finite verb phrase constituent, and they are thus well suited for the analysis of languages with free word order, such as Czech or Warlpiri.

In syntactic analysis, a constituent is a word or a group of words that function as a single unit within a hierarchical structure. The constituent structure of sentences is identified using tests for constituents. These tests apply to a portion of a sentence, and the results provide evidence about the constituent structure of the sentence. Many constituents are phrases. A phrase is a sequence of one or more words built around a head lexical item and working as a unit within a sentence. A word sequence is shown to be a phrase/constituent if it exhibits one or more of the behaviors discussed below. The analysis of constituent structure is associated mainly with phrase structure grammars, although dependency grammars also allow sentence structure to be broken down into constituent parts.

In generative grammar, non-configurational languages are languages characterized by a flat phrase structure, which allows syntactically discontinuous expressions, and a relatively free word order.

An adpositional phrase is a syntactic category that includes prepositional phrases, postpositional phrases, and circumpositional phrases. Adpositional phrases contain an adposition as head and usually a complement such as a noun phrase. Language syntax treats adpositional phrases as units that act as arguments or adjuncts. Prepositional and postpositional phrases differ by the order of the words used. Languages that are primarily head-initial such as English predominantly use prepositional phrases whereas head-final languages predominantly employ postpositional phrases. Many languages have both types, as well as circumpositional phrases.

In theoretical linguistics, a distinction is made between endocentric and exocentric constructions. A grammatical construction is said to be endocentric if it fulfils the same linguistic function as one of its parts, and exocentric if it does not. The distinction reaches back at least to Bloomfield's work of the 1930s, who based it on terms by Pāṇini and Patañjali in Sanskrit grammar. Such a distinction is possible only in phrase structure grammars, since in dependency grammars all constructions are necessarily endocentric.

Topicalization is a mechanism of syntax that establishes an expression as the sentence or clause topic by having it appear at the front of the sentence or clause. This involves a phrasal movement of determiners, prepositions, and verbs to sentence-initial position. Topicalization often results in a discontinuity and is thus one of a number of established discontinuity types, the other three being wh-fronting, scrambling, and extraposition. Topicalization is also used as a constituency test; an expression that can be topicalized is deemed a constituent. The topicalization of arguments in English is rare, whereas circumstantial adjuncts are often topicalized. Most languages allow topicalization, and in some languages, topicalization occurs much more frequently and/or in a much less marked manner than in English. Topicalization in English has also received attention in the pragmatics literature.

Antecedent-contained deletion (ACD), also called antecedent-contained ellipsis, is a phenomenon whereby an elided verb phrase appears to be contained within its own antecedent. For instance, in the sentence "I read every book that you did", the verb phrase in the main clause appears to license ellipsis inside the relative clause which modifies its object. ACD is a classic puzzle for theories of the syntax-semantics interface, since it threatens to introduce an infinite regress. It is commonly taken as motivation for syntactic transformations such as quantifier raising, though some approaches explain it using semantic composition rules or by adoption more flexible notions of what it means to be a syntactic unit.

Merge is one of the basic operations in the Minimalist Program, a leading approach to generative syntax, when two syntactic objects are combined to form a new syntactic unit. Merge also has the property of recursion in that it may apply to its own output: the objects combined by Merge are either lexical items or sets that were themselves formed by Merge. This recursive property of Merge has been claimed to be a fundamental characteristic that distinguishes language from other cognitive faculties. As Noam Chomsky (1999) puts it, Merge is "an indispensable operation of a recursive system ... which takes two syntactic objects A and B and forms the new object G={A,B}" (p. 2).

In linguistics, negative inversion is one of many types of subject–auxiliary inversion in English. A negation or a word that implies negation or a phrase containing one of these words precedes the finite auxiliary verb necessitating that the subject and finite verb undergo inversion. Negative inversion is a phenomenon of English syntax. Other Germanic languages have a more general V2 word order, which allows inversion to occur much more often than in English, so they may not acknowledge negative inversion as a specific phenomenon. While negative inversion is a common occurrence in English, a solid understanding of just what elicits the inversion has not yet been established. It is, namely, not entirely clear why certain fronted expressions containing a negation elicit negative inversion, but others do not.

In linguistics, a catena is a unit of syntax and morphology, closely associated with dependency grammars. It is a more flexible and inclusive unit than the constituent and its proponents therefore consider it to be better suited than the constituent to serve as the fundamental unit of syntactic and morphosyntactic analysis.

Pseudogapping is an ellipsis mechanism that elides most but not all of a non-finite verb phrase; at least one part of the verb phrase remains, which is called the remnant. Pseudogapping occurs in comparative and contrastive contexts, so it appears often after subordinators and coordinators such as if, although, but, than, etc. It is similar to verb phrase ellipsis (VP-ellipsis) insofar as the ellipsis is introduced by an auxiliary verb, and many grammarians take it to be a particular type of VP-ellipsis. The distribution of pseudogapping is more restricted than that of VP-ellipsis, however, and in this regard, it has some traits in common with gapping. But unlike gapping, pseudogapping occurs in English but not in closely related languages. The analysis of pseudogapping can vary greatly depending in part on whether the analysis is based in a phrase structure grammar or a dependency grammar. Pseudogapping was first identified, named, and explored by Stump (1977) and has since been studied in detail by Levin (1986) among others, and now enjoys a firm position in the canon of acknowledged ellipsis mechanisms of English.

Stripping or bare argument ellipsis is an ellipsis mechanism that elides everything from a clause except one constituent. It occurs exclusively in the non-initial conjuncts of coordinate structures. One prominent analysis of stripping sees it as a particular manifestation of the gapping mechanism, the difference between stripping and gapping lies merely with the number of remnants left behind by ellipsis: gapping leaves two constituents behind, whereas stripping leaves just one. Stripping occurs in many languages and is a frequent occurrence in colloquial conversation. As with many other ellipsis mechanisms, stripping challenges theories of syntax in part because the elided material often fails to qualify as a constituent in a straightforward manner.

In linguistics, the term right node raising (RNR) denotes a sharing mechanism that sees the material to the immediate right of parallel structures being in some sense "shared" by those parallel structures, e.g. [Sam likes] but [Fred dislikes] the debates. The parallel structures of RNR are typically the conjuncts of a coordinate structure, although the phenomenon is not limited to coordination, since it can also appear with parallel structures that do not involve coordination. The term right node raising itself is due to Postal (1974). Postal assumed that the parallel structures are complete clauses below the surface. The shared constituent was then raised rightward out of each conjunct of the coordinate structure and attached as a single constituent to the structure above the level of the conjuncts, hence "right node raising" was occurring in a literal sense. While the term right node raising survives, the actual analysis that Postal proposed is not widely accepted. RNR occurs in many languages, including English and related languages.

References