Metrical phonology

Last updated January 26, 2025

Metrical phonology is a theory of stress or linguistic prominence.^[1]^[2] The innovative feature of this theory is that the prominence of a unit is defined relative to other units in the same phrase. For example, in the most common pronunciation of the phrase "doctors use penicillin" (if said out-of-the-blue), the syllable '-ci-' is the strongest or most stressed syllable in the phrase, but the syllable 'doc-' is more stressed than the syllable '-tors'. Previously, generative phonologists and the American Structuralists represented prosodic prominence as a feature that applied to individual phonemes (segments) or syllables.^[3] This feature could take on multiple values to indicate various levels of stress. Stress was assigned using the cyclic reapplication of rules to words and phrases.

Metrical phonology holds that stress is separate from pitch accent and has phonetic effects on the realization of syllables beyond their intonation, including effects on their duration and amplitude.^[2] The perceived stress of a syllable results from its position in the metrical tree and metrical grid for the phrase it appears in.

Metrical trees

Linguistic prominence in metrical phonology is partially determined by the relations between nodes in a branching tree, in which one node is Strong (S) and the other node or nodes are Weak (W). The labels Strong and Weak have no inherent phonetic realization, and only have meaning relative to the rest of the labels in the tree. A Strong node is stronger than its Weak sister node. The most prominent syllable in a phrase is the one that does not have any Weak nodes above it. This syllable is called the Designated Terminal Element. In the example tree (1), the syllable '-ci-' is the Designated Terminal Element.

(1)

Metrical trees allow us to change the stress pattern for a phrase by switching S and W sister nodes. The tree in (1) represents the metrical structure for the sentence "Doctors use penicillin". when the sentence is providing all new information. This is called broad focus, and might be used in response to a question like "What did you learn at the hospital today?" The same metrical structure would be used when the sentence has narrow focus on the word 'penicillin'; for example, if it was used in response to a question like "What do doctors use to treat that disease?". However, we need a new metrical structure to put narrow focus on the word 'doctors', for example, if the phrase is used in response to the question "Who uses penicillin?" In this case, the S and W nodes at the intermediate phrase (ip) level have to switch, resulting in (2).

(2)

In metrical phonology there has been debate about whether nodes on metrical trees must have two children, making them binary branching,^[2]^[4] or whether they can have any number of children, making them n-ary branching.^[5]^[6] Proponents of binary branching trees have claimed that such trees can constrain the restructuring of very long and very short constituents because new nodes created in this restructuring have to correspond to nodes in the original tree.^[4] Proponents of n-ary branching trees point out that only multiple branches allow a limited number of tree levels, which can correspond to predetermined levels of prosodic constituents, whereas binary branching trees require intermediate levels that do not correspond to any prosodic constituent. A number of levels of prosodic constituents have been proposed, including: moras, syllables, feet, phonological words, clitic groups, phonological phrases, intermediate phrases, intonational phrases, and phonological utterances. The relations between prosodic constituents at different levels is commonly thought to be governed by the Strict Layer Hypothesis (SLH).^[7] This hypothesis states that in metrical trees, all prosodic constituents at a particular level consist exclusively of constituents from the level below. The SLH forbids a number of types of tree structures, including trees in which: a node has two parents in the level above, a node has two or more different types of children, a node has children from a level that is not the level immediately below it, a node does not correspond to any of the specified levels, or a node has children of the same type as itself.

The various levels of the prosodic hierarchy are independently justified by the phonological phenomena that make reference to them. For instance, in English the sounds /p/, /t/, and /k/ are aspirated (followed by a puff of air) only if they are the first segment in a foot.^[5] Similarly, in the Gorgia Toscana variety of Italian, the intonation phrase is the domain of a rule that changes voiceless plosives (/p/, /t/, /k/) between vowels into fricative consonants, like /θ/ (th) and /h/.^[5]

In addition to describing prominence relations between words, metrical trees can also describe prominence relations within words. Indeed, a set of rules developed by Liberman and Prince^[2] can be used to quite accurately predict stress in English words. Their Lexical Category Prominence Rule states that the second node in a pair of sister nodes is labeled W unless one of a number of conditions are met, such as the node branching or dominating a particular suffix, in which case it is labeled S. Allowable tree structures and node labels for a particular word in Liberman and Prince's system are constrained by the two-value feature [± stress], which can be assigned to segments or syllables by separate rules that refer to the number and type of segments in the syllable and the syllable's position in the word.^[8] Syllables that are [- stress] can only be immediately dominated by a W node. However, syllables that are [+ stress] can be immediately dominated by S or W nodes.

Metrical grids

In a Metrical grid, all the words in the phrase are arranged along the bottom and the rows of the grid indicate different levels of prominence, as in (3).

					X
X			X		X
X	X	X	X	X	X	X
doc	tors	use	pe	ni	ci	llin

(3) Example metrical grid

The higher the column of Xs above a syllable, the more prominent the syllable is. The metrical grid and the metrical tree for a particular utterance are related in such a way that the Designated Terminal Element of an S node must be more prominent than the Designated Terminal Element of its sister W node.^[2] So in (3), the metrical grid for the utterance in (1), '-ci-' must be more prominent than 'doc-' because '-ci-' is the Designated Terminal Element of the highest S node and 'doc-' is the Designated Terminal Element of its sister W node.

The structure of the metrical grid explains a number of otherwise surprising features of prominence patterns in language. For example, the main stress in English phrases may be placed several syllables away from the end of the phrase, even though the rule assigning this stress looks for a lexically stressed syllable near this boundary.^[9] Using a metrical grid, this rule can simply apply to the rightmost element in the highest row of the grid. Therefore, what seemed to be a non-local application of the phrasal stress rule is reinterpreted as the local application of the rule to the highest row of the metrical grid.

Metrical grids were originally developed to handle a phenomenon that appears in some languages, including English, German, and Masoretic Hebrew, in which stress shifts to avoid a 'stress clash'.^[2] A stress clash can occur when two stressed syllables are too close to each other. For example, the word 'nineteen' spoken in isolation has stress on the second syllable. But when it is placed before 'girls' the stress on 'nineteen' can shift to the first syllable. Two syllables exhibit stress clash if there are two successive rows in the grid in which their columns are adjacent (i.e. there is no X between them). For example, in grid (4) the columns for 'teen' and 'girls' are adjacent in both the first and second rows, indicating a stress clash.

		X
	X	X
X	X	X
nine	teen	girls

(4) Pre-stress-shift metrical grid

Stress clashes can be resolved by the Rhythm Rule, which reverses the S-W relation for some pair of sister nodes, as long as such a reversal does not put a Designated Terminal Element of an Intonational Phrase under any W node, and doesn't put a [- stress] syllable directly under an S node. In (4) the W and S nodes over 'nine-' and '-teen' can be reversed, leading to the non-clashing grid in (5).

		X
X		X
X	X	X
nine	teen	girls

(5) Post-stress-shift metrical grid

This process is optional, and seems to be applied more often in some circumstances than others.

Metrical parameters

Hayes (1981)^[10] describes four metrical parameters which can be used to group languages according to their word-level stress patterns.

Right-dominant vs. left-dominant: In a right-dominant language nodes on the right are labeled S, while in a left-dominant language nodes on the left are labeled S.
Bounded vs. unbounded: In a bounded language the main stress appears a fixed distance from the word boundary and the secondary stress appears at fixed intervals from other stressed syllables. In an unbounded language the main stress is drawn to 'heavy' syllables (syllables with long vowels and/or consonants at the end of the syllable). Within bounded languages, two more parameters apply: left-to-right vs. right-to-left and quantity sensitive vs. insensitive.
Left-to-right vs. right-to-left: In a left-to-right language metrical trees are constructed starting at the left edge of the word, while in a right-to-left language, they start at the right edge of the word.
Quantity-sensitive vs. quantity-insensitive: In a quantity-sensitive language a W node cannot dominate a heavy syllable, while in a quantity-insensitive language tree construction is not influenced by the internal makeup of the syllables

Hayes (1995)^[11] describes metrical parameters that can analyse/predict word-level stress placement:

Quantity-sensitive vs. quantity-insensitive: whether stress is sensitive to syllable weight
Foot Type: iambs or trochees.
Parsing Directionality: whether the feet are built from the left edge of the word to the right or right to left
Main Stress: does the stress fall on towards the right or left edge of the word
Extrametricality: is there a unit consistently ignored for stress assignment, such as a final consonant, mora, syllable, or foot.

Music

Hierarchical patterns of prominence like those represented in metrical trees can also apply to rhythm in music.^[12] The prominence level of a note is determined by the relative prominence of all the nodes above it. The timing of notes also depends on the metrical tree for a particular tune. Each node at the bottom level of the tree (terminal nodes) receives a beat. Empty terminal nodes correspond to rests or form part of a note that spans several beats. Syncopation in music can result when relatively strong nodes are empty.

Advantages

Metrical phonology offers a number of advantages over a system representing stress as a feature that applies to individual segments or syllables, without reference to the other syllables in a phrase. Creators of traditional feature systems posited the stress feature, which differed from other phonological features in several key ways. For instance, the feature stress had an arbitrary number of values or levels, rather than two or some justified number more than two. In addition, the non-primary stress values in these systems were only defined relative to the primary stress value, and did not have local acoustic or articulatory effects. By not treating stress as a feature of an individual segment, metrical phonology avoids the inexplicable differences between the stress feature and other phonological features.^[2]

Metrical phonology also correctly predicts the ambiguity between broad and narrow focus.^[13] There are two possible metrical patterns for two-word phrases: S-W and W-S. However, there are three possible patterns of focus for such phrases: narrow focus on the first word, narrow focus on the second word, and broad focus. For instance, the phrase "Gus skied" can either be pronounced "GUS skied" (S-W) or "Gus SKIED" (W-S). These two realizations are the only options for answering the three questions: Who skied? (narrow focus on 'Gus'), What did Gus do? (narrow focus on 'skied'), and What happened yesterday? (broad focus).

Finally, metrical phonology is consistent with patterns of deaccenting in which accents can shift both left and right.^[14] This is because swapping S and W nodes will cause stress to move left if the S node was originally on the right, and move right if it was originally on the left. Such bi-directional movement is more difficult to predict under a stress-shift rule, which would specify the direction of movement.

Related Research Articles

Autosegmental phonology is a framework of phonological analysis proposed by John Goldsmith in his PhD thesis in 1976 at the Massachusetts Institute of Technology (MIT).

In linguistics, and particularly phonology, stress or accent is the relative emphasis or prominence given to a certain syllable in a word or to a certain word in a phrase or sentence. That emphasis is typically caused by such properties as increased loudness and vowel length, full articulation of the vowel, and changes in tone. The terms stress and accent are often used synonymously in that context but are sometimes distinguished. For example, when emphasis is produced through pitch alone, it is called pitch accent, and when produced through length alone, it is called quantitative accent. When caused by a combination of various intensified properties, it is called stress accent or dynamic accent; English uses what is called variable stress accent.

A pitch-accent language is a type of language that, when spoken, has certain syllables in words or morphemes that are prominent, as indicated by a distinct contrasting pitch rather than by loudness or length, as in some other languages like English. Pitch-accent languages also contrast with fully tonal languages like Vietnamese, Thai and Standard Chinese, in which practically every syllable can have an independent tone. Some scholars have claimed that the term "pitch accent" is not coherently defined and that pitch-accent languages are just a sub-category of tonal languages in general.

Isochrony is a linguistic analysis or hypothesis assuming that any spoken language's utterances are divisible into equal rhythmic portions of some kind. Under this assumption, languages are proposed to broadly fall into one of two categories based on rhythm or timing: syllable-timed or stress-timed languages. However, empirical studies have been unable to directly or fully support the hypothesis, so the concept remains controversial in linguistics.

In linguistics, the head or nucleus of a phrase is the word that determines the syntactic category of that phrase. For example, the head of the noun phrase boiling hot water is the noun water. Analogously, the head of a compound is the stem that determines the semantic category of that compound. For example, the head of the compound noun handbag is bag, since a handbag is a bag, not a hand. The other elements of the phrase or compound modify the head, and are therefore the head's dependents. Headed phrases and compounds are called endocentric, whereas exocentric ("headless") phrases and compounds lack a clear head. Heads are crucial to establishing the direction of branching. Head-initial phrases are right-branching, head-final phrases are left-branching, and head-medial phrases combine left- and right-branching.

English phonology is the system of speech sounds used in spoken English. Like many other languages, English has wide variation in pronunciation, both historically and from dialect to dialect. In general, however, the regional dialects of English share a largely similar phonological system. Among other things, most dialects have vowel reduction in unstressed syllables and a complex set of phonological features that distinguish fortis and lenis consonants.

Dependency grammar (DG) is a class of modern grammatical theories that are all based on the dependency relation and that can be traced back primarily to the work of Lucien Tesnière. Dependency is the notion that linguistic units, e.g. words, are connected to each other by directed links. The (finite) verb is taken to be the structural center of clause structure. All other syntactic units (words) are either directly or indirectly connected to the verb in terms of the directed links, which are called dependencies. Dependency grammar differs from phrase structure grammar in that while it can identify phrases it tends to overlook phrasal nodes. A dependency structure is determined by the relation between a word and its dependents. Dependency structures are flatter than phrase structures in part because they lack a finite verb phrase constituent, and they are thus well suited for the analysis of languages with free word order, such as Czech or Warlpiri.

Scansion, or a system of scansion, is the method or practice of determining and (usually) graphically representing the metrical pattern of a line of verse. In classical poetry, these patterns are quantitative based on the different lengths of each syllable, while in English poetry, they are based on the different levels of stress placed on each syllable. In both cases, the meter often has a regular foot. Over the years, many systems have been established to mark the scansion of a poem.

In linguistics, focus is a grammatical category that conveys which part of the sentence contributes new, non-derivable, or contrastive information. In the English sentence "Mary only insulted BILL", focus is expressed prosodically by a pitch accent on "Bill" which identifies him as the only person whom Mary insulted. By contrast, in the sentence "Mary only INSULTED Bill", the verb "insult" is focused and thus expresses that Mary performed no other actions towards Bill. Focus is a cross-linguistic phenomenon and a major topic in linguistics. Research on focus spans numerous subfields including phonetics, syntax, semantics, pragmatics, and sociolinguistics.

In linguistics, prosody is the study of elements of speech, including intonation, stress, rhythm and loudness, that occur simultaneously with individual phonetic segments: vowels and consonants. Often, prosody specifically refers to such elements, known as suprasegmentals, when they extend across more than one phonetic segment.

In linguistics, intonation is the variation in pitch used to indicate the speaker's attitudes and emotions, to highlight or focus an expression, to signal the illocutionary act performed by a sentence, or to regulate the flow of discourse. For example, the English question "Does Maria speak Spanish or French?" is interpreted as a yes-or-no question when it is uttered with a single rising intonation contour, but is interpreted as an alternative question when uttered with a rising contour on "Spanish" and a falling contour on "French". Although intonation is primarily a matter of pitch variation, its effects almost always work hand-in-hand with other prosodic features. Intonation is distinct from tone, the phenomenon where pitch is used to distinguish words or to mark grammatical features.

In linguistics, a prosodic unit is a segment of speech that occurs with specific prosodic properties. These properties can be those of stress, intonation, or tonal patterns.

The phonology of second languages is different from the phonology of first languages in various ways. The differences are considered to come from general characteristics of second languages, such as slower speech rate, lower proficiency than native speakers, and from the interaction between non-native speakers' first and second languages.

ToBI is a set of conventions for transcribing and annotating the prosody of speech. The term "ToBI" is sometimes used to refer to the conventions used for describing American English specifically, which was the first ToBI system, developed by Mary Beckman and Janet Pierrehumbert, among others. Other ToBI systems have been defined for a number of languages; for example, J-ToBI refers to the ToBI conventions for Tokyo Japanese, and an adaptation of ToBI to describe Dutch intonation was developed by Carlos Gussenhoven, and called ToDI. Another variation of ToBI, called IViE, was established in 1998 to enable comparison between several dialects of British English.

Grass Koiari (Koiali) is a Papuan language of Papua New Guinea spoken in the inland Port Moresby area. It is not very close to the other language which shares its name, Mountain Koiali. It is considered a threatened language.

Generative metrics is the collective term for three distinct theories of verse structure advanced between 1966 and 1977. Inspired largely by the example of Noam Chomsky's Syntactic Structures (1957) and Chomsky and Morris Halle's The Sound Pattern of English (1968), these theories aim principally at the formulation of explicit linguistic rules that will generate all possible well-formed instances of a given meter and exclude any that are not well-formed. T.V.F. Brogan notes that of the three theories, "[a]ll three have undergone major revision, so that each exists in two versions, the revised version being preferable to the original in every case."

Pitch accent is a term used in autosegmental-metrical theory for local intonational features that are associated with particular syllables. Within this framework, pitch accents are distinguished from both the abstract metrical stress and the acoustic stress of a syllable. Different languages specify different relationships between pitch accent and stress placement.

The term boundary tone refers to a rise or fall in pitch that occurs in speech at the end of a sentence or other utterance, or, if a sentence is divided into two or more intonational phrases, at the end of each intonational phrase. It can also refer to a low or high intonational tone at the beginning of an utterance or intonational phrase.

Prosodic bootstrapping in linguistics refers to the hypothesis that learners of a primary language (L1) use prosodic features such as pitch, tempo, rhythm, amplitude, and other auditory aspects from the speech signal as a cue to identify other properties of grammar, such as syntactic structure. Acoustically signaled prosodic units in the stream of speech may provide critical perceptual cues by which infants initially discover syntactic phrases in their language. Although these features by themselves are not enough to help infants learn the entire syntax of their native language, they provide various cues about different grammatical properties of the language, such as identifying the ordering of heads and complements in the language using stress prominence, indicating the location of phrase boundaries, and word boundaries. It is argued that prosody of a language plays an initial role in the acquisition of the first language helping children to uncover the syntax of the language, mainly due to the fact that children are sensitive to prosodic cues at a very young age.

Lyric setting is the process in songwriting of placing textual content (lyrics) in the context of musical rhythm, in which the lyrical meter and musical rhythm are in proper alignment as to preserve the natural shape of the language and promote prosody.

References

↑ Liberman, Mark (1975). "The intonational system of English" (Document). PhD thesis, MIT, Distributed 1978 by IULC.
1 2 3 4 5 6 7 Liberman, Mark; Prince, Alan (1977). "On stress and linguistic rhythm" (Document). Linguistics Inquiry 8. pp. 249–336.
↑ Chomsky, Noam; Halle, Morris (1968). "The sound pattern of English" (Document). Harper and Row: New York.
1 2 Nespor, Marina; Vogel, Irene (1982). "Prosodic domains of external sandhi rules". H.van der Hulst and N. Smith (eds.), The structure of Phonological Representations. Part I. Foris Publications: Dordrecht: 225–255.{{cite journal}}: Cite journal requires |journal= (help)
1 2 3 Nespor, Marina; Vogel, Irene (1986). "Prosodic phonology" (Document). Foris Publications: Dordrecht.
↑ Beckman, Mary (1986). "Stress and Non-Stress Accent" (Document). Foris Publications: Dordrecht.
↑ Selkirk, Elizabeth (1984). "Phonology and Syntax: The relation between sound and structure" (Document). MIT Press: Cambridge, MA.
↑ Halle, Morris (1973). "Stress rules in English: A new version" (Document). Linguistic Inquiry 4. pp. 451–464.
↑ Hayes, Bruce (1995). "Metrical stress theory" (Document). The University of Chicago Press: Chicago.
↑ Hayes, Bruce (1981). "A metrical theory of stress rules" (Document). PhD Thesis, MIT, Distributed by Indiana University Linguistics Club.
↑ Hayes, Bruce (1995). Metrical Stress Theory: Principles and Case Studies. London: The University of Chicago Press, Ltd.
↑ Martin, James (1972). "Rhythmic (hierarchical) versus serial structure in speech and other behavior". Psychological Review. 79 (6). Psychological Review 79(6): 487–509. doi:10.1037/h0033467. PMID 4634593.
↑ Ladd, D. Robert (1996). "Intonational Phonology" (Document). Cambridge University Press: Cambridge, UK.
↑ Ladd, D. Robert (1980). "The structure of intonational meaning: Evidence from English" (Document). Indiana University Press: Bloomington.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Liberman, Mark (1975). "The intonational system of English" (Document). PhD thesis, MIT, Distributed 1978 by IULC.

[Liberman-Prince-2] 1 2 3 4 5 6 7 Liberman, Mark; Prince, Alan (1977). "On stress and linguistic rhythm" (Document). Linguistics Inquiry 8. pp. 249–336.

[3] Chomsky, Noam; Halle, Morris (1968). "The sound pattern of English" (Document). Harper and Row: New York.

[Nespor-82-4] 1 2 Nespor, Marina; Vogel, Irene (1982). "Prosodic domains of external sandhi rules". H.van der Hulst and N. Smith (eds.), The structure of Phonological Representations. Part I. Foris Publications: Dordrecht: 225–255.{{cite journal}}: Cite journal requires |journal= (help)

[Nespor-5] 1 2 3 Nespor, Marina; Vogel, Irene (1986). "Prosodic phonology" (Document). Foris Publications: Dordrecht.

[6] Beckman, Mary (1986). "Stress and Non-Stress Accent" (Document). Foris Publications: Dordrecht.

[7] Selkirk, Elizabeth (1984). "Phonology and Syntax: The relation between sound and structure" (Document). MIT Press: Cambridge, MA.

[8] Halle, Morris (1973). "Stress rules in English: A new version" (Document). Linguistic Inquiry 4. pp. 451–464.

[9] Hayes, Bruce (1995). "Metrical stress theory" (Document). The University of Chicago Press: Chicago.

[10] Hayes, Bruce (1981). "A metrical theory of stress rules" (Document). PhD Thesis, MIT, Distributed by Indiana University Linguistics Club.

[11] Hayes, Bruce (1995). Metrical Stress Theory: Principles and Case Studies. London: The University of Chicago Press, Ltd.

[12] Martin, James (1972). "Rhythmic (hierarchical) versus serial structure in speech and other behavior". Psychological Review. 79 (6). Psychological Review 79(6): 487–509. doi:10.1037/h0033467. PMID 4634593.

[13] Ladd, D. Robert (1996). "Intonational Phonology" (Document). Cambridge University Press: Cambridge, UK.

[14] Ladd, D. Robert (1980). "The structure of intonational meaning: Evidence from English" (Document). Indiana University Press: Bloomington.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]