Last updated

ToBI ( /ˈtbi/ ; [1] an abbreviation of tones and break indices) is a set of conventions for transcribing and annotating the prosody of speech. The term "ToBI" is sometimes used to refer to the conventions used for describing American English specifically, [2] which was the first ToBI system, developed by Mary Beckman and Janet Pierrehumbert, among others. [3] Other ToBI systems have been defined for a number of languages; for example, J-ToBI refers to the ToBI conventions for Tokyo Japanese, [4] and an adaptation of ToBI to describe Dutch intonation was developed by Carlos Gussenhoven, and called ToDI. [5] Another variation of ToBI, called IViE (Intonational Variation in English), was established in 1998 to enable comparison between several dialects of British English. [6]



A full ToBI transcription consists of six parts: (a) an audio recording, (b) an electronic print-out or paper record of the F0 (fundamental pitch), (c) a tones tier, with an analysis of the tonal events in terms of H and L, (d) a words tier with the words of the utterance in ordinary writing, (e) a break-index tier showing the strength of the junctures, and (f) a miscellaneous tier with comments. [7]

Tonal events

Tonal events include pitch accents, phrase accents, and boundary tones.

Pitch accents, written as H* or L* (high and low tones, respectively), are typically realized on words that carry the most information in a sentence. For example, in the sentence "Mary went to the store to get some milk", a natural pronunciation would include pitch accents on "Mary", "store", and "milk". Other kinds of pitch accents include L*+H (a syllable which starts with a low accent and then rises) and L+H* (again low-high on one syllable, but with the second part accented). [8]

Phrase accents, written H- or L-, are the tones between a pitch accent and a boundary tone. For example, the intonation at the end of a question might be H*L-H%, indicating that the pitch starts high, falls to a low, and rises again; or L*H-H%, indicating that the pitch starts low, then rises steadily to a high. [8]

Boundary tones, written with H% and L%, are affiliated not to words but to phrase edges. For example, the sentence "Mary went to the store" can be pronounced as a statement or a question ("Mary went to the store." vs. "Mary went to the store?"). The contrast between the statement and the question is signalled by a boundary tone at the end of the phrase: a low boundary tone causes a falling pitch contour, signalling the statement, whereas a high boundary tone causes a rising pitch contour, signalling the question.[ citation needed ]

Break indices

Break indices are numbers indicating how strong the break is between words: [8]

The English ToBI standard distinguishes four or five levels of boundary strength, corresponding roughly to breaks between constituents at different levels of the Prosodic Hierarchy. [9] [10] One signal of boundary strength is lengthening of the preceding syllable: the stronger the boundary, the more lengthening of the preceding syllable. [11] In some versions, level 2 is omitted.

Related Research Articles

Tone is the use of pitch in language to distinguish lexical or grammatical meaning—that is, to distinguish or to inflect words. All oral languages use pitch to express emotional and other para-linguistic information and to convey emphasis, contrast and other such features in what is called intonation, but not all languages use tones to distinguish words or their inflections, analogously to consonants and vowels. Languages that have this feature are called tonal languages; the distinctive tone patterns of such a language are sometimes called tonemes, by analogy with phoneme. Tonal languages are common in East and Southeast Asia, Africa, the Americas and the Pacific.

In linguistics, and particularly phonology, stress or accent is the relative emphasis or prominence given to a certain syllable in a word or to a certain word in a phrase or sentence. That emphasis is typically caused by such properties as increased loudness and vowel length, full articulation of the vowel, and changes in tone. The terms stress and accent are often used synonymously in that context but are sometimes distinguished. For example, when emphasis is produced through pitch alone, it is called pitch accent, and when produced through length alone, it is called quantitative accent. When caused by a combination of various intensified properties, it is called stress accent or dynamic accent; English uses what is called variable stress accent.

A pitch-accent language is a type of language that, when spoken, has certain syllables in words or morphemes that are prominent, as indicated by a distinct contrasting pitch rather than by loudness or length, as in some other languages like English. Pitch-accent also contrasts with fully tonal languages like Vietnamese, Thai and Standard Chinese, in which practically every syllable can have an independent tone. Some scholars have claimed that the term "pitch accent" is not coherently defined and that pitch-accent languages are just a sub-category of tonal languages in general.

In phonetics, downdrift is the cumulative lowering of pitch in the course of a sentence due to interactions among tones in a tonal language. Downdrift often occurs when the tones in successive syllables are H L H or H L L H. In this case the second high tone tends to be lower than the first. The effect can accumulate so that with each low tone, the pitch of the high tones becomes slightly lower, until the end of the intonational phrase, when the pitch is "reset".

English phonology is the system of speech sounds used in spoken English. Like many other languages, English has wide variation in pronunciation, both historically and from dialect to dialect. In general, however, the regional dialects of English share a largely similar phonological system. Among other things, most dialects have vowel reduction in unstressed syllables and a complex set of phonological features that distinguish fortis and lenis consonants.

<span class="mw-page-title-main">Japanese pitch accent</span> Japanese language feature

Japanese pitch accent is a feature of the Japanese language that distinguishes words by accenting particular morae in most Japanese dialects. The nature and location of the accent for a given word may vary between dialects. For instance, the word for "river" is in the Tokyo dialect, with the accent on the second mora, but in the Kansai dialect it is. A final or is often devoiced to or after a downstep and an unvoiced consonant.

In linguistics, prosody is the study of elements of speech that are not individual phonetic segments but which are properties of syllables and larger units of speech, including linguistic functions such as intonation, stress, and rhythm. Such elements are known as suprasegmentals.

The phonological hierarchy describes a series of increasingly smaller regions of a phonological utterance, each nested within the next highest region. Different research traditions make use of slightly different hierarchies. For instance, there is one hierarchy which is primarily used in theoretical phonology, while a similar hierarchy is used in discourse analysis. Both are described in the sections below.

The phonology of the Persian language varies between regional dialects, standard varieties, and even from older variates of Persian. Persian is a pluricentric language and countries that have Persian as an official language have separate standard varieties, namely: Standard Dari (Afghanistan), Standard Iranian Persian and Standard Tajik (Tajikistan). The most significant differences between standard varieties of Persian are their vowel systems. Standard varieties of Persian have anywhere from 6 to 8 vowel distinctions, and similar vowels may be pronounced differently between standards. However, there are not many notable differences when comparing consonants, as all standard varieties a similar amount of consonant sounds. Though, colloquial varieties generally have more differences than their standard counterparts. Most dialects feature contrastive stress and syllable-final consonant clusters.

Downstep is a phenomenon in tone languages in which if two syllables have the same tone, the second syllable is lower in pitch than the first.

INTSINT is an acronym for INternational Transcription System for INTonation.

In linguistics, intonation is the variation in pitch used to indicate the speaker's attitudes and emotions, to highlight or focus an expression, to signal the illocutionary act performed by a sentence, or to regulate the flow of discourse. For example, the English question "Does Maria speak Spanish or French?" is interpreted as a yes-or-no question when it is uttered with a single rising intonation contour, but is interpreted as an alternative question when uttered with a rising contour on "Spanish" and a falling contour on "French". Although intonation is primarily a matter of pitch variation, its effects almost always work hand-in-hand with other prosodic features. Intonation is distinct from tone, the phenomenon where pitch is used to distinguish words or to mark grammatical features.

The phonology of Turkish deals with current phonology and phonetics, particularly of Istanbul Turkish. A notable feature of the phonology of Turkish is a system of vowel harmony that causes vowels in most words to be either front or back and either rounded or unrounded. Velar stop consonants have palatal allophones before front vowels.

Mary Esther Beckman is a Professor Emerita of Linguistics at the Ohio State University.

In linguistics, a prosodic unit is a segment of speech that occurs with specific prosodic properties. These properties can be those of stress, intonation, or tonal patterns.

Metrical phonology is a theory of stress or linguistic prominence. The innovative feature of this theory is that the prominence of a unit is defined relative to other units in the same phrase. For example, in the most common pronunciation of the phrase "doctors use penicillin", the syllable '-ci-' is the strongest or most stressed syllable in the phrase, but the syllable 'doc-' is more stressed than the syllable '-tors'. Previously, generative phonologists and the American Structuralists represented prosodic prominence as a feature that applied to individual phonemes (segments) or syllables. This feature could take on multiple values to indicate various levels of stress. Stress was assigned using the cyclic reapplication of rules to words and phrases.

The phonology of second languages is different from the phonology of first languages in various ways. The differences are considered to come from general characteristics of second languages, such as slower speech rate, lower proficiency than native speakers, and from the interaction between non-native speakers' first and second languages.

Pitch accent is a term used in autosegmental-metrical theory for local intonational features that are associated with particular syllables. Within this framework, pitch accents are distinguished from both the abstract metrical stress and the acoustic stress of a syllable. Different languages specify different relationships between pitch accent and stress placement.

Julia Hirschberg is an American computer scientist noted for her research on computational linguistics and natural language processing.

The term boundary tone refers to a rise or fall in pitch that occurs in speech at the end of a sentence or other utterance, or, if a sentence is divided into two or more intonational phrases, at the end of each intonational phrase. It can also refer to a low or high intonational tone at the beginning of an utterance or intonational phrase.


  1. Wells, John C. (2008). Longman Pronunciation Dictionary (3rd ed.). Longman. ISBN   978-1-4058-8118-0.
  2. Beckman, M. E., Hirschberg, J., & Shattuck-Hufnagel, S. (2005). The original ToBI system and the evolution of the ToBI framework. In S.-A. Jun (ed.) Prosodic Typology -- The Phonology of Intonation and Phrasing
  3. Silverman, Kim; Beckman, Mary; Pitrelli, John; Ostendorf, Mari; Wightman, Colin; Price, Patti; Pierrehumbert, Janet; Hirschberg, Julia (1992). "TOBI: A Standard for Labeling English Prosody". International Conference Spoken Language Processing. Banff, Canada: 867–870.
  4. Venditti, J. J. (2005). The J_ToBI model of Japanese intonation. In Sun-Ah Jun (ed.) Prosodic Typology: The Phonology of Intonation and Phrasing, pp. 172-200.
  5. Gussenhoven, Carlos (2010). "Transcription of Dutch Intonation" in Sun-Ah Jun Prosodic Typology: The Phonology of Intonation and Phrasing. Oxford Scholarship Online, chapter 5. DOI: 10.1093/acprof:oso/9780199249633.001.0001.
  6. Cooper, S. (2015) "Intonation in Anglesey Welsh". Bangor University PhD thesis, p. 32.
  7. Cooper, S. (2015) "Intonation in Anglesey Welsh". Bangor University PhD thesis, p. 29.
  8. 1 2 3 Port, R. ToBI Intonation Transcription Summary
  9. Selkirk, E. (1984). Phonology and syntax. MIT Press: Cambridge.
  10. Nespor, M. and I. Vogel. 1986. Prosodic Phonology. Foris.
  11. Wightman, C. W., Shattuck-Hufnagel, S., Ostendorf, M., & Price, P.J. (1992). Segmental durations in the vicinity of prosodic phrase boundaries. Journal of the Acoustical Society of America, 91(3), 1707-1717.