Auditory phonetics

Last updated

Auditory phonetics is the branch of phonetics concerned with the hearing of speech sounds and with speech perception. It thus entails the study of the relationships between speech stimuli and a listener's responses to such stimuli as mediated by mechanisms of the peripheral and central auditory systems, including certain areas of the brain. It is said to compose one of the three main branches of phonetics along with acoustic and articulatory phonetics, [1] [2] though with overlapping methods and questions. [3]

Contents

Physical scales and auditory sensations

There is no direct connection between auditory sensations and the physical properties of sound that give rise to them. While the physical (acoustic) properties are objectively measurable, auditory sensations are subjective and can only be studied by asking listeners to report on their perceptions. [4] The table below shows some correspondences between physical properties and auditory sensations.

Physical propertyAuditory perception
amplitude or intensityloudness
fundamental frequencypitch
spectral structuresound quality
durationlength

Segmental and suprasegmental

Auditory phonetics is concerned with both segmental (chiefly vowels and consonants) and prosodic (such as stress, tone, rhythm and intonation) aspects of speech. While it is possible to study the auditory perception of these phenomena without context, in continuous speech all these variables are processed in parallel with significant variability and complex interactions between them. [5] [6] For example, it has been observed that vowels, which are usually described as different from each other in the frequencies of their formants, also have intrinsic values of fundamental frequency (and presumably therefore of pitch) that are different according to the height of the vowel. Thus open vowels typically have lower fundamental frequency than close vowels in a given context, [7] and vowel recognition is likely to interact with the perception of prosody.

In speech research

If there is a distinction to be made between auditory phonetics and speech perception, it is that the former is more closely associated with traditional non-instrumental approaches to phonology and other aspects of linguistics, while the latter is closer to experimental, laboratory-based study. Consequently, the term auditory phonetics is often used to refer to the study of speech without the use of instrumental analysis: the researcher may make use of technology such as recording equipment, or even a simple pen and paper (as used by William Labov in his study of the pronunciation of English in New York department stores), [8] but will not use laboratory techniques such as spectrography or speech synthesis, or methods such as EEG and fMRI that allow phoneticians to directly study the brain's response to sound. Most research in sociolinguistics and dialectology has been based on auditory analysis of data and almost all pronunciation dictionaries are based on impressionistic, auditory analysis of how words are pronounced. It is possible to claim an advantage for auditory analysis over instrumental: Kenneth L. Pike stated "Auditory analysis is essential to phonetic study since the ear can register all those features of sound waves, and only those features, which are above the threshold of audibility ... whereas analysis by instruments must always be checked against auditory reaction". [9] Herbert Pilch attempted to define auditory phonetics in such a way as to avoid any reference to acoustic parameters. [10] In the auditory analysis of phonetic data such as recordings of speech, it is clearly an advantage to have been trained in analytical listening. Practical phonetic training has since the 19th century been seen an essential foundation for phonetic analysis and for the teaching of pronunciation; it is still a significant part of modern phonetics. The best-known type of auditory training has been in the system of cardinal vowels; there is disagreement about the relative importance of auditory and articulatory factors underlying the system, but the importance of auditory training for those who are to use it is indisputable. [11]

Training in the auditory analysis of prosodic factors such as pitch and rhythm is also important. Not all research on prosody has been based on auditory techniques: some pioneering work on prosodic features using laboratory instruments was carried out in the 20th century (e.g. Elizabeth Uldall's work using synthesized intonation contours, [12] Dennis Fry's work on stress perception [13] or Daniel Jones's early work on analyzing pitch contours by means of manually operating the pickup arm of a gramophone to listen repeatedly to individual syllables, checking where necessary against a tuning fork),. [14] However, the great majority of work on prosody has been based on auditory analysis until the recent arrival of approaches explicitly based on computer analysis of the acoustic signal, such as ToBI, INTSINT or the IPO system. [15]

See also

Related Research Articles

In linguistics, creaky voice refers to a low, scratchy sound that occupies the vocal range below the common vocal register. It is a special kind of phonation in which the arytenoid cartilages in the larynx are drawn together; as a result, the vocal folds are compressed rather tightly, becoming relatively slack and compact. They normally vibrate irregularly at 20–50 pulses per second, about two octaves below the frequency of modal voicing, and the airflow through the glottis is very slow. Although creaky voice may occur with very low pitch, as at the end of a long intonation unit, it can also occur with a higher pitch. All contribute to make a speaker's voice sound creaky or raspy.

Phonetics is a branch of linguistics that studies how humans produce and perceive sounds, or in the case of sign languages, the equivalent aspects of sign. Linguists who specialize in studying the physical properties of speech are phoneticians. The field of phonetics is traditionally divided into three sub-disciplines based on the research questions involved such as how humans plan and execute movements to produce speech, how various movements affect the properties of the resulting sound, or how humans convert sound waves to linguistic information. Traditionally, the minimal linguistic unit of phonetics is the phone—a speech sound in a language which differs from the phonological unit of phoneme; the phoneme is an abstract categorization of phones.

Phonology is the branch of linguistics that studies how languages or dialects systematically organize their sounds or, for sign languages, their constituent parts of signs. The term can also refer specifically to the sound or sign system of a particular language variety. At one time, the study of phonology related only to the study of the systems of phonemes in spoken languages, but may now relate to any linguistic analysis either:

A vowel is a syllabic speech sound pronounced without any stricture in the vocal tract. Vowels are one of the two principal classes of speech sounds, the other being the consonant. Vowels vary in quality, in loudness and also in quantity (length). They are usually voiced and are closely involved in prosodic variation such as tone, intonation and stress.

Isochrony is the postulated rhythmic division of time into equal portions by a language. Rhythm is an aspect of prosody, others being intonation, stress, and tempo of speech.

Acoustic phonetics is a subfield of phonetics, which deals with acoustic aspects of speech sounds. Acoustic phonetics investigates time domain features such as the mean squared amplitude of a waveform, its duration, its fundamental frequency, or frequency domain features such as the frequency spectrum, or even combined spectrotemporal features and the relationship of these properties to other branches of phonetics, and to abstract linguistic concepts such as phonemes, phrases, or utterances.

Like many other languages, English has wide variation in pronunciation, both historically and from dialect to dialect. In general, however, the regional dialects of English share a largely similar phonological system. Among other things, most dialects have vowel reduction in unstressed syllables and a complex set of phonological features that distinguish fortis and lenis consonants.

Particularly within North American English, gay male speech has been the focus of numerous modern stereotypes, as well as sociolinguistic studies. Scientific research has uncovered phonetically significant features produced by many gay men and demonstrated that listeners accurately guess speakers' sexual orientation at rates greater than chance. Historically, gay male speech characteristics have been highly stigmatized and their usage may be sometimes coded to a limited number of settings outside of the workplace or other public spaces.

In linguistics, prosody is concerned with elements of speech that are not individual phonetic segments but are properties of syllables and larger units of speech, including linguistic functions such as intonation, stress, and rhythm. Such elements are known as suprasegmentals.

In linguistics, a segment is "any discrete unit that can be identified, either physically or auditorily, in the stream of speech". The term is most used in phonetics and phonology to refer to the smallest elements in a language, and this usage can be synonymous with the term phone.

<span class="mw-page-title-main">Speech</span> Human vocal communication using spoken language

Speech is a human vocal communication using language. Each language uses phonetic combinations of vowel and consonant sounds that form the sound of its words, and using those words in their semantic character as words in the lexicon of a language according to the syntactic constraints that govern lexical words' function in a sentence. In speaking, speakers perform many different intentional speech acts, e.g., informing, declaring, asking, persuading, directing, and can use enunciation, intonation, degrees of loudness, tempo, and other non-representational or paralinguistic aspects of vocalization to convey meaning. In their speech, speakers also unintentionally communicate many aspects of their social position such as sex, age, place of origin, physical states, psychological states, physico-psychological states, education or experience, and the like.

<span class="mw-page-title-main">Kenneth N. Stevens</span> American computer scientist

Kenneth Noble Stevens was the Clarence J. LeBel Professor of Electrical Engineering and Computer Science, and Professor of Health Sciences and Technology at the Research Laboratory of Electronics at MIT. Stevens was head of the Speech Communication Group in MIT's Research Laboratory of Electronics (RLE), and was one of the world's leading scientists in acoustic phonetics.

Speech perception is the process by which the sounds of language are heard, interpreted, and understood. The study of speech perception is closely linked to the fields of phonology and phonetics in linguistics and cognitive psychology and perception in psychology. Research in speech perception seeks to understand how human listeners recognize speech sounds and use this information to understand spoken language. Speech perception research has applications in building computer systems that can recognize speech, in improving speech recognition for hearing- and language-impaired listeners, and in foreign-language teaching.

Catherine Phebe Browman was an American linguist and speech scientist. She received her Ph.D. in linguistics from the University of California, Los Angeles (UCLA) in 1978. Browman was a research scientist at Bell Laboratories in New Jersey (1967–1972). While at Bell Laboratories, she was known for her work on speech synthesis using demisyllables. She later worked as researcher at Haskins Laboratories in New Haven, Connecticut (1982–1998). She was best known for developing, with Louis Goldstein, of the theory of articulatory phonology, a gesture-based approach to phonological and phonetic structure. The theoretical approach is incorporated in a computational model that generates speech from a gesturally-specified lexicon. Browman was made an honorary member of the Association for Laboratory Phonology.

Jennifer Sandra Cole is a professor of linguistics and Director of the Prosody and Speech Dynamics Lab at Northwestern University. Her research uses experimental and computational methods to study the sound structure of language. She was the founding General Editor of Laboratory Phonology (2009–2015) and a founding member of the Association for Laboratory Phonology.

The phonology of Danish is similar to that of the other closely related Scandinavian languages, Swedish and Norwegian, but it also has distinct features setting it apart. For example, Danish has a suprasegmental feature known as stød which is a kind of laryngeal phonation that is used phonemically. It also exhibits extensive lenition of plosives, which is noticeably more common than in the neighboring languages. Because of that and a few other things, spoken Danish is rather hard to understand for Norwegians and Swedes, although they can easily read it.

Klaus J. Kohler is a German phonetician.

Sociophonetics is a branch of linguistics that broadly combines the methods of sociolinguistics and phonetics. It addresses the questions of how socially constructed variation in the sound system is used and learned. The term was first used by Denise Deshaies-Lafontaine in their 1974 dissertation on Quebecois French, with early work in the field focusing on answering questions, chiefly sociolinguistic, using phonetic methods and data. The field began to expand rapidly in the 1990s: interest in the field increased and the boundaries of the field expanded to include a wider diversity of topics. Currently, sociophonetic studies often employ methods and insight from a wide range of fields including psycholinguistics, clinical linguistics, and computational linguistics.

<span class="mw-page-title-main">Intelligent Speech Analyser</span>

Intelligent Speech Analyser (ISA) is a speech and voice analyzer developed in Finland.

References

  1. O'Connor, J.D. (1973). Phonetics (First ed.). Penguin. pp. 17, 96–124. ISBN   0-14-02-1560-3.
  2. Ello. "Auditory Phonetics". ello.uos.de. Retrieved 11 November 2020.
  3. Mack, M. (2004) "Auditory phonetics" in Malmkjaer, K. (ed) The Linguistics Encyclopedia, Routledge, p.51
  4. Denes, Peter; Pinson, Elliott (1993). The Speech Chain (2nd ed.). W.H.Freeman. pp. 94–105. ISBN   0-7167-2344-1.
  5. Wood, Charles C. (1974). "Parallel processing of auditory and phonetic information in speech discrimination". Perception and Psychophysics. 15 (3): 501–8. doi: 10.3758/BF03199292 . S2CID   144044864.
  6. Elman, J. and McClelland, J. (1982) "Exploiting lawful variability in the speech wave" in J.S. Perkell and D. Klatt Invariance and Variability in Speech Processes, Erlbaum, pp. 360-380.
  7. Turner, Paul; Verhoeven, Jo (2011). "Intrinsic vowel pitch: a gradient feature of vowel systems?" (PDF). Proceedings of the International Congress of Phonetic Sciences: 2038–2041. Retrieved 13 November 2020.
  8. Labov, William (1966). The Social Stratification of English in New York City. Washington, D.C.: Center for Applied Linguistics.
  9. Pike, Kenneth (1943). Phonetics. University of Michigan. p. 31.
  10. Pilch, Herbert (1978). "Auditory phonetics". Word. 29 (2): 148–160. doi:10.1080/00437956.1978.11435657.
  11. Ladefoged, Peter (1967). Three Areas of Experimental Phonetics. Oxford. pp. 74–5.
  12. Elizabeth Uldall (1964) "Dimensions of meaning in intonation" in Abercrombie, D. et al (eds) In Honour of Daniel Jones, Longman
  13. Fry, Dennis (1954). "Duration and intensity as physical correlates of linguistic stress". Journal of the Acoustical Society of America. 27 (4): 765–768. doi:10.1121/1.1908022.
  14. Jones, Daniel (1909). Intonation Curves. Leipzig: Teubner.
  15. 't Hart, J.; Collier, R.; Cohen, A. (1990). A perceptual study of intonation. Cambridge.