Acoustic phonetics

Last updated

Acoustic phonetics is a subfield of phonetics, which deals with acoustic aspects of speech sounds. Acoustic phonetics investigates time domain features such as the mean squared amplitude of a waveform, its duration, its fundamental frequency, or frequency domain features such as the frequency spectrum, or even combined spectrotemporal features and the relationship of these properties to other branches of phonetics (e.g. articulatory or auditory phonetics), and to abstract linguistic concepts such as phonemes, phrases, or utterances.

Contents

The study of acoustic phonetics was greatly enhanced in the late 19th century by the invention of the Edison phonograph. The phonograph allowed the speech signal to be recorded and then later processed and analyzed. By replaying the same speech signal from the phonograph several times, filtering it each time with a different band-pass filter, a spectrogram of the speech utterance could be built up. A series of papers by Ludimar Hermann published in Pflügers Archiv in the last two decades of the 19th century investigated the spectral properties of vowels and consonants using the Edison phonograph, and it was in these papers that the term formant was first introduced. Hermann also played back vowel recordings made with the Edison phonograph at different speeds to distinguish between Willis' and Wheatstone's theories of vowel production.

Further advances in acoustic phonetics were made possible by the development of the telephone industry. (Incidentally, Alexander Graham Bell's father, Alexander Melville Bell, was a phonetician.) During World War II, work at the Bell Telephone Laboratories (which invented the spectrograph) greatly facilitated the systematic study of the spectral properties of periodic and aperiodic speech sounds, vocal tract resonances and vowel formants, voice quality, prosody, etc.

Integrated linear prediction residuals (ILPR) was an effective feature proposed by T V Ananthapadmanabha in 1995, which closely approximates the voice source signal. [1] This proved to be very effective in accurate estimation of the epochs or the glottal closure instant. [2] A G Ramakrishnan et al. showed in 2015 that the discrete cosine transform coefficients of the ILPR contains speaker information that supplements the mel frequency cepstral coefficients. [3] Plosion index is another scalar, time-domain feature that was introduced by T V Ananthapadmanabha et al. for characterizing the closure-burst transition of stop consonants. [4]

On a theoretical level, speech acoustics can be modeled in a way analogous to electrical circuits. Lord Rayleigh was among the first to recognize that the new electric theory could be used in acoustics, but it was not until 1941 that the circuit model was effectively used, in a book by Chiba and Kajiyama called "The Vowel: Its Nature and Structure". (This book by Japanese authors working in Japan was published in English at the height of World War II.) In 1952, Roman Jakobson, Gunnar Fant, and Morris Halle wrote "Preliminaries to Speech Analysis", a seminal work tying acoustic phonetics and phonological theory together. This little book was followed in 1960 by Fant "Acoustic Theory of Speech Production", which has remained the major theoretical foundation for speech acoustic research in both the academy and industry. (Fant was himself very involved in the telephone industry.) Other important framers of the field include Kenneth N. Stevens who wrote "Acoustic Phonetics", Osamu Fujimura, and Peter Ladefoged.

See also

Bibliography

Related Research Articles

Approximants are speech sounds that involve the articulators approaching each other but not narrowly enough nor with enough articulatory precision to create turbulent airflow. Therefore, approximants fall between fricatives, which do produce a turbulent airstream, and vowels, which produce no turbulence. This class is composed of sounds like and semivowels like and, as well as lateral approximants like.

<span class="mw-page-title-main">Formant</span> Spectrum of phonetic resonance in speech production, or its peak

In speech science and phonetics, a formant is the broad spectral maximum that results from an acoustic resonance of the human vocal tract. In acoustics, a formant is usually defined as a broad peak, or local maximum, in the spectrum. For harmonic sounds, with this definition, the formant frequency is sometimes taken as that of the harmonic that is most augmented by a resonance. The difference between these two definitions resides in whether "formants" characterise the production mechanisms of a sound or the produced sound itself. In practice, the frequency of a spectral peak differs slightly from the associated resonance frequency, except when, by luck, harmonics are aligned with the resonance frequency.

<span class="mw-page-title-main">Phonetics</span> Branch of linguistics that comprises the study of the sounds of human language

Phonetics is a branch of linguistics that studies how humans produce and perceive sounds, or in the case of sign languages, the equivalent aspects of sign. Linguists who specialize in studying the physical properties of speech are phoneticians. The field of phonetics is traditionally divided into three sub-disciplines based on the research questions involved such as how humans plan and execute movements to produce speech, how various movements affect the properties of the resulting sound, or how humans convert sound waves to linguistic information. Traditionally, the minimal linguistic unit of phonetics is the phone—a speech sound in a language which differs from the phonological unit of phoneme; the phoneme is an abstract categorization of phones, and it is also defined as the smallest unit that discerns meaning between sounds in any given language.

<span class="mw-page-title-main">Phonology</span> Branch of linguistics concerned with the systematic organization of sounds in languages

Phonology is the branch of linguistics that studies how languages or dialects systematically organize their phones or, for sign languages, their constituent parts of signs. The term can also refer specifically to the sound or sign system of a particular language variety. At one time, the study of phonology related only to the study of the systems of phonemes in spoken languages, but may now relate to any linguistic analysis either:

In phonetics, a plosive, also known as an occlusive or simply a stop, is a pulmonic consonant in which the vocal tract is blocked so that all airflow ceases.

A vowel is a syllabic speech sound pronounced without any stricture in the vocal tract. Vowels are one of the two principal classes of speech sounds, the other being the consonant. Vowels vary in quality, in loudness and also in quantity (length). They are usually voiced and are closely involved in prosodic variation such as tone, intonation and stress.

The field of articulatory phonetics is a subfield of phonetics that studies articulation and ways that humans produce speech. Articulatory phoneticians explain how humans produce speech sounds via the interaction of different physiological structures. Generally, articulatory phonetics is concerned with the transformation of aerodynamic energy into acoustic energy. Aerodynamic energy refers to the airflow through the vocal tract. Its potential form is air pressure; its kinetic form is the actual dynamic airflow. Acoustic energy is variation in the air pressure that can be represented as sound waves, which are then perceived by the human auditory system as sound.

<span class="mw-page-title-main">Voiceless epiglottal trill</span> Consonantal sound represented by ⟨ʜ⟩ in IPA

The voiceless epiglottal or pharyngeal trill, or voiceless epiglottal fricative, is a type of consonantal sound, used in some spoken languages. The symbol in the International Phonetic Alphabet that represents this sound is ʜ, a small capital version of the Latin letter h, and the equivalent X-SAMPA symbol is H\.

The voiceless glottal fricative, sometimes called voiceless glottal transition or the aspirate, is a type of sound used in some spoken languages that patterns like a fricative or approximant consonant phonologically, but often lacks the usual phonetic characteristics of a consonant. The symbol in the International Phonetic Alphabet that represents this sound is h, and the equivalent X-SAMPA symbol is h. However, has been described as a voiceless vowel because in many languages, it lacks the place and manner of articulation of a prototypical consonant, as well as the height and backness of a prototypical vowel:

[h and ɦ] have been described as voiceless or breathy voiced counterparts of the vowels that follow them [but] the shape of the vocal tract [...] is often simply that of the surrounding sounds. [...] Accordingly, in such cases it is more appropriate to regard h and ɦ as segments that have only a laryngeal specification, and are unmarked for all other features. There are other languages [such as Hebrew and Arabic] which show a more definite displacement of the formant frequencies for h, suggesting it has a [glottal] constriction associated with its production.

Linguolabials or apicolabials are consonants articulated by placing the tongue tip or blade against the upper lip, which is drawn downward to meet the tongue. They represent one extreme of a coronal articulatory continuum which extends from linguolabial to subapical palatal places of articulation. Cross-linguistically, linguolabial consonants are very rare, but they do not represent a particularly exotic combination of articulatory configurations, unlike click consonants or ejectives. They are found in a cluster of languages in Vanuatu, in the Kajoko dialect of Bijago in Guinea-Bissau, in Umotína, and as paralinguistic sounds elsewhere. They are also relatively common in disordered speech, and the diacritic is specifically provided for in the extensions to the IPA.

<span class="mw-page-title-main">Gunnar Fant</span>

Carl Gunnar Michael Fant was a leading researcher in speech science in general and speech synthesis in particular who spent most of his career as a professor at the Swedish Royal Institute of Technology (KTH) in Stockholm. He was a first cousin of the actors and directors George Fant and Kenne Fant.

Auditory phonetics is the branch of phonetics concerned with the hearing of speech sounds and with speech perception. It thus entails the study of the relationships between speech stimuli and a listener's responses to such stimuli as mediated by mechanisms of the peripheral and central auditory systems, including certain areas of the brain. It is said to compose one of the three main branches of phonetics along with acoustic and articulatory phonetics, though with overlapping methods and questions.

<span class="mw-page-title-main">Peter Ladefoged</span> British phonetician (1925–2006)

Peter Nielsen Ladefoged was a British linguist and phonetician.

The source–filter model represents speech as a combination of a sound source, such as the vocal cords, and a linear acoustic filter, the vocal tract. While only an approximation, the model is widely used in a number of applications such as speech synthesis and speech analysis because of its relative simplicity. It is also related to linear prediction. The development of the model is due, in large part, to the early work of Gunnar Fant, although others, notably Ken Stevens, have also contributed substantially to the models underlying acoustic analysis of speech and speech synthesis. Fant built off the work of Tsutomu Chiba and Masato Kajiyama, who first showed the relationship between a vowel's acoustic properties and the shape of the vocal tract.

<span class="mw-page-title-main">Ludimar Hermann</span> German physiologist and speech scientist

Ludimar Hermann was a German physiologist and speech scientist who used the Edison phonograph to test theories of vowel production, particularly those of Robert Willis and Charles Wheatstone. He coined the word formant, a term of importance in modern acoustic phonetics. The Hermann grid is named after him; he was the first to report the illusion in scientific literature.

<span class="mw-page-title-main">Kenneth N. Stevens</span> American computer scientist

Kenneth Noble Stevens was the Clarence J. LeBel Professor of Electrical Engineering and Computer Science, and professor of health sciences and technology at the research laboratory of electronics at MIT. Stevens was head of the speech communication group in MIT's research laboratory of electronics (RLE), and was one of the world's leading scientists in acoustic phonetics.

In some schools of phonetics, sounds are distinguished as grave or acute. This is primarily a perceptual classification, based on whether the sounds are perceived as sharp, high intensity, or as dull, low intensity. However, it can also be defined acoustically or in terms of the articulations involved.

<span class="mw-page-title-main">Vowel diagram</span> Schematic arrangement of vowels

A vowel diagram or vowel chart is a schematic arrangement of the vowels. Depending on the particular language being discussed, it can take the form of a triangle or a quadrilateral. Vertical position on the diagram denotes the vowel closeness, with close vowels at the top of the diagram, and horizontal position denotes the vowel backness, with front vowels at the left of the diagram. Vowels are unique in that their main features do not contain differences in voicing, manner, or place (articulators). Vowels differ only in the position of the tongue when voiced. The tongue moves vertically and horizontally within the oral cavity. Vowels are produced with at least a part of their vocal tract obstructed.

In linguistics, pre-stopping, also known as pre-occlusion or pre-plosion, is a phonological process involving the historical or allophonic insertion of a very short stop consonant before a sonorant, such as a short before a nasal or a lateral. The resulting sounds are called pre-stopped consonants, or sometimes pre-ploded or pre-occluded consonants, although technically may be considered an occlusive/stop without the pre-occlusion.

Osamu Fujimura 藤村靖 was a Japanese physicist, phonetician and linguist, recognized as one of the pioneers of speech science. Fujimura was also known for his influential work in the diverse field of speech-related studies including acoustics, phonetics/phonology, instrumentation techniques, speech production mechanisms, and computational/theoretical linguistics.

References

  1. T. V. Ananthapadmanabha, "Acoustic factors determining perceived voice quality", in Vocal fold Physiology - Voice quality control, O.Fujimura and M. Hirano, Eds. San Diego, Cal.: Singualr publishing group, 1995, ch. 7, pp. 113–126.
  2. A. P. Prathosh, T. V. Ananthapadmanabha, and A. G. Ramakrishnan, "Epoch extraction based on integrated linear prediction residual using plosion index", IEEE Transactions on Audio, Speech, and Language Processing, 2013, Vol. 21, Iss. 12, pp. 2471-2480.
  3. A G Ramakrishnan, B Abhiram and S R Mahadeva Prasanna, "Voice source characterization using pitch synchronous discrete cosine transform for speaker identification", Journal of the Acoustical Society of America Express Letters, Vol. 137(), pp., 2015.
  4. T V Ananthapadmanabha, A P Prathosh, A G Ramakrishnan, "Detection of the closure-burst transitions of stops and affricates in continuous speech using the plosion index", Journal of the Acoustical Society of America, Vol. 137, 2015.