Speech science

Last updated April 26, 2023

Speech science refers to the study of production, transmission and perception of speech. Speech science involves anatomy, in particular the anatomy of the oro-facial region and neuroanatomy, physiology, and acoustics.

Speech production

The cartilaginous passageways (the bronchi and bronchioles) of the lungs. Gray962.png — The cartilaginous passageways (the bronchi and bronchioles) of the lungs.

Coronal section of larynx and upper part of trachea. Gray954.png — Coronal section of larynx and upper part of trachea.

Sagittal section of nose mouth, pharynx, and larynx. Gray994.png — Sagittal section of nose mouth, pharynx, and larynx.

The production of speech is a highly complex motor task that involves approximately 100 orofacial, laryngeal, pharyngeal, and respiratory muscles.^[2]^[3] Precise and expeditious timing of these muscles is essential for the production of temporally complex speech sounds, which are characterized by transitions as short as 10 ms between frequency bands^[4] and an average speaking rate of approximately 15 sounds per second. Speech production requires airflow from the lungs (respiration) to be phonated through the vocal folds of the larynx (phonation) and resonated in the vocal cavities shaped by the jaw, soft palate, lips, tongue and other articulators (articulation).

Respiration

Respiration is the physical process of gas exchange between an organism and its environment involving four steps (ventilation, distribution, perfusion and diffusion) and two processes (inspiration and expiration). Respiration can be described as the mechanical process of air flowing into and out of the lungs on the principle of Boyle's law, stating that, as the volume of a container increases, the air pressure will decrease. This relatively negative pressure will cause air to enter the container until the pressure is equalized. During inspiration of air, the diaphragm contracts and the lungs expand drawn by pleurae through surface tension and negative pressure. When the lungs expand, air pressure becomes negative compared to atmospheric pressure and air will flow from the area of higher pressure to fill the lungs. Forced inspiration for speech uses accessory muscles to elevate the rib cage and enlarge the thoracic cavity in the vertical and lateral dimensions. During forced expiration for speech, muscles of the trunk and abdomen reduce the size of the thoracic cavity by compressing the abdomen or pulling the rib cage down forcing air out of the lungs.

Phonation

Phonation is the production of a periodic sound wave by vibration of the vocal folds. Airflow from the lungs, as well as laryngeal muscle contraction, causes movement of the vocal folds. It is the properties of tension and elasticity that allow the vocal folds to be stretched, bunched, brought together and separated. During prephonation, the vocal folds move from the abducted to adducted position. Subglottal pressure builds and air flow forces the folds apart, inferiorly to superiorly. If the volume of airflow is constant, the velocity of the flow will increase at the area of constriction and cause a decrease in pressure below once distributed. This negative pressure will pull the initially blow open folds back together again. The cycle repeats until the vocal folds are abducted to inhibit phonation or to take a breath.

Articulation

In a third process of speech production, articulation, mobile and immobile structures of the face (articulators) adjust the shape of the mouth, pharynx and nasal cavities (vocal tract) as the vocal fold vibration sound passes through producing varying resonant frequencies.

Central nervous control

The analysis of brain lesions and the correlation between lesion locations and behavioral deficits were the most important sources of knowledge about the cerebral mechanisms underlying speech production for many years.^[5]^[6] The seminal lesion studies of Paul Broca indicated that the production of speech relies on the functional integrity of the left inferior frontal gyrus.^[7]

More recently, the results of noninvasive neuroimaging techniques, such as functional magnetic resonance imaging (fMRI), provide growing evidence that complex human skills are not primarily located in highly specialized brain areas (e.g., Broca's area) but are organized in networks connecting several different areas of both hemispheres instead. Functional neuroimaging identified a complex neural network underlying speech production including cortical and subcortical areas, such as the supplementary motor area, cingulate motor areas, primary motor cortex, basal ganglia, and cerebellum.^[8]^[9]

Speech perception

Speech perception refers to the understanding of speech. The beginning of the process towards understanding speech is first hearing the message that is spoken. The auditory system receives sound signals starting at the outer ear. They enter the pinna and continue into the external auditory canal (ear canal) and then to the eardrum. Once in the middle ear, which consists of the malleus, the incus, and the stapes; the sounds are changed into mechanical energy. After being converted into mechanical energy, the message reaches the oval window, which is the beginning of the inner ear. Once inside the inner ear, the message is transferred into hydraulic energy by going through the cochlea, which is filled with fluid, and on to the Organ of Corti. This organ again helps the sound to be transferred into a neural impulse that stimulates the auditory pathway and reaches the brain. Sound is then processed in Heschl's gyrus and associated with meaning in Wernicke's area. As for theories of speech perception, there are a motor and an auditory theory. The motor theory is based upon the premise that speech sounds are encoded in the acoustic signal rather than enciphered in it. The auditory theory puts greater emphasis on the sensory and filtering mechanisms of the listener and suggests that speech knowledge is a minor role that's only used in hard perceptual conditions.

Transmission of speech

Waveform (amplitude as a function of time) of the English word "above". Waveform-above.png — Waveform (amplitude as a function of time) of the English word "above".

Spectrogram (frequency as a function of time) of the English word "buy". Spectrogram-buy.png — Spectrogram (frequency as a function of time) of the English word "buy".

Speech is transmitted through sound waves, which follow the basic principles of acoustics. The source of all sound is vibration. For sound to exist, a source (something put into vibration) and a medium (something to transmit the vibrations) are necessary.

Since sound waves are produced by a vibrating body, the vibrating object moves in one direction and compresses the air directly in front of it. As the vibrating object moves in the opposite direction, the pressure on the air is lessened so that an expansion, or rarefaction, of air molecules occurs. One compression and one rarefaction make up one longitudinal wave. The vibrating air molecules move back and forth parallel to the direction of motion of the wave, receiving energy from adjacent molecules nearer the source and passing the energy to adjacent molecules farther from the source. Sound waves have two general characteristics: A disturbance is in some identifiable medium in which energy is transmitted from place to place, but the medium does not travel between two places.

Important basic characteristics of waves are wavelength, amplitude, period, and frequency. Wavelength is the length of the repeating wave shape. Amplitude is the maximum displacement of the particles of the medium, which is determined by the energy of the wave. A period (measured in seconds) is the time for one wave to pass a given point. Frequency of the wave is the number of waves passing a given point in a unit of time. Frequency is measured in hertz (hz); (Hz cycles per second) and is perceived as pitch. Each complete vibration of a sound wave is called a cycle. Two other physical properties of sound are intensity and duration. Intensity is measured in decibels (dB) and is perceived as loudness.

There are two types of tones: pure tones and complex tones. The musical note produced by a tuning fork is called a pure tone because it consists of one tone sounding at just one frequency. Instruments get their specific sounds — their timbre — because their sound comes from many different tones all sounding together at different frequencies. A single note played on a piano, for example, actually consists of several tones all sounding together at slightly different frequencies.

Related Research Articles

Acoustics is a branch of physics that deals with the study of mechanical waves in gases, liquids, and solids including topics such as vibration, sound, ultrasound and infrasound. A scientist who works in the field of acoustics is an acoustician while someone working in the field of acoustics technology may be called an acoustical engineer. The application of acoustics is present in almost all aspects of modern society with the most obvious being the audio and noise control industries.

Phonetics is a branch of linguistics that studies how humans produce and perceive sounds, or in the case of sign languages, the equivalent aspects of sign. Linguists who specialize in studying the physical properties of speech are phoneticians. The field of phonetics is traditionally divided into three sub-disciplines based on the research questions involved such as how humans plan and execute movements to produce speech, how various movements affect the properties of the resulting sound, or how humans convert sound waves to linguistic information. Traditionally, the minimal linguistic unit of phonetics is the phone—a speech sound in a language which differs from the phonological unit of phoneme; the phoneme is an abstract categorization of phones, and it is also defined as the smallest unit that discerns meaning between sounds in any given language.

The term phonation has slightly different meanings depending on the subfield of phonetics. Among some phoneticians, phonation is the process by which the vocal folds produce certain sounds through quasi-periodic vibration. This is the definition used among those who study laryngeal anatomy and physiology and speech production in general. Phoneticians in other subfields, such as linguistic phonetics, call this process voicing, and use the term phonation to refer to any oscillatory state of any part of the larynx that modifies the airstream, of which voicing is just one example. Voiceless and supra-glottal phonations are included under this definition.

The human voice consists of sound made by a human being using the vocal tract, including talking, singing, laughing, crying, screaming, shouting, humming or yelling. The human voice frequency is specifically a part of human sound production in which the vocal folds are the primary sound source.

The larynx, commonly called the voice box, is an organ in the top of the neck involved in breathing, producing sound and protecting the trachea against food aspiration. The opening of larynx into pharynx known as the laryngeal inlet is about 4–5 centimeters in diameter. The larynx houses the vocal cords, and manipulates pitch and volume, which is essential for phonation. It is situated just below where the tract of the pharynx splits into the trachea and the esophagus. The word ʻlarynxʼ comes from the Ancient Greek word lárunx ʻlarynx, gullet, throat.ʼ

This is a glossary of medical terms related to communication disorders which are psychological or medical conditions that could have the potential to affect the ways in which individuals can hear, listen, understand, speak and respond to others.

The middle ear is the portion of the ear medial to the eardrum, and distal to the oval window of the cochlea.

The field of articulatory phonetics is a subfield of phonetics that studies articulation and ways that humans produce speech. Articulatory phoneticians explain how humans produce speech sounds via the interaction of different physiological structures. Generally, articulatory phonetics is concerned with the transformation of aerodynamic energy into acoustic energy. Aerodynamic energy refers to the airflow through the vocal tract. Its potential form is air pressure; its kinetic form is the actual dynamic airflow. Acoustic energy is variation in the air pressure that can be represented as sound waves, which are then perceived by the human auditory system as sound.

<span class="mw-page-title-main">Cochlea</span> Snail-shaped part of inner ear involved in hearing

The cochlea is the part of the inner ear involved in hearing. It is a spiral-shaped cavity in the bony labyrinth, in humans making 2.75 turns around its axis, the modiolus. A core component of the cochlea is the Organ of Corti, the sensory organ of hearing, which is distributed along the partition separating the fluid chambers in the coiled tapered tube of the cochlea.

<span class="mw-page-title-main">Hearing test</span> Evaluation of the sensitivity of a persons sense of hearing

A hearing test provides an evaluation of the sensitivity of a person's sense of hearing and is most often performed by an audiologist using an audiometer. An audiometer is used to determine a person's hearing sensitivity at different frequencies. There are other hearing tests as well, e.g., Weber test and Rinne test.

The auditory system is the sensory system for the sense of hearing. It includes both the sensory organs and the auditory parts of the sensory system.

The acoustic reflex is an involuntary muscle contraction that occurs in the middle ear in response to loud sound stimuli or when the person starts to vocalize.

<span class="mw-page-title-main">Conductive hearing loss</span> Medical condition

Conductive hearing loss (CHL) occurs when there is a problem transferring sound waves anywhere along the pathway through the outer ear, tympanic membrane (eardrum), or middle ear (ossicles). If a conductive hearing loss occurs in conjunction with a sensorineural hearing loss, it is referred to as a mixed hearing loss. Depending upon the severity and nature of the conductive loss, this type of hearing impairment can often be treated with surgical intervention or pharmaceuticals to partially or, in some cases, fully restore hearing acuity to within normal range. However, cases of permanent or chronic conductive hearing loss may require other treatment modalities such as hearing aid devices to improve detection of sound and speech perception.

<span class="mw-page-title-main">Tensor tympani muscle</span> Muscle of the middle ear

The tensor tympani is a muscle within the middle ear, located in the bony canal above the bony part of the auditory tube, and connects to the malleus bone. Its role is to dampen loud sounds, such as those produced from chewing, shouting, or thunder. Because its reaction time is not fast enough, the muscle cannot protect against hearing damage caused by sudden loud sounds, like explosions or gunshots.

Speech is a human vocal communication using language. Each language uses phonetic combinations of vowel and consonant sounds that form the sound of its words, and using those words in their semantic character as words in the lexicon of a language according to the syntactic constraints that govern lexical words' function in a sentence. In speaking, speakers perform many different intentional speech acts, e.g., informing, declaring, asking, persuading, directing, and can use enunciation, intonation, degrees of loudness, tempo, and other non-representational or paralinguistic aspects of vocalization to convey meaning. In their speech, speakers also unintentionally communicate many aspects of their social position such as sex, age, place of origin, physical states, psychological states, physico-psychological states, education or experience, and the like.

Vocal cord paresis, also known as recurrent laryngeal nerve paralysis or vocal fold paralysis, is an injury to one or both recurrent laryngeal nerves (RLNs), which control all intrinsic muscles of the larynx except for the cricothyroid muscle. The RLN is important for speaking, breathing and swallowing.

Vocal resonance may be defined as "the process by which the basic product of phonation is enhanced in timbre and/or intensity by the air-filled cavities through which it passes on its way to the outside air." Throughout the vocal literature, various terms related to resonation are used, including: amplification, filtering, enrichment, enlargement, improvement, intensification, and prolongation. Acoustic authorities would question many of these terms from a strictly scientific perspective. However, the main point to be drawn from these terms by a singer or speaker is that the result of resonation is to make a better sound, or at least suitable to a certain esthetical and practical domain.

<span class="mw-page-title-main">Sound</span> Vibration that travels via pressure waves in matter

In physics, sound is a vibration that propagates as an acoustic wave, through a transmission medium such as a gas, liquid or solid. In human physiology and psychology, sound is the reception of such waves and their perception by the brain. Only acoustic waves that have frequencies lying between about 20 Hz and 20 kHz, the audio frequency range, elicit an auditory percept in humans. In air at atmospheric pressure, these represent sound waves with wavelengths of 17 meters (56 ft) to 1.7 centimeters (0.67 in). Sound waves above 20 kHz are known as ultrasound and are not audible to humans. Sound waves below 20 Hz are known as infrasound. Different animal species have varying hearing ranges.

Hearing, or auditory perception, is the ability to perceive sounds through an organ, such as an ear, by detecting vibrations as periodic changes in the pressure of a surrounding medium. The academic field concerned with hearing is auditory science.

Oral skills are speech enhancers that are used to produce clear sentences that are intelligible to an audience. Oral skills are used to enhance the clarity of speech for effective communication. Communication is the transmission of messages and the correct interpretation of information between people. The production speech is insisted by the respiration of air from the lungs that initiates the vibrations in the vocal cords. The cartilages in the larynx adjust the shape, position and tension of the vocal cords. Speech enhancers are used to improve the clarity and pronunciation of speech for correct interpretation of speech. The articulation of voice enhances the resonance of speech and enables people to speak intelligibly. Speaking at a moderate pace and using clear pronunciation improves the phonation of sounds. The term "phonation" means the process to produce intelligible sounds for the correct interpretation of speech. Speaking in a moderate tone enables the audience to process the information word for word.

References

1 2 3 Gray's Anatomy of the Human Body , 20th ed. 1918.
↑ Simonyan K, Horwitz B (April 2011). "Laryngeal motor cortex and control of speech in humans". Neuroscientist. 17 (2): 197–208. doi:10.1177/1073858410386727. PMC 3077440 . PMID 21362688.
↑ Levelt, Willem J. M. (1989). Speaking : from intention to articulation. Cambridge, Mass.: MIT Press. ISBN 978-0-262-12137-8. OCLC 18136175.
↑ Fitch RH, Miller S, Tallal P (1997). "Neurobiology of speech perception". Annu. Rev. Neurosci. 20: 331–53. doi:10.1146/annurev.neuro.20.1.331. PMID 9056717.
↑ Huber P, Gutbrod K, Ozdoba C, Nirkko A, Lövblad KO, Schroth G (January 2000). "[Aphasia research and speech localization in the brain]". Schweiz Med Wochenschr (in German). 130 (3): 49–59. PMID 10683880.
↑ Rorden C, Karnath HO (October 2004). "Using human brain lesions to infer function: a relic from a past era in the fMRI age?". Nat. Rev. Neurosci. 5 (10): 813–9. doi:10.1038/nrn1521. PMID 15378041.
↑ BROCA, M. PAUL (1861). "REMARQUES SUR LE SIÉGE DE LA FACULTÉ DU LANGAGE ARTICULÉ, SUIVIES D'UNE OBSERVATION D'APHÉMIE (PERTE DE LA PAROLE)". Bulletin de la Société Anatomique. York University, Toronto, Ontario. 6: 330–357. Retrieved 20 December 2013.
↑ Riecker A, Mathiak K, Wildgruber D, et al. (February 2005). "fMRI reveals two distinct cerebral networks subserving speech motor control". Neurology. 64 (4): 700–6. doi:10.1212/01.WNL.0000152156.90779.89. PMID 15728295.
↑ Sörös P, Sokoloff LG, Bose A, McIntosh AR, Graham SJ, Stuss DT (August 2006). "Clustered functional MRI of overt speech production". NeuroImage. 32 (1): 376–87. doi:10.1016/j.neuroimage.2006.02.046. PMID 16631384.