In speech communication, intelligibility is a measure of how comprehensible speech is in given conditions. Intelligibility is affected by the level (loud but not too loud) and quality of the speech signal, the type and level of background noise, reverberation (some reflections but not too many), and, for speech over communication devices, the properties of the communication system. A common standard measurement for the quality of the intelligibility of speech is the Speech Transmission Index (STI). The concept of speech intelligibility is relevant to several fields, including phonetics, human factors, acoustical engineering, and audiometry.
Speech is considered to be the major method of communication between humans. Humans alter the way they speak and hear according to many factors, like the age, gender, native language and social relationship between talker and listener. Speech intelligibility may also be affected by pathologies such as speech and hearing disorders. [1] [2]
Finally, speech intelligibility is influenced by the environment or limitations on the communication channel. How well a spoken message can be understood in a room is influenced by the
Intelligibility is negatively impacted by background noise and too much reverberation. The relationship between sound and noise levels is generally described in terms of a signal-to-noise ratio. With a background noise level between 35 and 100 dB, the threshold for 100% intelligibility is usually a signal-to-noise ratio of 12 dB. [3] 12 dB means that the signal should be roughly 4 times louder than the background noise. The speech signal ranges from about 200–8000 Hz, while human hearing ranges from about 20-20,000 Hz, so the effects of masking depend on the frequency range of the masking noise. Additionally, different speech sounds make use of different parts of the speech frequency spectrum, so a continuous background noise such as white or pink noise will have a different effect on intelligibility than a variable or modulated background noise such as competing speech, multi-talker or "cocktail party" babble, or industrial machinery.
Reverberation also affects the speech signal by blurring speech sounds over time. This has the effect of enhancing vowels with steady states, while masking stops, glides and vowel transitions, and prosodic cues such as pitch and duration. [4]
The fact that background noise compromises intelligibility is exploited in audiometric testing involving spoken speech and some linguistic perception experiments as a way to compensate for the ceiling effect by making listening tasks more difficult.
Quantity to be measured | Unit of measurement | Good values |
---|---|---|
STI | Intelligibility (internationally known) | > 0.6 |
CIS | Intelligibility (internationally known) | > 0.78 |
%Alcons | Articulation loss (popular in USA) | < 10% |
C50 | Clarity index (widespread in Germany) | > 3 dB |
RASTI (obsolete) | Intelligibility (internationally known) | > 0.6 |
Word articulation remains high even when only 1–2% of the wave is unaffected by distortion. [5]
The human brain automatically changes speech made in noise through a process called the Lombard effect. Such speech has increased intelligibility compared to normal speech. It is not only louder but the frequencies of its phonetic fundamental are increased and the durations of its vowels are prolonged. People also tend to make more noticeable facial movements. [6] [7]
Shouted speech is less intelligible than Lombard speech because increased vocal energy produces decreased phonetic information. [8] However, "infinite peak clipping of shouted speech makes it almost as intelligible as normal speech." [9]
Clear speech is used when talking to a person with a hearing impairment. It is characterized by a slower speaking rate, more and longer pauses, elevated speech intensity, increased word duration, "targeted" vowel formants, increased consonant intensity compared to adjacent vowels, and a number of phonological changes (including fewer reduced vowels and more released stop bursts). [10] [11]
Infant-directed speech—or baby talk—uses a simplified syntax and a small and easier-to-understand vocabulary than speech directed to adults [12] Compared to adult directed speech, it has a higher fundamental frequency, exaggerated pitch range, and slower rate. [13]
Citation speech occurs when people engage self-consciously in spoken language research. It has a slower tempo and fewer connected speech processes (e.g., shortening of nuclear vowels, devoicing of word-final consonants) than normal speech. [14]
Hyperspace speech, also known as the hyperspace effect, occurs when people are misled about the presence of environment noise. It involves modifying the formants F1 and F2 of phonetic vowel targets to ease perceived difficulties on the part of the listener in recovering information from the acoustic signal. [14]
Phonetics is a branch of linguistics that studies how humans produce and perceive sounds or in the case of sign languages, the equivalent aspects of sign. Linguists who specialize in studying the physical properties of speech are phoneticians. The field of phonetics is traditionally divided into three sub-disciplines based on the research questions involved such as how humans plan and execute movements to produce speech, how various movements affect the properties of the resulting sound or how humans convert sound waves to linguistic information. Traditionally, the minimal linguistic unit of phonetics is the phone—a speech sound in a language which differs from the phonological unit of phoneme; the phoneme is an abstract categorization of phones and it is also defined as the smallest unit that discerns meaning between sounds in any given language.
Reverberation, in acoustics, is a persistence of sound after it is produced. Reverberation is created when a sound or signal is reflected. This causes numerous reflections to build up and then decay as the sound is absorbed by the surfaces of objects in the space – which could include furniture, people, and air. This is most noticeable when the sound source stops but the reflections continue, their amplitude decreasing, until zero is reached.
Lip reading, also known as speechreading, is a technique of understanding a limited range of speech by visually interpreting the movements of the lips, face and tongue without sound. Estimates of the range of lip reading vary, with some figures as low as 30% because lip reading relies on context, language knowledge, and any residual hearing. Although lip reading is used most extensively by deaf and hard-of-hearing people, most people with normal hearing process some speech information from sight of the moving mouth.
The McGurk effect is a perceptual phenomenon that demonstrates an interaction between hearing and vision in speech perception. The illusion occurs when the auditory component of one sound is paired with the visual component of another sound, leading to the perception of a third sound. The visual information a person gets from seeing a person speak changes the way they hear the sound. If a person is getting poor-quality auditory information but good-quality visual information, they may be more likely to experience the McGurk effect. Integration abilities for audio and visual information may also influence whether a person will experience the effect. People who are better at sensory integration have been shown to be more susceptible to the effect. Many people are affected differently by the McGurk effect based on many factors, including brain damage and other disorders.
A hearing test provides an evaluation of the sensitivity of a person's sense of hearing and is most often performed by an audiologist using an audiometer. An audiometer is used to determine a person's hearing sensitivity at different frequencies. There are other hearing tests as well, e.g., Weber test and Rinne test.
Speech perception is the process by which the sounds of language are heard, interpreted, and understood. The study of speech perception is closely linked to the fields of phonology and phonetics in linguistics and cognitive psychology and perception in psychology. Research in speech perception seeks to understand how human listeners recognize speech sounds and use this information to understand spoken language. Speech perception research has applications in building computer systems that can recognize speech, in improving speech recognition for hearing- and language-impaired listeners, and in foreign-language teaching.
The ASA Silver Medal is an award presented by the Acoustical Society of America to individuals, without age limitation, for contributions to the advancement of science, engineering, or human welfare through the application of acoustic principles or through research accomplishments in acoustics. The medal is awarded in a number of categories depending on the technical committee responsible for making the nomination.
Japanese has one liquid phoneme, realized usually as an apico-alveolar tap and sometimes as an alveolar lateral approximant. English has two: rhotic and lateral, with varying phonetic realizations centered on the postalveolar approximant and on the alveolar lateral approximant, respectively. Japanese speakers who learn English as a second language later than childhood often have difficulty in hearing and producing the and of English accurately.
Hypocorrection is a sociolinguistic phenomenon that involves the purposeful addition of slang or a shift in pronunciation, word form, or grammatical construction and is propelled by a desire to appear less intelligible or to strike rapport. That contrasts with hesitation and modulation because rather than not having the right words to say or choosing to avoid them, the speaker chooses to adopt a nonstandard form of speech as a strategy to establish distance from or to become closer to their interlocutor.
Speech shadowing is a psycholinguistic experimental technique in which subjects repeat speech at a delay to the onset of hearing the phrase. The time between hearing the speech and responding, is how long the brain takes to process and produce speech. The task instructs participants to shadow speech, which generates intent to reproduce the phrase while motor regions in the brain unconsciously process the syntax and semantics of the words spoken. Words repeated during the shadowing task would also imitate the parlance of the shadowed speech.
The Lombard effect or Lombard reflex is the involuntary tendency of speakers to increase their vocal effort when speaking in loud noise to enhance the audibility of their voice. This change includes not only loudness but also other acoustic features such as pitch, rate, and duration of syllables. This compensation effect maintains the auditory signal-to-noise ratio of the speaker's spoken words.
The motor theory of speech perception is the hypothesis that people perceive spoken words by identifying the vocal tract gestures with which they are pronounced rather than by identifying the sound patterns that speech generates. It originally claimed that speech perception is done through a specialized module that is innate and human-specific. Though the idea of a module has been qualified in more recent versions of the theory, the idea remains that the role of the speech motor system is not only to produce speech articulations but also to detect them.
Speech acquisition focuses on the development of vocal, acoustic and oral language by a child. This includes motor planning and execution, pronunciation, phonological and articulation patterns.
Phonemic restoration effect is a perceptual phenomenon where under certain conditions, sounds actually missing from a speech signal can be restored by the brain and may appear to be heard. The effect occurs when missing phonemes in an auditory signal are replaced with a noise that would have the physical properties to mask those phonemes, creating an ambiguity. In such ambiguity, the brain tends towards filling in absent phonemes. The effect can be so strong that some listeners may not even notice that there are phonemes missing. This effect is commonly observed in a conversation with heavy background noise, making it difficult to properly hear every phoneme being spoken. Different factors can change the strength of the effect, including how rich the context or linguistic cues are in speech, as well as the listener's state, such as their hearing status or age.
Temporal envelope (ENV) and temporal fine structure (TFS) are changes in the amplitude and frequency of sound perceived by humans over time. These temporal changes are responsible for several aspects of auditory perception, including loudness, pitch and timbre perception and spatial hearing.
Christian Lorenzi is Professor of Experimental Psychology at École Normale Supérieure in Paris, France, where he has been Director of the Department of Cognitive Studies and Director of Scientific Studies until. Lorenzi works on auditory perception.
Deniz Başkent is a Turkish-born Dutch auditory scientist who works on auditory perception. As of 2018, she is Professor of Audiology at the University Medical Center Groningen, Netherlands.
Robert V. Shannon is Research Professor of Otolaryngology-Head & Neck Surgery and Affiliated Research Professor of Biomedical Engineering at University of Southern California, CA, USA. Shannon investigates the basic mechanisms underlying auditory neural processing by users of cochlear implants, auditory brainstem implants, and midbrain implants.
Binaural unmasking is phenomenon of auditory perception discovered by Ira Hirsh. In binaural unmasking, the brain combines information from the two ears in order to improve signal detection and identification in noise. The phenomenon is most commonly observed when there is a difference between the interaural phase of the signal and the interaural phase of the noise. When such a difference is present there is an improvement in masking threshold compared to a reference situation in which the interaural phases are the same, or when the stimulus has been presented monaurally. Those two cases usually give very similar thresholds. The size of the improvement is known as the "binaural masking level difference" (BMLD), or simply as the "masking level difference".
Sandra Gordon-Salant is an American audiologist. She is a professor at the University of Maryland, College Park, where she is also director of the doctoral program in clinical audiology. Gordon-Salant investigates the effects of aging and hearing loss on auditory processes, as well as signal enhancement devices for hearing-impaired listeners. She is the senior editor of the 2010 book, The Aging Auditory System. Gordon-Salant has served as editor of the Journal of Speech, Language, and Hearing Research.