Part of a series on | ||||||
Phonetics | ||||||
---|---|---|---|---|---|---|
Part of the Linguistics Series | ||||||
Subdisciplines | ||||||
Articulation | ||||||
| ||||||
Acoustics | ||||||
| ||||||
Perception | ||||||
| ||||||
Linguistics portal | ||||||
The motor theory of speech perception is the hypothesis that people perceive spoken words by identifying the vocal tract gestures with which they are pronounced rather than by identifying the sound patterns that speech generates. [1] [2] [3] [4] [5] It originally claimed that speech perception is done through a specialized module that is innate and human-specific. Though the idea of a module has been qualified in more recent versions of the theory, [5] the idea remains that the role of the speech motor system is not only to produce speech articulations but also to detect them.
The hypothesis has gained more interest outside the field of speech perception than inside. This has increased particularly since the discovery of mirror neurons that link the production and perception of motor movements, including those made by the vocal tract. [5] The theory was initially proposed in the Haskins Laboratories in the 1950s by Alvin Liberman and Franklin S. Cooper, and developed further by Donald Shankweiler, Michael Studdert-Kennedy, Ignatius Mattingly, Carol Fowler and Douglas Whalen.
The hypothesis has its origins in research using pattern playback to create reading machines for the blind that would substitute sounds for orthographic letters. [6] This led to a close examination of how spoken sounds correspond to the acoustic spectrogram of them as a sequence of auditory sounds. This found that successive consonants and vowels overlap in time with one another (a phenomenon known as coarticulation). [7] [8] [9] This suggested that speech is not heard like an acoustic "alphabet" or "cipher," but as a "code" of overlapping speech gestures.
Initially, the theory was associationist: infants mimic the speech they hear and that this leads to behavioristic associations between articulation and its sensory consequences. Later, this overt mimicry would be short-circuited and become speech perception. [8] This aspect of the theory was dropped, however, with the discovery that prelinguistic infants could already detect most of the phonetic contrasts used to separate different speech sounds. [1]
The behavioristic approach was replaced by a cognitivist one in which there was a speech module. [1] The module detected speech in terms of hidden distal objects rather than at the proximal or immediate level of their input. The evidence for this was the research finding that speech processing was special such as duplex perception. [10]
Initially, speech perception was assumed to link to speech objects that were both
This was later revised to include the phonetic gestures rather than motor commands, [1] and then the gestures intended by the speaker at a prevocal, linguistic level, rather than actual movements. [12]
The "speech is special" claim has been dropped, [5] as it was found that speech perception could occur for nonspeech sounds (for example, slamming doors for duplex perception). [13]
The discovery of mirror neurons has led to renewed interest in the motor theory of speech perception, and the theory still has its advocates, [5] although there are also critics. [14]
If speech is identified in terms of how it is physically made, then nonauditory information should be incorporated into speech percepts even if it is still subjectively heard as "sounds". This is, in fact, the case.
Using a speech synthesizer, speech sounds can be varied in place of articulation along a continuum from /bɑ/ to /dɑ/ to /ɡɑ/, or in voice onset time on a continuum from /dɑ/ to /tɑ/ (for example). When listeners are asked to discriminate between two different sounds, they perceive sounds as belonging to discrete categories, even though the sounds vary continuously. In other words, 10 sounds (with the sound on one extreme being /dɑ/ and the sound on the other extreme being /tɑ/, and the ones in the middle varying on a scale) may all be acoustically different from one another, but the listener will hear all of them as either /dɑ/ or /tɑ/. Likewise, the English consonant /d/ may vary in its acoustic details across different phonetic contexts (the /d/ in /du/ does not technically sound the same as the one in /di/, for example), but all /d/'s as perceived by a listener fall within one category (voiced alveolar plosive) and that is because "linguistic representations are abstract, canonical, phonetic segments or the gestures that underlie these segments." [17] This suggests that humans identify speech using categorical perception, and thus that a specialized module, such as that proposed by the motor theory of speech perception, may be on the right track. [18]
If people can hear the gestures in speech, then the imitation of speech should be very fast, as in when words are repeated that are heard in headphones as in speech shadowing. [19] People can repeat heard syllables more quickly than they would be able to produce them normally. [20]
Evidence exists that perception and production are generally coupled in the motor system. This is supported by the existence of mirror neurons that are activated both by seeing (or hearing) an action and when that action is carried out. [29] Another source of evidence is that for common coding theory between the representations used for perception and action. [30]
The motor theory of speech perception is not widely held in the field of speech perception, though it is more popular in other fields, such as theoretical linguistics. As three of its advocates have noted, "it has few proponents within the field of speech perception, and many authors cite it primarily to offer critical commentary". [5] p. 361 Several critiques of it exist. [31] [32]
Speech perception is affected by nonproduction sources of information, such as context. Individual words are hard to understand in isolation but easy when heard in sentence context. It therefore seems that speech perception uses multiple sources that are integrated together in an optimal way. [31]
The motor theory of speech perception would predict that speech motor abilities in infants predict their speech perception abilities, but in actuality it is the other way around. [33] It would also predict that defects in speech production would impair speech perception, but they do not. [34] However, this only affects the first and already superseded behaviorist version of the theory, where infants were supposed to learn all production-perception patterns by imitation early in childhood. This is no longer the mainstream view of motor-speech theorists.
Several sources of evidence for a specialized speech module have failed to be supported.
As a result, this part of the theory has been dropped by some researchers. [5]
The evidence provided for the motor theory of speech perception is limited to tasks such as syllable discrimination that use speech units not full spoken words or spoken sentences. As a result, "speech perception is sometimes interpreted as referring to the perception of speech at the sublexical level. However, the ultimate goal of these studies is presumably to understand the neural processes supporting the ability to process speech sounds under ecologically valid conditions, that is, situations in which successful speech sound processing ultimately leads to contact with the mental lexicon and auditory comprehension." [35] This however creates the problem of " a tenuous connection to their implicit target of investigation, speech recognition". [35]
It has been suggested that birds also hear each other's bird song in terms of vocal gestures. [36]
Phonetics is a branch of linguistics that studies how humans produce and perceive sounds, or in the case of sign languages, the equivalent aspects of sign. Linguists who specialize in studying the physical properties of speech are phoneticians. The field of phonetics is traditionally divided into three sub-disciplines based on the research questions involved such as how humans plan and execute movements to produce speech, how various movements affect the properties of the resulting sound, or how humans convert sound waves to linguistic information. Traditionally, the minimal linguistic unit of phonetics is the phone—a speech sound in a language which differs from the phonological unit of phoneme; the phoneme is an abstract categorization of phones, and it is also defined as the smallest unit that discerns meaning between sounds in any given language.
Lip reading, also known as speechreading, is a technique of understanding a limited range of speech by visually interpreting the movements of the lips, face and tongue without sound. Estimates of the range of lip reading vary, with some figures as low as 30% because lip reading relies on context, language knowledge, and any residual hearing. Although lip reading is used most extensively by deaf and hard-of-hearing people, most people with normal hearing process some speech information from sight of the moving mouth.
The McGurk effect is a perceptual phenomenon that demonstrates an interaction between hearing and vision in speech perception. The illusion occurs when the auditory component of one sound is paired with the visual component of another sound, leading to the perception of a third sound. The visual information a person gets from seeing a person speak changes the way they hear the sound. If a person is getting poor-quality auditory information but good-quality visual information, they may be more likely to experience the McGurk effect. Integration abilities for audio and visual information may also influence whether a person will experience the effect. People who are better at sensory integration have been shown to be more susceptible to the effect. Many people are affected differently by the McGurk effect based on many factors, including brain damage and other disorders.
A mirror neuron is a neuron that fires both when an organism acts and when the organism observes the same action performed by another. Thus, the neuron "mirrors" the behavior of the other, as though the observer were itself acting. Mirror neurons are not always physiologically distinct from other types of neurons in the brain; their main differentiating factor is their response patterns. By this definition, such neurons have been directly observed in humans and primate species, and in birds.
In psycholinguistics, language processing refers to the way humans use words to communicate ideas and feelings, and how such communications are processed and understood. Language processing is considered to be a uniquely human ability that is not produced with the same grammatical understanding or systematicity in even human's closest primate relatives.
Speech is a human vocal communication using language. Each language uses phonetic combinations of vowel and consonant sounds that form the sound of its words, and using those words in their semantic character as words in the lexicon of a language according to the syntactic constraints that govern lexical words' function in a sentence. In speaking, speakers perform many different intentional speech acts, e.g., informing, declaring, asking, persuading, directing, and can use enunciation, intonation, degrees of loudness, tempo, and other non-representational or paralinguistic aspects of vocalization to convey meaning. In their speech, speakers also unintentionally communicate many aspects of their social position such as sex, age, place of origin, physical states, psychological states, physico-psychological states, education or experience, and the like.
The two-streams hypothesis is a model of the neural processing of vision as well as hearing. The hypothesis, given its initial characterisation in a paper by David Milner and Melvyn A. Goodale in 1992, argues that humans possess two distinct visual systems. Recently there seems to be evidence of two distinct auditory systems as well. As visual information exits the occipital lobe, and as sound leaves the phonological network, it follows two main pathways, or "streams". The ventral stream leads to the temporal lobe, which is involved with object and visual identification and recognition. The dorsal stream leads to the parietal lobe, which is involved with processing the object's spatial location relative to the viewer and with speech repetition.
Categorical perception is a phenomenon of perception of distinct categories when there is a gradual change in a variable along a continuum. It was originally observed for auditory stimuli but now found to be applicable to other perceptual modalities.
Speech perception is the process by which the sounds of language are heard, interpreted, and understood. The study of speech perception is closely linked to the fields of phonology and phonetics in linguistics and cognitive psychology and perception in psychology. Research in speech perception seeks to understand how human listeners recognize speech sounds and use this information to understand spoken language. Speech perception research has applications in building computer systems that can recognize speech, in improving speech recognition for hearing- and language-impaired listeners, and in foreign-language teaching.
Haskins Laboratories, Inc. is an independent 501(c) non-profit corporation, founded in 1935 and located in New Haven, Connecticut, since 1970. Haskins has formal affiliation agreements with both Yale University and the University of Connecticut; it remains fully independent, administratively and financially, of both Yale and UConn. Haskins is a multidisciplinary and international community of researchers that conducts basic research on spoken and written language. A guiding perspective of their research is to view speech and language as emerging from biological processes, including those of adaptation, response to stimuli, and conspecific interaction. Haskins Laboratories has a long history of technological and theoretical innovation, from creating systems of rules for speech synthesis and development of an early working prototype of a reading machine for the blind to developing the landmark concept of phonemic awareness as the critical preparation for learning to read an alphabetic writing system.
Alvin Meyer Liberman was born in St. Joseph, Missouri. Liberman was an American psychologist. His ideas set the agenda for fifty years of psychological research in speech perception.
Duplex perception refers to the linguistic phenomenon whereby "part of the acoustic signal is used for both a speech and a nonspeech percept." A listener is presented with two simultaneous, dichotic stimuli. One ear receives an isolated third-formant transition that sounds like a nonspeech chirp. At the same time the other ear receives a base syllable. This base syllable consists of the first two formants, complete with formant transitions, and the third formant without a transition. Normally, there would be peripheral masking in such a binaural listening task but this does not occur. Instead, the listener's percept is duplex, that is, the completed syllable is perceived and the nonspeech chirp is heard at the same time. This is interpreted as being due to the existence of a special speech module.
Phonological development refers to how children learn to organize sounds into meaning or language (phonology) during their stages of growth.
The concept of motor cognition grasps the notion that cognition is embodied in action, and that the motor system participates in what is usually considered as mental processing, including those involved in social interaction. The fundamental unit of the motor cognition paradigm is action, defined as the movements produced to satisfy an intention towards a specific motor goal, or in reaction to a meaningful event in the physical and social environments. Motor cognition takes into account the preparation and production of actions, as well as the processes involved in recognizing, predicting, mimicking, and understanding the behavior of other people. This paradigm has received a great deal of attention and empirical support in recent years from a variety of research domains including embodied cognition, developmental psychology, cognitive neuroscience, and social psychology.
Speech shadowing is a psycholinguistic experimental technique in which subjects repeat speech at a delay to the onset of hearing the phrase. The time between hearing the speech and responding, is how long the brain takes to process and produce speech. The task instructs participants to shadow speech, which generates intent to reproduce the phrase while motor regions in the brain unconsciously process the syntax and semantics of the words spoken. Words repeated during the shadowing task would also imitate the parlance of the shadowed speech.
The neuroscience of music is the scientific study of brain-based mechanisms involved in the cognitive processes underlying music. These behaviours include music listening, performing, composing, reading, writing, and ancillary activities. It also is increasingly concerned with the brain basis for musical aesthetics and musical emotion. Scientists working in this field may have training in cognitive neuroscience, neurology, neuroanatomy, psychology, music theory, computer science, and other relevant fields.
Speech repetition occurs when individuals speak the sounds that they have heard another person pronounce or say. In other words, it is the saying by one individual of the spoken vocalizations made by another individual. Speech repetition requires the person repeating the utterance to have the ability to map the sounds that they hear from the other person's oral pronunciation to similar places and manners of articulation in their own vocal tract.
Dichotic listening is a psychological test commonly used to investigate selective attention and the lateralization of brain function within the auditory system. It is used within the fields of cognitive psychology and neuroscience.
Sensory-motor coupling is the coupling or integration of the sensory system and motor system. Sensorimotor integration is not a static process. For a given stimulus, there is no one single motor command. "Neural responses at almost every stage of a sensorimotor pathway are modified at short and long timescales by biophysical and synaptic processes, recurrent and feedback connections, and learning, as well as many other internal and external variables".
Neurocomputational speech processing is computer-simulation of speech production and speech perception by referring to the natural neuronal processes of speech production and speech perception, as they occur in the human nervous system. This topic is based on neuroscience and computational neuroscience.