Part of a series on | ||||||
Phonetics | ||||||
---|---|---|---|---|---|---|
Part of the Linguistics Series | ||||||
Subdisciplines | ||||||
Articulation | ||||||
| ||||||
Acoustics | ||||||
| ||||||
Perception | ||||||
| ||||||
Linguistics portal | ||||||
Speech production is the process by which thoughts are translated into speech. This includes the selection of words, the organization of relevant grammatical forms, and then the articulation of the resulting sounds by the motor system using the vocal apparatus. Speech production can be spontaneous such as when a person creates the words of a conversation, reactive such as when they name a picture or read aloud a written word, or imitative, such as in speech repetition. Speech production is not the same as language production since language can also be produced manually by signs.
In ordinary fluent conversation people pronounce roughly four syllables, ten or twelve phonemes and two to three words out of their vocabulary (that can contain 10 to 100 thousand words) each second. [1] Errors in speech production are relatively rare occurring at a rate of about once in every 900 words in spontaneous speech. [2] Words that are commonly spoken or learned early in life or easily imagined are quicker to say than ones that are rarely said, learnt later in life, or are abstract. [3] [4]
Normally speech is created with pulmonary pressure provided by the lungs that generates sound by phonation through the glottis in the larynx that then is modified by the vocal tract into different vowels and consonants. However speech production can occur without the use of the lungs and glottis in alaryngeal speech by using the upper parts of the vocal tract. An example of such alaryngeal speech is Donald Duck talk. [5]
The vocal production of speech may be associated with the production of hand gestures that act to enhance the comprehensibility of what is being said. [6]
The development of speech production throughout an individual's life starts from an infant's first babble and is transformed into fully developed speech by the age of five. [7] The first stage of speech doesn't occur until around age one (holophrastic phase). Between the ages of one and a half and two and a half the infant can produce short sentences (telegraphic phase). After two and a half years the infant develops systems of lemmas used in speech production. Around four or five the child's lemmas are largely increased; this enhances the child's production of correct speech and they can now produce speech like an adult. An adult now develops speech in four stages: Activation of lexical concepts, select lemmas needed, morphologically and phonologically encode speech, and the word is phonetically encoded. [7]
The production of spoken language involves three major levels of processing: conceptualization, formulation, and articulation. [1] [8] [9]
The first is the processes of conceptualization or conceptual preparation, in which the intention to create speech links a desired concept to the particular spoken words to be expressed. Here the preverbal intended messages are formulated that specify the concepts to be expressed. [10]
The second stage is formulation in which the linguistic form required for the expression of the desired message is created. Formulation includes grammatical encoding, morpho-phonological encoding, and phonetic encoding. [10] Grammatical encoding is the process of selecting the appropriate syntactic word or lemma. The selected lemma then activates the appropriate syntactic frame for the conceptualized message. Morpho-phonological encoding is the process of breaking words down into syllables to be produced in overt speech. Syllabification is dependent on the preceding and proceeding words, for instance: I-com-pre-hend vs. I-com-pre-hen-dit. [10] The final part of the formulation stage is phonetic encoding. This involves the activation of articulatory gestures dependent on the syllables selected in the morpho-phonological process, creating an articulatory score as the utterance is pieced together and the order of movements of the vocal apparatus is completed. [10]
The third stage of speech production is articulation, which is the execution of the articulatory score by the lungs, glottis, larynx, tongue, lips, jaw and other parts of the vocal apparatus resulting in speech. [8] [10]
The motor control for speech production in right handed people depends mostly upon areas in the left cerebral hemisphere. These areas include the bilateral supplementary motor area, the left posterior inferior frontal gyrus, the left insula, the left primary motor cortex and temporal cortex. [11] There are also subcortical areas involved such as the basal ganglia and cerebellum. [12] [13] The cerebellum aids the sequencing of speech syllables into fast, smooth and rhythmically organized words and longer utterances. [13]
Speech production can be affected by several disorders:
Until the late 1960s research on speech was focused on comprehension. As researchers collected greater volumes of speech error data, they began to investigate the psychological processes responsible for the production of speech sounds and to contemplate possible processes for fluent speech. [14] Findings from speech error research were soon incorporated into speech production models. Evidence from speech error data supports the following conclusions about speech production.
Some of these ideas include:
Models of speech production must contain specific elements to be viable. These include the elements from which speech is composed, listed below. The accepted models of speech production discussed in more detail below all incorporate these stages either explicitly or implicitly, and the ones that are now outdated or disputed have been criticized for overlooking one or more of the following stages. [16]
The attributes of accepted speech models are:
a) a conceptual stage where the speaker abstractly identifies what they wish to express. [16]
b) a syntactic stage where a frame is chosen that words will be placed into, this frame is usually sentence structure. [16]
c) a lexical stage where a search for a word occurs based on meaning. Once the word is selected and retrieved, information about it becomes available to the speaker involving phonology and morphology. [16]
d) a phonological stage where the abstract information is converted into a speech like form. [16]
e) a phonetic stage where instructions are prepared to be sent to the muscles of articulation. [16]
Also, models must allow for forward planning mechanisms, a buffer, and a monitoring mechanism.
Following are a few of the influential models of speech production that account for or incorporate the previously mentioned stages and include information discovered as a result of speech error studies and other disfluency data, [17] such as tip-of-the-tongue research.
The Utterance Generator Model was proposed by Fromkin (1971). [18] It is composed of six stages and was an attempt to account for the previous findings of speech error research. The stages of the Utterance Generator Model were based on possible changes in representations of a particular utterance. The first stage is where a person generates the meaning they wish to convey. The second stage involves the message being translated onto a syntactic structure. Here, the message is given an outline. [19] The third stage proposed by Fromkin is where/when the message gains different stresses and intonations based on the meaning. The fourth stage Fromkin suggested is concerned with the selection of words from the lexicon. After the words have been selected in Stage 4, the message undergoes phonological specification. [20] The fifth stage applies rules of pronunciation and produces syllables that are to be outputted. The sixth and final stage of Fromkin's Utterance Generator Model is the coordination of the motor commands necessary for speech. Here, phonetic features of the message are sent to the relevant muscles of the vocal tract so that the intended message can be produced. Despite the ingenuity of Fromkin's model, researchers have criticized this interpretation of speech production. Although The Utterance Generator Model accounts for many nuances and data found by speech error studies, researchers decided it still had room to be improved. [21] [22]
A more recent (than Fromkin's) attempt to explain speech production was published by Garrett in 1975. [23] Garrett also created this model by compiling speech error data. There are many overlaps between this model and the Fromkin model from which it was based, but he added a few things to the Fromkin model that filled some of the gaps being pointed out by other researchers. The Garrett Fromkin models both distinguish between three levels—a conceptual level, and sentence level, and a motor level. These three levels are common to contemporary understanding of Speech Production. [24]
In 1994, [25] Dell proposed a model of the lexical network that became fundamental in the understanding of the way speech is produced. [1] This model of the lexical network attempts to symbolically represent the lexicon, and in turn, explain how people choose the words they wish to produce, and how those words are to be organized into speech. Dell's model was composed of three stages, semantics, words, and phonemes. The words in the highest stage of the model represent the semantic category. (In the image, the words representing semantic category are winter, footwear, feet, and snow represent the semantic categories of boot and skate.) The second level represents the words that refer to the semantic category (In the image, boot and skate). And, the third level represents the phonemes ( syllabic information including onset, vowels, and codas). [26]
Levelt further refined the lexical network proposed by Dell. Through the use of speech error data, Levelt recreated the three levels in Dell's model. The conceptual stratum, the top and most abstract level, contains information a person has about ideas of particular concepts. [27] The conceptual stratum also contains ideas about how concepts relate to each other. This is where word selection would occur, a person would choose which words they wish to express. The next, or middle level, the lemma-stratum, contains information about the syntactic functions of individual words including tense and function. [1] This level functions to maintain syntax and place words correctly into sentence structure that makes sense to the speaker. [27] The lowest and final level is the form stratum which, similarly to the Dell Model, contains syllabic information. From here, the information stored at the form stratum level is sent to the motor cortex where the vocal apparatus are coordinated to physically produce speech sounds.
The physical structure of the human nose, throat, and vocal cords allows for the productions of many unique sounds, these areas can be further broken down into places of articulation. Different sounds are produced in different areas, and with different muscles and breathing techniques. [28] Our ability to utilize these skills to create the various sounds needed to communicate effectively is essential to our speech production. Speech is a psychomotor activity. Speech between two people is a conversation - they can be casual, formal, factual, or transactional, and the language structure/ narrative genre employed differs depending upon the context. Affect is a significant factor that controls speech, manifestations that disrupt memory in language use due to affect include feelings of tension, states of apprehension, as well as physical signs like nausea. Language level manifestations that affect brings could be observed with the speaker's hesitations, repetitions, false starts, incompletion, syntactic blends, etc. Difficulties in manner of articulation can contribute to speech difficulties and impediments. [29] It is suggested that infants are capable of making the entire spectrum of possible vowel and consonant sounds. IPA has created a system for understanding and categorizing all possible speech sounds, which includes information about the way in which the sound is produced, and where the sounds is produced. [29] This is extremely useful in the understanding of speech production because speech can be transcribed based on sounds rather than spelling, which may be misleading depending on the language being spoken. Average speaking rates are in the 120 to 150 words per minute (wpm) range, and same is the recommended guidelines for recording audiobooks. As people grow accustomed to a particular language they are prone to lose not only the ability to produce certain speech sounds, but also to distinguish between these sounds. [29]
Articulation, often associated with speech production, is how people physically produce speech sounds. For people who speak fluently, articulation is automatic and allows 15 speech sounds to be produced per second. [30]
An effective articulation of speech include the following elements – fluency, complexity, accuracy, and comprehensibility. [31]
Before even producing a sound, infants imitate facial expressions and movements. [32] Around 7 months of age, infants start to experiment with communicative sounds by trying to coordinate producing sound with opening and closing their mouths.
Until the first year of life infants cannot produce coherent words, instead they produce a reoccurring babbling sound. Babbling allows the infant to experiment with articulating sounds without having to attend to meaning. This repeated babbling starts the initial production of speech. Babbling works with object permanence and understanding of location to support the networks of our first lexical items or words. [7] The infant’s vocabulary growth increases substantially when they are able to understand that objects exist even when they are not present.
The first stage of meaningful speech does not occur until around the age of one. This stage is the holophrastic phase. [33] The holistic stage refers to when infant speech consists of one word at a time (i.e. papa).
The next stage is the telegraphic phase. In this stage infants can form short sentences (i.e., Daddy sit, or Mommy drink). This typically occurs between the ages of one and a half and two and a half years old. This stage is particularly noteworthy because of the explosive growth of their lexicon. During this stage, infants must select and match stored representations of words to the specific perceptual target word in order to convey meaning or concepts. [32] With enough vocabulary, infants begin to extract sound patterns, and they learn to break down words into phonological segments, increasing further the number of words they can learn. [7] At this point in an infant's development of speech their lexicon consists of 200 words or more and they are able to understand even more than they can speak. [33]
When they reach two and a half years their speech production becomes increasingly complex, particularly in its semantic structure. With a more detailed semantic network the infant learns to express a wider range of meanings, helping the infant develop a complex conceptual system of lemmas.
Around the age of four or five the child lemmas have a wide range of diversity, this helps them select the right lemma needed to produce correct speech. [7] Reading to infants enhances their lexicon. At this age, children who have been read to and are exposed to more uncommon and complex words have 32 million more words than a child who is linguistically impoverished. [34] At this age the child should be able to speak in full complete sentences, similar to an adult.
Language acquisition is the process by which humans acquire the capacity to perceive and comprehend language, as well as to produce and use words and sentences to communicate.
Phonetics is a branch of linguistics that studies how humans produce and perceive sounds, or in the case of sign languages, the equivalent aspects of sign. Linguists who specialize in studying the physical properties of speech are phoneticians. The field of phonetics is traditionally divided into three sub-disciplines based on the research questions involved such as how humans plan and execute movements to produce speech, how various movements affect the properties of the resulting sound, or how humans convert sound waves to linguistic information. Traditionally, the minimal linguistic unit of phonetics is the phone—a speech sound in a language which differs from the phonological unit of phoneme; the phoneme is an abstract categorization of phones, and it is also defined as the smallest unit that discerns meaning between sounds in any given language.
Psycholinguistics or psychology of language is the study of the interrelation between linguistic factors and psychological aspects. The discipline is mainly concerned with the mechanisms by which language is processed and represented in the mind and brain; that is, the psychological and neurobiological factors that enable humans to acquire, use, comprehend, and produce language.
Vocabulary development is a process by which people acquire words. Babbling shifts towards meaningful speech as infants grow and produce their first words around the age of one year. In early word learning, infants build their vocabulary slowly. By the age of 18 months, infants can typically produce about 50 words and begin to make word combinations.
In linguistics, linguistic competence is the system of unconscious knowledge that one knows when they know a language. It is distinguished from linguistic performance, which includes all other factors that allow one to use one's language in practice.
Victoria Alexandra Fromkin was an American linguist who taught at UCLA. She studied slips of the tongue, mishearing, and other speech errors, which she applied to phonology, the study of how the sounds of a language are organized in the mind.
Speech is a human vocal communication using language. Each language uses phonetic combinations of vowel and consonant sounds that form the sound of its words, and using those words in their semantic character as words in the lexicon of a language according to the syntactic constraints that govern lexical words' function in a sentence. In speaking, speakers perform many different intentional speech acts, e.g., informing, declaring, asking, persuading, directing, and can use enunciation, intonation, degrees of loudness, tempo, and other non-representational or paralinguistic aspects of vocalization to convey meaning. In their speech, speakers also unintentionally communicate many aspects of their social position such as sex, age, place of origin, physical states, psychological states, physico-psychological states, education or experience, and the like.
A speech error, commonly referred to as a slip of the tongue or misspeaking, is a deviation from the apparently intended form of an utterance. They can be subdivided into spontaneously and inadvertently produced speech errors and intentionally produced word-plays or puns. Another distinction can be drawn between production and comprehension errors. Errors in speech production and perception are also called performance errors. Some examples of speech error include sound exchange or sound anticipation errors. In sound exchange errors, the order of two individual morphemes is reversed, while in sound anticipation errors a sound from a later syllable replaces one from an earlier syllable. Slips of the tongue are a normal and common occurrence. One study shows that most people can make up to as much as 22 slips of the tongue per day.
Catherine Phebe Browman was an American linguist and speech scientist. She received her Ph.D. in linguistics from the University of California, Los Angeles (UCLA) in 1978. Browman was a research scientist at Bell Laboratories in New Jersey (1967–1972). While at Bell Laboratories, she was known for her work on speech synthesis using demisyllables. She later worked as researcher at Haskins Laboratories in New Haven, Connecticut (1982–1998). She was best known for developing, with Louis Goldstein, of the theory of articulatory phonology, a gesture-based approach to phonological and phonetic structure. The theoretical approach is incorporated in a computational model that generates speech from a gesturally-specified lexicon. Browman was made an honorary member of the Association for Laboratory Phonology.
In linguistics, lexicalization is the process of adding words, set phrases, or word patterns to a language's lexicon.
Language production is the production of spoken or written language. In psycholinguistics, it describes all of the stages between having a concept to express and translating that concept into linguistic forms. These stages have been described in two types of processing models: the lexical access models and the serial models. Through these models, psycholinguists can look into how speeches are produced in different ways, such as when the speaker is bilingual. Psycholinguists learn more about these models and different kinds of speech by using language production research methods that include collecting speech errors and elicited production tasks.
The term linguistic performance was used by Noam Chomsky in 1960 to describe "the actual use of language in concrete situations". It is used to describe both the production, sometimes called parole, as well as the comprehension of language. Performance is defined in opposition to "competence"; the latter describes the mental knowledge that a speaker or listener has of language.
Sentence processing takes place whenever a reader or listener processes a language utterance, either in isolation or in the context of a conversation or a text. Many studies of the human language comprehension process have focused on reading of single utterances (sentences) without context. Extensive research has shown that language comprehension is affected by context preceding a given utterance as well as many other factors.
Phonological development refers to how children learn to organize sounds into meaning or language (phonology) during their stages of growth.
Apraxia of speech (AOS), also called verbal apraxia, is a speech sound disorder affecting an individual's ability to translate conscious speech plans into motor plans, which results in limited and difficult speech ability. By the definition of apraxia, AOS affects volitional movement pattern. However, AOS usually also affects automatic speech.
In psychology, a lemma is an abstract conceptual form of a word that has been mentally selected for utterance in the early stages of speech production. A lemma represents a specific meaning but does not have any specific sounds that are attached to it.
The mental lexicon is defined as a mental dictionary that contains information regarding the word store of a language user, such as their meanings, pronunciations, and syntactic characteristics. The mental lexicon is used in linguistics and psycholinguistics to refer to individual speakers' lexical, or word, representations. However, there is some disagreement as to the utility of the mental lexicon as a scientific construct.
Speech acquisition focuses on the development of vocal, acoustic and oral language by a child. This includes motor planning and execution, pronunciation, phonological and articulation patterns.
Prosodic bootstrapping in linguistics refers to the hypothesis that learners of a primary language (L1) use prosodic features such as pitch, tempo, rhythm, amplitude, and other auditory aspects from the speech signal as a cue to identify other properties of grammar, such as syntactic structure. Acoustically signaled prosodic units in the stream of speech may provide critical perceptual cues by which infants initially discover syntactic phrases in their language. Although these features by themselves are not enough to help infants learn the entire syntax of their native language, they provide various cues about different grammatical properties of the language, such as identifying the ordering of heads and complements in the language using stress prominence, indicating the location of phrase boundaries, and word boundaries. It is argued that prosody of a language plays an initial role in the acquisition of the first language helping children to uncover the syntax of the language, mainly due to the fact that children are sensitive to prosodic cues at a very young age.
Gary S. Dell is an American psycholinguist. He is Professor Emeritus of Psychology and Center for Advanced Study Professor at the University of Illinois at Urbana-Champaign.
{{cite journal}}
: CS1 maint: multiple names: authors list (link){{cite book}}
: |journal=
ignored (help)