Wolfgang von Kempelen's speaking machine

Last updated

A replica of Kempelen's speaking machine, built 2007-09 at the Department of Phonetics, Saarland University, Saarbrucken, Germany Kempelen Speakingmachine.JPG
A replica of Kempelen's speaking machine, built 2007–09 at the Department of Phonetics, Saarland University, Saarbrücken, Germany

Wolfgang von Kempelen's speaking machine is a manually operated speech synthesizer that began development in 1769, by Austro-Hungarian author and inventor Wolfgang von Kempelen. It was in this same year that he completed his far more infamous contribution to history: The Turk, a chess-playing automaton, later revealed to be a very far-reaching and elaborate hoax due to the chess-playing human-being occupying its innards. [1] But while the Turk's construction was completed in six months, Kempelen's speaking machine occupied the next twenty years of his life. [2] After two conceptual "dead ends" over the first five years of research, Kempelen's third direction ultimately led him to the design he felt comfortable deeming "final": a functional representational model of the human vocal tract. [3]

Contents

First design

Kempelen's first experiment with speech synthesis involved only the most rudimentary elements of the vocal tract necessary to produce speech-like sounds. A kitchen bellows, used to stoke fires in wood-burning stoves, was invoked as a set of lungs to supply the airflow. A reed extracted from a common bagpipe was implemented as the glottis, the source of the raw fundamental sound in the vocal tract. The bell of a clarinet made for a sufficient mouth, despite its rigid form. This basic model was able to produce simple vowel sounds only, though some additional articulation was possible by positioning one's hand at the bell opening to obstruct airflow. The physical hardware for constructing the nasals, plosives and fricatives that most consonants require was not present, however. Kempelen, like many other early pioneers of phonetics, misunderstood the source of the perceived "higher frequencies" of certain sounds as a function of the glottis, rather than as the function of the formants of the entire vocal tract, so he abandoned his single-reed design for a multiple-reed approach. [2] [3]

Second design

The second design involved a console, similar to that of a musical organ of the period, in which the operator manned a set of keys, one for each letter. The sounds were produced by a common bellows that fed air through various pipes with the appropriate shapes and obstructions needed to produce that letter. Through experimentation, he came to find that the reed's resonant length was not crucial to the creation of the high-frequency components of certain vowels and fricatives, so he tuned them all to be the same pitch for the sake of consistency between letters. While not all letters were represented at this point, Kempelen had developed the technology required to produce most vowels and several consonants, including the plosive /p/, and the nasal /m/, and thus was in a position to begin forming syllables and short words. However, this immediately led to the primary flaw of his second design: the parallel nature of the multiple reeds allowed for more than one letter to be sounded at a time. And in the process of building syllables and words, the sonic “overlap” (now referred to as co-articulation) rendered sounds very uncharacteristic of human speech, undermining the intention of the design altogether. Kempelen comments:
“In order to continue my experiments it was necessary, above all, that I should have a perfect knowledge of what I wanted to imitate. I had to make a formal study of speech and continually consult nature as I conducted my experiments. In this way my talking machine and my theory concerning speech made equal progress, the one serving as guide to the other.” [3]
"It was possible, following the methods I'd been using, to invent separate letters, but never to combine them to form syllables, and that it was absolutely necessary to follow nature which has only one glottis and one mouth, through which every sound emerges and which gives a unity to them." [2] [3]
Thus, Kempelen began work on his third, and ultimately final design, which itself was in many ways a "close-as-possible" representation of the physiology of the vocal tract.

Third design

The third approach followed a similar design to the first, which was conceptually more faithful to the natural design of the human vocal tract than the second. It consisted, like before, of a bellows, a reed and a simulated mouth (this time made of India rubber, for better creation of vowel sounds via manipulation by hand), but also included a "throat" to which a "nasal cavity" was attached (complete with two "nostrils" for pronouncing nasal consonants), as well as several levers and tubes dedicated to pronouncing /s/ and /ʃ/, a rod that would interfere with the reeds vibration to articulate /r/, and separate, smaller bellows that would allow air to pass the reed while the mouth was completely closed (a feature required for pronouncing /b/). At one point, a special valve intended to simulate /f/ was included, but was later removed when it was revealed that the same sound could be achieved by simply closing all of the orifices of the machine and allowing air to leak from the cracks. Similarly, at one point in the design, there was an alternate "mouth" assembly consisting of a wooden box with a pair of hinged shutters that acted as lips. Inside the box resided a hinged, wooden, string-operated flap that acted as a tongue. The purpose of this assembly was to mimic the mouth and tongue in the construction of plosives such as "b” and "d”, but was later removed when Kempelen recognized that without a proper tongue, the machine would never be able to produce /t/, /d/, /k/ and /ɡ/. He found his way around this entire problem by replacing /t/ and /k/ with the /p/, and /d/ and /ɡ/ with /b/ (which itself only differed in voicing from /p/). In the context of a familiar word, listeners often ignored the mispronunciation altogether (a phenomenon later explored by researchers in the field of cognitive science). Kempelen believed that people were more forgiving of the errors made by his machine due to the frequency of the reed and vocal tract resonant length he chose to use, which create a resonance much more like a young child, than that of an adult. [2] [3] This third design, unlike those before it, was completely capable of speaking complete phrases in French, Italian and English (German was possible, but required a greater skill-level by the operator, due to the more frequent use of consonants in the German language). Its greatest limitation was the bellows, which, although they were six times the capacity of human lungs, ran empty of air much faster than that of its human counterpart. Because the design was based on a single reed as the glottal sound-source, he had none of the problems of co-articulation that came inherently with the second design. But that single reed also meant that the Speaking Machine had a monotone voice. [1] Kempelen expended some time to try and introduce several prosodic pitch-variation mechanisms into the reed assembly, but to no avail. He decided to leave the design to be improved upon by the next batch of experimenters. All of these important additions for the third design came from the two decades of intensive research of the vocal tract in relation to spoken languages by Kempelen, for which the behavior of each crucial physiological element of speech production was scrutinized and replicated acoustically and/or mechanically. [3]

A significant contribution

Shortly after the completion and exhibition of his Speaking Machine, in 1804, von Kempelen died, though not before publishing an extremely comprehensive journal of the past twenty years of his research in phonetics. The 456-page book, titled Mechanismus der menschlichen Sprache nebst Beschreibung einer sprechenden Maschine (which translates to The Mechanism of Human Speech, with a Description of a Speaking Machine, published in 1791) [2] [1] , contained every technical aspect of both Kempelen's construction of the Speaking Machine (including the preliminary designs) and his studies of the human vocal tract. [3]

In 1837, Sir Charles Wheatstone resurrected the work of Wolfgang von Kempelen, creating an improved replica of his Speaking Machine. [3] [1] Using new technology developed over the previous 50 years, Wheatstone was able to further analyze and synthesize components of acoustic speech, giving rise to the second wave of scientific interest in phonetics. After viewing Wheatstone's improved replica of the Speaking Machine at an exposition, a young Alexander Graham Bell set out to construct his own speaking machine with the help and encouragement of his father. [1] [4] Bell's experiments and research ultimately led to his invention of the telephone in 1876 [1] , which revolutionized global communication.

In 1968, Marcel Van den Broecke (University of Amsterdam) built a replica as part of an MA thesis, about which he reported in "Sound Structures", Marcel van den Broecke, Vincent van Heuven and Wim Zonneveld (eds.), chapter 2, p 9-19: "Wolfgang von Kempelen's Speaking Machine as a Performer", Foris Publications, Dordrecht-Netherlands/Cinnaminson-USA, 1983. Acoustic predictions using N-tube approximations of the vocal tract and applying them to the replica's characteristics showed what had already been established perceptually, namely that the machine could only produce two vowel-like sounds, viz. an /a/-like vowel and an /o/-like vowel. Of the consonants produced, the general purpose plosive is very convincing. A general purpose nasal can also easily be identified, but sibilants and the rattling /r/ are as unpleasant as eye witness von Windisch reported two centuries earlier.

Related Research Articles

In articulatory phonetics, a consonant is a speech sound that is articulated with complete or partial closure of the vocal tract. Examples are and [b], pronounced with the lips; and [d], pronounced with the front of the tongue; and [g], pronounced with the back of the tongue;, pronounced in the throat;, [v], and, pronounced by forcing air through a narrow channel (fricatives); and and, which have air flowing through the nose (nasals). Contrasting with consonants are vowels.

<span class="mw-page-title-main">Manner of articulation</span> Configuration and interaction of the articulators when making a speech sound

In articulatory phonetics, the manner of articulation is the configuration and interaction of the articulators when making a speech sound. One parameter of manner is stricture, that is, how closely the speech organs approach one another. Others include those involved in the r-like sounds, and the sibilancy of fricatives.

Phonetics is a branch of linguistics that studies how humans produce and perceive sounds, or in the case of sign languages, the equivalent aspects of sign. Linguists who specialize in studying the physical properties of speech are phoneticians. The field of phonetics is traditionally divided into three sub-disciplines based on the research questions involved such as how humans plan and execute movements to produce speech, how various movements affect the properties of the resulting sound, or how humans convert sound waves to linguistic information. Traditionally, the minimal linguistic unit of phonetics is the phone—a speech sound in a language which differs from the phonological unit of phoneme; the phoneme is an abstract categorization of phones, and it is also defined as the smallest unit that discerns meaning between sounds in any given language.

In phonetics, a plosive, also known as an occlusive or simply a stop, is a pulmonic consonant in which the vocal tract is blocked so that all airflow ceases.

The field of articulatory phonetics is a subfield of phonetics that studies articulation and ways that humans produce speech. Articulatory phoneticians explain how humans produce speech sounds via the interaction of different physiological structures. Generally, articulatory phonetics is concerned with the transformation of aerodynamic energy into acoustic energy. Aerodynamic energy refers to the airflow through the vocal tract. Its potential form is air pressure; its kinetic form is the actual dynamic airflow. Acoustic energy is variation in the air pressure that can be represented as sound waves, which are then perceived by the human auditory system as sound.

The voiced alveolar implosive is a type of consonantal sound, used in some spoken languages. The symbol in the International Phonetic Alphabet that represents this sound is ɗ. The IPA symbol is lowercase letter d with a rightward hook protruding from the upper right of the letter.

The voiced uvular implosive is an extremely rare type of consonantal sound. The symbol in the International Phonetic Alphabet that represents this sound is ʛ, a small capital letter G with a rightward pointing hook extending from the upper right of the letter.

In phonetics, the airstream mechanism is the method by which airflow is created in the vocal tract. Along with phonation and articulation, it is one of three main components of speech production. The airstream mechanism is mandatory for most sound production and constitutes the first part of this process, which is called initiation.

<span class="mw-page-title-main">Wolfgang von Kempelen</span> Hungarian author and inventor (1734–1804)

Johann Wolfgang Ritter von Kempelen de Pázmánd was a Hungarian author and inventor, known for his chess-playing "automaton" hoax The Turk and for his speaking machine.

In linguistics, a distinctive feature is the most basic unit of phonological structure that distinguishes one sound from another within a language. For example, the feature [voice] distinguishes the two bilabial plosives: [p] and [b]. There are many different ways of defining and arranging features into feature systems: some deal with only one language while others are developed to apply to all languages.

The palatal ejective is a type of consonantal sound, used in some spoken languages. The symbol in the International Phonetic Alphabet that represents this sound is .

A voiceless uvular implosive is a rare consonantal sound, used in some spoken languages. The symbol in the International Phonetic Alphabet that represents this sound is ʛ̥  or qʼ↓. A dedicated IPA letter, ʠ, was withdrawn in 1993.

<span class="mw-page-title-main">Articulatory synthesis</span>

Articulatory synthesis refers to computational techniques for synthesizing speech based on models of the human vocal tract and the articulation processes occurring there. The shape of the vocal tract can be controlled in a number of ways which usually involves modifying the position of the speech articulators, such as the tongue, jaw, and lips. Speech is created by digitally simulating the flow of air through the representation of the vocal tract.

Alaryngeal speech is speech using an airstream mechanism that uses features other than the glottis to create voicing. There are three types: esophageal, buccal, and pharyngeal speech. Each of these uses an alternative method of creating phonation to substitute for the vocal cords in the larynx. These forms of alaryngeal speech are also called "pseudo-voices".

A voiceless palatal implosive is a rare consonantal sound, used in some spoken languages. The symbol in the International Phonetic Alphabet that represents this sound is  ʄ̊  or cʼ↓. A dedicated IPA letter, ƈ, was withdrawn in 1993.

A voiceless bilabial implosive is a rare consonantal sound, used in some spoken languages. The symbol in the International Phonetic Alphabet that represents this sound is ɓ̥ or pʼ↓. A dedicated IPA letter, ƥ, was withdrawn in 1993.

A voiceless velar implosive is a rare consonantal sound, used in some oral languages. The symbol in the International Phonetic Alphabet that represents this sound is ɠ̊ or kʼ↓. A dedicated IPA letter, ƙ, was withdrawn in 1993.

The dental nasal click is a click consonant found primarily among the languages of southern Africa. The symbol in the International Phonetic Alphabet that represents this sound is ǀ̃ or ᵑǀ; a symbol abandoned by the IPA but still preferred by some linguists is ʇ̃ or ᵑʇ.

A voiceless retroflex implosive is an extremely rare consonantal sound, used in very few spoken languages. There is no official symbol in the International Phonetic Alphabet that represents this sound, but ᶑ̊ or ʈʼ↓ may be used, or the old convention 𝼉.

References

  1. 1 2 3 4 5 6 Standage, Tom, The Turk: The Life and Times of the Famous Eighteenth-Century Chess-Playing Machine, New York: Walker & Company, 2002: pp 76–81
  2. 1 2 3 4 5 Dudley, Homer & Tarnoczy, T.H., The Speaking Machine of Wolfgang Von Kempelen. The Journal of the Acoustical Society of America, Vol 22, No 2, March 1950: pp 151–166.
  3. 1 2 3 4 5 6 7 8 Linggard, R., Electronic Synthesis of Speech, Cambridge: Cambridge University Press, 1985: pp 4–9
  4. Rossing, Thomas, et al., The Science of Sound, San Francisco: Addison-Wesley, 2002: p 365

Further reading