Pattern playback

Last updated

The pattern playback [1] [2] is an early talking device that was built by Dr. Franklin S. Cooper and his colleagues, including John M. Borst and Caryl Haskins, at Haskins Laboratories in the late 1940s and completed in 1950. There were several different versions of this hardware device. Only one currently survives. The machine converts pictures of the acoustic patterns of speech in the form of a spectrogram back into sound. Using this device, Alvin Liberman, Frank Cooper, and Pierre Delattre (later joined by Katherine Safford Harris, Leigh Lisker, and others) were able to discover acoustic cues for the perception of phonetic segments (consonants and vowels). This research was fundamental to the development of modern techniques of speech synthesis, reading machines for the blind, the study of speech perception and speech recognition, and the development of the motor theory of speech perception.

Contents

To create sound, the pattern playback machine uses an arc light source which is directed against a rotating disk with 50 concentric tracks whose transparencies vary systematically in order to produce 50 harmonics of a fundamental frequency. The light is further projected against a spectrogram, whose reflectance corresponds to the sound pressure level of the partial of the signal, and is then directed towards a photovoltaic cell by which the light variation is converted into sound pressure variations.

The pattern playback was last used in an experimental study by Robert Remez in 1976. The pattern playback now resides in the Museum at Haskins Laboratories in New Haven, Connecticut.

The technique of pattern playback also now refers, more generally, to algorithms or techniques for converting spectrograms, cochleagrams, and correlograms from pictures back into sounds.

A demonstration is in the TV show Adventure. Pioneering technology in psycholinguistics (CBS Television. 1953). [3]

Digital pattern playback

In the 1970s, digital pattern playbacks began to supplant the earlier version. An early prototype was developed by Patrick Nye, Philip Rubin, and colleagues at Haskins Laboratories. It combined a "Ubiquitous Spectrum Analyzer" for automatic spectral analysis, along with a VAX GT-40 display processor for graphic manipulation of the displayed spectrogram, a form of "synthesis by art", and subsequent re-synthesis using a 40 channel filter bank. This hybrid hardware/software digital pattern playback was eventually replaced at Haskins Laboratories by the HADES analysis and display system, designed by Philip Rubin, and implemented in Fortran on the VAX family of computers. A more modern version has been described by Arai and colleagues . An on-line demonstration is available .

See also

Related Research Articles

Additive synthesis is a sound synthesis technique that creates timbre by adding sine waves together.

Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech. The reverse process is speech recognition.

<span class="mw-page-title-main">Spectrogram</span> Visual representation of the spectrum of frequencies of a signal as it varies with time

A spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time. When applied to an audio signal, spectrograms are sometimes called sonographs, voiceprints, or voicegrams. When the data are represented in a 3D plot they may be called waterfall displays.

A reading machine is a piece of assistive technology that allows blind people to access printed materials. It scans text, converts the image into text by means of optical character recognition and uses a speech synthesizer to read out what it has found.

<span class="mw-page-title-main">Kenneth N. Stevens</span> American computer scientist

Kenneth Noble Stevens was the Clarence J. LeBel Professor of Electrical Engineering and Computer Science, and professor of health sciences and technology at the research laboratory of electronics at MIT. Stevens was head of the speech communication group in MIT's research laboratory of electronics (RLE), and was one of the world's leading scientists in acoustic phonetics.

Speech perception is the process by which the sounds of language are heard, interpreted, and understood. The study of speech perception is closely linked to the fields of phonology and phonetics in linguistics and cognitive psychology and perception in psychology. Research in speech perception seeks to understand how human listeners recognize speech sounds and use this information to understand spoken language. Speech perception research has applications in building computer systems that can recognize speech, in improving speech recognition for hearing- and language-impaired listeners, and in foreign-language teaching.

<span class="mw-page-title-main">Haskins Laboratories</span>

Haskins Laboratories, Inc. is an independent 501(c) non-profit corporation, founded in 1935 and located in New Haven, Connecticut, since 1970. Haskins has formal affiliation agreements with both Yale University and the University of Connecticut; it remains fully independent, administratively and financially, of both Yale and UConn. Haskins is a multidisciplinary and international community of researchers that conducts basic research on spoken and written language. A guiding perspective of their research is to view speech and language as emerging from biological processes, including those of adaptation, response to stimuli, and conspecific interaction. Haskins Laboratories has a long history of technological and theoretical innovation, from creating systems of rules for speech synthesis and development of an early working prototype of a reading machine for the blind to developing the landmark concept of phonemic awareness as the critical preparation for learning to read an alphabetic writing system.

<span class="mw-page-title-main">Philip Rubin</span> American linguist

Philip E. Rubin is an American cognitive scientist, technologist, and science administrator known for raising the visibility of behavioral and cognitive science, neuroscience, and ethical issues related to science, technology, and medicine, at a national level. His research career is noted for his theoretical contributions and pioneering technological developments, starting in the 1970s, related to speech synthesis and speech production, including articulatory synthesis and sinewave synthesis, and their use in studying complex temporal events, particularly understanding the biological bases of speech and language.

Alvin Meyer Liberman was born in St. Joseph, Missouri. Liberman was an American psychologist. His ideas set the agenda for fifty years of psychological research in speech perception.

Carol Ann Fowler is an American experimental psychologist. She was president and director of research at Haskins Laboratories in New Haven, Connecticut from 1992 to 2008. She is also a professor of psychology at the University of Connecticut and adjunct professor of linguistics and psychology at Yale University. She received her undergraduate degree from Brown University in 1971, her M.A University of Connecticut in 1973 and her Ph.D. in psychology from the University of Connecticut in 1977.

Sinewave synthesis, or sine wave speech, is a technique for synthesizing speech by replacing the formants with pure tone whistles. The first sinewave synthesis program (SWS) for the automatic creation of stimuli for perceptual experiments was developed by Philip Rubin at Haskins Laboratories in the 1970s. This program was subsequently used by Robert Remez, Philip Rubin, David Pisoni, and other colleagues to show that listeners can perceive continuous speech without traditional speech cues, i.e., pitch, stress, and intonation. This work paved the way for a view of speech as a dynamic pattern of trajectories through articulatory-acoustic space.

Robert Remez is an American experimental psychologist and cognitive scientist, and is Professor of Psychology at Barnard College, Columbia University and Chair of the Columbia University Seminar on Language & Cognition. His teaching focuses on the relationships between cognition, perception and language. He is best known for his theoretical and experimental work on perceptual organization and speech perception.

Articulatory phonology is a linguistic theory originally proposed in 1986 by Catherine Browman of Haskins Laboratories and Louis Goldstein of University of Southern California and Haskins. The theory identifies theoretical discrepancies between phonetics and phonology and aims to unify the two by treating them as low- and high-dimensional descriptions of a single system.

Franklin Seaney Cooper was an American physicist and inventor who was a pioneer in speech research.

Ignatius G. Mattingly (1927–2004) was a prominent American linguist and speech scientist. Prior to his academic career, he was an analyst for the National Security Agency from 1955 to 1966. He was a Lecturer and then Professor of Linguistics at the University of Connecticut from 1966 to 1996 and a researcher at Haskins Laboratories from 1966 until his death in 2004. He is best known for his pioneering work on speech synthesis and reading and for his theoretical work on the motor theory of speech perception in conjunction with Alvin Liberman. He received his B.A. in English from Yale University in 1947, his M.A. in Linguistics from Harvard University in 1959, and his Ph.D. in English from Yale University in 1968.

Katherine Safford Harris is a noted psychologist and speech scientist. She is Distinguished Professor Emerita in Speech and Hearing at the CUNY Graduate Center and a member of the Board of Directors Archived 2006-03-03 at the Wayback Machine of Haskins Laboratories. She is also the former President of the Acoustical Society of America and Vice President of Haskins Laboratories.

Catherine Phebe Browman was an American linguist and speech scientist. She received her Ph.D. in linguistics from the University of California, Los Angeles (UCLA) in 1978. Browman was a research scientist at Bell Laboratories in New Jersey (1967–1972). While at Bell Laboratories, she was known for her work on speech synthesis using demisyllables. She later worked as researcher at Haskins Laboratories in New Haven, Connecticut (1982–1998). She was best known for developing, with Louis Goldstein, of the theory of articulatory phonology, a gesture-based approach to phonological and phonetic structure. The theoretical approach is incorporated in a computational model that generates speech from a gesturally-specified lexicon. Browman was made an honorary member of the Association for Laboratory Phonology.

Michael Studdert-Kennedy was an American psychologist and speech scientist 1927–2017.https://haskinslabs. We org/news/michael-studdert-kennedy. He is well known for his contributions to studies of speech perception, the motor theory of speech perception, and the evolution of language, among other areas. He is a professor emeritus of psychology at the University of Connecticut and a professor emeritus of linguistics at Yale University. He is the former president (1986–1992) of Haskins Laboratories in New Haven, Connecticut. He was also a member of the Haskins Laboratories Board of Directors and was chairman of the board from 1988 until 2001. He was the son of the priest and Christian socialist Geoffrey Studdert-Kennedy.

Donald P. ShankweilerArchived 2006-06-26 at the Wayback Machine is an eminent psychologist and cognitive scientist who has done pioneering work on the representation and processing of language in the brain. He is a Professor Emeritus of Psychology at the University of Connecticut, a Senior Scientist at Haskins Laboratories in New Haven, Connecticut, and a member of the Board of Directors Archived 2021-01-26 at the Wayback Machine at Haskins. He is married to well-known American philosopher of biology, psychology, and language Ruth Millikan.

The motor theory of speech perception is the hypothesis that people perceive spoken words by identifying the vocal tract gestures with which they are pronounced rather than by identifying the sound patterns that speech generates. It originally claimed that speech perception is done through a specialized module that is innate and human-specific. Though the idea of a module has been qualified in more recent versions of the theory, the idea remains that the role of the speech motor system is not only to produce speech articulations but also to detect them.

References

  1. "Haskins Laboratories". Haskins.yale.edu. Retrieved 2016-10-21.[ permanent dead link ]
  2. "History of speech synthesis, 1770 - 1970". Ling.su.se. 1997-07-08. Archived from the original on 2015-03-06. Retrieved 2016-10-21.
  3. 【1950 | Pattern Playback Machine】 Dr. Franklin S. Cooper - An Early Talking Device in 1950 , retrieved 2023-02-26

Bibliography