Janet M. Baker

Last updated
Janet M. Baker
Alma mater Tufts University
Carnegie Mellon University
Known for Dragon Systems
Spouse James K. Baker
Scientific career
Fields speech recognition
Institutions MIT Media Lab
Harvard Medical School
Dragon Systems
Thesis A new time-domain analysis of human speech and other complex waveforms  (1975)
Doctoral advisor Raj Reddy

Janet MacIver Baker is an American computer scientist, neuroscientist and entrepreneur. Along with her husband James K. Baker, they founded Dragon Systems and are together credited with the creation of Dragon NaturallySpeaking. [1]

In 2012, she received the IEEE James L. Flanagan Speech and Audio Processing Award with her husband. [2]

Baker is currently affiliated with the MIT Media Lab and Harvard Medical School as a visiting scientist and lecturer. [3]

Related Research Articles

Audio signal processing is a subfield of signal processing that is concerned with the electronic manipulation of audio signals. Audio signals are electronic representations of sound waves—longitudinal waves which travel through air, consisting of compressions and rarefactions. The energy contained in audio signals or sound power level is typically measured in decibels. As audio signals may be represented in either digital or analog format, processing may occur in either domain. Analog processors operate directly on the electrical signal, while digital processors operate mathematically on its digital representation.

Speech processing is the study of speech signals and the processing methods of signals. The signals are usually processed in a digital representation, so speech processing can be regarded as a special case of digital signal processing, applied to speech signals. Aspects of speech processing includes the acquisition, manipulation, storage, transfer and output of speech signals. Different speech processing tasks include speech recognition, speech synthesis, speaker diarization, speech enhancement, speaker recognition, etc.

Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech-to-text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.

<span class="mw-page-title-main">Thomas Huang</span> Chinese-American engineer and computer scientist (1936–2020)

Thomas Shi-Tao Huang was a Chinese-born American computer scientist, electrical engineer, and writer. He was a researcher and professor emeritus at the University of Illinois at Urbana-Champaign (UIUC). Huang was one of the leading figures in computer vision, pattern recognition and human computer interaction.

Frederick Jelinek was a Czech-American researcher in information theory, automatic speech recognition, and natural language processing. He is well known for his oft-quoted statement, "Every time I fire a linguist, the performance of the speech recognizer goes up".

James Loton Flanagan was an American electrical engineer. He was Rutgers University's vice president for research until 2004. He was also director of Rutgers' Center for Advanced Information Processing and the Board of Governors Professor of Electrical and Computer Engineering. He is known for co-developing adaptive differential pulse-code modulation (ADPCM) with P. Cummiskey and Nikil Jayant at Bell Labs.

Fumitada Itakura is a Japanese scientist. He did pioneering work in statistical signal processing, and its application to speech analysis, synthesis and coding, including the development of the linear predictive coding (LPC) and line spectral pairs (LSP) methods.

The IEEE James L. Flanagan Speech and Audio Processing Award is a Technical Field Award presented by the IEEE for an outstanding contribution to the advancement of speech and/or audio signal processing. It may be presented to an individual or a team of up to three people. The award was established by the IEEE Board of Directors in 2002. The award is named after James L. Flanagan, who was a scientist from Bell Labs where he worked on acoustics for many years.

James K. Baker is an American entrepreneur and computer scientist. Along with his wife Janet M. Baker, they co-founded the Dragon Systems and together credited with the creation of Dragon NaturallySpeaking.

<span class="mw-page-title-main">Alex Waibel</span> American computer scientist

Alexander Waibel is a professor of Computer Science at Carnegie Mellon University and Karlsruhe Institute of Technology (KIT). Waibel’s research focuses on automatic speech recognition, translation and human-machine interaction. His work has introduced cross-lingual communication systems, such as consecutive and simultaneous interpreting systems on a variety of platforms. In fundamental research on machine learning, he is known for the Time Delay Neural Network (TDNN), the first Convolutional Neural Network (CNN) trained by gradient descent, using backpropagation. Alex Waibel introduced the TDNN in 1987 at ATR in Japan.

Nelson Harold Morgan is an American computer scientist and professor in residence (emeritus) of electrical engineering and computer science at the University of California, Berkeley. Morgan is the co-inventor of the Relative Spectral (RASTA) approach to speech signal processing, first described in a technical report published in 1991.

Julia Hirschberg is an American computer scientist noted for her research on computational linguistics and natural language processing.

<span class="mw-page-title-main">Audio coding format</span> Digitally coded format for audio signals

An audio coding format is a content representation format for storage or transmission of digital audio. Examples of audio coding formats include MP3, AAC, Vorbis, FLAC, and Opus. A specific software or hardware implementation capable of audio compression and decompression to/from a specific audio coding format is called an audio codec; an example of an audio codec is LAME, which is one of several different codecs which implements encoding and decoding audio in the MP3 audio coding format in software.

Victor Waito Zue is a Chinese American computer scientist and professor at Massachusetts Institute of Technology.

<span class="mw-page-title-main">Shrikanth Narayanan</span> Researcher

Shrikanth Narayanan is an Indian-American Professor at the University of Southern California. He is an interdisciplinary engineer–scientist with a focus on human-centered signal processing and machine intelligence with speech and spoken language processing at its core. A prolific award-winning researcher, educator, and inventor, with hundreds of publications and a number of acclaimed patents to his credit, he has pioneered several research areas including in computational speech science, speech and human language technologies, audio, music and multimedia engineering, human sensing and imaging technologies, emotions research and affective computing, behavioral signal processing, and computational media intelligence. His technical contributions cover a range of applications including in defense, security, health, education, media, and the arts. His contributions continue to impact numerous domains including in human health, national defense/intelligence, and the media arts including in using technologies that facilitate awareness and support of diversity and inclusion. His award-winning patents have contributed to the proliferation of speech technologies on the cloud and on mobile devices and in enabling novel emotion-aware artificial intelligence technologies.

Biing Hwang "Fred" Juang is a communication and information scientist, best known for his work in speech coding, speech recognition and acoustic signal processing. He joined Georgia Institute of Technology in 2002 as Motorola Foundation Chair Professor in the School of Electrical & Computer Engineering.

<span class="mw-page-title-main">John Makhoul</span> American computer scientist

John Makhoul is a Lebanese-American computer scientist who works in the field of speech and language processing. Dr. Makhoul's work on linear predictive coding was used in the establishment of the Network Voice Protocol, which enabled the transmission of speech signals over the ARPANET. Makhoul is recognized in the field for his vital role in the areas of speech and language processing, including speech analysis, speech coding, speech recognition and speech understanding. He has made a number of significant contributions to the mathematical modeling of speech signals, including his work on linear prediction, and vector quantization. His patented work on the direct application of speech recognition techniques for accurate, language-independent optical character recognition (OCR) has had a dramatic impact on the ability to create OCR systems in multiple languages relatively quickly.

Mari Ostendorf is a professor of electrical engineering in the area of speech and language technology and the vice provost for research at the University of Washington.

Chin-Hui Lee is a Taiwanese information scientist best known for his work in speech recognition, speaker recognition and acoustic signal processing. He joined Georgia Institute of Technology in 2002 as a professor in the school of electrical and computer engineering

References

  1. "History of Speech Recognition". Dragon Medical Transcription. Archived from the original on 2015-08-13. Retrieved 17 January 2015.
  2. "IEEE James L. Flanagan Speech and Audio Processing Award Recipients". IEEE. Archived from the original on April 7, 2010. Retrieved 2 July 2017.
  3. Blanding, Michael (Fall 2012). "Speechless". Tufts Magazine. Retrieved 2 July 2017.[ permanent dead link ]