Tara N. Sainath is an American computer scientist whose research involves deep learning applied to speech recognition. She is a principal research scientist at Google Research.
Sainath was a student of electrical and engineering and computer science at the Massachusetts Institute of Technology, where she received a bachelor's degree, a master's degree in 2005, and a Ph.D. in 2009. Her master's thesis was Acoustic Landmark Detection and Segmentation using the McAulay-Quatieri Sinusoidal Model, supervised by Timothy Hazen, [1] and her doctoral dissertation was Applications of Broad Class Knowledge for Noise Robust Speech Recognition, supervised by Victor Zue. [2] [3]
She worked for IBM Research at the Thomas J. Watson Research Center before moving to Google Research. [4]
Sainath was elected both as an IEEE Fellow and as a fellow of the International Speech Communication Association in 2022, in both cases "for contributions to deep learning for automatic speech recognition". [5] [6]
Additive synthesis is a sound synthesis technique that creates timbre by adding sine waves together.
Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech-to-text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.
Lawrence R. Rabiner is an electrical engineer working in the fields of digital signal processing and speech processing; in particular in digital signal processing for automatic speech recognition. He has worked on systems for AT&T Corporation for speech recognition.
Thomas Shi-Tao Huang was a Chinese-born American computer scientist, electrical engineer, and writer. He was a researcher and professor emeritus at the University of Illinois at Urbana-Champaign (UIUC). Huang was one of the leading figures in computer vision, pattern recognition and human computer interaction.
Richard "Dick" Francis Lyon is an American inventor, scientist, and engineer. He is one of the two people who independently invented the first optical mouse devices in 1980. He has worked in signal processing and was a co-founder of Foveon, Inc., a digital camera and image sensor company.
Computer audition (CA) or machine listening is the general field of study of algorithms and systems for audio interpretation by machines. Since the notion of what it means for a machine to "hear" is very broad and somewhat vague, computer audition attempts to bring together several disciplines that originally dealt with specific problems or had a concrete application in mind. The engineer Paris Smaragdis, interviewed in Technology Review, talks about these systems — "software that uses sound to locate people moving through rooms, monitor machinery for impending breakdowns, or activate traffic cameras to record accidents."
Object recognition – technology in the field of computer vision for finding and identifying objects in an image or video sequence. Humans recognize a multitude of objects in images with little effort, despite the fact that the image of the objects may vary somewhat in different view points, in many different sizes and scales or even when they are translated or rotated. Objects can even be recognized when they are partially obstructed from view. This task is still a challenge for computer vision systems. Many approaches to the task have been implemented over multiple decades.
Time delay neural network (TDNN) is a multilayer artificial neural network architecture whose purpose is to 1) classify patterns with shift-invariance, and 2) model context at each layer of the network.
Alexander Waibel is a professor of Computer Science at Carnegie Mellon University and Karlsruhe Institute of Technology (KIT). Waibel’s research focuses on automatic speech recognition, translation and human-machine interaction. His work has introduced cross-lingual communication systems, such as consecutive and simultaneous interpreting systems on a variety of platforms. In fundamental research on machine learning, he is known for the Time Delay Neural Network (TDNN), the first Convolutional Neural Network (CNN) trained by gradient descent, using backpropagation. Alex Waibel introduced the TDNN in 1987 at ATR in Japan.
Carol Yvonne Espy-Wilson is an electrical engineer and Professor of Electrical and Computer Engineering at the University of Maryland (UMD) at College Park. She received her Ph.D. in Electrical Engineering from the Massachusetts Institute of Technology in 1987.
Larry Paul Heck is the Rhesa Screven Farmer, Jr., Advanced Computing Concepts Chair, Georgia Research Alliance Eminent Scholar, Chief Scientist of the AI Hub, Executive Director of the Machine Learning Center, and Professor at the Georgia Institute of Technology. His career spans many of the sub-disciplines of artificial intelligence, including conversational AI, speech recognition and speaker recognition, natural language processing, web search, online advertising and acoustics. He is best known for his role as a co-founder of the Microsoft] Cortana] Personal Assistant and his early work in deep learning] for speech processing.
Biing Hwang "Fred" Juang is a communication and information scientist, best known for his work in speech coding, speech recognition and acoustic signal processing. He joined Georgia Institute of Technology in 2002 as Motorola Foundation Chair Professor in the School of Electrical & Computer Engineering.
Stephen John Young is a British researcher, Professor of Information Engineering at the University of Cambridge and an entrepreneur. He is one of the pioneers of automated speech recognition and statistical spoken dialogue systems. He served as the Senior Pro-Vice-Chancellor of the University of Cambridge from 2009 to 2015, responsible for planning and resources. From 2015 to 2019, he held a joint appointment between his professorship at Cambridge and Apple, where he was a senior member of the Siri development team.
Thomas Francis Quatieri Jr. is an American electrical engineer and Senior Technical Staff member at the MIT Lincoln Laboratory. He is recognized for his contributions in speech signal processing, in conjunction with Petros Maragos and James Kaiser, by using the discrete Fourier transform to examine energy modulation in speech waveforms. In 1999 he was elected a Fellow of the IEEE "for contributions to sinusoidal speech and audio modeling and nonlinear signal processing".
Mads Græsbøll Christensen is a Danish Professor in Audio Processing at Department of Architecture, Design & Media Technology, Aalborg University, where he is also head and founder of the Audio Analysis Lab which conducts research in audio and acoustic signal processing. Before that he worked at the Department of Electronic Systems at Aalborg University and has held visiting positions at Philips Research Labs, ENST, UCSB, and Columbia University. He has published extensively on these topics in books, scientific journals and conference proceedings, and he has given tutorials and keynote talks at major international scientific conferences.
The Robotics Collaborative Technology Alliance (R-CTA) was a research program initiated and sponsored by the US Army Research Laboratory. The purpose was to "bring together government, industrial, and academic institutions to address research and development required to enable the deployment of future military unmanned ground vehicle systems ranging in size from man-portables to ground combat vehicles." Collaborative Technology and Research Alliances was a term for partnerships between Army laboratories and centers, private industry and academia for performing research and technology development intended to benefit the US Army. The partnerships were funded by the US Army.
Lori Faith Lamel is a speech processing researcher known for her work with the TIMIT corpus of American English speech and for her work on voice activity detection, speaker recognition, and other non-linguistic inferences from speech signals. She works for the French National Centre for Scientific Research (CNRS) as a senior research scientist in the Spoken Language Processing Group of the Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur.
Abeer Alwan is an American electrical engineer and speech processing researcher. She is a professor of electrical and computer engineering in the UCLA Henry Samueli School of Engineering and Applied Science, and vice chair for undergraduate affairs in the Department of Electrical & Computer Engineering.
Yang Liu is a Chinese and American computer scientist specializing in speech processing and natural language processing, and a senior principal scientist for Amazon.
Xiaoming Liu is a Chinese-American computer scientist and an academic. He is a Professor in the Department of Computer Science and Engineering, MSU Foundation Professor as well as Anil K. and Nandita Jain Endowed Professor of Engineering at Michigan State University.