Nelson Harold Morgan | |
---|---|
Born | May 1949 (age 75) Buffalo, New York, USA [1] |
Education | University of Chicago University of California, Berkeley |
Scientific career | |
Institutions | National Semiconductor University of California, Berkeley |
Doctoral advisor | Robert W. Brodersen |
Doctoral students | Oriol Vinyals |
Website | www2 |
Nelson Harold Morgan (born May, 1949) is an American computer scientist and professor in residence (emeritus) of electrical engineering and computer science at the University of California, Berkeley. [2] Morgan is the co-inventor of the Relative Spectral (RASTA) approach to speech signal processing, first described in a technical report published in 1991. [3] [4]
Morgan was born in Buffalo, New York. [1] He studied at University of Chicago, later he received his PhD as an NSF fellow from University of California, Berkeley in 1980 under the supervision of Robert W. Brodersen. [5] Morgan worked at National Semiconductor before taking up the post as a professor in residence at University of California, Berkeley. At Berkeley, he founded ICSI's Realization Group, which later become known as the Speech Group, in 1988. He served as director of ICSI from 1999 through 2011. [6]
In 1993, Morgan and Herve Bourlard published their work on the hybrid system approach to speech recognition, which uses neural networks probabilistically with Hidden Markov Models (HMMs). [7] The system improved automatic speech recognition techniques based on HMMs by providing discriminative training, incorporating multiple input sources, and using a flexible architecture able to accommodate contextual inputs and feedbacks. The work has been described as "seminal.". [8] Morgan won the 1996 IEEE Signal Processing Magazine Best Paper Award for a paper with Bourlard. [9] Morgan and Bourlard were awarded the 2022 IEEE James L. Flanagan Speech and Audio Processing Award "For contributions to neural networks for statistical speech recognition." [10]
Morgan was the principal investigator of the IARPA-funded project Outing Unfortunate Characteristics of HMMs, which sought to identify problems in automatic speech recognition technology. [11] He also led a team of universities to build speech recognition systems for low resource languages as part of the IARPA Babel program. [12]
Morgan was the former director of the International Computer Science Institute (ICSI), where he was also the Speech Group leader. [13] He recently has focused on campaign reform through empowering volunteerism. In that work, he co-founded UpRise Campaigns with Antonia Scatton, and later co-founded Neighbors Forward AZ with Alison Porter.
Morgan has produced more than 200 publications, including four books, [14] [15]
Morgan is a fellow of the IEEE [16] and the International Speech Communication Association. [17] Together with Hervé Bourlard, he won the 1996 IEEE Signal Processing Magazine Best Paper Award and was awarded the 2022 IEEE James L. Flanagan Speech and Audio Processing Award "For contributions to neural networks for statistical speech recognition." [10] He was on the editorial board of Speech Communication Magazine, of which he is a former co-editor-in-chief. [18]
Audio signal processing is a subfield of signal processing that is concerned with the electronic manipulation of audio signals. Audio signals are electronic representations of sound waves—longitudinal waves which travel through air, consisting of compressions and rarefactions. The energy contained in audio signals or sound power level is typically measured in decibels. As audio signals may be represented in either digital or analog format, processing may occur in either domain. Analog processors operate directly on the electrical signal, while digital processors operate mathematically on its digital representation.
Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech-to-text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.
In sound processing, the mel-frequency cepstrum (MFC) is a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency.
Lawrence R. Rabiner is an electrical engineer working in the fields of digital signal processing and speech processing; in particular in digital signal processing for automatic speech recognition. He has worked on systems for AT&T Corporation for speech recognition.
Keyword spotting is a problem that was historically first defined in the context of speech processing. In speech processing, keyword spotting deals with the identification of keywords in utterances.
The International Computer Science Institute (ICSI) is an independent, non-profit research organization located in Berkeley, California, United States. Since its founding in 1988, ICSI has maintained an affiliation agreement with the University of California, Berkeley, where several of its members hold faculty appointments.
Thomas Shi-Tao Huang was a Chinese-born American computer scientist, electrical engineer, and writer. He was a researcher and professor emeritus at the University of Illinois at Urbana-Champaign (UIUC). Huang was one of the leading figures in computer vision, pattern recognition and human computer interaction.
Richard "Dick" Francis Lyon is an American inventor, scientist, and engineer. He is one of the two people who independently invented the first optical mouse devices in 1980. He has worked in signal processing and was a co-founder of Foveon, Inc., a digital camera and image sensor company.
The IEEE James L. Flanagan Speech and Audio Processing Award is a Technical Field Award presented by the IEEE for an outstanding contribution to the advancement of speech and/or audio signal processing. It may be presented to an individual or a team of up to three people. The award was established by the IEEE Board of Directors in 2002. The award is named after James L. Flanagan, who was a scientist from Bell Labs where he worked on acoustics for many years.
Roberto Pieraccini is an Italian and US electrical engineer working in the field of speech recognition, natural language understanding, and spoken dialog systems. He has been an active contributor to speech language research and technology since 1981. He is currently the Chief Scientist of Uniphore, a conversational automation technology company.
Alexander Waibel is a professor of Computer Science at Carnegie Mellon University and Karlsruhe Institute of Technology. Waibel's research interests focus on speech recognition and translation and human communication signals and systems. Alex Waibel made pioneering contributions to speech translation systems, breaking down language barriers through cross-lingual speech communication. In fundamental research on machine learning, he is known for the Time Delay Neural Network (TDNN), the first Convolutional Neural Network (CNN) trained by gradient descent, using backpropagation. Alex Waibel introduced the TDNN in 1987 at ATR in Japan.
Yasuo Matsuyama is a Japanese researcher in machine learning and human-aware information processing.
V John Mathews is an Indian-American engineer and educator who is currently a Professor of Electrical Engineering and Computer Science (EECS) at the Oregon State University, United States.
Bayya Yegnanarayana is an INSA Senior Scientist at International Institute of Information Technology (IIIT) Hyderabad, Telangana, India. He is an eminent professor and is known for his contributions in Digital Signal Processing, Speech Signal Processing, Artificial Neural Networks and related areas. He has guided about 39 PhD theses, 43 MS theses and 65 MTech projects. He was the General Chair for the international conference, INTERSPEECH 2018, held at Hyderabad. He also holds the positions as Distinguished Professor, IIT Hyderabad and an Adjunct Faculty, IIT Tirupati.
Biing Hwang "Fred" Juang is a communication and information scientist, best known for his work in speech coding, speech recognition and acoustic signal processing. He joined Georgia Institute of Technology in 2002 as Motorola Foundation Chair Professor in the School of Electrical & Computer Engineering.
Stephen John Young is a British researcher, Professor of Information Engineering at the University of Cambridge and an entrepreneur. He is one of the pioneers of automated speech recognition and statistical spoken dialogue systems. He served as the Senior Pro-Vice-Chancellor of the University of Cambridge from 2009 to 2015, responsible for planning and resources. From 2015 to 2019, he held a joint appointment between his professorship at Cambridge and Apple, where he was a senior member of the Siri development team.
The IARPA Babel program developed speech recognition technology for noisy telephone conversations. The main goal of the program was to improve the performance of keyword search on languages with very little transcribed data, i.e. low-resource languages. Data from 26 languages was collected with certain languages being held-out as "surprise" languages to test the ability of the teams to rapidly build a system for a new language.
John Makhoul is a Lebanese-American computer scientist who works in the field of speech and language processing. Dr. Makhoul's work on linear predictive coding was used in the establishment of the Network Voice Protocol, which enabled the transmission of speech signals over the ARPANET. Makhoul is recognized in the field for his vital role in the areas of speech and language processing, including speech analysis, speech coding, speech recognition and speech understanding. He has made a number of significant contributions to the mathematical modeling of speech signals, including his work on linear prediction, and vector quantization. His patented work on the direct application of speech recognition techniques for accurate, language-independent optical character recognition (OCR) has had a dramatic impact on the ability to create OCR systems in multiple languages relatively quickly.
Chin-Hui Lee is an information scientist, best known for his work in speech recognition, speaker recognition and acoustic signal processing. He joined Georgia Institute of Technology in 2002 as a professor in the school of electrical and computer engineering
Yang Liu is a Chinese and American computer scientist specializing in speech processing and natural language processing, and a senior principal scientist for Amazon.