Nelson Morgan

Last updated
Nelson Harold Morgan
BornMay 1949 (age 75)
Education University of Chicago
University of California, Berkeley
Scientific career
Institutions National Semiconductor
University of California, Berkeley
Doctoral advisor Robert W. Brodersen
Doctoral students Oriol Vinyals
Website www2.eecs.berkeley.edu/Faculty/Homepages/morgan.html

Nelson Harold Morgan (born May, 1949) is an American computer scientist and professor in residence (emeritus) of electrical engineering and computer science at the University of California, Berkeley. [2] Morgan is the co-inventor of the Relative Spectral (RASTA) approach to speech signal processing, first described in a technical report published in 1991. [3] [4]

Contents

Education and career

Morgan was born in Buffalo, New York. [1] He studied at University of Chicago, later he received his PhD as an NSF fellow from University of California, Berkeley in 1980 under the supervision of Robert W. Brodersen. [5] Morgan worked at National Semiconductor before taking up the post as a professor in residence at University of California, Berkeley. At Berkeley, he founded ICSI's Realization Group, which later become known as the Speech Group, in 1988. He served as director of ICSI from 1999 through 2011. [6]

Research and contributions

In 1993, Morgan and Herve Bourlard published their work on the hybrid system approach to speech recognition, which uses neural networks probabilistically with Hidden Markov Models (HMMs). [7] The system improved automatic speech recognition techniques based on HMMs by providing discriminative training, incorporating multiple input sources, and using a flexible architecture able to accommodate contextual inputs and feedbacks. The work has been described as "seminal.". [8] Morgan won the 1996 IEEE Signal Processing Magazine Best Paper Award for a paper with Bourlard. [9] Morgan and Bourlard were awarded the 2022 IEEE James L. Flanagan Speech and Audio Processing Award "For contributions to neural networks for statistical speech recognition." [10]

Morgan was the principal investigator of the IARPA-funded project Outing Unfortunate Characteristics of HMMs, which sought to identify problems in automatic speech recognition technology. [11] He also led a team of universities to build speech recognition systems for low resource languages as part of the IARPA Babel program. [12]

Morgan was the former director of the International Computer Science Institute (ICSI), where he was also the Speech Group leader. [13] He recently has focused on campaign reform through empowering volunteerism. In that work, he co-founded UpRise Campaigns with Antonia Scatton, and later co-founded Neighbors Forward AZ with Alison Porter.

Morgan has produced more than 200 publications, including four books, [14] [15]

Honors and awards

Morgan is a fellow of the IEEE [16] and the International Speech Communication Association. [17] Together with Hervé Bourlard, he won the 1996 IEEE Signal Processing Magazine Best Paper Award and was awarded the 2022 IEEE James L. Flanagan Speech and Audio Processing Award "For contributions to neural networks for statistical speech recognition." [10] He was on the editorial board of Speech Communication Magazine, of which he is a former co-editor-in-chief. [18]

Related Research Articles

Audio signal processing is a subfield of signal processing that is concerned with the electronic manipulation of audio signals. Audio signals are electronic representations of sound waves—longitudinal waves which travel through air, consisting of compressions and rarefactions. The energy contained in audio signals or sound power level is typically measured in decibels. As audio signals may be represented in either digital or analog format, processing may occur in either domain. Analog processors operate directly on the electrical signal, while digital processors operate mathematically on its digital representation.

Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech-to-text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.

In sound processing, the mel-frequency cepstrum (MFC) is a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency.

Lawrence R. Rabiner is an electrical engineer working in the fields of digital signal processing and speech processing; in particular in digital signal processing for automatic speech recognition. He has worked on systems for AT&T Corporation for speech recognition.

Keyword spotting is a problem that was historically first defined in the context of speech processing. In speech processing, keyword spotting deals with the identification of keywords in utterances.

<span class="mw-page-title-main">International Computer Science Institute</span>

The International Computer Science Institute (ICSI) is an independent, non-profit research organization located in Berkeley, California, United States. Since its founding in 1988, ICSI has maintained an affiliation agreement with the University of California, Berkeley, where several of its members hold faculty appointments.

<span class="mw-page-title-main">Thomas Huang</span> Chinese-American engineer and computer scientist (1936–2020)

Thomas Shi-Tao Huang was a Chinese-born American computer scientist, electrical engineer, and writer. He was a researcher and professor emeritus at the University of Illinois at Urbana-Champaign (UIUC). Huang was one of the leading figures in computer vision, pattern recognition and human computer interaction.

<span class="mw-page-title-main">Richard F. Lyon</span> American inventor

Richard "Dick" Francis Lyon is an American inventor, scientist, and engineer. He is one of the two people who independently invented the first optical mouse devices in 1980. He has worked in signal processing and was a co-founder of Foveon, Inc., a digital camera and image sensor company.

The IEEE James L. Flanagan Speech and Audio Processing Award is a Technical Field Award presented by the IEEE for an outstanding contribution to the advancement of speech and/or audio signal processing. It may be presented to an individual or a team of up to three people. The award was established by the IEEE Board of Directors in 2002. The award is named after James L. Flanagan, who was a scientist from Bell Labs where he worked on acoustics for many years.

<span class="mw-page-title-main">Roberto Pieraccini</span> Italian-American computer scientist

Roberto Pieraccini is an Italian and US electrical engineer working in the field of speech recognition, natural language understanding, and spoken dialog systems. He has been an active contributor to speech language research and technology since 1981. He is currently the Chief Scientist of Uniphore, a conversational automation technology company.

<span class="mw-page-title-main">Alex Waibel</span> American computer scientist

Alexander Waibel is a professor of Computer Science at Carnegie Mellon University and Karlsruhe Institute of Technology. Waibel's research interests focus on speech recognition and translation and human communication signals and systems. Alex Waibel made pioneering contributions to speech translation systems, breaking down language barriers through cross-lingual speech communication. In fundamental research on machine learning, he is known for the Time Delay Neural Network (TDNN), the first Convolutional Neural Network (CNN) trained by gradient descent, using backpropagation. Alex Waibel introduced the TDNN in 1987 at ATR in Japan.

<span class="mw-page-title-main">Yasuo Matsuyama</span>

Yasuo Matsuyama is a Japanese researcher in machine learning and human-aware information processing.

V John Mathews is an Indian-American engineer and educator who is currently a Professor of Electrical Engineering and Computer Science (EECS) at the Oregon State University, United States.

Bayya Yegnanarayana is an INSA Senior Scientist at International Institute of Information Technology (IIIT) Hyderabad, Telangana, India. He is an eminent professor and is known for his contributions in Digital Signal Processing, Speech Signal Processing, Artificial Neural Networks and related areas. He has guided about 39 PhD theses, 43 MS theses and 65 MTech projects. He was the General Chair for the international conference, INTERSPEECH 2018, held at Hyderabad. He also holds the positions as Distinguished Professor, IIT Hyderabad and an Adjunct Faculty, IIT Tirupati.

Biing Hwang "Fred" Juang is a communication and information scientist, best known for his work in speech coding, speech recognition and acoustic signal processing. He joined Georgia Institute of Technology in 2002 as Motorola Foundation Chair Professor in the School of Electrical & Computer Engineering.

<span class="mw-page-title-main">Steve Young (software engineer)</span> British researcher (born 1951)

Stephen John Young is a British researcher, Professor of Information Engineering at the University of Cambridge and an entrepreneur. He is one of the pioneers of automated speech recognition and statistical spoken dialogue systems. He served as the Senior Pro-Vice-Chancellor of the University of Cambridge from 2009 to 2015, responsible for planning and resources. From 2015 to 2019, he held a joint appointment between his professorship at Cambridge and Apple, where he was a senior member of the Siri development team.

The IARPA Babel program developed speech recognition technology for noisy telephone conversations. The main goal of the program was to improve the performance of keyword search on languages with very little transcribed data, i.e. low-resource languages. Data from 26 languages was collected with certain languages being held-out as "surprise" languages to test the ability of the teams to rapidly build a system for a new language.

<span class="mw-page-title-main">John Makhoul</span> American computer scientist

John Makhoul is a Lebanese-American computer scientist who works in the field of speech and language processing. Dr. Makhoul's work on linear predictive coding was used in the establishment of the Network Voice Protocol, which enabled the transmission of speech signals over the ARPANET. Makhoul is recognized in the field for his vital role in the areas of speech and language processing, including speech analysis, speech coding, speech recognition and speech understanding. He has made a number of significant contributions to the mathematical modeling of speech signals, including his work on linear prediction, and vector quantization. His patented work on the direct application of speech recognition techniques for accurate, language-independent optical character recognition (OCR) has had a dramatic impact on the ability to create OCR systems in multiple languages relatively quickly.

Chin-Hui Lee is an information scientist, best known for his work in speech recognition, speaker recognition and acoustic signal processing. He joined Georgia Institute of Technology in 2002 as a professor in the school of electrical and computer engineering

Yang Liu is a Chinese and American computer scientist specializing in speech processing and natural language processing, and a senior principal scientist for Amazon.

References

  1. 1 2 "Nelson Morgan, Class of 1967". Hamburg Alumni Foundation. Retrieved 2022-11-24.
  2. Author biography, Speech and Audio Signal Processing, Wiley Publishing, 2011
  3. RASTA-PLP Speech Analysis, Hynek Hermansky, Nelson Morgan, Aruna Bayya, and Phil Kohn, ICSI Technical Report TR-91-069, December 1991
  4. RASTA-PLP Speech Analysis Technique, Hynek Hermansky, Nelson Morgan, Aruna Bayya, and Phil Kohn, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-92), Vol.1, No., pp. 121-124 March 1992. doi: 10.1109/ICASSP.1992.225957
  5. NSF Graduate Research Fellows [ permanent dead link ]. Retrieved 28 February 2012
  6. Wahlster, Wolfgang (2012). "Interview with Dr. Roberto Pieraccini, Director of the International Computer Science Institute (ICSI) at Berkeley, USA". Ki - Künstliche Intelligenz. 26 (3): 289–291. doi:10.1007/s13218-012-0217-0.
  7. Connectionist Speech Recognition: A Hybrid Approach , Nelson Morgan and Herve Bourlard, Springer International Series in Engineering and Computer Science, Vol. 247, 1993
  8. Hybrid HMM/Neural Network Based Speech Recognition in Loquendo ASR, Roberto Gemello, Franco Mana, and Dario Albesano
  9. IEEE Signal Processing Magazine Best Paper Award Recipients, IEEE Signal Processing Society
  10. 1 2 IEEE James L. Flanagan Speech and Audio Processing Award Recipients
  11. Speech Tech Blog, Michele Masterson, June 20, 2012
  12. "ICSI Leads Team Researching Ways to Build Speech Recognition Systems for New Languages Under Severe Data and Time Constraints". November 28, 2012. Retrieved 30 March 2018.
  13. SLTC Newsletter Archived 2016-04-04 at the Wayback Machine , IEEE Signal Processing Society, May 2012
  14. Speech and Audio Signal Processing, Ben Gold and Nelson Morgan, Wiley Publishing, 1999
  15. Speech and Audio Signal Processing, Second Edition , Ben Gold, Nelson Morgan, and Dan Ellis, 2011
  16. IEEE Fellows Archived 2010-06-19 at the Wayback Machine . Retrieved 28 February 2012.
  17. ISCA 2010 Fellows. Retrieved 28 February 2012
  18. Speech Communication Editorial Board, Elsevier Publishing.