Xuedong Huang

Last updated
Xuedong Huang
Xuedong Huang 2020 Wired Magazine Profile Picture.jpg
in 2017
Born (1962-10-20) October 20, 1962 (age 61)
Citizenship United States
Alma mater University of Edinburgh, Tsinghua University, Hunan University
Awards National Academy of Engineering Member, American Academy of Arts and Sciences Member, IEEE Bose Industrial Leader Award, Asian American Corporate Leadership Award, ACM Fellow, IEEE Fellow
Scientific career
Fields Speech Recognition, Machine translation, Natural Language Processing, AI, Computer Vision, Software Development
Institutions Zoom Video Communications, Microsoft, Carnegie Mellon University

Xuedong David Huang (born October 20, 1962) is a Chinese American computer scientist and technology executive who has made contributions to spoken language processing and artificial intelligence, including Azure AI Services. He is Zoom's chief technology officer after serving as Microsoft's Technical Fellow and Azure AI Chief Technology Officer for 30 years. Huang is a strong advocate of AI for Accessibility, [1] and AI for Cultural Heritage. [2]

Contents

Education

Huang received his PhD from the University of Edinburgh in 1989 (sponsored by the British ORS and Edinburgh University Scholarship), his MS from Tsinghua University in 1984, and BS from Hunan University in 1982.

Career

After receiving his PhD in 1989, Huang joined Carnegie Mellon University and worked with Raj Reddy and Kai-Fu Lee on speech recognition. At CMU, he directed the Sphinx-II speech system research which achieved the best performance in every category of DARPA's 1992 benchmarking. Microsoft Research recruited him to found and lead Microsoft's spoken language initiatives in 1993. His co-authored book Spoken Language Processing [3] and his Historical speech recognition review [4] succinctly summarize several generations of spoken language research. As Microsoft's Mr. Speech for three decades, Huang has been instrumental in creating Microsoft's Speech Application Programming Interface (SAPI), shipping Microsoft Speech Server, and modernizing spoken language and integrative AI services [5] [6] via Azure AI, [7] which not only enables millions of 3rd party customers but also powers up Microsoft's Windows, Office, Teams, and Azure OpenAI Services.

Huang helped Microsoft and Azure Cognitive Services achieve multiple industry's first human parity milestones on the following open research tasks: transcribing conversational speech, [8] machine translation, [9] conversational QnA, [10] and computer vision image captioning. [11]

Huang has made significant contributions to the software and AI industry through his executive leadership and his scientific publications, owning more than 170 US patents and impacting billions through Azure AI enabled products and services. In 2016, Wired magazine named him one of 25 Geniuses. [12] In 2021, Azure AI was named the winner of InfoWorld's Technology of the Year Award. [13]

Huang was awarded the Allen Newell research excellence medal in 1992, and IEEE Speech Processing Best Paper in 1993. He was recognized as an IEEE Fellow by Institute of Electrical and Electronics Engineers in 2000, named ACM Fellow by Association for Computing Machinery in 2017, [14] and a member of Washington State Academy of Sciences. Huang received 2022 Asian American Corporate Leadership Award, and IEEE Amar Bose Industrial Leader Award. In 2023, he was elected a member of the US National Academy of Engineering (NAE), [15] and a member of the American Academy of Arts and Sciences. [16]

Related Research Articles

Speech processing is the study of speech signals and the processing methods of signals. The signals are usually processed in a digital representation, so speech processing can be regarded as a special case of digital signal processing, applied to speech signals. Aspects of speech processing includes the acquisition, manipulation, storage, transfer and output of speech signals. Different speech processing tasks include speech recognition, speech synthesis, speaker diarization, speech enhancement, speaker recognition, etc.

Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.

<span class="mw-page-title-main">Raj Reddy</span> Indian-American computer scientist (born 1937)

Dabbala Rajagopal "Raj" Reddy is an Indian-American computer scientist and a winner of the Turing Award. He is one of the early pioneers of artificial intelligence and has served on the faculty of Stanford and Carnegie Mellon for over 50 years. He was the founding director of the Robotics Institute at Carnegie Mellon University. He was instrumental in helping to create Rajiv Gandhi University of Knowledge Technologies in India, to cater to the educational needs of the low-income, gifted, rural youth. He was the founding chairman of International Institute of Information Technology, Hyderabad. He is the first person of Asian origin to receive the Turing Award, in 1994, known as the Nobel Prize of Computer Science, for his work in the field of artificial intelligence.

<span class="mw-page-title-main">Thomas Huang</span> Chinese-American engineer and computer scientist (1936–2020)

Thomas Shi-Tao Huang was a Chinese-born American computer scientist, electrical engineer, and writer. He was a researcher and professor emeritus at the University of Illinois at Urbana-Champaign (UIUC). Huang was one of the leading figures in computer vision, pattern recognition and human computer interaction.

<span class="mw-page-title-main">Virtual assistant</span> Software agent

A virtual assistant (VA) is a software agent that can perform a range of tasks or services for a user based on user input such as commands or questions, including verbal ones. Such technologies often incorporate chatbot capabilities to simulate human conversation, such as via online chat, to facilitate interaction with their users. The interaction may be via text, graphical interface, or voice - as some virtual assistants are able to interpret human speech and respond via synthesized voices.

Heung-Yeung "Harry" Shum is a Chinese computer scientist. He was a doctoral student of Raj Reddy. He was the Executive Vice President of Artificial Intelligence & Research at Microsoft. He is known for his research on computer vision and computer graphics, and for the development of the search engine Bing.

<span class="mw-page-title-main">Yann LeCun</span> French computer scientist (born 1960)

Yann André LeCun is a Turing Award winning French-American computer scientist working primarily in the fields of machine learning, computer vision, mobile robotics and computational neuroscience. He is the Silver Professor of the Courant Institute of Mathematical Sciences at New York University and Vice-President, Chief AI Scientist at Meta.

<span class="mw-page-title-main">Roberto Pieraccini</span> Italian-American computer scientist

Roberto Pieraccini is an Italian and US electrical engineer working in the field of speech recognition, natural language understanding, and spoken dialog systems. He has been an active contributor to speech language research and technology since 1981. He is currently the Chief Scientist of Uniphore, a conversational automation technology company.

<span class="mw-page-title-main">Alex Waibel</span> American computer scientist

Alexander Waibel is a professor of Computer Science at Carnegie Mellon University and Karlsruhe Institute of Technology. Waibel's research interests focus on speech recognition and translation and human communication signals and systems. Alex Waibel made pioneering contributions to speech translation systems, breaking down language barriers through cross-lingual speech communication. In fundamental research on machine learning, he is known for the Time Delay Neural Network (TDNN), the first Convolutional Neural Network (CNN) trained by gradient descent, using backpropagation. Alex Waibel introduced the TDNN in 1987 at ATR in Japan.

Julia Hirschberg is an American computer scientist noted for her research on computational linguistics and natural language processing.

<span class="mw-page-title-main">Albert Greenberg</span> American computer scientist

Albert Greenberg is an American software engineer and computer scientist who is notable for his contributions to the design of operating carrier and datacenter networks as well as to advances in computer networking and cloud computing. At Microsoft, he is a Corporate Vice President and the director of development for its Microsoft Azure service, which is a cloud computing infrastructure platform that coordinates data centers around the world. In contrast to hard-wired computer networks, firms such as Microsoft are turning increasingly to software-defined networking approaches to run its cloud computing networks by managing virtual networks across "millions of servers". He oversees development of technologies that keep the network running in the cloud, so that when component failures happen, software systems pinpoint the failures and "route around the faulty components;" the technology permits data centers to be "software-defined", allowing the cloud to grow rapidly while being flexible to meet changing needs, as he explained in 2015 in eWeek magazine. His research focuses on the infrastructure of cloud services, management of enterprise networks, data center networks, and systems monitoring.

<span class="mw-page-title-main">Shrikanth Narayanan</span> Researcher

Shrikanth Narayanan is an Indian-American Professor at the University of Southern California. He is an interdisciplinary engineer–scientist with a focus on human-centered signal processing and machine intelligence with speech and spoken language processing at its core. A prolific award-winning researcher, educator, and inventor, with hundreds of publications and a number of acclaimed patents to his credit, he has pioneered several research areas including in computational speech science, speech and human language technologies, audio, music and multimedia engineering, human sensing and imaging technologies, emotions research and affective computing, behavioral signal processing, and computational media intelligence. His technical contributions cover a range of applications including in defense, security, health, education, media, and the arts. His contributions continue to impact numerous domains including in human health, national defense/intelligence, and the media arts including in using technologies that facilitate awareness and support of diversity and inclusion. His award-winning patents have contributed to the proliferation of speech technologies on the cloud and on mobile devices and in enabling novel emotion-aware artificial intelligence technologies.

<span class="mw-page-title-main">Larry Heck</span>

Larry Paul Heck is currently the Rhesa Screven Farmer, Jr., Advanced Computing Concepts Chair, Georgia Research Alliance Eminent Scholar, and professor at the Georgia Institute of Technology. His career spans many of the sub-disciplines of artificial intelligence, including conversational AI, speech recognition and speaker recognition, natural language processing, web search, online advertising and acoustics. He is probably best known for his role as the founder of the Microsoft Cortana Personal Assistant and his early work in deep learning for speech processing.

<span class="mw-page-title-main">Steve Young (software engineer)</span> British researcher (born 1951)

Stephen John Young is a British researcher, Professor of Information Engineering at the University of Cambridge and an entrepreneur. He is one of the pioneers of automated speech recognition and statistical spoken dialogue systems. He served as the Senior Pro-Vice-Chancellor of the University of Cambridge from 2009 to 2015, responsible for planning and resources. From 2015 to 2019, he held a joint appointment between his professorship at Cambridge and Apple, where he was a senior member of the Siri development team.

<span class="mw-page-title-main">Yong Rui</span> CTO of Lenovo

Yong Rui is the chief technology officer and senior vice president of Lenovo Group. He is in charge of Lenovo's technical strategy, research and development directions, and Lenovo Research, one of Lenovo's most important innovation engines.

Mari Ostendorf is a professor of electrical engineering in the area of speech and language technology and the vice provost for research at the University of Washington.

Speechmatics is a technology company based in Cambridge, England, which develops automatic speech recognition software (ASR) based on recurrent neural networks and statistical language modelling. Speechmatics was originally named Cantab Research Ltd when founded in 2006 by speech recognition specialist Dr. Tony Robinson.

Hsiao-Wuen Hon is a Taiwanese-US researcher in speech technology, and coauthor of the book Spoken Language Processing. He is Corporate Vice President of Microsoft and Chairman of Microsoft's Asia-Pacific R&D Group.

Ramalingam "Rama" Chellappa is a Bloomberg Distinguished Professor, who works at Johns Hopkins University. At Johns Hopkins University, he is a member of the Center for Language and Speech Processing, the Center for Imaging Science, the Institute for Assured Autonomy, and the Mathematical Institute for Data Sciences. He joined Johns Hopkins University after 29 years at The University of Maryland. Before that, he was an assistant, associate professor, and later, director, of the University of Southern California's Signal and Image Processing institute.

Mei-Yuh Hwang is a speech recognition researcher who works for Mobvoi in Redmond, Washington, and holds a position as affiliate professor of electrical and computer engineering at the University of Washington.

References

  1. "Azure AI for Accessibility". www.linkedin.com. Retrieved 2021-02-09.
  2. "Xuedong Huang on LinkedIn: Microsoft Introduces Inuktitut to Microsoft Translator - Microsoft". www.linkedin.com. Retrieved 2021-02-09.
  3. Spoken Language Processing, Prentice Hall 2001 Xuedong Huang, Alex Acero, and Hsiao-Wuen Hon
  4. A Historical Perspective of Speech Recognition Xuedong Huang, James Baker, Raj Reddy. Communications of the ACM, January 2014, Vol. 57 No. 1, Pages 94-103.
  5. Stanford's Speech Transcription Bias Study in 2020
  6. XYZ-Code: A holistic representation toward integrative AI, Microsoft AI blog
  7. Azure AI Cognitive Services
  8. Historic Achievement: Microsoft researchers reach human parity in conversational speech recognition October 18, 2016 | Allison Linn
  9. Microsoft reaches a historic milestone, using AI to match human performance in translating news from Chinese to English March 14, 2018 | Allison Linn
  10. Machine Reading Systems Are Becoming More Conversational May 2019
  11. What’s that? Microsoft’s latest breakthrough, now in Azure AI, describes images as well as people do Oct 14, 2020 | John Roach
  12. 25 Geniuses Who Are Creating the Future of Business 04.26.2016
  13. Yegulalp, James R. Borck, Martin Heller, Steven Nuñez, Andrew C. Oliver, Ian Pointer, Isaac Sacolick and Serdar (2021-02-03). "InfoWorld's 2021 Technology of the Year Award winners". InfoWorld. Retrieved 2021-02-08.{{cite web}}: CS1 maint: multiple names: authors list (link)
  14. People of ACM - Xuedong Huang July 25, 2017
  15. National Academy of Engineering Elects 106 Members and 18 International Members Feb 7, 2023
  16. New Members Elected in 2023: American Academy of Arts and Sciences April 19, 2023 Huang joined Zoom at June, 2023 as CTO.