Dong Yu is a Chinese-American computer scientist, AI researcher, and engineer, known for his research in speech recognition, deep learning, and multi-modal artificial intelligence. [1] He is a Fellow of the Association for Computing Machinery (ACM), the Institute of Electrical and Electronics Engineers (IEEE), and the International Speech Communication Association (ISCA). [2] Yu currently serves as Distinguished Scientist and Vice General Manager at Tencent AI Lab and Chief Scientist and Vice General Manager at Tencent Cloud AI. [3]
Yu received a Bachelor of Science in Electrical Engineering from Zhejiang University and a master's degree in Pattern Recognition and Intelligent Control from the Chinese Academy of Sciences. [2] He subsequently earned a Master of Science in Computer Science from Indiana University at Bloomington and a Ph.D. in Computer Science from the University of Idaho. [4]
Yu began his professional career at Microsoft Research in Redmond, Washington, in 1998, where he worked in the Speech and Dialog Research Group. [5] [6] During his time at Microsoft, he led research on automatic speech recognition, multi-modal AI, and deep learning framework, contributing to products such as Microsoft Cortana, Skype Translator, and Ford Sync. [7]
In 2017, Yu joined Tencent America, where he holds dual leadership roles at Tencent AI Lab and Tencent Cloud AI. [8] His work focuses on developing large language models, multi-modal AI systems, and research toward artificial general intelligence (AGI). [9] He has led teams that created systems for conversational AI, speech and music synthesis, multi-modal interaction, and intelligent web agents. [7]
He served as Chair of the IEEE Speech and Language Processing Technical Committee and was Technical Program Co-chair for ICASSP 2021. [10] In addition, he has contributed as an Associate Editor and Senior Area Editor for the IEEE/ACM Transactions on Audio, Speech, and Language Processing and has acted as Guest Editor for multiple IEEE journals and conference special issues focused on deep learning, speech processing, and conversational AI. [11]
He was also an adjunct professor at Zhejiang University, holds approximately 60 patents, and was among the founders and core contributors to CNTK, an open-source deep learning framework. [1]
Yu is known for the application of deep learning to large-vocabulary speech recognition, including the development of context-dependent deep neural networks (CD-DNN-HMM), which significantly improved recognition accuracy and influenced both academia and industry. [12]
He has also advanced recurrent and convolutional neural networks, [13] end-to-end deep learning architectures, multi-modal AI, and speech synthesis, contributing to technologies that underpin modern virtual assistants and human-computer interaction systems. [11] Yu developed the Computational Network Toolkit (CNTK), an open-source deep learning framework, and introduced scalable training methods for multi-GPU systems. [14]
More recently, his research has focused on multi-modal large language models and AGI, resulting in models such as SongGeneration, AlphaLLM, LiteSearch, WebVoyager, Cognitive Kernel, R-Zero, and WebEvolver, as well as audio front-end and speech synthesis systems used in Tencent products. [15]