Steve Young (software engineer)

Steve Young
CBE FRS FREng
Steve Young CBE FRS FREng
Born	Stephen John Young; 1951 (age 72–73); Liverpool, United Kingdom
Alma mater	University of Cambridge
Known for	HTK toolkit ; Spoken dialogue system ;
	Scientific career
Fields	Conversational AI ; Automatic Speech Recognition ; Spoken Dialogue System ;
Institutions	University of Cambridge ;
Thesis	Speech synthesis from concept with applications to speech output from systems (1978)
Doctoral advisor	Frank Fallside
Website	mi.eng.cam.ac.uk/~sjy

Last updated November 20, 2024

Stephen John Young (born 1951) is a British researcher,^[1] Professor of Information Engineering at the University of Cambridge and an entrepreneur. He is one of the pioneers of automated speech recognition ^[2] and statistical spoken dialogue systems.^[3]^[4] He served as the Senior Pro-Vice-Chancellor of the University of Cambridge from 2009 to 2015, responsible for planning and resources. From 2015 to 2019, he held a joint appointment between his professorship at Cambridge and Apple, where he was a senior member of the Siri development team.^[5]

Early life and education

Young was born in Liverpool on 23 January 1951. He studied at the University of Cambridge, completing a BA in Electrical Sciences in 1973 and a PhD in speech recognition in 1978, under the supervision of Professor Frank Fallside at the Engineering Department. He held lectureships at both Manchester and Cambridge before being elected to the Chair of Information Engineering at Cambridge University in 1994.^[6]

Research and academic career

He is best known as the leading author of the HTK toolkit,^[2] a software package for using hidden Markov models to model time series, mainly used for speech recognition. Its first version was originally developed by Young at the Machine Intelligence Laboratory of the Cambridge University Engineering Department (CUED) in 1989. Due to the growing popularity of the toolkit worldwide, Microsoft decided to make the core HTK toolkit available again and licensed the software back to CUED after its acquisition of Entropic, the startup Steve co-founded in 1993 to distribute and maintain the HTK toolkit. The HTK book,^[7] which is the tutorial of the HTK toolkit, has received more than 7,000 citations.^[8]

In the late nineties, Young's research interests shifted to the design of statistical spoken dialogue systems. His most notable contribution to the field is the partially observable Markov decision process (POMDP) based dialogue management framework,^[3]^[9]^[10] which includes the Hidden Information State (HIS) dialogue model,^[11] the first practical dialogue management model based on the POMDP framework. His research focuses on developing spoken dialogue systems that are robust against noise introduced by noisy speech recognisers, as well as adapt and scale on-line in interaction with real users. One notable instance of this approach is the application of Gaussian process based reinforcement learning for rapid policy optimisation.^[12]^[13] In recent years, Young's research group has successfully applied deep learning techniques to various submodules of statistical dialogue systems,^[14]^[15]^[16]^[17] winning multiple best paper awards at prestigious speech and NLP conferences.

Entrepreneurship

Apart from his academic and scientific contributions, Young is also a successful entrepreneur and he took a leading role in three company acquisitions:

Entropic, a speech recognition software company that developed applications for voice-enabling the web via mobile operators. The company was acquired by Microsoft in 1999.^[18]
Phonetic Arts, a speech synthesis company that delivered technology for generating natural expressive speech. The technology developed by the company allowed computer games to say various sentences with different kinds of voices. Phonetic arts was acquired by Google in 2010.^[18]
VocalIQ, a dialogue technology company that built the world's first dialogue system application programming interface. The company's technology provided a platform for voice interfaces, allowing businesses to voice-enable mobile devices and proprietary apps. VocalIQ was acquired by Apple in 2015.^[18]

Awards and honours

Young is a Fellow of the Royal Academy of Engineering,^[19] the Institution of Engineering and Technology (IET), the Institute of Electrical and Electronics Engineers (IEEE), the RSA and the International Speech Communication Association (ISCA).^[5]

He received the IEEE Signal Processing Society Technical Achievement Award in 2004, and the ISCA Medal for Scientific Achievement in 2010. He also received the European Signal Processing Society Individual Technical Achievement Award in 2013, the IEEE James L Flanagan Speech and Audio Processing Award in 2015, and the IEEE Carl Friedrich Gauss Education Award in 2021.^[5]

In 2020 he was elected a Fellow of the Royal Society (FRS) ^[20]

Young was appointed Commander of the Order of the British Empire (CBE) in the 2022 Birthday Honours for services to software engineering.^[21]

Related Research Articles

Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related to information retrieval, knowledge representation and computational linguistics, a subfield of linguistics. Typically data is collected in text corpora, using either rule-based, statistical or neural-based approaches in machine learning and deep learning.

Speech processing is the study of speech signals and the processing methods of signals. The signals are usually processed in a digital representation, so speech processing can be regarded as a special case of digital signal processing, applied to speech signals. Aspects of speech processing includes the acquisition, manipulation, storage, transfer and output of speech signals. Different speech processing tasks include speech recognition, speech synthesis, speaker diarization, speech enhancement, speaker recognition, etc.

Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech-to-text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.

The Viterbi algorithm is a dynamic programming algorithm for obtaining the maximum a posteriori probability estimate of the most likely sequence of hidden states—called the Viterbi path—that results in a sequence of observed events. This is done especially in the context of Markov information sources and hidden Markov models (HMM).

Lawrence R. Rabiner is an electrical engineer working in the fields of digital signal processing and speech processing; in particular in digital signal processing for automatic speech recognition. He has worked on systems for AT&T Corporation for speech recognition.

A dynamic Bayesian network (DBN) is a Bayesian network (BN) which relates variables to each other over adjacent time steps.

Multimodal interaction provides the user with multiple modes of interacting with a system. A multimodal interface provides several distinct tools for input and output of data.

A partially observable Markov decision process (POMDP) is a generalization of a Markov decision process (MDP). A POMDP models an agent decision process in which it is assumed that the system dynamics are determined by an MDP, but the agent cannot directly observe the underlying state. Instead, it must maintain a sensor model and the underlying MDP. Unlike the policy function in MDP which maps the underlying states to the actions, POMDP's policy is a mapping from the history of observations to the actions.

Frederick Jelinek was a Czech-American researcher in information theory, automatic speech recognition, and natural language processing. He is well known for his oft-quoted statement, "Every time I fire a linguist, the performance of the speech recognizer goes up".

CMU Sphinx, also called Sphinx for short, is the general term to describe a group of speech recognition systems developed at Carnegie Mellon University. These include a series of speech recognizers and an acoustic model trainer (SphinxTrain).

Julius is a speech recognition engine, specifically a high-performance, two-pass large vocabulary continuous speech recognition (LVCSR) decoder software for speech-related researchers and developers. It can perform almost real-time computing (RTC) decoding on most current personal computers (PCs) in 60k word dictation task using word trigram (3-gram) and context-dependent Hidden Markov model (HMM). Major search methods are fully incorporated.

HTK is a proprietary software toolkit for handling HMMs. It is mainly intended for speech recognition, but has been used in many other pattern recognition applications that employ HMMs, including speech synthesis, character recognition and DNA sequencing.

In probability theory, a Markov model is a stochastic model used to model pseudo-randomly changing systems. It is assumed that future states depend only on the current state, not on the events that occurred before it. Generally, this assumption enables reasoning and computation with the model that would otherwise be intractable. For this reason, in the fields of predictive modelling and probabilistic forecasting, it is desirable for a given model to exhibit the Markov property.

Nelson Harold Morgan is an American computer scientist and professor in residence (emeritus) of electrical engineering and computer science at the University of California, Berkeley. Morgan is the co-inventor of the Relative Spectral (RASTA) approach to speech signal processing, first described in a technical report published in 1991.

Julia Hirschberg is an American computer scientist noted for her research on computational linguistics and natural language processing.

Emotion recognition is the process of identifying human emotion. People vary widely in their accuracy at recognizing the emotions of others. Use of technology to help people with emotion recognition is a relatively nascent research area. Generally, the technology works best if it uses multiple modalities in context. To date, the most work has been conducted on automating the recognition of facial expressions from video, spoken expressions from audio, written expressions from text, and physiology as measured by wearables.

Alberto Ciaramella is an Italian computer engineer and scientist. He is notable for extensive pioneering contributions in the field of speech technologies and applied natural language processing, most of them at CSELT and Loquendo, with the amount of 40 papers and four patents.

Ronjon Nag is a British-American inventor and entrepreneur specializing in the field of mobile technology. He co-founded the technology company Lexicus, acquired by Motorola in 1993 and Cellmania, acquired by Research in Motion in 2010. He later served as Vice-President of both Motorola and BlackBerry.

The IARPA Babel program developed speech recognition technology for noisy telephone conversations. The main goal of the program was to improve the performance of keyword search on languages with very little transcribed data, i.e. low-resource languages. Data from 26 languages was collected with certain languages being held-out as "surprise" languages to test the ability of the teams to rapidly build a system for a new language.

Fatmah Baothman is Saudi Arabian computer scientist who is the first woman in the Middle East with a Ph.D. in artificial intelligence. She was recently appointed the board president for the Artificial Intelligence Society. Baothman has worked over 25 years as, and is currently, an assistant professor at King Abdulaziz University Faculty of Computing & Information Technology Baothman established the women's Department which is the foundation of the Computer Science College at King Abdulaziz University, and became the first teaching assistant faculty member.

References

↑ "Steve Young – Google Scholar Citations". Google Scholar. Retrieved 2 May 2017.
1 2 "HTK Speech Recognition Toolkit". University of Cambridge.
1 2 Williams, Jason; Young, Steve (2007). "Partially observable Markov decision processes for spoken dialogue systems" (PDF). Computer Speech and Language. 21 (2): 393–422. doi:10.1016/j.csl.2006.06.008. S2CID 13903063.
↑ Young, Steve; et al. "The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management" (PDF). Computer Speech and Language.
1 2 3 "Professor Steve Young, Professor of Information Engineering". University of Cambridge.
↑ "Stephen Young, Emmanuel Fellow".
↑ Young, Steve. "The HTK book" (PDF). Cambridge University Engineering Department.
↑ "Google Scholar" . Retrieved 23 December 2020.
↑ Blaise Thompson and Steve Young (2010). "Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems" (PDF). Computer Speech and Language.
↑ Young, Steve (2013). "POMDP-based Statistical Spoken Dialogue Systems: a Review" (PDF). Proc IEEE.
↑ Steve Young; et al. (2010). "The Hidden Information State Model: a practical framework for POMDP-based spoken dialogue management" (PDF). Computer Speech and Language.
↑ Milica Gasic and Steve Young (2014). "Gaussian processes for POMDP-based dialogue manager optimization" (Document). IEEE Trans. Audio, Speech and Language Processing.
↑ Pei-Hao Su; et al. (2016). "On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems" (PDF). Proc ACL. arXiv: 1605.07669 .
↑ Lina Rojas-Barahona; et al. (2016). "Exploiting Sentence and Context Representations in Deep Neural Models for Spoken Language Understanding". Proc Coling. pp. 258–267.
↑ Nikola Mrkšić; et al. (2017). "The Neural Belief Tracker: Data-Driven Dialogue State Tracking" (PDF). Proc ACL.
↑ Tsung-Hsien Wen; et al. (2015). "Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems" (PDF). Proc EMNLP. arXiv: 1508.01745 .
↑ Tsung-Hsien Wen el al (2017). "A Network-based End-to-End Trainable Task-oriented Dialogue System". arXiv: 1604.04562 [cs.CL].
1 2 3 "Steve Young: Executive Profile & Biography". Bloomberg L.P.
↑ "Stephen Young". Royal Academy of Engineering. Retrieved 23 December 2020.
↑ "Stephen Young". Royal Society. Retrieved 20 September 2020.
↑ "No. 63714". The London Gazette (Supplement). 1 June 2022. p. B11.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "Steve Young – Google Scholar Citations". Google Scholar. Retrieved 2 May 2017.

[:0-2] 1 2 "HTK Speech Recognition Toolkit". University of Cambridge.

[jason-3] 1 2 Williams, Jason; Young, Steve (2007). "Partially observable Markov decision processes for spoken dialogue systems" (PDF). Computer Speech and Language. 21 (2): 393–422. doi:10.1016/j.csl.2006.06.008. S2CID 13903063.

[4] Young, Steve; et al. "The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management" (PDF). Computer Speech and Language.

[cam.ac.uk-5] 1 2 3 "Professor Steve Young, Professor of Information Engineering". University of Cambridge.

[6] "Stephen Young, Emmanuel Fellow".

[7] Young, Steve. "The HTK book" (PDF). Cambridge University Engineering Department.

[8] "Google Scholar" . Retrieved 23 December 2020.

[9] Blaise Thompson and Steve Young (2010). "Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems" (PDF). Computer Speech and Language.

[10] Young, Steve (2013). "POMDP-based Statistical Spoken Dialogue Systems: a Review" (PDF). Proc IEEE.

[11] Steve Young; et al. (2010). "The Hidden Information State Model: a practical framework for POMDP-based spoken dialogue management" (PDF). Computer Speech and Language.

[12] Milica Gasic and Steve Young (2014). "Gaussian processes for POMDP-based dialogue manager optimization" (Document). IEEE Trans. Audio, Speech and Language Processing.

[13] Pei-Hao Su; et al. (2016). "On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems" (PDF). Proc ACL. arXiv: 1605.07669 .

[14] Lina Rojas-Barahona; et al. (2016). "Exploiting Sentence and Context Representations in Deep Neural Models for Spoken Language Understanding". Proc Coling. pp. 258–267.

[15] Nikola Mrkšić; et al. (2017). "The Neural Belief Tracker: Data-Driven Dialogue State Tracking" (PDF). Proc ACL.

[16] Tsung-Hsien Wen; et al. (2015). "Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems" (PDF). Proc EMNLP. arXiv: 1508.01745 .

[17] Tsung-Hsien Wen el al (2017). "A Network-based End-to-End Trainable Task-oriented Dialogue System". arXiv: 1604.04562 [cs.CL].

[bloomberg.com-18] 1 2 3 "Steve Young: Executive Profile & Biography". Bloomberg L.P.

[19] "Stephen Young". Royal Academy of Engineering. Retrieved 23 December 2020.

[20] "Stephen Young". Royal Society. Retrieved 20 September 2020.

[21] "No. 63714". The London Gazette (Supplement). 1 June 2022. p. B11.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

v t e Fellows of the Royal Society elected in 2020
Fellows	Timothy Behrens Yoshua Bengio Malcolm J. Bennett Ben Berks Zulfiqar Bhutta Kevin Brindle Gordon Brown William C. Campbell Henry Chapman G. Marius Clore Vikram Deshpande John Endler Adam Eyre-Walker Daniel Frost François Guillemot David Harel Marian Holness Ehud Hrushovski Andrew P. Jackson George Jackson Xin Lu Alexander Makarov Keith Matthews Iain McCulloch Linda Nazar Peter Nellist Giles Oldroyd Hugh Osborn Oliver L. Phillips Raymond Pierrehumbert John Plane Cathy Price Carol Prives Didier Queloz Nicholas Read Michael Rudnicki William Schafer Nigel Scrutton John Shine Stephen Smartt Ralf Speth Molly Stevens Donna Strickland Andrew M. Stuart Sarah Teichmann Richard Thompson Jack Thorne Nicholas Turner Jane Visvader Alan M. Wilson Steve Young
Honorary	David Cooksey
Foreign	Frances Arnold Francis Collins Kerry Emanuel Ben Feringa Else Marie Friis Regine Kahmann Margaret Kivelson Ramamoorthy Ramesh Wendelin Werner Ada Yonath

Authority control databases
International	ISNI VIAF WorldCat
National	United States Netherlands Israel
Academics	CiNii
Other	IdRef

Steve Young CBE FRS FREng

Born	Stephen John Young 1951 (age 72–73) Liverpool, United Kingdom
Alma mater	University of Cambridge
Known for	HTK toolkit Spoken dialogue system
Scientific career
Fields	Conversational AI Automatic Speech Recognition Spoken Dialogue System
Institutions	University of Cambridge
Thesis	Speech synthesis from concept with applications to speech output from systems (1978)
Doctoral advisor	Frank Fallside

Website	mi.eng.cam.ac.uk/~sjy