Mike Phillips (speech recognition)

Last updated May 13, 2023

Michael Phillips (born August 1, 1961) is the CEO and co-founder of Sense Labs and a pioneer in machine learning, including mobile speech recognition and text-to-speech technology.

Education

Phillips was a student in electrical engineering at Carnegie Mellon University.^[1] He was also a researcher for Carnegie Mellon and then a research scientist at the Spoken Language Systems group at the Massachusetts Institute of Technology (MIT),^[2] where he helped to develop VOYAGER, an “urban navigation and exploration system” that could recognize and interpret basic spoken queries.^[3] VOYAGER was one of the first research systems to combine speech recognition and natural language processing to have a conversation with a user.^[4]

Career

In 1994, Phillips co-founded and became CTO of Boston-based SpeechWorks,^[5] which became one of the leading US-based vendors of speech recognition technology at the time, alongside Nuance Communications and IBM.^[6] The startup developed interactive voice response systems, including call-center interfaces for clients including Amtrak ^[7] and FedEx.^[8] SpeechWorks’ technology worked for call-center interfaces because the customer could verbally answer questions posed by the human-sounding speech recognition program, rather than navigating through a menu. The technology also had time-saving “barge-in” capabilities, meaning that a customer could interrupt the system before it finished offering the full list of options. The system could also “learn.” It kept a record of names or phrases customers had used in the past so that it could learn to understand names and phrases that slightly differed from its original vocabulary.^[9]

SpeechWorks’ value more than tripled after its initial public offering,^[8] and it was acquired by ScanSoft in 2003.^[2] While Phillips was CTO at ScanSoft, he worked on technologies across the company's products, including the leading dictation software Dragon NaturallySpeaking.^[10] ScanSoft then acquired Nuance Communications in 2005, and adopted the latter's name.^[5]

Phillips returned to MIT as a visiting scientist and co-founded Vlingo in 2006, with former SpeechWorks colleague John Nguyen.^[5] An intelligent software assistant, Vlingo is a speech-to-text application integrated with user-facing apps for iPhone, Android, BlackBerry, and other smartphones.^[11] Vlingo software allowed users to text and navigate smartphones via voice recognition.^[11] The first cell phone speech recognition software that successfully interpreted user input and learned over time,^[12] the software would later be adapted into the popular personal assistant software Siri.^[13]

In 2008, Nuance Communications^[14] attempted to sue Vlingo on the grounds of patent infringement. Phillips was offered the choice to either sell Vlingo to Nuance or be sued. After six lengthy lawsuits, Phillips won, but the $3 million in legal fees drained his company's research and development funds.^[15] Vlingo was sold to Nuance in December 2011.^[16]

In 2013, Phillips co-founded a startup, Sense Labs.^[17] Headquartered in Cambridge, Massachusetts, the Sense home energy monitor is an in-development device. Once attached to a home's electric panel, it “listens” to a home's electricity usage and identifies the wattage various appliances draw.^[18] The first wave of Sense energy monitors began shipping in early December 2015.

Phillips has served on various boards and holds more than 20 patents.^[19]

Awards

2004: Top Leader in Speech from Speech Technology Magazine ^[20]
2005: Winner of the Speech Technology Magazine Lifetime Achievement Award ^[21]

Selected works

Zue, Victor; Glass, James; Phillips, Michael; Seneff, Stephanie (1989). "The MIT SUMMIT Speech Recognition system: A progress report". Proceedings of the workshop on Speech and Natural Language - HLT '89. pp. 179–189. doi:10.3115/100964.100983. S2CID 3538583.
MacLennan, D.; Phillips, M. (1992). "Malignant hyperthermia". Science. 256 (5058): 789–794. doi:10.1126/science.1589759. PMID 1589759. S2CID 9693521.
Phillips, Michael S.; Liu, Qingyun; Hammond, Holly A.; Dugan, Valarie; Hey, Patricia J.; Caskey, C. Thomas; Hess, J. Fred (1996). "Leptin receptor missense mutation in the fatty Zucker rat". Nature Genetics. 13 (1): 18–19. doi:10.1038/ng0596-18. PMID 8673096. S2CID 11315067.
Phillips, M.S.; Lawrence, R.; Sachidanandam, R.; Morris, A.P.; Balding, D.J.; Donaldson, M.A.; Studebaker, J.F.; Ankener, W.M.; Alfisi, S.V.; Kuo, F.-S.; Camisa, A.L.; Pazorov, V.; Scott, K.E.; Carey, B.J.; Faith, J.; Katari, G.; Bhatti, H.A.; Cyr, J.M.; Derohannessian, V.; Elosua, C.; Forman, A.M.; Grecco, N.M.; Hock, C.R.; Kuebler, J.M.; Lathrop, J.A.; Mockler, M.A.; Nachtman, E.P.; Restine, S.L.; Varde, S.A.; et al. (2003). "Chromosome-wide distribution of haplotype blocks and the role of recombination hot spots". Nature Genetics. 33 (3): 382–387. doi:10.1038/ng1100. PMID 12590262. S2CID 9844702.
Marchini, Jonathan; Cardon, Lon R.; Phillips, Michael S.; Donnelly, Peter (2004). "The effects of human population structure on large genetic association studies". Nature Genetics. 36 (5): 512–517. doi: 10.1038/ng1337 . PMID 15052271.
Hamdan, Fadi F.; Daoud, Hussein; Rochefort, Daniel; Piton, Amélie; Gauthier, Julie; Langlois, Mathieu; Foomani, Gila; Dobrzeniecka, Sylvia; Krebs, Marie-Odile; Joober, Ridha; Lafrenière, Ronald G.; Lacaille, Jean-Claude; Mottron, Laurent; Drapeau, Pierre; Beauchamp, Miriam H.; Phillips, Michael S.; Fombonne, Eric; Rouleau, Guy A.; Michaud, Jacques L. (2010). "De Novo Mutations in FOXP1 in Cases with Intellectual Disability, Autism, and Language Impairment". The American Journal of Human Genetics. 87 (5): 671–678. doi:10.1016/j.ajhg.2010.09.017. PMC 2978954 . PMID 20950788.
Ross, Colin J D.; Katzov-Eckert, Hagit; Dubé, Marie-Pierre; Brooks, Beth; Rassekh, S Rod; Barhdadi, Amina; Feroz-Zada, Yassamin; Visscher, Henk; Brown, Andrew M K.; Rieder, Michael J.; Rogers, Paul C.; Phillips, Michael S.; Carleton, Bruce C.; Hayden, Michael R. (2013). "Erratum: Genetic variants in TPMT and COMT are associated with hearing loss in children receiving cisplatin chemotherapy". Nature Genetics. 45 (5): 578. doi: 10.1038/ng.0513-578 .

Related Research Articles

Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the main benefit of searchability. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.

Optical character recognition (OCR) is a process that converts printed texts into digital image files. It is a digital copier that uses automation to convert scanned documents into editable, shareable PDFs that are machine-readable. OCR may be seen in action when you use your computer to scan a receipt. The scan is then saved as a picture on your computer. The words in the image cannot be searched, edited, or counted, but you may use OCR to convert the image to a text document with the content stored as text. OCR software can extract data from scanned documents, camera photos, and image-only PDFs. It makes static material editable and does away with the necessity for human data entry.

Computer Science and Artificial Intelligence Laboratory (CSAIL) is a research institute at the Massachusetts Institute of Technology (MIT) formed by the 2003 merger of the Laboratory for Computer Science (LCS) and the Artificial Intelligence Laboratory. Housed within the Ray and Maria Stata Center, CSAIL is the largest on-campus laboratory as measured by research scope and membership. It is part of the Schwarzman College of Computing but is also overseen by the MIT Vice President of Research.

SRI International (SRI) is an American nonprofit scientific research institute and organization headquartered in Menlo Park, California. The trustees of Stanford University established SRI in 1946 as a center of innovation to support economic development in the region.

SpeechWorks was a company founded in Boston in 1994 by speech recognition pioneer Mike Phillips and Bill O'Farrell. The Boston-based company developed and supported speech-related computer software. Originally known as Applied Language Technologies, SpeechWorks went public in 2000 and tripled its value. ScanSoft acquired Nuance in 2003, and changed its name to Nuance Communications.

Nuance Communications, Inc. is an American multinational computer software technology corporation, headquartered in Burlington, Massachusetts, that markets speech recognition and artificial intelligence software.

Speaker recognition is the identification of a person from characteristics of voices. It is used to answer the question "Who is speaking?" The term voice recognition can refer to speaker recognition or speech recognition. Speaker verification contrasts with identification, and speaker recognition differs from speaker diarisation.

Conexant Systems, Inc. was an American-based software developer and fabless semiconductor company that developed technology for voice and audio processing, imaging and modems. The company began as a division of Rockwell International, before being spun off as a public company. Conexant itself then spun off several business units, creating independent public companies which included Skyworks Solutions and Mindspeed Technologies.

Dragon NaturallySpeaking is a speech recognition software package developed by Dragon Systems of Newton, Massachusetts, which was acquired in turn by Lernout & Hauspie Speech Products, Nuance Communications, and Microsoft. It runs on Windows personal computers. Version 15, which supports 32-bit and 64-bit editions of Windows 7, 8 and 10, was released in August 2016.

A voice-user interface (VUI) makes spoken human interaction with computers possible, using speech recognition to understand spoken commands and answer questions, and typically text to speech to play a reply. A voice command device is a device controlled with a voice user interface.

Voice portals are the voice equivalent of web portals, giving access to information through spoken commands and voice responses. Ideally a voice portal could be an access point for any type of information, services, or transactions found on the Internet. Common uses include movie time listings and stock trading. In telecommunications circles, voice portals may be referred to as interactive voice response (IVR) systems, but this term also includes DTMF services. With the emergence of conversational assistants such as Apple's Siri, Amazon Alexa, Google Assistant, Microsoft Cortana, and Samsung's Bixby, Voice Portals can now be accessed through mobile devices and Far Field voice smart speakers such as the Amazon Echo and Google Home.

A virtual assistant (VA) is a software agent that can perform a range of tasks or services for a user based on user input such as commands or questions. Such technologies often incorporate chatbot capabilities to simulate human conversation, such as via online chat, to facilitate interaction with their users. The interaction may be via text, graphical interface, or voice - as some virtual assistants are able to interpret human speech and respond via synthesized voices.

Mobile translation is any electronic device or software application that provides audio translation. The concept includes any handheld electronic device that is specifically designed for audio translation. It also includes any machine translation service or software application for hand-held devices, including mobile telephones, Pocket PCs, and PDAs. Mobile translation provides hand-held device users with the advantage of instantaneous and non-mediated translation from one human language to another, usually against a service fee that is, nevertheless, significantly smaller than a human translator charges.

<span class="mw-page-title-main">Siri</span> Software based personal assistant from Apple Inc.

Siri is a virtual assistant that is part of Apple Inc.'s iOS, iPadOS, watchOS, macOS, tvOS, and audioOS operating systems. It uses voice queries, gesture based control, focus-tracking and a natural-language user interface to answer questions, make recommendations, and perform actions by delegating requests to a set of Internet services. With continued use, it adapts to users' individual language usages, searches, and preferences, returning individualized results.

Dragon Dictation started as speech recognition application for Apple's iOS platforms, including iPhone, iPod Touch and iPad. The app provided automatic speech-to-text capabilities. It was developed by Nuance Communications, and released in December 2009 as a free app. It is now commonly found licensed in vehicle infotainment systems and healthcare equipment.

Vlingo was a speech recognition software company co-founded by speech-to-text pioneers Mike Phillips and John Nguyen in 2006. It was best known for its intelligent personal assistant and knowledge navigator, also named Vlingo, which functioned as a personal assistant application for Symbian, Android, iPhone, BlackBerry, and other smartphones. Vlingo was acquired by speech recognition giant Nuance Communications in 2012.

Stephanie Seneff is a senior research scientist at the Computer Science and Artificial Intelligence Laboratory (CSAIL) of the Massachusetts Institute of Technology (MIT). Working primarily in the Spoken Language Systems group, her research at CSAIL relates to human-computer interaction, and algorithms for language understanding and speech recognition. In 2011, she began publishing controversial papers in low-impact, open access journals on biology and medical topics; the articles have received "heated objections from experts in almost every field she's delved into," according to the food columnist Ari LeVaux.

Emotion recognition is the process of identifying human emotion. People vary widely in their accuracy at recognizing the emotions of others. Use of technology to help people with emotion recognition is a relatively nascent research area. Generally, the technology works best if it uses multiple modalities in context. To date, the most work has been conducted on automating the recognition of facial expressions from video, spoken expressions from audio, written expressions from text, and physiology as measured by wearables.

Babak Hodjat was the co-founder and CEO of Sentient Technologies and now holds the position of Vice President of Evolutionary AI at Cognizant. He is a specialist in the field of artificial intelligence and machine learning.

References

↑ "CMU Robust Speech Recognition Home Page". www.cs.cmu.edu. Retrieved 2016-01-21.
1 2 "Speech Industry Expert Mike Phillips Joins Tell-Eureka Advisory Board; MIT Scientist and a Founder of Speechworks (Now Part of Nuance) to Help Tell-Eureka Bring Next Generation Speech Applications to a Broader Market | Business Wire". www.businesswire.com (Press release). Retrieved 2016-01-21.
↑ Zue, Victor. "From Speech Recognition to Spoken Language Understanding: The Development of the MIT SUMMIT and VOYAGER Systems" (PDF).
↑ Zue, Victor (1989). "THE VOYAGER SPEECH UNDERSTANDING SYSTEM: A PROGRESS REPORT" (PDF).
1 2 3 Fitzgerald, Michael (2008-01-27). "The Coming Wave of Gadgets That Listen and Obey". The New York Times. ISSN 0362-4331 . Retrieved 2016-01-21.
↑ Fluss, Donna (June 2002). "Ripe for the picking. (Speech Recognition)".
↑ "Talk to the Phone | MIT Technology Review". MIT Technology Review. Retrieved 2016-01-21.
1 2 Kirsner, Scott (2012-05-25). "Former SpeechWorks chief executive out raising money for Xtone, startup that wants to speech-enable mobile apps". Boston.com. Retrieved 2016-01-21.
↑ "Thrifty speaks to its customers: car rental agency deploys speech recognition to improve customer experience while reducing costs". Customer Interface. October 2002.
↑ Akass, Clive (July 1, 2005). "Voice on a sound footing. Speech input has become viable on PCs and will soon be available on mobiles. But it has a long way to go before you can throw away your keyboard, writes Clive Akass".
1 2 Banks, Courtney. "A Safer Way to Text on the Road". Wall Street Journal. ISSN 0099-9660 . Retrieved 2016-01-21.
↑ "Vlingo's Adaptive Speech Recognition Promises an End to Typing on your Phone Keyboard | Xconomy". Xconomy. Retrieved 2016-01-21.
↑ Farrell, Michael. "Does Siri soar on Dragon's wings?" (PDF).
↑ "Nuance Plays Hardball in Voice Recognition". BloombergView. Retrieved 2016-01-21.
↑ Duhigg, Charles; Lohr, Steve (8 October 2012). "The Patent, Used as a Sword - NYTimes.com". The New York Times. Retrieved 2016-01-21.
↑ UTC, Samantha Murphy Kelly2011-12-20 21:39:55 (20 December 2011). "Nuance Acquires Voice-Recognition Competitor Vlingo". Mashable. Retrieved 2016-01-21.
↑ Duhigg, Charles; Lohr, Steve (2012-10-07). "In Technology Wars, Using the Patent as a Sword". The New York Times. ISSN 0362-4331 . Retrieved 2016-01-21.
↑ "Cambridge's Sense Labs starts production of new device to track what's happening at home". www.betaboston.com. Retrieved 2016-01-21.
↑ Cohan, Peter. "5 Reasons to Scrap Our Patent System: #1. Apple's Siri". Forbes. Retrieved 2016-01-21.
↑ "2004 Speech Solutions Winners". www.speechtechmag.com. 11 September 2004. Retrieved 2016-01-21.
↑ "2005 Speech Solutions Winners". www.speechtechmag.com. 30 August 2005. Retrieved 2016-01-21.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "CMU Robust Speech Recognition Home Page". www.cs.cmu.edu. Retrieved 2016-01-21.

[businesswire.com-2] 1 2 "Speech Industry Expert Mike Phillips Joins Tell-Eureka Advisory Board; MIT Scientist and a Founder of Speechworks (Now Part of Nuance) to Help Tell-Eureka Bring Next Generation Speech Applications to a Broader Market | Business Wire". www.businesswire.com (Press release). Retrieved 2016-01-21.

[3] Zue, Victor. "From Speech Recognition to Spoken Language Understanding: The Development of the MIT SUMMIT and VOYAGER Systems" (PDF).

[4] Zue, Victor (1989). "THE VOYAGER SPEECH UNDERSTANDING SYSTEM: A PROGRESS REPORT" (PDF).

[:1-5] 1 2 3 Fitzgerald, Michael (2008-01-27). "The Coming Wave of Gadgets That Listen and Obey". The New York Times. ISSN 0362-4331 . Retrieved 2016-01-21.

[6] Fluss, Donna (June 2002). "Ripe for the picking. (Speech Recognition)".

[7] "Talk to the Phone | MIT Technology Review". MIT Technology Review. Retrieved 2016-01-21.

[:0-8] 1 2 Kirsner, Scott (2012-05-25). "Former SpeechWorks chief executive out raising money for Xtone, startup that wants to speech-enable mobile apps". Boston.com. Retrieved 2016-01-21.

[9] "Thrifty speaks to its customers: car rental agency deploys speech recognition to improve customer experience while reducing costs". Customer Interface. October 2002.

[10] Akass, Clive (July 1, 2005). "Voice on a sound footing. Speech input has become viable on PCs and will soon be available on mobiles. But it has a long way to go before you can throw away your keyboard, writes Clive Akass".

[:2-11] 1 2 Banks, Courtney. "A Safer Way to Text on the Road". Wall Street Journal. ISSN 0099-9660 . Retrieved 2016-01-21.

[12] "Vlingo's Adaptive Speech Recognition Promises an End to Typing on your Phone Keyboard | Xconomy". Xconomy. Retrieved 2016-01-21.

[13] Farrell, Michael. "Does Siri soar on Dragon's wings?" (PDF).

[14] "Nuance Plays Hardball in Voice Recognition". BloombergView. Retrieved 2016-01-21.

[15] Duhigg, Charles; Lohr, Steve (8 October 2012). "The Patent, Used as a Sword - NYTimes.com". The New York Times. Retrieved 2016-01-21.

[16] UTC, Samantha Murphy Kelly2011-12-20 21:39:55 (20 December 2011). "Nuance Acquires Voice-Recognition Competitor Vlingo". Mashable. Retrieved 2016-01-21.

[17] Duhigg, Charles; Lohr, Steve (2012-10-07). "In Technology Wars, Using the Patent as a Sword". The New York Times. ISSN 0362-4331 . Retrieved 2016-01-21.

[18] "Cambridge's Sense Labs starts production of new device to track what's happening at home". www.betaboston.com. Retrieved 2016-01-21.

[19] Cohan, Peter. "5 Reasons to Scrap Our Patent System: #1. Apple's Siri". Forbes. Retrieved 2016-01-21.

[20] "2004 Speech Solutions Winners". www.speechtechmag.com. 11 September 2004. Retrieved 2016-01-21.

[21] "2005 Speech Solutions Winners". www.speechtechmag.com. 30 August 2005. Retrieved 2016-01-21.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]