This biographical article is written like a résumé .(January 2016) |
Michael Phillips (born August 1, 1961) is the CEO and co-founder of Sense Labs and a pioneer in machine learning, including mobile speech recognition and text-to-speech technology.
Phillips was a student in electrical engineering at Carnegie Mellon University. [1] He was also a researcher for Carnegie Mellon and then a research scientist at the Spoken Language Systems group at the Massachusetts Institute of Technology (MIT), [2] where he helped to develop VOYAGER, an “urban navigation and exploration system” that could recognize and interpret basic spoken queries. [3] VOYAGER was one of the first research systems to combine speech recognition and natural language processing to have a conversation with a user. [4]
In 1994, Phillips co-founded and became CTO of Boston-based SpeechWorks, [5] which became one of the leading US-based vendors of speech recognition technology at the time, alongside Nuance Communications and IBM. [6] The startup developed interactive voice response systems, including call-center interfaces for clients including Amtrak [7] and FedEx. [8] SpeechWorks’ technology worked for call-center interfaces because the customer could verbally answer questions posed by the human-sounding speech recognition program, rather than navigating through a menu. The technology also had time-saving “barge-in” capabilities, meaning that a customer could interrupt the system before it finished offering the full list of options. The system could also “learn.” It kept a record of names or phrases customers had used in the past so that it could learn to understand names and phrases that slightly differed from its original vocabulary. [9]
SpeechWorks’ value more than tripled after its initial public offering, [8] and it was acquired by ScanSoft in 2003. [2] While Phillips was CTO at ScanSoft, he worked on technologies across the company's products, including the leading dictation software Dragon NaturallySpeaking. [10] ScanSoft then acquired Nuance Communications in 2005, and adopted the latter's name. [5]
Phillips returned to MIT as a visiting scientist and co-founded Vlingo in 2006, with former SpeechWorks colleague John Nguyen. [5] An intelligent software assistant, Vlingo is a speech-to-text application integrated with user-facing apps for iPhone, Android, BlackBerry, and other smartphones. [11] Vlingo software allowed users to text and navigate smartphones via voice recognition. [11] The first cell phone speech recognition software that successfully interpreted user input and learned over time, [12] the software would later be adapted into the popular personal assistant software Siri. [13]
In 2008, Nuance Communications [14] attempted to sue Vlingo on the grounds of patent infringement. Phillips was offered the choice to either sell Vlingo to Nuance or be sued. After six lengthy lawsuits, Phillips won, but the $3 million in legal fees drained his company's research and development funds. [15] Vlingo was sold to Nuance in December 2011. [16]
In 2013, Phillips co-founded a startup, Sense Labs. [17] Headquartered in Cambridge, Massachusetts, the Sense home energy monitor is an in-development device. Once attached to a home's electric panel, it “listens” to a home's electricity usage and identifies the wattage various appliances draw. [18] The first wave of Sense energy monitors began shipping in early December 2015.
Phillips has served on various boards and holds more than 20 patents. [19]
Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the main benefit of searchability. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.
Optical character recognition (OCR) is a process that converts printed texts into digital image files. It is a digital copier that uses automation to convert scanned documents into editable, shareable PDFs that are machine-readable. OCR may be seen in action when you use your computer to scan a receipt. The scan is then saved as a picture on your computer. The words in the image cannot be searched, edited, or counted, but you may use OCR to convert the image to a text document with the content stored as text. OCR software can extract data from scanned documents, camera photos, and image-only PDFs. It makes static material editable and does away with the necessity for human data entry.
Computer Science and Artificial Intelligence Laboratory (CSAIL) is a research institute at the Massachusetts Institute of Technology (MIT) formed by the 2003 merger of the Laboratory for Computer Science (LCS) and the Artificial Intelligence Laboratory. Housed within the Ray and Maria Stata Center, CSAIL is the largest on-campus laboratory as measured by research scope and membership. It is part of the Schwarzman College of Computing but is also overseen by the MIT Vice President of Research.
SRI International (SRI) is an American nonprofit scientific research institute and organization headquartered in Menlo Park, California. The trustees of Stanford University established SRI in 1946 as a center of innovation to support economic development in the region.
SpeechWorks was a company founded in Boston in 1994 by speech recognition pioneer Mike Phillips and Bill O'Farrell. The Boston-based company developed and supported speech-related computer software. Originally known as Applied Language Technologies, SpeechWorks went public in 2000 and tripled its value. ScanSoft acquired Nuance in 2003, and changed its name to Nuance Communications.
Nuance Communications, Inc. is an American multinational computer software technology corporation, headquartered in Burlington, Massachusetts, that markets speech recognition and artificial intelligence software.
Speaker recognition is the identification of a person from characteristics of voices. It is used to answer the question "Who is speaking?" The term voice recognition can refer to speaker recognition or speech recognition. Speaker verification contrasts with identification, and speaker recognition differs from speaker diarisation.
Conexant Systems, Inc. was an American-based software developer and fabless semiconductor company that developed technology for voice and audio processing, imaging and modems. The company began as a division of Rockwell International, before being spun off as a public company. Conexant itself then spun off several business units, creating independent public companies which included Skyworks Solutions and Mindspeed Technologies.
Dragon NaturallySpeaking is a speech recognition software package developed by Dragon Systems of Newton, Massachusetts, which was acquired in turn by Lernout & Hauspie Speech Products, Nuance Communications, and Microsoft. It runs on Windows personal computers. Version 15, which supports 32-bit and 64-bit editions of Windows 7, 8 and 10, was released in August 2016.
A voice-user interface (VUI) makes spoken human interaction with computers possible, using speech recognition to understand spoken commands and answer questions, and typically text to speech to play a reply. A voice command device is a device controlled with a voice user interface.
Voice portals are the voice equivalent of web portals, giving access to information through spoken commands and voice responses. Ideally a voice portal could be an access point for any type of information, services, or transactions found on the Internet. Common uses include movie time listings and stock trading. In telecommunications circles, voice portals may be referred to as interactive voice response (IVR) systems, but this term also includes DTMF services. With the emergence of conversational assistants such as Apple's Siri, Amazon Alexa, Google Assistant, Microsoft Cortana, and Samsung's Bixby, Voice Portals can now be accessed through mobile devices and Far Field voice smart speakers such as the Amazon Echo and Google Home.
A virtual assistant (VA) is a software agent that can perform a range of tasks or services for a user based on user input such as commands or questions. Such technologies often incorporate chatbot capabilities to simulate human conversation, such as via online chat, to facilitate interaction with their users. The interaction may be via text, graphical interface, or voice - as some virtual assistants are able to interpret human speech and respond via synthesized voices.
Mobile translation is any electronic device or software application that provides audio translation. The concept includes any handheld electronic device that is specifically designed for audio translation. It also includes any machine translation service or software application for hand-held devices, including mobile telephones, Pocket PCs, and PDAs. Mobile translation provides hand-held device users with the advantage of instantaneous and non-mediated translation from one human language to another, usually against a service fee that is, nevertheless, significantly smaller than a human translator charges.
Siri is a virtual assistant that is part of Apple Inc.'s iOS, iPadOS, watchOS, macOS, tvOS, and audioOS operating systems. It uses voice queries, gesture based control, focus-tracking and a natural-language user interface to answer questions, make recommendations, and perform actions by delegating requests to a set of Internet services. With continued use, it adapts to users' individual language usages, searches, and preferences, returning individualized results.
Dragon Dictation started as speech recognition application for Apple's iOS platforms, including iPhone, iPod Touch and iPad. The app provided automatic speech-to-text capabilities. It was developed by Nuance Communications, and released in December 2009 as a free app. It is now commonly found licensed in vehicle infotainment systems and healthcare equipment.
Vlingo was a speech recognition software company co-founded by speech-to-text pioneers Mike Phillips and John Nguyen in 2006. It was best known for its intelligent personal assistant and knowledge navigator, also named Vlingo, which functioned as a personal assistant application for Symbian, Android, iPhone, BlackBerry, and other smartphones. Vlingo was acquired by speech recognition giant Nuance Communications in 2012.
Stephanie Seneff is a senior research scientist at the Computer Science and Artificial Intelligence Laboratory (CSAIL) of the Massachusetts Institute of Technology (MIT). Working primarily in the Spoken Language Systems group, her research at CSAIL relates to human-computer interaction, and algorithms for language understanding and speech recognition. In 2011, she began publishing controversial papers in low-impact, open access journals on biology and medical topics; the articles have received "heated objections from experts in almost every field she's delved into," according to the food columnist Ari LeVaux.
Emotion recognition is the process of identifying human emotion. People vary widely in their accuracy at recognizing the emotions of others. Use of technology to help people with emotion recognition is a relatively nascent research area. Generally, the technology works best if it uses multiple modalities in context. To date, the most work has been conducted on automating the recognition of facial expressions from video, spoken expressions from audio, written expressions from text, and physiology as measured by wearables.
Babak Hodjat was the co-founder and CEO of Sentient Technologies and now holds the position of Vice President of Evolutionary AI at Cognizant. He is a specialist in the field of artificial intelligence and machine learning.