Timeline of speech and voice recognition

Last updated

This is a timeline of speech and voice recognition , a technology which enables the recognition and translation of spoken language into text.

Contents

Overview

Time periodKey developments
1877–1971Speech recognition is at an early stage of development. Specialized devices can recognize few words and accuracy is not very high. [1]
1971–1987Speech recognition rapidly improves, although the technology is still not commercially available. [1]
1987–2014Speech recognition continues to improve, becomes widely available commercially, and can be found in many products. [1]

Full timeline

YearMonth and date (if applicable)Event typeDetails
1877Invention Thomas Edison's phonograph becomes the first device to record and reproduce sound. The method is fragile, however, and is prone to damage. [2]
1879InventionThomas Edison invents the first dictation machine, a slightly improved version of his phonograph. [2]
1936InventionA team of engineers at Bell Labs, led by Homer Dudley, begins work on the Voder, the first electronic speech synthesizer. [3]
1939March 21InventionDudley is granted a patent for the Voder, US patent 2151091 A. [3]
1939DemonstrationThe Voder is demonstrated at the 1939 Golden Gate International College in Nepal. A keyboard and footpaths where students used to have the machine emit speech. [3]
1939–1940DemonstrationThe Voder is demonstrated at the 1939-1940 World's Fair in New York City. [3]
1952InventionA team at Bell Labs designs the Audrey, a machine capable of understanding spoken digits. [1]
1962Demonstration IBM demonstrates the Shoebox, a machine that can understand up to 16 spoken words in English, at the 1962 Seattle World's Fair. [4]
1971InventionIBM invents the Automatic Call Identification system, enabling engineers to talk to and receive spoken answers from a device. [5]
1971–1976Program DARPA funds five years of speech recognition research with the goal of ending up with a machine capable of understanding a minimum of 1,000 words. The program led to the creation of the Harpy by Carnegie Mellon, a machine capable of understanding 1,011 words. [1]
Early 1980sTechniqueThe hidden Markov model begins to be used in speech recognition systems, allowing machines to more accurately recognize speech by predicting the probability of unknown sounds being words. [1]
Mid 1980sInventionIBM begins work on the Tangora, a machine that would be able to recognize 20,000 spoken words by the mid-1980s. [5]
1987InventionThe invention of the World of Wonder's Julie Doll, a toy children could train to respond to their voice, brings speech recognition technology to the home. [1]
1990InventionDragon launches Dragon Dictate, the first speech recognition product for consumers. [1]
1993Invention Speakable items, the first built-in speech recognition and voice enabled control software for Apple computers.
1993Invention Sphinx-II, the first large-vocabulary continuous speech recognition system, is invented by Xuedong Huang. [6]
1996InventionIBM launches the MedSpeak, the first commercial product capable of recognizing continuous speech. [5]
2002Application Microsoft integrates speech recognition into their Office products. [7]
2006ApplicationThe National Security Agency begins using speech recognition to isolate keywords when analyzing recorded conversations. [8]
2007January 30ApplicationMicrosoft releases Windows Vista, the first version of Windows to incorporate speech recognition. [9]
2007Invention Google introduces GOOG-411, a telephone-based directory service. This will serve as a foundation for the company's future Voice Search product. [10]
2008November 14ApplicationGoogle launches the Voice Search app for the iPhone, bringing speech recognition technology to mobile devices. [11]
2011October 4InventionApple announces Siri, a digital personal assistant. In addition to being able to recognize speech, Siri is able to understand the meaning of what it is told and take appropriate action. [12]
2014April 2ApplicationMicrosoft announces Cortana, a digital personal assistant similar to Siri. [13]
2014November 6Invention Amazon announces the Echo, a voice-controlled speaker. The Echo is powered by Alexa, a digital personal assistant similar to Siri and Cortana. While Siri and Cortana are not the most important features of the devices on which they run, the Echo is dedicated to Alexa. [14]

See also

Related Research Articles

Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech-to-text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.

Nuance Communications, Inc. is an American multinational computer software technology corporation, headquartered in Burlington, Massachusetts, that markets speech recognition and artificial intelligence software.

Speaker recognition is the identification of a person from characteristics of voices. It is used to answer the question "Who is speaking?" The term voice recognition can refer to speaker recognition or speech recognition. Speaker verification contrasts with identification, and speaker recognition differs from speaker diarisation.

A voice-user interface (VUI) enables spoken human interaction with computers, using speech recognition to understand spoken commands and answer questions, and typically text to speech to play a reply. A voice command device is a device controlled with a voice user interface.

Google Voice Search or Search by Voice is a Google product that allows users to use Google Search by speaking on a mobile phone or computer, i.e. have the device search for data upon entering information on what to search into the device by speaking.

<span class="mw-page-title-main">SoundHound</span> American music and speech recognition company

SoundHound AI, Inc. is a voice AI and speech recognition company founded in 2005. It develops speech recognition, natural language understanding, sound recognition and search technologies. Its featured products include a voice AI developer platform, SoundHound Chat AI, a voice-enabled digital assistant, and music recognition mobile app SoundHound. Key vertical industries include the automotive, IoT devices, restaurant and customer service industries. The company’s headquarters are in Santa Clara, California.

<span class="mw-page-title-main">Virtual assistant</span> Software agent

A virtual assistant (VA) is a software agent that can perform a range of tasks or services for a user based on user input such as commands or questions, including verbal ones. Such technologies often incorporate chatbot capabilities to simulate human conversation, such as via online chat, to facilitate interaction with their users. The interaction may be via text, graphical interface, or voice - as some virtual assistants are able to interpret human speech and respond via synthesized voices.

<span class="mw-page-title-main">Siri</span> Software-based personal assistant from Apple Inc.

Siri is the digital assistant that is part of Apple Inc.'s iOS, iPadOS, watchOS, macOS, tvOS, audioOS, and visionOS operating systems. It uses voice queries, gesture based control, focus-tracking and a natural-language user interface to answer questions, make recommendations, and perform actions by delegating requests to a set of Internet services. With continued use, it adapts to users' individual language usages, searches, and preferences, returning individualized results.

<span class="mw-page-title-main">Roberto Pieraccini</span> Italian-American computer scientist

Roberto Pieraccini is an Italian and US electrical engineer working in the field of speech recognition, natural language understanding, and spoken dialog systems. He has been an active contributor to speech language research and technology since 1981. He is currently the Chief Scientist of Uniphore, a conversational automation technology company.

Dialogflow is a natural language understanding platform used to design and integrate a conversational user interface into mobile apps, web applications, devices, bots, interactive voice response systems and related uses.

<span class="mw-page-title-main">SILVIA</span>

Symbolically Isolated Linguistically Variable Intelligence Algorithms (SILVIA) is a core platform technology developed by Cognitive Code. SILVIA was developed and designed to recognize and interpret speech and text, and interact with applications and operating systems. The technology can operate from the cloud, as a mobile application, as part of a network, or on servers.

<span class="mw-page-title-main">Cortana (virtual assistant)</span> Discontinued personal assistant by Microsoft

Cortana was a virtual assistant developed by Microsoft that used the Bing search engine to perform tasks such as setting reminders and answering questions for users.

<span class="mw-page-title-main">Hugo Barra</span> Brazilian computer scientist

Hugo Barra is a Brazilian computer scientist, technology executive and entrepreneur. From 2008 to 2013, he worked in a number of product management roles at Google, including vice president and product spokesperson of its Android division. From 2013 to 2017, he worked at Xiaomi as vice president of global operations. From 2017 to 2021, he worked as vice president of Virtual Reality and head of the Oculus division at Meta Platforms. In May 2021, he left Meta to join health technology startup Detect as CEO.

A smart speaker is a type of loudspeaker and voice command device with an integrated virtual assistant that offers interactive actions and hands-free activation with the help of one "hot word". Some smart speakers can also act as a smart device that utilizes Wi-Fi and other protocol standards to extend usage beyond audio playback, such as to control home automation devices. This can include, but is not limited to, features such as compatibility across a number of services and platforms, peer-to-peer connection through mesh networking, virtual assistants, and others. Each can have its own designated interface and features in-house, usually launched or controlled via application or home automation software. Some smart speakers also include a screen to show the user a visual response.

<span class="mw-page-title-main">Steve Young (software engineer)</span> British researcher (born 1951)

Stephen John Young is a British researcher, Professor of Information Engineering at the University of Cambridge and an entrepreneur. He is one of the pioneers of automated speech recognition and statistical spoken dialogue systems. He served as the Senior Pro-Vice-Chancellor of the University of Cambridge from 2009 to 2015, responsible for planning and resources. From 2015 to 2019, he held a joint appointment between his professorship at Cambridge and Apple, where he was a senior member of the Siri development team.

<span class="mw-page-title-main">Joy Buolamwini</span> Computer scientist and digital activist

Joy Adowaa Buolamwini is a Canadian-American computer scientist and digital activist formerly based at the MIT Media Lab. She founded the Algorithmic Justice League (AJL), an organization that works to challenge bias in decision-making software, using art, advocacy, and research to highlight the social implications and harms of artificial intelligence (AI).

<span class="mw-page-title-main">Patricia Scanlon</span> Irish entrepreneur

Patricia Scanlon is an Irish technologist and businesswoman. She founded SoapBox Labs in 2013, a company that applies artificial intelligence to develop speech recognition applications that are specifically tuned to children's voices. Scanlon was CEO of SoapBox Labs from its founding until May 2021, when she became executive chair. In 2022, Scanlon was appointed by the Irish Government as Ireland’s first Artificial Intelligence Ambassador. In this role, she will "lead a national conversation" about the role of AI in people's lives, including its benefits and risks.

References

  1. 1 2 3 4 5 6 7 8 Pinola, Melanie. "Speech Recognition Through the Decades: How We Ended Up With Siri". PCWorld. Retrieved 21 June 2016.
  2. 1 2 Newville, Leslie J. Development of the Phonograph at Alexander Graham Bell's Volta Laboratory . Retrieved 21 June 2016.
  3. 1 2 3 4 "History of Information Database" . Retrieved 21 June 2016.
  4. "IBM Shoebox". IBM. 23 January 2003. Archived from the original on January 19, 2005. Retrieved 21 June 2016.
  5. 1 2 3 "Pioneering Speech Recognition". IBM. 7 March 2012. Archived from the original on April 3, 2012. Retrieved 21 June 2016.
  6. Lee, Kai-Fu. "An Overview of the SPHINX Speech Recognition System" (PDF). Carnegie Mellon University. Retrieved 21 June 2016.
  7. Thompson, Terry. "DO-IT". University of Washington. Retrieved 21 June 2016.
  8. Froomkin, Dan (5 May 2015). "The Computers Are Listening". The Intercept. Retrieved 21 June 2016.
  9. Shinder, Deb. "Speech recognition in Windows Vista". TechRepublic.
  10. Kincaid, Jason (13 February 2011). "The Power Of Voice: A Conversation With The Head Of Google's Speech Technology". TechCrunch. Retrieved 21 June 2016.
  11. Markoff, John (14 November 2008). "Google Is Taking Questions (Spoken, via iPhone)". New York Times. Retrieved 21 June 2016.
  12. Daw, David. "What Makes Siri Special?". PCWorld. Retrieved 21 June 2016.
  13. "Microsoft Announces Cortana, Siri-Like Personal Assistant". NBC News. 2 April 2014. Retrieved 21 June 2016.
  14. Welch, Chris (6 November 2014). "Amazon just surprised everyone with a crazy speaker that talks to you". The Verge. Retrieved 21 June 2016.