Speech recognition software is available for many computing platforms, operating systems, use models, and software licenses. Here is a listing of such, grouped in various useful ways.
The following list presents notable speech recognition software engines with a brief synopsis of characteristics.
Application name | Description | Open-source | License | Operating system | Programming language | Supported language, note | Offline or online |
---|---|---|---|---|---|---|---|
CMU Sphinx | HMM | Yes | BSD style | Cross-platform | Java | English, German, French, Mandarin, Russian | Offline |
HTK | HMM neural net | No | HTK specific | Cross-platform | C | English; version 3.5 released December 2015 | |
Julius | HMM trigrams | Yes | BSD style, non-commercial | Cross-platform | C | Japanese, English; | Offline |
Kaldi | Neural net | Yes | Apache | Cross-platform | C++ | English | |
RWTH ASR | RWTH Aachen University | No | RWTH ASR, non-commercial use only | Linux, macOS | C++ | English | |
Whisper | Encoder/decoder transformer | Yes | MIT license | Cross-platform | Python (programming language) | Multilingual | Online (through API) and Offline |
Application name | Description | Open-source | License | Price | Note |
---|---|---|---|---|---|
Dragon for Mac (discontinued 2018) | macOS; by Nuance | No | Proprietary | ||
Dragon Dictate (discontinued) | macOS; by Nuance | No | Proprietary | ||
MacSpeech Scribe (discontinued) | Transcription from recorded text; acquired by Nuance | ||||
iListen (discontinued) | PowerPC Macintosh; discontinued by MacSpeech; acquired by Nuance | ||||
Speakable items | Included with macOS | ||||
ViaVoice (discontinued) | IBM Product; acquired by Nuance | ||||
Voice Navigator | Original GUI voice control; 1989 |
The following list presents notable speech recognition software that operate in a Chrome browser as web apps. They make use of HTML5 Web-Speech-API. [1]
Application name | Description | Open-source | License | Price | Note |
---|---|---|---|---|---|
Speechmatics [2] | Cloud based and on-premise automatic speech recognition | No | Proprietary | From £0.06 per minute of audio |
Many mobile phone handsets, including feature phones and smartphones such as iPhones and BlackBerrys, have basic dial-by-voice features built in. Many third-party apps have implemented natural-language speech recognition support, including:
Application name | Description | Open-source | License | Price | Note |
---|---|---|---|---|---|
Assistant.ai | Assistant for Android, iOS and Windows Phone | No | Proprietary, freeware | Free | Discontinued |
Dragon Dictation | No | Proprietary, freeware | Free | ||
Google Now | Android voice search | No | Proprietary, freeware | Free | |
Google Voice Search | No | Proprietary, freeware | Free | ||
Microsoft Cortana | Microsoft voice search | No | Proprietary, freeware | Free | |
Siri Personal Assistant | Apple's virtual personal assistant | No | Proprietary, freeware | Free | |
Alexa – Amazon Echo | Amazon's personal assistant | No | Proprietary | ||
SILVIA | Android and iOS | No | |||
Vlingo |
The Windows Speech Recognition version 8.0 by Microsoft comes built into Windows Vista, Windows 7, Windows 8 and Windows 10. Speech Recognition is available only in English, French, Spanish, German, Japanese, Simplified Chinese, and Traditional Chinese and only in the corresponding version of Windows; meaning you cannot use the speech recognition engine in one language if you use a version of Windows in another language. Windows 7 Ultimate and Windows 8 Pro allow you to change the system language, and therefore change which speech engine is available. Windows Speech Recognition evolved into Cortana (software), a personal assistant included in Windows 10.
The first version of the Microsoft Speech API was released for Windows NT 3.51 and Windows 95 in 1994, it was then part of Windows up to Windows Vista. This initial version already contained Direct Speech Recognition and Direct Text To Speech APIs which applications could use to directly control engines, as well as simplified 'higher-level' Voice Command and Voice Talk APIs. Speech recognition functionality included as part of Microsoft Office and on Tablet PCs running Microsoft Windows XP Tablet PC Edition. It can also be downloaded as part of the Speech SDK 5.1 for Windows applications, but since that is aimed at developers building speech applications, the pure SDK form lacks any user interface (numerous applications were available), and thus is unsuitable for end users.
The following are interactive voice response (IVR) systems:
A computing platform, digital platform, or software platform is the infrastructure on which software is executed. While the individual components of a computing platform may be obfuscated under layers of abstraction, the summation of the required components comprise the computing platform.
Microsoft Agent is a technology developed by Microsoft which employs animated characters, text-to-speech engines, and speech recognition software to enhance interaction with computer users. It came pre-installed as part of Windows 2000 and later versions of Microsoft Windows up to Windows Vista. It was not included with Windows 7, and was completely discontinued in Windows 8. Microsoft Agent functionality was exposed as an ActiveX control that can be used by web pages.
A screen reader is a form of assistive technology (AT) that renders text and image content as speech or braille output. Screen readers are essential to people who are blind, and are useful to people who are visually impaired, illiterate, or have a learning disability. Screen readers are software applications that attempt to convey what people with normal eyesight see on a display to their users via non-visual means, like text-to-speech, sound icons, or a braille device. They do this by applying a wide variety of techniques that include, for example, interacting with dedicated accessibility APIs, using various operating system features, and employing hooking techniques.
IBM ViaVoice was a range of language-specific continuous speech recognition software products offered by IBM. The current version is designed primarily for use in embedded devices. The latest stable version of IBM Via Voice was 9.0 and was able to transfer text directly into Microsoft Word.
PlainTalk is the collective name for several speech synthesis (MacinTalk) and speech recognition technologies developed by Apple Inc. In 1990, Apple invested a lot of work and money in speech recognition technology, hiring many researchers in the field. The result was "PlainTalk", released with the AV models in the Macintosh Quadra series from 1993. It was made a standard system component in System 7.1.2, and has since been shipped on all PowerPC and some 68k Macintoshes.
MacSpeech, Inc. was a New Hampshire-based technology company that produced software-based speech recognition and voice dictation solutions for the Apple ecosystem. The company's products included iListen, MacSpeech Dictate, MacSpeech Dictate Medical, MacSpeech Dictate Legal, MacSpeech Dictate International, and MacSpeech Scribe. On February 12, 2010, Nuance Communications, Inc. acquired MacSpeech.
Dragon NaturallySpeaking is a speech recognition software package developed by Dragon Systems of Newton, Massachusetts, which was acquired in turn by Lernout & Hauspie Speech Products, Nuance Communications, and Microsoft. It runs on Windows personal computers. Version 15, which supports 32-bit and 64-bit editions of Windows 7, 8 and 10, was released in August 2016.
A voice-user interface (VUI) enables spoken human interaction with computers, using speech recognition to understand spoken commands and answer questions, and typically text to speech to play a reply. A voice command device is a device controlled with a voice user interface.
The Speech Application Programming Interface or SAPI is an API developed by Microsoft to allow the use of speech recognition and speech synthesis within Windows applications. To date, a number of versions of the API have been released, which have shipped either as part of a Speech SDK or as part of the Windows OS itself. Applications that use SAPI include Microsoft Office, Microsoft Agent and Microsoft Speech Server.
Microsoft Voice Command is an application which can control Windows Mobile devices by voice. The first version was announced in November 2003 and it was supported in the United Kingdom, United States, France, and Germany.
Bing for mobile is a search tool for handheld mobile devices from Microsoft as part of their Bing search engine. It is designed for mobile device displays. Bing Mobile is built into Windows Mobile and Windows Phone as proprietary software, accessed via the Search key on Windows Phone 7 and Windows Phone 8 devices. It is also available on Windows Phone 8.1, and can be downloaded for other platforms, including and Android.
As of the early 2000s, several speech recognition (SR) software packages exist for Linux. Some of them are free and open-source software and others are proprietary software. Speech recognition usually refers to software that attempts to distinguish thousands of words in a human language. Voice control may refer to software used for communicating operational commands to a computer.
Windows Speech Recognition (WSR) is speech recognition developed by Microsoft for Windows Vista that enables voice commands to control the desktop user interface, dictate text in electronic documents and email, navigate websites, perform keyboard shortcuts, and operate the mouse cursor. It supports custom macros to perform additional or supplementary tasks.
Natural-language user interface is a type of computer human interface where linguistic phenomena such as verbs, phrases and clauses act as UI controls for creating, selecting and modifying data in software applications.
A virtual assistant (VA) is a software agent that can perform a range of tasks or services for a user based on user input such as commands or questions, including verbal ones. Such technologies often incorporate chatbot capabilities to simulate human conversation, such as via online chat, to facilitate interaction with their users. The interaction may be via text, graphical interface, or voice - as some virtual assistants are able to interpret human speech and respond via synthesized voices.
The Microsoft text-to-speech voices are speech synthesizers provided for use with applications that use the Microsoft Speech API (SAPI) or the Microsoft Speech Server Platform. There are client, server, and mobile versions of Microsoft text-to-speech voices. Client voices are shipped with Windows operating systems; server voices are available for download for use with server applications such as Speech Server, Lync etc. for both Windows client and server platforms, and mobile voices are often shipped with more recent versions.
Tazti is a speech recognition software package developed and sold by Voice Tech Group, Inc. for Windows personal computers. The most recent package is version 3.2, which supports Windows 10, Windows 8.1, Windows 8 and Windows 7 64-bit editions. Earlier versions of Tazti supported Windows Vista and Windows XP. PC video game play by voice, controlling PC applications and programs by voice and creating speech commands to trigger a browser to open web pages, or trigger the Windows operating system to open files, folders or programs are Tazti's primary features. Earlier versions of Tazti included a lite Dictation feature that is eliminated from the latest version.
Cortana was a virtual assistant developed by Microsoft that used the Bing search engine to perform tasks such as setting reminders and answering questions for users.
One Voice Technologies was an Artificial Intelligence (AI) based Natural Language Processing (NLP) company founded in 1998 and based in San Diego, CA. One Voice was the developer of IVAN, an intelligent personal assistant, which commercially launched in 1999. Some of the customers for One Voice Technologies are Telefonos de Mexico, S.A.B. de C.V. (TELMEX), Intel Corporation, the Government of India, Fry's Electronics, Inland Cellular, and Nex-Tec Wireless.
Braina is a virtual assistant and speech-to-text dictation application for Microsoft Windows developed by Brainasoft. Braina uses natural language interface, speech synthesis, and speech recognition technology to interact with its users and allows them to use natural language sentences to perform various tasks on a computer. The name Braina is a short form of "Brain Artificial".