Speech recognition software is available for many computing platforms, operating systems, use models, and software licenses. Here is a listing of such, grouped in various useful ways.
The following list presents notable speech recognition software engines with a brief synopsis of characteristics.
Application name | Description | Open-source | License | Operating system | Programming language | Supported language, note | Offline or online |
---|---|---|---|---|---|---|---|
CMU Sphinx | HMM | Yes | BSD style | Cross-platform | Java | English, German, French, Mandarin, Russian | Offline |
HTK | HMM neural net | No | HTK specific | Cross-platform | C | English; version 3.5 released December 2015 | |
Julius | HMM trigrams | Yes | BSD style, non-commercial | Cross-platform | C | Japanese, English; | Offline |
Kaldi | Neural net | Yes | Apache | Cross-platform | C++ | English | |
RWTH ASR | RWTH Aachen University | No | RWTH ASR, non-commercial use only | Linux, macOS | C++ | English | |
Whisper (speech recognition system) | Encoder/decoder transformer | Yes | MIT license | Cross-platform | Python (programming language) | Multilingual | Online (through API) and Offline |
Application name | Description | Open-source | License | Price | Note |
---|---|---|---|---|---|
Dragon for Mac (discontinued 2018) | macOS; by Nuance | No | Proprietary | ||
Dragon Dictate (discontinued) | macOS; by Nuance | No | Proprietary | ||
MacSpeech Scribe (discontinued) | Transcription from recorded text; acquired by Nuance | ||||
iListen (discontinued) | PowerPC Macintosh; discontinued by MacSpeech; acquired by Nuance | ||||
Speakable items | Included with macOS | ||||
ViaVoice (discontinued) | IBM Product; acquired by Nuance | ||||
Voice Navigator | Original GUI voice control; 1989 |
The following list presents notable speech recognition software that operate in a Chrome browser as web apps. They make use of HTML5 Web-Speech-API. [1]
Application name | Description | Open-source | License | Price | Note |
---|---|---|---|---|---|
Speechmatics [2] | Cloud based and on-premise automatic speech recognition | No | Proprietary | From £0.06 per minute of audio |
Many mobile phone handsets, including feature phones and smartphones such as iPhones and BlackBerrys, have basic dial-by-voice features built in. Many third-party apps have implemented natural-language speech recognition support, including:
Application name | Description | Open-source | License | Price | Note |
---|---|---|---|---|---|
Assistant.ai | Assistant for Android, iOS and Windows Phone | No | Proprietary, freeware | Free | Discontinued |
Dragon Dictation | No | Proprietary, freeware | Free | ||
Google Now | Android voice search | No | Proprietary, freeware | Free | |
Google Voice Search | No | Proprietary, freeware | Free | ||
Microsoft Cortana | Microsoft voice search | No | Proprietary, freeware | Free | |
Siri Personal Assistant | Apple's virtual personal assistant | No | Proprietary, freeware | Free | |
Alexa – Amazon Echo | Amazon's personal assistant | No | Proprietary | ||
SILVIA | Android and iOS | No | |||
Vlingo |
The Windows Speech Recognition version 8.0 by Microsoft comes built into Windows Vista, Windows 7, Windows 8 and Windows 10. Speech Recognition is available only in English, French, Spanish, German, Japanese, Simplified Chinese, and Traditional Chinese and only in the corresponding version of Windows; meaning you cannot use the speech recognition engine in one language if you use a version of Windows in another language. Windows 7 Ultimate and Windows 8 Pro allow you to change the system language, and therefore change which speech engine is available. Windows Speech Recognition evolved into Cortana (software), a personal assistant included in Windows 10.
The following are interactive voice response (IVR) systems:
A computing platform, digital platform, or software platform is an environment in which software is executed. It may be the hardware or the operating system (OS), a web browser and associated application programming interfaces, or other underlying software, as long as the program code is executed using the services provided by the platform. Computing platforms have different abstraction levels, including a computer architecture, an OS, or runtime libraries. A computing platform is the stage on which computer programs can run.
Microsoft Agent is a technology developed by Microsoft which employs animated characters, text-to-speech engines, and speech recognition software to enhance interaction with computer users. It came pre-installed as part of Windows 2000 and later versions of Microsoft Windows up to Windows Vista. It was not included with Windows 7,and was completely discontinued in Windows 8. Microsoft Agent functionality was exposed as an ActiveX control that can be used by web pages.
IBM ViaVoice was a range of language-specific continuous speech recognition software products offered by IBM. The current version is designed primarily for use in embedded devices. The latest stable version of IBM Via Voice was 9.0 and was able to transfer text directly into Word.
MacSpeech, Inc. was a New Hampshire-based technology company that produced software-based speech recognition and voice dictation solutions for the Apple ecosystem. The company's products included iListen, MacSpeech Dictate, MacSpeech Dictate Medical, MacSpeech Dictate Legal, MacSpeech Dictate International, and MacSpeech Scribe. On February 12, 2010, Nuance Communications, Inc. acquired MacSpeech.
Dragon NaturallySpeaking is a speech recognition software package developed by Dragon Systems of Newton, Massachusetts, which was acquired in turn by Lernout & Hauspie Speech Products, Nuance Communications, and Microsoft. It runs on Windows personal computers. Version 15, which supports 32-bit and 64-bit editions of Windows 7, 8 and 10, was released in August 2016.
A voice-user interface (VUI) enables spoken human interaction with computers, using speech recognition to understand spoken commands and answer questions, and typically text to speech to play a reply. A voice command device is a device controlled with a voice user interface.
The Speech Application Programming Interface or SAPI is an API developed by Microsoft to allow the use of speech recognition and speech synthesis within Windows applications. To date, a number of versions of the API have been released, which have shipped either as part of a Speech SDK or as part of the Windows OS itself. Applications that use SAPI include Microsoft Office, Microsoft Agent and Microsoft Speech Server.
Microsoft Voice Command is an application which can control Windows Mobile devices by voice. The first version was announced in November 2003 and it was supported in the United Kingdom, United States, France, and Germany.
Bing for mobile is a search tool for handheld mobile devices from Microsoft as part of their Bing search engine. It is designed for mobile device displays. Bing Mobile is built into Windows Mobile and Windows Phone as proprietary software, accessed via the Search key on Windows Phone 7 and Windows Phone 8 devices. It is also available on Windows Phone 8.1, and can be downloaded for other platforms, including and Android.
As of the early 2000s, several speech recognition (SR) software packages exist for Linux. Some of them are free and open-source software and others are proprietary software. Speech recognition usually refers to software that attempts to distinguish thousands of words in a human language. Voice control may refer to software used for communicating operational commands to a computer.
Windows Speech Recognition (WSR) is speech recognition developed by Microsoft for Windows Vista that enables voice commands to control the desktop user interface, dictate text in electronic documents and email, navigate websites, perform keyboard shortcuts, and operate the mouse cursor. It supports custom macros to perform additional or supplementary tasks.
Natural-language user interface is a type of computer human interface where linguistic phenomena such as verbs, phrases and clauses act as UI controls for creating, selecting and modifying data in software applications.
Apache Cordova is a mobile application development framework created by Nitobi. Adobe Systems purchased Nitobi in 2011, rebranded it as PhoneGap, and later released an open-source version of the software called Apache Cordova. Apache Cordova enables software programmers to build hybrid web applications for mobile devices using CSS3, HTML5, and JavaScript, instead of relying on platform-specific APIs like those in Android, iOS, or Windows Phone. It enables the wrapping up of CSS, HTML, and JavaScript code depending on the platform of the device. It extends the features of HTML and JavaScript to work with the device. The resulting applications are hybrid, meaning that they are neither truly native mobile application nor purely Web-based. They are not native because all layout rendering is done via Web views instead of the platform's native UI framework. They are not Web apps because they are packaged as apps for distribution and have access to native device APIs. Mixing native and hybrid code snippets has been possible since version 1.9.
The Microsoft text-to-speech voices are speech synthesizers provided for use with applications that use the Microsoft Speech API (SAPI) or the Microsoft Speech Server Platform. There are client, server, and mobile versions of Microsoft text-to-speech voices. Client voices are shipped with Windows operating systems; server voices are available for download for use with server applications such as Speech Server, Lync etc. for both Windows client and server platforms, and mobile voices are often shipped with more recent versions.
Tazti is a speech recognition software package developed and sold by Voice Tech Group, Inc. for Windows personal computers. The most recent package is version 3.2, which supports Windows 10, Windows 8.1, Windows 8 and Windows 7 64-bit editions. Earlier versions of Tazti supported Windows Vista and Windows XP. PC video game play by voice, controlling PC applications and programs by voice and creating speech commands to trigger a browser to open web pages, or trigger the Windows operating system to open files, folders or programs are Tazti's primary features. Earlier versions of Tazti included a lite Dictation feature that is eliminated from the latest version.
Sensory, Inc. is an American company which develops software AI technologies for speech, sound and vision. It is based in Santa Clara, California.
Cortana is a discontinued virtual assistant developed by Microsoft that used the Bing search engine to perform tasks such as setting reminders and answering questions for users.
One Voice Technologies was an Artificial Intelligence (AI) based Natural Language Processing (NLP) company founded in 1998 and based in San Diego, CA. One Voice was the developer of IVAN, an intelligent personal assistant, which commercially launched in 1999. Some of the customers for One Voice Technologies are Telefonos de Mexico, S.A.B. de C.V. (TELMEX), Intel Corporation, the Government of India, Fry's Electronics, Inland Cellular, and Nex-Tec Wireless.
Braina is a virtual assistant and speech-to-text dictation application for Microsoft Windows developed by Brainasoft. Braina uses natural language interface, speech synthesis, and speech recognition technology to interact with its users and allows them to use natural language sentences to perform various tasks on a computer in most languages of the world. The name Braina is a short form of “Brain Artificial”.