List of speech recognition software

Last updated

Speech recognition software is available for many computing platforms, operating systems, use models, and software licenses. Here is a listing of such, grouped in various useful ways.

Contents

Acoustic models and speech corpus (compilation)

The following list presents notable speech recognition software engines with a brief synopsis of characteristics.

Application nameDescription Open-source License Operating system Programming language Supported language, noteOffline or online
CMU Sphinx HMM Yes BSD style Cross-platform Java English, German, French, Mandarin, RussianOffline
HTK HMM neural net NoHTK specific Cross-platform C English; version 3.5 released December 2015
Julius HMM trigramsYesBSD style, non-commercial Cross-platform CJapanese, English; Offline
Kaldi Neural net Yes Apache Cross-platform C++ English
RWTH ASR RWTH Aachen UniversityNoRWTH ASR, non-commercial use only Linux, macOS C++English
Whisper (speech recognition system) Encoder/decoder transformer Yes MIT license Cross-platform Python (programming language) MultilingualOnline (through API) and Offline

Macintosh

Application nameDescription Open-source License PriceNote
Dragon for Mac (discontinued 2018) macOS; by NuanceNo Proprietary
Dragon Dictate (discontinued)macOS; by NuanceNo Proprietary
MacSpeech Scribe (discontinued)Transcription from recorded text; acquired by Nuance
iListen (discontinued) PowerPC Macintosh; discontinued by MacSpeech; acquired by Nuance
Speakable items Included with macOS
ViaVoice (discontinued)IBM Product; acquired by Nuance
Voice Navigator Original GUI voice control; 1989

Cross-platform web apps based on Chrome

The following list presents notable speech recognition software that operate in a Chrome browser as web apps. They make use of HTML5 Web-Speech-API. [1]

Application nameDescription Open-source License PriceNote
Speechmatics [2] Cloud based and on-premise automatic speech recognitionNo Proprietary From £0.06 per minute of audio

Mobile devices and smartphones

Many mobile phone handsets, including feature phones and smartphones such as iPhones and BlackBerrys, have basic dial-by-voice features built in. Many third-party apps have implemented natural-language speech recognition support, including:

Application nameDescription Open-source License PriceNote
Assistant.ai Assistant for Android, iOS and Windows PhoneNo Proprietary, freeware FreeDiscontinued
Dragon Dictation No Proprietary, freeware Free
Google Now Android voice searchNo Proprietary, freeware Free
Google Voice Search No Proprietary, freeware Free
Microsoft Cortana Microsoft voice searchNo Proprietary, freeware Free
Siri Personal Assistant Apple's virtual personal assistantNo Proprietary, freeware Free
Alexa – Amazon Echo Amazon's personal assistantNo Proprietary
SILVIA Android and iOSNo
Vlingo

Windows

Windows built-in speech recognition

The Windows Speech Recognition version 8.0 by Microsoft comes built into Windows Vista, Windows 7, Windows 8 and Windows 10. Speech Recognition is available only in English, French, Spanish, German, Japanese, Simplified Chinese, and Traditional Chinese and only in the corresponding version of Windows; meaning you cannot use the speech recognition engine in one language if you use a version of Windows in another language. Windows 7 Ultimate and Windows 8 Pro allow you to change the system language, and therefore change which speech engine is available. Windows Speech Recognition evolved into Cortana (software), a personal assistant included in Windows 10.

Windows 7, 8, 10, 11 third-party speech recognition

Windows XP or 2000 only

Built-in software

Interactive voice response

The following are interactive voice response (IVR) systems:

Unix-like x86 and x86-64 speech transcription software

Discontinued software

See also

Related Research Articles

A computing platform, digital platform, or software platform is an environment in which software is executed. It may be the hardware or the operating system (OS), a web browser and associated application programming interfaces, or other underlying software, as long as the program code is executed using the services provided by the platform. Computing platforms have different abstraction levels, including a computer architecture, an OS, or runtime libraries. A computing platform is the stage on which computer programs can run.

<span class="mw-page-title-main">Microsoft Agent</span> Virtual software agent technology

Microsoft Agent is a technology developed by Microsoft which employs animated characters, text-to-speech engines, and speech recognition software to enhance interaction with computer users. It came pre-installed as part of Windows 2000 and later versions of Microsoft Windows up to Windows Vista. It was not included with Windows 7,and was completely discontinued in Windows 8. Microsoft Agent functionality was exposed as an ActiveX control that can be used by web pages.

IBM ViaVoice was a range of language-specific continuous speech recognition software products offered by IBM. The current version is designed primarily for use in embedded devices. The latest stable version of IBM Via Voice was 9.0 and was able to transfer text directly into Word.

<span class="mw-page-title-main">MacSpeech</span> Speech recognition etc. software company

MacSpeech, Inc. was a New Hampshire-based technology company that produced software-based speech recognition and voice dictation solutions for the Apple ecosystem. The company's products included iListen, MacSpeech Dictate, MacSpeech Dictate Medical, MacSpeech Dictate Legal, MacSpeech Dictate International, and MacSpeech Scribe. On February 12, 2010, Nuance Communications, Inc. acquired MacSpeech.

<span class="mw-page-title-main">Dragon NaturallySpeaking</span> Speech recognition software package

Dragon NaturallySpeaking is a speech recognition software package developed by Dragon Systems of Newton, Massachusetts, which was acquired in turn by Lernout & Hauspie Speech Products, Nuance Communications, and Microsoft. It runs on Windows personal computers. Version 15, which supports 32-bit and 64-bit editions of Windows 7, 8 and 10, was released in August 2016.

A voice-user interface (VUI) enables spoken human interaction with computers, using speech recognition to understand spoken commands and answer questions, and typically text to speech to play a reply. A voice command device is a device controlled with a voice user interface.

The Speech Application Programming Interface or SAPI is an API developed by Microsoft to allow the use of speech recognition and speech synthesis within Windows applications. To date, a number of versions of the API have been released, which have shipped either as part of a Speech SDK or as part of the Windows OS itself. Applications that use SAPI include Microsoft Office, Microsoft Agent and Microsoft Speech Server.

Microsoft Voice Command is an application which can control Windows Mobile devices by voice. The first version was announced in November 2003 and it was supported in the United Kingdom, United States, France, and Germany.

<span class="mw-page-title-main">Bing Mobile</span> Web search for mobile devices

Bing for mobile is a search tool for handheld mobile devices from Microsoft as part of their Bing search engine. It is designed for mobile device displays. Bing Mobile is built into Windows Mobile and Windows Phone as proprietary software, accessed via the Search key on Windows Phone 7 and Windows Phone 8 devices. It is also available on Windows Phone 8.1, and can be downloaded for other platforms, including and Android.

As of the early 2000s, several speech recognition (SR) software packages exist for Linux. Some of them are free and open-source software and others are proprietary software. Speech recognition usually refers to software that attempts to distinguish thousands of words in a human language. Voice control may refer to software used for communicating operational commands to a computer.

<span class="mw-page-title-main">Windows Speech Recognition</span> Speech recognition software

Windows Speech Recognition (WSR) is speech recognition developed by Microsoft for Windows Vista that enables voice commands to control the desktop user interface, dictate text in electronic documents and email, navigate websites, perform keyboard shortcuts, and operate the mouse cursor. It supports custom macros to perform additional or supplementary tasks.

Natural-language user interface is a type of computer human interface where linguistic phenomena such as verbs, phrases and clauses act as UI controls for creating, selecting and modifying data in software applications.

Apache Cordova is a mobile application development framework created by Nitobi. Adobe Systems purchased Nitobi in 2011, rebranded it as PhoneGap, and later released an open-source version of the software called Apache Cordova. Apache Cordova enables software programmers to build hybrid web applications for mobile devices using CSS3, HTML5, and JavaScript, instead of relying on platform-specific APIs like those in Android, iOS, or Windows Phone. It enables the wrapping up of CSS, HTML, and JavaScript code depending on the platform of the device. It extends the features of HTML and JavaScript to work with the device. The resulting applications are hybrid, meaning that they are neither truly native mobile application nor purely Web-based. They are not native because all layout rendering is done via Web views instead of the platform's native UI framework. They are not Web apps because they are packaged as apps for distribution and have access to native device APIs. Mixing native and hybrid code snippets has been possible since version 1.9.

The Microsoft text-to-speech voices are speech synthesizers provided for use with applications that use the Microsoft Speech API (SAPI) or the Microsoft Speech Server Platform. There are client, server, and mobile versions of Microsoft text-to-speech voices. Client voices are shipped with Windows operating systems; server voices are available for download for use with server applications such as Speech Server, Lync etc. for both Windows client and server platforms, and mobile voices are often shipped with more recent versions.

Tazti is a speech recognition software package developed and sold by Voice Tech Group, Inc. for Windows personal computers. The most recent package is version 3.2, which supports Windows 10, Windows 8.1, Windows 8 and Windows 7 64-bit editions. Earlier versions of Tazti supported Windows Vista and Windows XP. PC video game play by voice, controlling PC applications and programs by voice and creating speech commands to trigger a browser to open web pages, or trigger the Windows operating system to open files, folders or programs are Tazti's primary features. Earlier versions of Tazti included a lite Dictation feature that is eliminated from the latest version.

<span class="mw-page-title-main">Sensory, Inc.</span>

Sensory, Inc. is an American company which develops software AI technologies for speech, sound and vision. It is based in Santa Clara, California.

<span class="mw-page-title-main">Cortana (virtual assistant)</span> Discontinued personal assistant by Microsoft

Cortana is a discontinued virtual assistant developed by Microsoft that used the Bing search engine to perform tasks such as setting reminders and answering questions for users.

One Voice Technologies was an Artificial Intelligence (AI) based Natural Language Processing (NLP) company founded in 1998 and based in San Diego, CA. One Voice was the developer of IVAN, an intelligent personal assistant, which commercially launched in 1999. Some of the customers for One Voice Technologies are Telefonos de Mexico, S.A.B. de C.V. (TELMEX), Intel Corporation, the Government of India, Fry's Electronics, Inland Cellular, and Nex-Tec Wireless.

<span class="mw-page-title-main">Braina</span> Intelligent personal assistant & dictation software

Braina is a virtual assistant and speech-to-text dictation application for Microsoft Windows developed by Brainasoft. Braina uses natural language interface, speech synthesis, and speech recognition technology to interact with its users and allows them to use natural language sentences to perform various tasks on a computer in most languages of the world. The name Braina is a short form of “Brain Artificial”.

References

  1. "Web Speech API Specification". dvcs.w3.org. Archived from the original on 2016-06-21.
  2. Orlowski, Andrew. "Total recog: British AI makes universal speech breakthrough". The Register. Situation Publishing. Retrieved 17 May 2018.
  3. "Speech Recognition Software for Windows PC – Braina". www.brainasoft.com. Archived from the original on 2015-04-07.
  4. "Dynamic Faceting-List of Most 57 Speech Recognition SWs and Web Services". Archived from the original on February 13, 2019. Retrieved February 23, 2019.
  5. O'Neill, Mark (2013-11-06). "Control your PC with these 5 speech recognition programs". PC World . Archived from the original on 2014-01-01. Retrieved 2013-12-30.
  6. "Interactive Voice Response". Genesys. Archived from the original on 2016-10-14.
  7. [ dead link ]
  8. Lavie, A.; Waibel, A.; Levin, L.; Finke, M.; Gates, D.; Gavalda, M.; Zeppenfeld, T.; Zhan, Puming (1 April 1997). "Janus-III: speech-to-speech translation in multiple languages". 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing. Vol. 1. IEEE Xplore. pp. 99–102. CiteSeerX   10.1.1.36.6967 . doi:10.1109/ICASSP.1997.599557. ISBN   978-0-8186-7919-3. S2CID   1514209.
  9. "A TensorFlow implementation of Baidu's DeepSpeech architecture". Mozilla. 2017-12-05. Retrieved 2017-12-05.
  10. "IBM - Embedded ViaVoice - Embedded ViaVoice - Software". Archived from the original on 2010-08-08. Retrieved 2010-06-29.
  11. "Nuance product support for Microsoft Windows 7". Nuance Communications, Customer Help. Retrieved 2019-03-16.
  12. "ViaVoice for Mac OS X on Intel Chipset". Nuance Communications, Customer Help. Retrieved 2019-03-16.