NeoSpeech

Last updated

NeoSpeech Inc. is an American company that specializes in text-to-speech (TTS) software for embedded devices, mobile, desktop, and network/server applications. NeoSpeech was founded by two speech engineers, Lin Chase and Yoon Kim, in Fremont, California, US, in 2002. NeoSpeech is privately held, headquartered in Santa Clara, California.

Contents

Stephen Hawking was briefly a NeoSpeech TTS user in 2004, [1] but soon returned to using his iconic DECtalk voice synthesizer [2] since he identified with it so strongly. Adobe Systems has selected NeoSpeech speech synthesis for their e-learning authoring suite Adobe Captivate. [3]

History

Neospeech was a subsidiary of Korean company named Voiceware Co., Ltd. Voiceware was established in 2000. In January 2001 Voiceware released VoiceEz natural speech recognition technology, and VoiceCop speaker verification technology. [4] In September 2001 Voiceware released VoiceText its first American English voice, named Kate, in VoiceText English. [5]

In February 2002 Voiceware announced the establishment of US subsidiary, NeoSpeech, Inc.(www.neospeech.com) at San Jose, CA. [6]

in January 2006, a Japanese photographic company named Pentax acquired Voiceware Co., a producer of text-to-speech software technologies. [7]

In 2007–2008, Pentax was acquired by HOYA Corporation. [8]

Following the acquisition of ReadSpeaker (another text-to-speech company, the creator of rSpeak software) by HOYA in 2017, HOYA gathered all of its voice technology companies under the ReadSpeaker brand, encompassing the existing companies and brands of ReadSpeaker, rSpeak, Voiceware, VoiceText, and NeoSpeech. [9]

Products

VoiceText speech synthesis is the NeoSpeech software component that generates synthesized speech from input text. NeoSpeech uses Unit Selection Synthesis (USS), which utilizes large databases of recorded sound segments to create synthesized speech. The VoiceText TTS Engine is mainly used to build custom stand-alone TTS applications such as AAC (Augmentative and alternative communication) products, [10] [11] [12] gaming software, automated loud speaker/paging systems, educational software, [13] and language learning apps. [14] It also can be used simply to output a voice from an input text using a provided desktop TTS program. [15]

Languages

Languages include US and UK variants of English, Mexican Spanish, Canadian French, Chinese, Korean, and Japanese, with a variety of male and female voices.

The software is available for

See also

Related Research Articles

Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech. The reverse process is speech recognition.

<span class="mw-page-title-main">Screen reader</span> Assistive technology that converts text or images to speech or Braille

A screen reader is a form of assistive technology (AT) that renders text and image content as speech or braille output. Screen readers are essential to people who are blind, and are useful to people who are visually impaired, illiterate, or have a learning disability. Screen readers are software applications that attempt to convey what people with normal eyesight see on a display to their users via non-visual means, like text-to-speech, sound icons, or a braille device. They do this by applying a wide variety of techniques that include, for example, interacting with dedicated accessibility APIs, using various operating system features, and employing hooking techniques.

FreeTTS is an open source speech synthesis system written entirely in the Java programming language. It is based upon Flite. FreeTTS is an implementation of Sun's Java Speech API.

PlainTalk is the collective name for several speech synthesis (MacinTalk) and speech recognition technologies developed by Apple Inc. In 1990, Apple invested a lot of work and money in speech recognition technology, hiring many researchers in the field. The result was "PlainTalk", released with the AV models in the Macintosh Quadra series from 1993. It was made a standard system component in System 7.1.2, and has since been shipped on all PowerPC and some 68k Macintoshes.

A voice-user interface (VUI) enables spoken human interaction with computers, using speech recognition to understand spoken commands and answer questions, and typically text to speech to play a reply. A voice command device is a device controlled with a voice user interface.

Speech Synthesis Markup Language (SSML) is an XML-based markup language for speech synthesis applications. It is a recommendation of the W3C's Voice Browser Working Group. SSML is often embedded in VoiceXML scripts to drive interactive telephony systems. However, it also may be used alone, such as for creating audio books. For desktop applications, other markup languages are popular, including Apple's embedded speech commands, and Microsoft's SAPI Text to speech (TTS) markup, also an XML language. It is also used to produce sounds via Azure Cognitive Services' Text to Speech API or when writing third-party skills for Google Assistant or Amazon Alexa.

Emacspeak is a free computer application, a speech interface, and an audio desktop. It employs Emacs, Emacs Lisp, and Tcl. Developed principally by T. V. Raman, it was first released in April 1995. It is portable to all POSIX-compatible OSs. It is tightly integrated with Emacs, allowing it to render intelligible and useful content rather than parsing the graphics ; its default voice synthesizer can be replaced with other software synthesizers when a server module is installed. Emacspeak is one of the most popular speech interfaces for Linux, bundled with most major distributions. In 2014, Raman wrote an article describing how the software's design was impacted by shifts in computer technology and its general usage over 20 years.

The Speech Application Programming Interface or SAPI is an API developed by Microsoft to allow the use of speech recognition and speech synthesis within Windows applications. To date, a number of versions of the API have been released, which have shipped either as part of a Speech SDK or as part of the Windows OS itself. Applications that use SAPI include Microsoft Office, Microsoft Agent and Microsoft Speech Server.

<span class="mw-page-title-main">DECtalk</span> Speech synthesizer and text-to-speech technology

DECtalk was a speech synthesizer and text-to-speech technology developed by Digital Equipment Corporation in 1983, based largely on the work of Dennis Klatt at MIT, whose source-filter algorithm was variously known as KlattTalk or MITalk.

<span class="mw-page-title-main">Hoya Corporation</span> Japanese optical products company

Hoya Corporation is a Japanese company manufacturing optical products such as photomasks, photomask blanks and hard disk drive platters, contact lenses and eyeglass lenses for the health-care market, medical photonics, lasers, photographic filters, medical flexible endoscopy equipment, and software. Hoya Corporation is one of the Forbes Global 2000 Leading Companies and Industry Week 1000 Company.

Chinese speech synthesis is the application of speech synthesis to the Chinese language. It poses additional difficulties due to Chinese characters frequently having different pronunciations in different contexts and the complex prosody, which is essential to convey the meaning of words, and sometimes the difficulty in obtaining agreement among native speakers concerning what the correct pronunciation is of certain phonemes.

eSpeak Compact, open-source, software speech synthesizer

eSpeak is a free and open-source, cross-platform, compact, software speech synthesizer. It uses a formant synthesis method, providing many languages in a relatively small file size. eSpeakNG is a continuation of the original developer's project with more feedback from native speakers.

Gnuspeech is an extensible text-to-speech computer software package that produces artificial speech output based on real-time articulatory speech synthesis by rules. That is, it converts text strings into phonetic descriptions, aided by a pronouncing dictionary, letter-to-sound rules, and rhythm and intonation models; transforms the phonetic descriptions into parameters for a low-level articulatory speech synthesizer; uses these to drive an articulatory model of the human vocal tract producing an output suitable for the normal sound output devices used by various computer operating systems; and does this at the same or faster rate than the speech is spoken for adult speech.

The Microsoft text-to-speech voices are speech synthesizers provided for use with applications that use the Microsoft Speech API (SAPI) or the Microsoft Speech Server Platform. There are client, server, and mobile versions of Microsoft text-to-speech voices. Client voices are shipped with Windows operating systems; server voices are available for download for use with server applications such as Speech Server, Lync etc. for both Windows client and server platforms, and mobile voices are often shipped with more recent versions.

Cepstral is a provider of speech synthesis technology and services. It was founded in June 2000 by scientists from Carnegie Mellon University including the computer scientists Kevin Lenzo and Alan W. Black. It is a privately held corporation with headquarters in Pittsburgh, Pennsylvania.

Loquendo is an Italian multinational computer software technology corporation, headquartered in Torino, Italy, that provides speech recognition, speech synthesis, speaker verification and identification applications. Loquendo, which was founded in 2001 under the Telecom Italia Lab, also had offices in United Kingdom, Spain, Germany, France, and the United States.

<span class="mw-page-title-main">CereProc</span> Speech synthesis company

CereProc is a speech synthesis company based in Edinburgh, Scotland, founded in 2005. The company specialises in creating natural and expressive-sounding text to speech voices, synthesis voices with regional accents, and in voice cloning.

<span class="mw-page-title-main">Sensory, Inc.</span>

Sensory, Inc. is an American company which develops software AI technologies for speech, sound and vision. It is based in Santa Clara, California.

Lessac Technologies, Inc. (LTI) is an American firm which develops voice synthesis software, licenses technology and sells synthesized novels as MP3 files. The firm currently has seven patents granted and three more pending for its automated methods of converting digital text into human-sounding speech, more accurately recognizing human speech and outputting the text representing the words and phrases of said speech, along with recognizing the speaker's emotional state.

WaveNet is a deep neural network for generating raw audio. It was created by researchers at London-based AI firm DeepMind. The technique, outlined in a paper in September 2016, is able to generate relatively realistic-sounding human-like voices by directly modelling waveforms using a neural network method trained with recordings of real speech. Tests with US English and Mandarin reportedly showed that the system outperforms Google's best existing text-to-speech (TTS) systems, although as of 2016 its text-to-speech synthesis still was less convincing than actual human speech. WaveNet's ability to generate raw waveforms means that it can model any kind of audio, including music.

References

  1. Gigmaz Online. "Stephen Hawking chooses a new voice".
  2. "The Evolution of Text-to-Speech Voice Assistive Technology". William & Mary. Retrieved 2021-06-26.
  3. Adobe Captivate 4 text-to-speech converter | Download location
  4. "Voiceware > About Voiceware > History & Awards & Recognition". www.voiceware.co.kr. Archived from the original on 2013-09-29.
  5. "Voiceware > Customer Support > News & Announcements".
  6. "Voiceware > Customer Support > News & Announcements".
  7. "Pentax Corporation | Encyclopedia.com".
  8. "Public Notice of Execution of Merger Agreement" (PDF). Pentax Corporation. 2007-10-29. Archived from the original (PDF) on 2008-12-09. Retrieved 2007-10-31.
  9. "Now Gathering Voiceware, NeoSpeech, VoiceText & rSpeak Under a Unified Umbrella Brand, ReadSpeaker Consolidates its Position as a Leading Text-to-Speech Player". 28 February 2019.
  10. SpeechTech magazine. "NeoSpeech Enhances TTS for ADA Compliance". May 31, 2013.
  11. AiSquared's ZoomText Speech synthesis
  12. Global Accessibility News Americas. "Neospeech Text-to-Speech offers enhanced accessibility in compliance with the ADA". May 31, 2013.
  13. Kurzweil 1000 Windows Version 13 / Features and Highlights
  14. SpeechTech magazine. "NeoSpeech TTS Powers New Japanese Learning App". June 17, 2013.
  15. OnDemand Text to Speech