Developer | Digital Equipment Corporation |
---|---|
Type | speech synthesizer, text-to-speech |
Release date | 1984[1] [2] |
Introductory price | DTC01 US$4,000(equivalent to $11,731 in 2023) [3] |
Connectivity | RS-232C serial interface [1] |
Platform | OpenVMS, ULTRIX, Digital UNIX, Windows NT |
Dimensions | (DTC01 = W 45.7 cm x D 30.48 cm x H 10.16 cm ( 18in x 12in x 4in) ) |
Mass | (DTC01 = 7.2 kg (16 lbs)) |
DECtalk [4] was a speech synthesizer and text-to-speech technology developed by Digital Equipment Corporation in 1983, [1] based largely on the work of Dennis Klatt at MIT, whose source-filter algorithm was variously known as KlattTalk or MITalk. [5] [6]
Uses ranged from interacting with the public to allowing those with speech disabilities to verbalize, include giving a public speech. [7] [8]
Announced December 1983, a trickle came February 1984; larger DECtalk quantities were delivered in March. [9]
They were standalone units that connected to any device with an asynchronous serial port. These units were also able to connect to the telephone system by having two telephone jacks. One connected to a phone line, the other to a telephone. The DECtalk units could recognize and generate any telephone touch tone. With that capability the units could be used to automate various telephone-related tasks by handling both incoming and outgoing calls. This included acting as an interface to an email system and the capability to function as an alerting system by utilizing the ability to place calls and interact via touch tones with the person answering the phone.
Later units were produced for PCs with ISA bus slots. In addition, various software implementations were produced, most notably the DECtalk Access32. Such implementations began as explorations into real-time software synthesis on general purpose CPUs, [10] : 2 subsequently delivering a DECtalk Software product for Digital Unix and for Windows NT on Alpha and Intel processors. [11] Certain versions of the synthesizer were prone to undesirable characteristics. For example, the alveolar stops were often assimilated as sounding more like dental stops. Also, versions such as Access32 would produce faint electronic beeps at the end of phrases.
In the final years, early/mid-2000, [12] the DECtalk IP was sold to Force Computers, Inc. In December 2001, the IP was sold [13] from Force Computers, Inc, to Fonix Speech, Inc. (now SpeechFX, Inc.), which offers DECtalk as a small-footprint TTS system and in a computer program form. [14]
The New York Times wrote: "like a scratchy recording of a person with a lisp" but added "usually understandable." [4]
DECtalk had a number of built-in voices which were identified by the following names: Perfect Paul (the default voice), Beautiful Betty, Huge Harry, Frail Frank, Kit the Kid, Rough Rita, Uppity Ursula, Doctor Dennis and Whispering Wendy. Each of the voices were editable by adjusting various parameters (such as throat size, crossover frequencies, etc.).
DECtalk understood phonetic spellings of words, allowing customized pronunciation of unusual words. These phonetic spellings could also include a pitch and duration notation which DECtalk would use when enunciating the phonetic components. This allowed DECtalk to sing.
Telephony is the field of technology involving the development, application, and deployment of telecommunications services for the purpose of electronic transmission of voice, fax, or data, between distant parties. The history of telephony is intimately linked to the invention and development of the telephone.
Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech. The reverse process is speech recognition.
NOAA Weather Radio (NWR), also known as NOAA Weather Radio All Hazards, is an automated 24-hour network of VHF FM weather radio stations in the United States that broadcast weather information directly from a nearby National Weather Service office. The routine programming cycle includes local or regional weather forecasts, synopsis, climate summaries or zone/lake/coastal waters forecasts. During severe conditions the cycle is shortened into: hazardous weather outlooks, short-term forecasts, special weather statements or tropical weather summaries. It occasionally broadcasts other non-weather related events such as national security statements, natural disaster information, environmental and public safety statements, civil emergencies, fires, evacuation orders, and other hazards sourced from the Federal Communications Commission's (FCC) Emergency Alert System. NOAA Weather Radio uses automated broadcast technology that allows for the recycling of segments featured in one broadcast cycle into another and more regular updating of segments to each of the transmitters. It also speeds up the warning transmitting process.
A screen reader is a form of assistive technology (AT) that renders text and image content as speech or braille output. Screen readers are essential to people who are blind, and are useful to people who are visually impaired, illiterate, or have a learning disability. Screen readers are software applications that attempt to convey what people with normal eyesight see on a display to their users via non-visual means, like text-to-speech, sound icons, or a braille device. They do this by applying a wide variety of techniques that include, for example, interacting with dedicated accessibility APIs, using various operating system features, and employing hooking techniques.
A telecommunications relay service, also known as TRS, relay service, or IP-relay, or Web-based relay service, is an operator service that allows people who are deaf, hard of hearing, deafblind, or have a speech disorder to place calls to standard telephone users via a keyboard or assistive device. Originally, relay services were designed to be connected through a TDD, teletypewriter (TTY) or other assistive telephone device. Services gradually have expanded to include almost any real-time text capable technology such as a personal computer, laptop, mobile phone, PDA, and many other devices. The first TTY was invented by deaf scientist Robert Weitbrecht in 1964. The first relay service was established in 1974 by Converse Communications of Connecticut.
SpeechWorks was a company founded in Boston in 1994 by speech recognition pioneer Mike Phillips and Bill O'Farrell. The Boston-based company developed and supported speech-related computer software. Originally known as Applied Language Technologies, SpeechWorks went public in 2000 and tripled its value. It was acquired by Scansoft in 2003. ScanSoft acquired Nuance in 2005, and changed its name to Nuance Communications.
SpeechFX, Inc. offers voice technology for mobile phone and wireless devices, interactive video games, toys, home appliances, computer telephony systems and vehicle telematics. SpeechFX speech solutions are based on the firm’s proprietary neural network-based automatic speech recognition (ASR) and Fonix DECtalk, a text-to-speech speech synthesis system (TTS). Fonix speech technology is user-independent, meaning no voice training is involved.
A voice-user interface (VUI) enables spoken human interaction with computers, using speech recognition to understand spoken commands and answer questions, and typically text to speech to play a reply. A voice command device is a device controlled with a voice user interface.
Votrax International, Inc., or just Votrax, was a speech synthesis company located in the Detroit, Michigan area from 1971 to 1996. It began as a division of Federal Screw Works from 1971 to 1973. In 1974, it was given the Votrax name and moved to Troy, Michigan and, in 1980, split off of its parent company entirely and became Votrax International, Inc., which produced speech products up until 1984.
As of the early 2000s, several speech recognition (SR) software packages exist for Linux. Some of them are free and open-source software and others are proprietary software. Speech recognition usually refers to software that attempts to distinguish thousands of words in a human language. Voice control may refer to software used for communicating operational commands to a computer.
Speech-generating devices (SGDs), also known as voice output communication aids, are electronic augmentative and alternative communication (AAC) systems used to supplement or replace speech or writing for individuals with severe speech impairments, enabling them to verbally communicate. SGDs are important for people who have limited means of interacting verbally, as they allow individuals to become active participants in communication interactions. They are particularly helpful for patients with amyotrophic lateral sclerosis (ALS) but recently have been used for children with predicted speech deficiencies.
eSpeak is a free and open-source, cross-platform, compact, software speech synthesizer. It uses a formant synthesis method, providing many languages in a relatively small file size. eSpeakNG is a continuation of the original developer's project with more feedback from native speakers.
Higan is a free and open source emulator for multiple video game consoles, including the Super Nintendo Entertainment System. It was developed by Near. Originally called bsnes, the emulator is notable for attempting to emulate the original hardware as accurately as possible through low-level, cycle-accurate emulation and for the associated historical preservation efforts of the Super NES platform.
Gnuspeech is an extensible text-to-speech computer software package that produces artificial speech output based on real-time articulatory speech synthesis by rules. That is, it converts text strings into phonetic descriptions, aided by a pronouncing dictionary, letter-to-sound rules, and rhythm and intonation models; transforms the phonetic descriptions into parameters for a low-level articulatory speech synthesizer; uses these to drive an articulatory model of the human vocal tract producing an output suitable for the normal sound output devices used by various computer operating systems; and does this at the same or faster rate than the speech is spoken for adult speech.
UTAU is a Japanese singing synthesizer application created by Ameya/Ayame (飴屋/菖蒲). This program is similar to the VOCALOID software, with the difference being it is shareware instead of under a third party licensing.
Loquendo was an Italian multinational computer software technology corporation, headquartered in Torino, Italy, that provides speech recognition, speech synthesis, speaker verification and identification applications. Loquendo, which was founded in 2001 under the Telecom Italia Lab, also had offices in United Kingdom, Spain, Germany, France, and the United States.
NeoSpeech Inc. was an American company that specializes in text-to-speech (TTS) software for embedded devices, mobile, desktop, and network/server applications. NeoSpeech was founded by two speech engineers, Lin Chase and Yoon Kim, in Fremont, California, US, in 2002. NeoSpeech is privately held, headquartered in Santa Clara, California. NeoSpeech voices are now available from ReadSpeaker, www.readspeaker.com
Plogue Art et Technologie, Inc. is an incorporated company based in Montreal, Quebec, Canada that develops music software including Bidule, chipsounds, Alter/Ego and chipspeech.
Chipspeech is a singing vocal synthesizer software application and plugin created by Plogue that recreates the vocals of several 1980s speech synthesis chips from early home computers and video games.
Dennis H. Klatt was an American researcher in speech and hearing science. Klatt was the pioneer of computerized speech synthesis and created an interface which allowed for speech for non-expert users for the first time. Prior to his work, non-verbal individuals would need specialist support to be able to speak at all. Stephen Hawking used a version of this speech synthesizer, based on Klatt's own voice, and which Hawking chose to keep even after others became available.
the audience heard the DECtalk, voicing words that the educator typed into his computer.
{{cite web}}
: CS1 maint: bot: original URL status unknown (link)