DECtalk

Last updated

DECtalk
DECtalk DCT01 and Tink.jpg
DECtalk DTC01 (with a cat for scale)
Developer Digital Equipment Corporation
Type speech synthesizer, text-to-speech
Release date1984 (1984) [1] [2]
Introductory priceDTC01 US$4,000(equivalent to $11,731 in 2023) [3]
ConnectivityRS-232C serial interface [1]
Platform OpenVMS, ULTRIX, Digital UNIX, Windows NT
Dimensions(DTC01 = W 45.7 cm x D 30.48 cm x H 10.16 cm ( 18in x 12in x 4in) )
Mass(DTC01 = 7.2 kg (16 lbs))
DECtalk demo recording using the Perfect Paul and Uppity Ursula voices

DECtalk [4] was a speech synthesizer and text-to-speech technology developed by Digital Equipment Corporation in 1983, [1] based largely on the work of Dennis Klatt at MIT, whose source-filter algorithm was variously known as KlattTalk or MITalk. [5] [6]

Contents

Uses ranged from interacting with the public to allowing those with speech disabilities to verbalize, include giving a public speech. [7] [8]

History

Announced December 1983, a trickle came February 1984; larger DECtalk quantities were delivered in March. [9]

They were standalone units that connected to any device with an asynchronous serial port. These units were also able to connect to the telephone system by having two telephone jacks. One connected to a phone line, the other to a telephone. The DECtalk units could recognize and generate any telephone touch tone. With that capability the units could be used to automate various telephone-related tasks by handling both incoming and outgoing calls. This included acting as an interface to an email system and the capability to function as an alerting system by utilizing the ability to place calls and interact via touch tones with the person answering the phone.

Later units were produced for PCs with ISA bus slots. In addition, various software implementations were produced, most notably the DECtalk Access32. Such implementations began as explorations into real-time software synthesis on general purpose CPUs, [10] :2 subsequently delivering a DECtalk Software product for Digital Unix and for Windows NT on Alpha and Intel processors. [11] Certain versions of the synthesizer were prone to undesirable characteristics. For example, the alveolar stops were often assimilated as sounding more like dental stops. Also, versions such as Access32 would produce faint electronic beeps at the end of phrases.

In the final years, early/mid-2000, [12] the DECtalk IP was sold to Force Computers, Inc. In December 2001, the IP was sold [13] from Force Computers, Inc, to Fonix Speech, Inc. (now SpeechFX, Inc.), which offers DECtalk as a small-footprint TTS system and in a computer program form. [14]

Features

The New York Times wrote: "like a scratchy recording of a person with a lisp" but added "usually understandable." [4]

DECtalk had a number of built-in voices which were identified by the following names: Perfect Paul (the default voice), Beautiful Betty, Huge Harry, Frail Frank, Kit the Kid, Rough Rita, Uppity Ursula, Doctor Dennis and Whispering Wendy. Each of the voices were editable by adjusting various parameters (such as throat size, crossover frequencies, etc.).

Daisy Bell sung by DECtalk

DECtalk understood phonetic spellings of words, allowing customized pronunciation of unusual words. These phonetic spellings could also include a pitch and duration notation which DECtalk would use when enunciating the phonetic components. This allowed DECtalk to sing.

Uses

Related Research Articles

Telephony is the field of technology involving the development, application, and deployment of telecommunications services for the purpose of electronic transmission of voice, fax, or data, between distant parties. The history of telephony is intimately linked to the invention and development of the telephone.

Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech. The reverse process is speech recognition.

<span class="mw-page-title-main">NOAA Weather Radio</span> Weather radio network in the United States

NOAA Weather Radio (NWR), also known as NOAA Weather Radio All Hazards, is an automated 24-hour network of VHF FM weather radio stations in the United States that broadcast weather information directly from a nearby National Weather Service office. The routine programming cycle includes local or regional weather forecasts, synopsis, climate summaries or zone/lake/coastal waters forecasts. During severe conditions the cycle is shortened into: hazardous weather outlooks, short-term forecasts, special weather statements or tropical weather summaries. It occasionally broadcasts other non-weather related events such as national security statements, natural disaster information, environmental and public safety statements, civil emergencies, fires, evacuation orders, and other hazards sourced from the Federal Communications Commission's (FCC) Emergency Alert System. NOAA Weather Radio uses automated broadcast technology that allows for the recycling of segments featured in one broadcast cycle into another and more regular updating of segments to each of the transmitters. It also speeds up the warning transmitting process.

<span class="mw-page-title-main">Screen reader</span> Assistive technology that converts text or images to speech or Braille

A screen reader is a form of assistive technology (AT) that renders text and image content as speech or braille output. Screen readers are essential to people who are blind, and are useful to people who are visually impaired, illiterate, or have a learning disability. Screen readers are software applications that attempt to convey what people with normal eyesight see on a display to their users via non-visual means, like text-to-speech, sound icons, or a braille device. They do this by applying a wide variety of techniques that include, for example, interacting with dedicated accessibility APIs, using various operating system features, and employing hooking techniques.

<span class="mw-page-title-main">Telecommunications relay service</span>

A telecommunications relay service, also known as TRS, relay service, or IP-relay, or Web-based relay service, is an operator service that allows people who are deaf, hard of hearing, deafblind, or have a speech disorder to place calls to standard telephone users via a keyboard or assistive device. Originally, relay services were designed to be connected through a TDD, teletypewriter (TTY) or other assistive telephone device. Services gradually have expanded to include almost any real-time text capable technology such as a personal computer, laptop, mobile phone, PDA, and many other devices. The first TTY was invented by deaf scientist Robert Weitbrecht in 1964. The first relay service was established in 1974 by Converse Communications of Connecticut.

SpeechWorks was a company founded in Boston in 1994 by speech recognition pioneer Mike Phillips and Bill O'Farrell. The Boston-based company developed and supported speech-related computer software. Originally known as Applied Language Technologies, SpeechWorks went public in 2000 and tripled its value. It was acquired by Scansoft in 2003. ScanSoft acquired Nuance in 2005, and changed its name to Nuance Communications.

<span class="mw-page-title-main">SpeechFX</span>

SpeechFX, Inc. offers voice technology for mobile phone and wireless devices, interactive video games, toys, home appliances, computer telephony systems and vehicle telematics. SpeechFX speech solutions are based on the firm’s proprietary neural network-based automatic speech recognition (ASR) and Fonix DECtalk, a text-to-speech speech synthesis system (TTS). Fonix speech technology is user-independent, meaning no voice training is involved.

A voice-user interface (VUI) enables spoken human interaction with computers, using speech recognition to understand spoken commands and answer questions, and typically text to speech to play a reply. A voice command device is a device controlled with a voice user interface.

<span class="mw-page-title-main">Votrax</span> Defunct speech synthesis company

Votrax International, Inc., or just Votrax, was a speech synthesis company located in the Detroit, Michigan area from 1971 to 1996. It began as a division of Federal Screw Works from 1971 to 1973. In 1974, it was given the Votrax name and moved to Troy, Michigan and, in 1980, split off of its parent company entirely and became Votrax International, Inc., which produced speech products up until 1984.

As of the early 2000s, several speech recognition (SR) software packages exist for Linux. Some of them are free and open-source software and others are proprietary software. Speech recognition usually refers to software that attempts to distinguish thousands of words in a human language. Voice control may refer to software used for communicating operational commands to a computer.

<span class="mw-page-title-main">Speech-generating device</span> Augmenting speech device

Speech-generating devices (SGDs), also known as voice output communication aids, are electronic augmentative and alternative communication (AAC) systems used to supplement or replace speech or writing for individuals with severe speech impairments, enabling them to verbally communicate. SGDs are important for people who have limited means of interacting verbally, as they allow individuals to become active participants in communication interactions. They are particularly helpful for patients with amyotrophic lateral sclerosis (ALS) but recently have been used for children with predicted speech deficiencies.

eSpeak Compact, open-source, software speech synthesizer

eSpeak is a free and open-source, cross-platform, compact, software speech synthesizer. It uses a formant synthesis method, providing many languages in a relatively small file size. eSpeakNG is a continuation of the original developer's project with more feedback from native speakers.

higan (emulator) Multi-system emulator

Higan is a free and open source emulator for multiple video game consoles, including the Super Nintendo Entertainment System. It was developed by Near. Originally called bsnes, the emulator is notable for attempting to emulate the original hardware as accurately as possible through low-level, cycle-accurate emulation and for the associated historical preservation efforts of the Super NES platform.

Gnuspeech is an extensible text-to-speech computer software package that produces artificial speech output based on real-time articulatory speech synthesis by rules. That is, it converts text strings into phonetic descriptions, aided by a pronouncing dictionary, letter-to-sound rules, and rhythm and intonation models; transforms the phonetic descriptions into parameters for a low-level articulatory speech synthesizer; uses these to drive an articulatory model of the human vocal tract producing an output suitable for the normal sound output devices used by various computer operating systems; and does this at the same or faster rate than the speech is spoken for adult speech.

<span class="mw-page-title-main">Utau</span> Japanese shareware voice synthesizer

UTAU is a Japanese singing synthesizer application created by Ameya/Ayame (飴屋/菖蒲). This program is similar to the VOCALOID software, with the difference being it is shareware instead of under a third party licensing.

Loquendo was an Italian multinational computer software technology corporation, headquartered in Torino, Italy, that provides speech recognition, speech synthesis, speaker verification and identification applications. Loquendo, which was founded in 2001 under the Telecom Italia Lab, also had offices in United Kingdom, Spain, Germany, France, and the United States.

NeoSpeech Inc. was an American company that specializes in text-to-speech (TTS) software for embedded devices, mobile, desktop, and network/server applications. NeoSpeech was founded by two speech engineers, Lin Chase and Yoon Kim, in Fremont, California, US, in 2002. NeoSpeech is privately held, headquartered in Santa Clara, California. NeoSpeech voices are now available from ReadSpeaker, www.readspeaker.com

Plogue Art et Technologie, Inc. is an incorporated company based in Montreal, Quebec, Canada that develops music software including Bidule, chipsounds, Alter/Ego and chipspeech.

<span class="mw-page-title-main">Chipspeech</span> Vocal synthesizer software

Chipspeech is a singing vocal synthesizer software application and plugin created by Plogue that recreates the vocals of several 1980s speech synthesis chips from early home computers and video games.

<span class="mw-page-title-main">Dennis H. Klatt</span> American researcher in speech and hearing science

Dennis H. Klatt was an American researcher in speech and hearing science. Klatt was the pioneer of computerized speech synthesis and created an interface which allowed for speech for non-expert users for the first time. Prior to his work, non-verbal individuals would need specialist support to be able to speak at all. Stephen Hawking used a version of this speech synthesizer, based on Klatt's own voice, and which Hawking chose to keep even after others became available.

References

  1. 1 2 3 DECtalk lets micros read messages over phone, By Peggy Zientara, InfoWorld, Jan 9-16, 1984, Page 21 and 23, ...DECtalk, which the company says will be available in March at a price of $4000...The DECtalk hardware, which fits easily under a 12-inch monitor...The unit attaches to a personal computer via an RS-232C serial interface...
  2. Advertisement:We just turned every touch-tone phone into a financial clearinghouse., Computerworld, 18 Feb 1985, Page 39
  3. Advertisement: INTRODUCING DECTALK THE REVOLUTIONARY NEW TERMINAL THAT LETS YOUR COMPUTER SPEAK FOR ITSELF., Computerworld, 23 Jul 1984, Page 70-71, ...The DECtalk system is available now for $4000* or less depending on quantity...
  4. 1 2 Andrew Pollack (January 5, 1984). "Technology; Audiotex: Data By Telephone". The New York Times .
  5. Klatt, Dennis (April 1987), "How Klattalk became DECtalk: An Academic's Experiences in the Business World", The official proceedings of Speech Tech '87, New York: Media Dimensions Inc./Penn State, pp. 293–294
  6. Computer talk: amazing new realism in synthetic speech, By T. A. Heppenhemimer, Popular Science, Jan 1986, Page 42 and 44 and 48, ..with the creator of DECtalk, Dennis Klatt...Ironically having given computers the power of speech, he slowly losing his own...DECtalk - actually a micro computer itself -...
  7. Glenn Rifkin (December 16, 1990). "Technology; A Wider Work Force by Computer". The New York Times . the audience heard the DECtalk, voicing words that the educator typed into his computer.
  8. Kari Haskell (December 12, 2002). "The Neediest Cases; Battling Federal Bureaucracy To Have His Benefits Restored". The New York Times .
  9. Brian J. Edwards, ed. (February 1984). "DECtalk Speaks into the Future". HARDCOPY . pp. 84–85.
  10. Stewart, Lawrence C.; Payne, Andrew C.; Levergood, Thomas M. (November 16, 1992). Are DSP Chips Obsolete? (Technical report). Digital Equipment Corporation. Retrieved March 23, 2024.
  11. Hallahan, William I. (1995). "DECtalk Software: Text-to-Speech Technology and Implementation". Digital Technical Journal. 7 (4). Digital Equipment Corporation: 5–19. Retrieved March 23, 2024.
  12. "AllLinuxDevices: Force to Support Linux On DECtalk TTS For Strongarm and Intel Devices". Linux Today . October 26, 2000. Archived from the original on May 18, 2014. Retrieved September 16, 2012.
  13. "Fonix acquires DECtalk from Force Computers". Deseret News. September 16, 2012. Archived from the original on May 18, 2014. Retrieved September 16, 2012.
  14. "SpeechFX Text to Speech Solutions, FonixTalk, DecTalk". SpeechFX, Inc. April 19, 2021. Archived from the original on April 19, 2021. Retrieved October 8, 2023.{{cite web}}: CS1 maint: bot: original URL status unknown (link)
  15. "Voices Used in NOAA Weather Radio". Archived from the original on February 6, 2008.
  16. US Department of Commerce, NOAA. "New NOAA Weather Radio Management Platform is Coming Soon". www.weather.gov. Retrieved May 5, 2019.
  17. Hawking, Stephen. "Stephen Hawking and ALS". Archived from the original on April 26, 2021. Retrieved August 10, 2009. (Originally published as: Hawking, Stephen. "Disability - my experience with ALS". hawking.org.uk. Archived from the original on June 14, 2000.)
  18. Greenemeier, Larry (August 10, 2009). "Getting Back the Gift of Gab: NexGen Handheld Computers Allow the Mute to Converse". Scientific American. Retrieved August 10, 2009.
  19. Lange, Catherine de (December 30, 2011). "The man who saves Stephen Hawking's voice". New Scientist. Retrieved June 9, 2013.
  20. Hawking, Stephen. "The Computer". hawking.org.uk. Retrieved March 1, 2015.
  21. Fagone, Jason (March 18, 2018). "The quest to save Stephen Hawking's voice". sfchronicle.com. Retrieved March 20, 2018.
  22. "Plogue - chipspeech :: Vintage speech synthesizer". PLOGUE - Music Software - Developers. Retrieved April 27, 2016.
  23. Karl Bartos, "Der Klang der Maschine". pp. 427–428. Retrieved August 25, 2017.
  24. "El Rincon del UtaUtaUtau". utautautau.neocities.org. Retrieved October 16, 2022.