Speech technology

Last updated

Speech technology relates to the technologies designed to duplicate and respond to the human voice. They have many uses. These include aid to the voice-disabled, the hearing-disabled, and the blind, along with communication with computers without a keyboard. They enhance game software and aid in marketing goods or services by telephone.

The subject includes several subfields:

See also

Related Research Articles

<span class="mw-page-title-main">Assistive technology</span> Assistive devices for people with disabilities

Assistive technology (AT) is a term for assistive, adaptive, and rehabilitative devices for people with disabilities and the elderly. Disabled people often have difficulty performing activities of daily living (ADLs) independently, or even with assistance. ADLs are self-care activities that include toileting, mobility (ambulation), eating, bathing, dressing, grooming, and personal device care. Assistive technology can ameliorate the effects of disabilities that limit the ability to perform ADLs. Assistive technology promotes greater independence by enabling people to perform tasks they were formerly unable to accomplish, or had great difficulty accomplishing, by providing enhancements to, or changing methods of interacting with, the technology needed to accomplish such tasks. For example, wheelchairs provide independent mobility for those who cannot walk, while assistive eating devices can enable people who cannot feed themselves to do so. Due to assistive technology, disabled people have an opportunity of a more positive and easygoing lifestyle, with an increase in "social participation," "security and control," and a greater chance to "reduce institutional costs without significantly increasing household expenses." In schools, assistive technology can be critical in allowing students with disabilities access the general education curriculum. Students who experience challenges writing or keyboarding, for example, can use voice recognition software instead. Assistive technologies assist people who are recovering from strokes and people who have abstained injuries that effect their daily tasks.

Speech processing is the study of speech signals and the processing methods of signals. The signals are usually processed in a digital representation, so speech processing can be regarded as a special case of digital signal processing, applied to speech signals. Aspects of speech processing includes the acquisition, manipulation, storage, transfer and output of speech signals. The input is called speech recognition and the output is called speech synthesis.

Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the main benefit of searchability. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.

Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech. The reverse process is speech recognition.

<span class="mw-page-title-main">Accessibility</span> Design of products, services, and environments for usability by disabled people

Accessibility is the design of products, devices, services, vehicles, or environments so as to be usable by people with disabilities. The concept of accessible design and practice of accessible development ensures both "direct access" and "indirect access" meaning compatibility with a person's assistive technology.

<span class="mw-page-title-main">Telecommunications relay service</span>

A telecommunications relay service, also known as TRS, relay service, or IP-relay, or Web-based relay service, is an operator service that allows people who are deaf, hard of hearing, deafblind, or have a speech disorder to place calls to standard telephone users via a keyboard or assistive device. Originally, relay services were designed to be connected through a TDD, teletypewriter (TTY) or other assistive telephone device. Services gradually have expanded to include almost any real-time text capable technology such as a personal computer, laptop, mobile phone, PDA, and many other devices. The first TTY was invented by deaf scientist Robert Weitbrecht in 1964. The first relay service was established in 1974 by Converse Communications of Connecticut.

Speaker recognition is the identification of a person from characteristics of voices. It is used to answer the question "Who is speaking?" The term voice recognition can refer to speaker recognition or speech recognition. Speaker verification contrasts with identification, and speaker recognition differs from speaker diarisation.

In linguistics, prosody is concerned with elements of speech that are not individual phonetic segments but are properties of syllables and larger units of speech, including linguistic functions such as intonation, stress, and rhythm. Such elements are known as suprasegmentals.

<span class="mw-page-title-main">Augmentative and alternative communication</span> Techniques used for those with communication impairments

Augmentative and alternative communication (AAC) encompasses the communication methods used to supplement or replace speech or writing for those with impairments in the production or comprehension of spoken or written language. AAC is used by those with a wide range of speech and language impairments, including congenital impairments such as cerebral palsy, intellectual impairment and autism, and acquired conditions such as amyotrophic lateral sclerosis and Parkinson's disease. AAC can be a permanent addition to a person's communication or a temporary aid. Stephen Hawking used AAC to communicate through a speech-generating device.

A voice-user interface (VUI) makes spoken human interaction with computers possible, using speech recognition to understand spoken commands and answer questions, and typically text to speech to play a reply. A voice command device (VCD) is a device controlled with a voice user interface.

<span class="mw-page-title-main">Subvocal recognition</span>

Subvocal recognition (SVR) is the process of taking subvocalization and converting the detected results to a digital output, aural or text-based.

Voice portals are the voice equivalent of web portals, giving access to information through spoken commands and voice responses. Ideally a voice portal could be an access point for any type of information, services, or transactions found on the Internet. Common uses include movie time listings and stock trading. In telecommunications circles, voice portals may be referred to as interactive voice response (IVR) systems, but this term also includes DTMF services. With the emergence of conversational assistants such as Apple's Siri, Amazon Alexa, Google Assistant, Microsoft Cortana, and Samsung's Bixby, Voice Portals can now be accessed through mobile devices and Far Field voice smart speakers such as the Amazon Echo and Google Home.

<span class="mw-page-title-main">LG VX8300</span>

The LG LG VX8300 was one of Verizon's most popular mobile phones. This phone contains the following features:

A speech-to-text reporter (STTR), also known as a captioner, is a person who listens to what is being said and inputs it, word for word, using an electronic shorthand keyboard, speech recognition software, or a CAT software system. Their keyboard or speech recognition software is linked to a computer, which converts this information to properly spelled words. The reproduced text can then be read by deaf or hard-of-hearing people, English language learners, or persons with auditory processing disabilities.

<span class="mw-page-title-main">Speech-generating device</span> Augmenting speech device

Speech-generating devices (SGDs), also known as voice output communication aids, are electronic augmentative and alternative communication (AAC) systems used to supplement or replace speech or writing for individuals with severe speech impairments, enabling them to verbally communicate. SGDs are important for people who have limited means of interacting verbally, as they allow individuals to become active participants in communication interactions. They are particularly helpful for patients with amyotrophic lateral sclerosis (ALS) but recently have been used for children with predicted speech deficiencies.

<span class="mw-page-title-main">Virtual assistant</span> Mobile software agent

An intelligent virtual assistant (IVA) or intelligent personal assistant (IPA) is a software agent that can perform tasks or services for an individual based on commands or questions. The term "chatbot" is sometimes used to refer to virtual assistants generally or specifically accessed by online chat. In some cases, online chat programs are exclusively for entertainment purposes. Some virtual assistants are able to interpret human speech and respond via synthesized voices. Users can ask their assistants questions, control home automation devices and media playback via voice, and manage other basic tasks such as email, to-do lists, and calendars with verbal commands. A similar concept, however with differences, lays under the dialogue systems.

Mobile translation is any electronic device or software application that provides audio translation. The concept includes any handheld electronic device that is specifically designed for audio translation. It also includes any machine translation service or software application for hand-held devices, including mobile telephones, Pocket PCs, and PDAs. Mobile translation provides hand-held device users with the advantage of instantaneous and non-mediated translation from one human language to another, usually against a service fee that is, nevertheless, significantly smaller than a human translator charges.

<span class="mw-page-title-main">Sensory, Inc.</span>

Sensory, Inc. is an American company which develops software AI technologies for speech, sound and vision. It is based in Santa Clara, California.

NeoSpeech is a company that specializes in text-to-speech (TTS) software for embedded devices, mobile, desktop, and network/server applications. NeoSpeech was founded by two speech engineers in Fremont, California, US, in 2002. NeoSpeech is privately held, headquartered in Santa Clara, California.

Alberto Ciaramella is an Italian computer engineer and scientist. He is notable for extensive pioneering contributions in the field of speech technologies and applied natural language processing, most of them at CSELT and Loquendo, with the amount of 40 papers and four patents.