This article needs additional citations for verification .(May 2015) |
Automated Lip Reading (ALR) is a software technology developed by speech recognition expert Frank Hubner. A video image of a person talking can be analysed by the software. The shapes made by the lips can be examined and then turned into sounds. The sounds are compared to a dictionary to create matches to the words being spoken.
The technology was used successfully to analyse silent home movie footage of Adolf Hitler taken by Eva Braun at their Bavarian retreat Berghof.
The video, with words, was included in a documentary titled "Hitler's Private World", Revealed Studios, 2006
Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.
Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech. The reverse process is speech recognition.
Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene photo or from subtitle text superimposed on an image.
Lip reading, also known as speechreading, is a technique of understanding speech by visually interpreting the movements of the lips, face and tongue when normal sound is not available. It relies also on information provided by the context, knowledge of the language, and any residual hearing. Although lip reading is used most extensively by deaf and hard-of-hearing people, most people with normal hearing process some speech information from sight of the moving mouth.
A screen reader is a form of assistive technology (AT) that renders text and image content as speech or braille output. Screen readers are essential to people who are blind, and are useful to people who are visually impaired, illiterate, or have a learning disability. Screen readers are software applications that attempt to convey what people with normal eyesight see on a display to their users via non-visual means, like text-to-speech, sound icons, or a braille device. They do this by applying a wide variety of techniques that include, for example, interacting with dedicated accessibility APIs, using various operating system features, and employing hooking techniques.
IBM ViaVoice was a range of language-specific continuous speech recognition software products offered by IBM. The current version is designed primarily for use in embedded devices. The latest stable version of IBM Via Voice was 9.0 and was able to transfer text directly into Word.
PlainTalk is the collective name for several speech synthesis (MacinTalk) and speech recognition technologies developed by Apple Inc. In 1990, Apple invested a lot of work and money in speech recognition technology, hiring many researchers in the field. The result was "PlainTalk", released with the AV models in the Macintosh Quadra series from 1993. It was made a standard system component in System 7.1.2, and has since been shipped on all PowerPC and some 68k Macintoshes.
Subvocalization, or silent speech, is the internal speech typically made when reading; it provides the sound of the word as it is read. This is a natural process when reading, and it helps the mind to access meanings to comprehend and remember what is read, potentially reducing cognitive load.
A video search engine is a web-based search engine which crawls the web for video content. Some video search engines parse externally hosted content while others allow content to be uploaded and hosted on their own servers. Some engines also allow users to search by video format type and by length of the clip. The video search results are usually accompanied by a thumbnail view of the video.
A viseme is any of several speech sounds that look the same, for example when lip reading.
Subvocal recognition (SVR) is the process of taking subvocalization and converting the detected results to a digital output, aural or text-based.
Speech segmentation is the process of identifying the boundaries between words, syllables, or phonemes in spoken natural languages. The term applies both to the mental processes used by humans, and to artificial processes of natural language processing.
Robotic voice effects became a recurring element in popular music starting in the second half of the twentieth century. Several methods of producing variations on this effect have arisen.
Project LISTEN was a 25-year research project at Carnegie Mellon University to improve children's reading skills. Project LISTEN. The project created a computer-based Reading Tutor that listens to a child reading aloud, corrects errors, helps when the child is stuck or encounters a hard word, provides hints, assesses progress, and presents more advanced text when the child is ready. The Reading Tutor has been used daily by hundreds of children in field tests at schools in the United States, Canada, Ghana, and India. Thousands of hours of usage logged at multiple levels of detail, including millions of words read aloud, have been stored in a database that has been mined to improve the Tutor's interactions with students. An extensive list of publications can be found at Carnegie Mellon University.
A speech-to-text reporter (STTR), also known as a captioner, is a person who listens to what is being said and inputs it, word for word, as properly written texts. Many captioners use tools which commonly converts verbally communicated information into written words to be composed as a text. The reproduced text can then be read by deaf or hard-of-hearing people, language learners, or people with auditory processing disabilities.
Fluency refers to continuity, smoothness, rate, and effort in speech production. It is also used to characterize language production, language ability or language proficiency.
Silent speech interface is a device that allows speech communication without using the sound made when people vocalize their speech sounds. As such it is a type of electronic lip reading. It works by the computer identifying the phonemes that an individual pronounces from nonauditory sources of information about their speech movements. These are then used to recreate the speech using speech synthesis.
The following outline is provided as an overview of and topical guide to natural-language processing:
Speech Recognition & Synthesis, formerly known as Speech Services, is a screen reader application developed by Google for its Android operating system. It powers applications to read aloud (speak) the text on the screen with support for many languages. Text-to-Speech may be used by apps such as Google Play Books for reading books aloud, by Google Translate for reading aloud translations providing useful insight to the pronunciation of words, by Google TalkBack and other spoken feedback accessibility-based applications, as well as by third-party apps. Users must install voice data for each language.
Voice writing is a method used for court reporting, medical transcription, CART, and closed captioning. Using the voice writing method, a court reporter speaks directly into a stenomask or speech silencer - a hand-held mask containing one or two microphones and voice-dampening materials. As the reporter repeats the testimony into the recorder, the mask prevents the reporter from being heard during testimony.