Subvocal recognition

Last updated
Electrodes used in subvocal speech recognition research at NASA's Ames Research Lab. Subvocal speech recognition.jpg
Electrodes used in subvocal speech recognition research at NASA's Ames Research Lab.

Subvocal recognition (SVR) is the process of taking subvocalization and converting the detected results to a digital output, aural or text-based. [1] A silent speech interface is a device that allows speech communication without using the sound made when people vocalize their speech sounds. It works by the computer identifying the phonemes that an individual pronounces from nonauditory sources of information about their speech movements. These are then used to recreate the speech using speech synthesis. [2]

Contents

Input methods

Silent speech interface systems have been created using ultrasound and optical camera input of tongue and lip movements. [3] Electromagnetic devices are another technique for tracking tongue and lip movements. [4]

The detection of speech movements by electromyography of speech articulator muscles and the larynx is another technique. [5] [6] Another source of information is the vocal tract resonance signals that get transmitted through bone conduction called non-audible murmurs. [7]

They have also been created as a brain–computer interface using brain activity in the motor cortex obtained from intracortical microelectrodes. [8]

Uses

Such devices are created as aids to those unable to create the sound phonation needed for audible speech such as after laryngectomies. [9] Another use is for communication when speech is masked by background noise or distorted by self-contained breathing apparatus. A further practical use is where a need exists for silent communication, such as when privacy is required in a public place, or hands-free data silent transmission is needed during a military or security operation. [3] [10]

In 2002, the Japanese company NTT DoCoMo announced it had created a silent mobile phone using electromyography and imaging of lip movement. The company stated that "the spur to developing such a phone was ridding public places of noise," adding that, "the technology is also expected to help people who have permanently lost their voice." [11] The feasibility of using silent speech interfaces for practical communication has since then been shown. [12]

In 2019, Arnav Kapur, a researcher from the Massachusetts Institute of Technology, conducted a study known as AlterEgo. Its implementation of the silent speech interface enables direct communication between the human brain and external devices through stimulation of the speech muscles. By leveraging neural signals associated with speech and language, the AlterEgo system deciphers the user's intended words and translates them into text or commands without the need for audible speech. [13]

Research and patents

With a grant from the U.S. Army, research into synthetic telepathy using subvocalization is taking place at the University of California, Irvine under lead scientist Mike D'Zmura. [14]

NASA's Ames Research Laboratory in Mountain View, California, under the supervision of Charles Jorgensen is conducting subvocalization research.[ citation needed ]

The Brain Computer Interface R&D program at Wadsworth Center under the New York State Department of Health has confirmed the existing ability to decipher consonants and vowels from imagined speech, which allows for brain-based communication using imagined speech., [15] however using EEGs instead of subvocalization techniques.

US Patents on silent communication technologies include: US Patent 6587729 "Apparatus for audibly communicating speech using the radio frequency hearing effect", [16] US Patent 5159703 "Silent subliminal presentation system", [17] US Patent 6011991 "Communication system and method including brain wave analysis and/or use of brain activity", [18] US Patent 3951134 "Apparatus and method for remotely monitoring and altering brain waves". [19] Latter two rely on brain wave analysis.

In fiction

See also

Related Research Articles

A universal translator is a device common to many science fiction works, especially on television. First described in Murray Leinster's 1945 novella "First Contact", the translator's purpose is to offer an instant translation of any language.

Cyberware is a relatively new and unknown field. In science fiction circles, however, it is commonly known to mean the hardware or machine parts implanted in the human body and acting as an interface between the central nervous system and the computers or machinery connected to it.

A brain–computer interface (BCI), sometimes called a brain–machine interface (BMI), is a direct communication link between the brain's electrical activity and an external device, most commonly a computer or robotic limb. BCIs are often directed at researching, mapping, assisting, augmenting, or repairing human cognitive or sensory-motor functions. They are often conceptualized as a human–machine interface that skips the intermediary of moving body parts (hands...), although they also raise the possibility of erasing the distinction between brain and machine. BCI implementations range from non-invasive and partially invasive to invasive, based on how physically close electrodes are to brain tissue.

<span class="mw-page-title-main">Electromyography</span> Electrodiagnostic medicine technique

Electromyography (EMG) is a technique for evaluating and recording the electrical activity produced by skeletal muscles. EMG is performed using an instrument called an electromyograph to produce a record called an electromyogram. An electromyograph detects the electric potential generated by muscle cells when these cells are electrically or neurologically activated. The signals can be analyzed to detect abnormalities, activation level, or recruitment order, or to analyze the biomechanics of human or animal movement. Needle EMG is an electrodiagnostic medicine technique commonly used by neurologists. Surface EMG is a non-medical procedure used to assess muscle activation by several professionals, including physiotherapists, kinesiologists and biomedical engineers. In computer science, EMG is also used as middleware in gesture recognition towards allowing the input of physical action to a computer as a form of human-computer interaction.

<span class="mw-page-title-main">Subvocalization</span> Internal process while reading

Subvocalization, or silent speech, is the internal speech typically made when reading; it provides the sound of the word as it is read. This is a natural process when reading, and it helps the mind to access meanings to comprehend and remember what is read, potentially reducing cognitive load.

<span class="mw-page-title-main">Gesture recognition</span> Topic in computer science and language technology

Gesture recognition is an area of research and development in computer science and language technology concerned with the recognition and interpretation of human gestures. A subdiscipline of computer vision, it employs mathematical algorithms to interpret gestures.

<span class="mw-page-title-main">Brain implant</span> Device that connects to a brain

Brain implants, often referred to as neural implants, are technological devices that connect directly to a biological subject's brain – usually placed on the surface of the brain, or attached to the brain's cortex. A common purpose of modern brain implants and the focus of much current research is establishing a biomedical prosthesis circumventing areas in the brain that have become dysfunctional after a stroke or other head injuries. This includes sensory substitution, e.g., in vision. Other brain implants are used in animal experiments simply to record brain activity for scientific reasons. Some brain implants involve creating interfaces between neural systems and computer chips. This work is part of a wider research field called brain–computer interfaces.

Neuroprosthetics is a discipline related to neuroscience and biomedical engineering concerned with developing neural prostheses. They are sometimes contrasted with a brain–computer interface, which connects the brain to a computer rather than a device meant to replace missing biological functionality.

A Bioamplifier is an electrophysiological device, a variation of the instrumentation amplifier, used to gather and increase the signal integrity of physiologic electrical activity for output to various sources. It may be an independent unit, or integrated into the electrodes.

<span class="mw-page-title-main">Mark Gasson</span> British research scientist

Mark N. Gasson is a British scientist and visiting research fellow at the Cybernetics Research Group, University of Reading, UK. He pioneered developments in direct neural interfaces between computer systems and the human nervous system, has developed brain–computer interfaces and is active in the research fields of human microchip implants, medical devices and digital identity. He is known for his experiments transmitting a computer virus into a human implant, and is credited with being the first human infected with a computer virus.

<span class="mw-page-title-main">Human–computer interaction</span> Academic discipline studying the relationship between computer systems and their users

Human–computer interaction (HCI) is research in the design and the use of computer technology, which focuses on the interfaces between people (users) and computers. HCI researchers observe the ways humans interact with computers and design technologies that allow humans to interact with computers in novel ways. A device that allows interaction between human being and a computer is known as a "Human-computer Interface (HCI)".

<span class="mw-page-title-main">Neurotrophic electrode</span> Intracortical device designed to read the electrical signals of the brain

The neurotrophic electrode is an intracortical device designed to read the electrical signals that the brain uses to process information. It consists of a small, hollow glass cone attached to several electrically conductive gold wires. The term neurotrophic means "relating to the nutrition and maintenance of nerve tissue" and the device gets its name from the fact that it is coated with Matrigel and nerve growth factor to encourage the expansion of neurites through its tip. It was invented by neurologist Dr. Philip Kennedy and was successfully implanted for the first time in a human patient in 1996 by neurosurgeon Roy Bakay.

Imagined speech is thinking in the form of sound – "hearing" one's own voice silently to oneself, without the intentional movement of any extremities such as the lips, tongue, or hands. Logically, imagined speech has been possible since the emergence of language, however, the phenomenon is most associated with its investigation through signal processing and detection within electroencephalograph (EEG) data as well as data obtained using alternative non-invasive, brain–computer interface (BCI) devices.

Frank H. Guenther is an American computational and cognitive neuroscientist whose research focuses on the neural computations underlying speech, including characterization of the neural bases of communication disorders and development of brain–computer interfaces for communication restoration. He is currently a professor of speech, language, and hearing sciences and biomedical engineering at Boston University.

Brain technology, or self-learning know-how systems, defines a technology that employs latest findings in neuroscience. [see also neuro implants] The term was first introduced by the Artificial Intelligence Laboratory in Zurich, Switzerland, in the context of the Roboy project. Brain Technology can be employed in robots, know-how management systems and any other application with self-learning capabilities. In particular, Brain Technology applications allow the visualization of the underlying learning architecture often coined as "know-how maps".

<span class="mw-page-title-main">Ingeborg Hochmair</span> Austrian electrical engineer

Ingeborg J. Hochmair-Desoyer is an Austrian electrical engineer and the CEO and CTO of hearing implant company MED-EL. Dr Hochmair and her husband Prof. Erwin Hochmair co-created the first micro-electronic multi-channel cochlear implant in the world. She received the Lasker-DeBakey Clinical Medical Research Award for her contributions towards the development of the modern cochlear implant. She also received the 2015 Russ Prize for bioengineering.

<span class="mw-page-title-main">Stent-electrode recording array</span> Stent-mounted electrode array that is permanently implanted into a blood vessel in the brain

Stentrode is a small stent-mounted electrode array permanently implanted into a blood vessel in the brain, without the need for open brain surgery. It is in clinical trials as a brain–computer interface (BCI) for people with paralyzed or missing limbs, who will use their neural signals or thoughts to control external devices, which currently include computer operating systems. The device may ultimately be used to control powered exoskeletons, robotic prosthesis, computers or other devices.

A cortical implant is a subset of neuroprosthetics that is in direct connection with the cerebral cortex of the brain. By directly interfacing with different regions of the cortex, the cortical implant can provide stimulation to an immediate area and provide different benefits, depending on its design and placement. A typical cortical implant is an implantable microelectrode array, which is a small device through which a neural signal can be received or transmitted.

Neural dust is a hypothetical class of nanometer-sized devices operated as wirelessly powered nerve sensors; it is a type of brain–computer interface. The sensors may be used to study, monitor, or control the nerves and muscles and to remotely monitor neural activity. In practice, a medical treatment could introduce thousands of neural dust devices into human brains. The term is derived from "smart dust", as the sensors used as neural dust may also be defined by this concept.

Gerald E. Loeb is an American neurophysiologist, biomedical engineer, academic and author. He is a Professor of Biomedical Engineering, Pharmacy and Neurology at the University of Southern California, the President of Biomed Concepts, and the co-founder of SynTouch.

References

  1. Shirley, John (2013-05-01). New Taboos. PM Press. ISBN   9781604868715 . Retrieved 14 April 2017.
  2. Denby B, Schultz T, Honda K, Hueber T, Gilbert J.M., Brumberg J.S. (2010). Silent speech interfaces. Speech Communication 52: 270–287. doi : 10.1016/j.specom.2009.08.002
  3. 1 2 Hueber T, Benaroya E-L, Chollet G, Denby B, Dreyfus G, Stone M. (2010). Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips. Speech Communication, 52 288–300. doi : 10.1016/j.specom.2009.11.004
  4. Wang, J., Samal, A., & Green, J. R. (2014). Preliminary test of a real-time, interactive silent speech interface based on electromagnetic articulograph, the 5th ACL/ISCA Workshop on Speech and Language Processing for Assistive Technologies, Baltimore, MD, 38-45.
  5. Jorgensen C, Dusan S. (2010). Speech interfaces based upon surface electromyography. Speech Communication, 52: 354–366. doi : 10.1016/j.specom.2009.11.003
  6. Schultz T, Wand M. (2010). Modeling Coarticulation in EMG-based Continuous Speech Recognition. Speech Communication, 52: 341-353. doi : 10.1016/j.specom.2009.12.002
  7. Hirahara T, Otani M, Shimizu S, Toda T, Nakamura K, Nakajima Y, Shikano K. (2010). Silent-speech enhancement using body-conducted vocal-tract resonance signals. Speech Communication, 52:301–313. doi : 10.1016/j.specom.2009.12.001
  8. Brumberg J.S., Nieto-Castanon A, Kennedy P.R., Guenther F.H. (2010). Brain–computer interfaces for speech communication. Speech Communication 52:367–379. 2010 doi : 10.1016/j.specom.2010.01.001
  9. Deng Y., Patel R., Heaton J. T., Colby G., Gilmore L. D., Cabrera J., Roy S. H., De Luca C.J., Meltzner G. S.(2009). Disordered speech recognition using acoustic and sEMG signals. In INTERSPEECH-2009, 644-647.
  10. Deng Y., Colby G., Heaton J. T., and Meltzner HG. S. (2012). Signal Processing Advances for the MUTE sEMG-Based Silent Speech Recognition System. Military Communication Conference, MILCOM 2012.
  11. Fitzpatrick M. (2002). Lip-reading cellphone silences loudmouths. New Scientist.
  12. Wand M, Schultz T. (2011). Session-independent EMG-based Speech Recognition. Proceedings of the 4th International Conference on Bio-inspired Systems and Signal Processing.
  13. "Project Overview ‹ AlterEgo". MIT Media Lab. Retrieved 2024-05-20.
  14. "Army developing 'synthetic telepathy'". NBC News . 13 October 2008.
  15. Pei, Xiaomei; Barbour, Dennis L; Leuthardt, Eric C; Schalk, Gerwin (2011). "Decoding vowels and consonants in spoken and imagined words using electrocorticographic signals in humans". Journal of Neural Engineering. 8 (4): 046028. Bibcode:2011JNEng...8d6028P. doi:10.1088/1741-2560/8/4/046028. PMC   3772685 . PMID   21750369.
  16. Apparatus for audibly communicating speech using the radio frequency hearing effect
  17. Silent subliminal presentation system
  18. Communication system and method including brain wave analysis and/or use of brain activity
  19. Apparatus and method for remotely monitoring and altering brain waves
  20. Clarke, Arthur C. (1972). The Lost Worlds of 2001. London: Sidgwick and Jackson. ISBN   0-283-97903-8.

Further reading