IraqComm

Last updated
IraqComm Pack (photo by SRI International) IraqCommPack.gif
IraqComm Pack (photo by SRI International)

IraqComm is a speech translation system that performs two-way, speech-to-speech machine translation between English and colloquial Iraqi Arabic. SRI International in Menlo Park, California led development of the IraqComm system under the DARPA program Spoken Language Communication and Translation System for Tactical Use (TRANSTAC).

Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. The English language draws a terminological distinction between translating and interpreting ; under this distinction, translation can begin only after the appearance of writing within a language community.

Machine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that investigates the use of software to translate text or speech from one language to another.

English language West Germanic language

English is a West Germanic language that was first spoken in early medieval England and eventually became a global lingua franca. It is named after the Angles, one of the Germanic tribes that migrated to the area of Great Britain that later took their name, as England. Both names derive from Anglia, a peninsula in the Baltic Sea. The language is closely related to Frisian and Low Saxon, and its vocabulary has been significantly influenced by other Germanic languages, particularly Norse, and to a greater extent by Latin and French.

Contents

Vocabulary

IraqComm has a vocabulary of tens of thousands of words in English and in Iraqi Arabic. [1] The system is designed to enable soldiers or medics to converse with civilians in settings such as military checkpoints, door-to-door searches, or first aid situations. [2] The system is tailored to translate spoken interactions on topics on force protection, security, and basic medical services, and can be customized to include other topics as needed. Conversations can be logged for later review or archiving.

Force protection Military term; preventive measures taken to mitigate hostile actions

Force protection (FP): Preventive measures taken to mitigate hostile actions against Department of Defense and U.S. Coast Guard personnel, resources, facilities, and critical information. The concept of Force Protection was initially created after the Beirut barrack bombings in Lebanon in 1983. With its Cold War focus towards potential adversaries employing large conventional military forces at the time, the U.S. Military had become complacent and predictable with regard to asymmetric attacks by state and non-state actors employing terrorist and guerilla methodologies. As a result, during what were ostensibly peacekeeping operations by a U.S. Marine Corps landing force ashore in Lebanon in 1983, it allowed two civilian trucks to breach the perimeter of the Marines' cantonment area and detonate their load of explosives as suicide vehicles adjacent to the Marines' billeting areas.

Security degree of resistance to, or protection from, harm

Security is freedom from, or resilience against, potential harm caused by others. Beneficiaries of security may be of persons and social groups, objects and institutions, ecosystems or any other entity or phenomenon vulnerable to unwanted change by its environment.

Users

After IraqComm demonstrated leading performance in competitive laboratory tests under Phase 1 of TRANSTAC, in Spring 2006 the U.S. military selected the IraqComm system for evaluation in real tactical operations in Iraq. [3]

Design

IraqComm integrates three software technologies: automatic speech recognition (ASR), machine translation (MT), and text-to-speech synthesis (TTS). To start a dialog, the user speaks into the microphone and the system records his or her voice. The ASR module processes the recording and displays what it heard on the screen. The MT module translates the phrase into the target language (e.g., Iraqi Arabic). The TTS module then “speaks” this translation, which is also displayed on the screen.

Software non-tangible executable component of a computer

Computer software, or simply software, is a collection of data or computer instructions that tell the computer how to work. This is in contrast to physical hardware, from which the system is built and actually performs the work. In computer science and software engineering, computer software is all information processed by computer systems, programs and data. Computer software includes computer programs, libraries and related non-executable data, such as online documentation or digital media. Computer hardware and software require each other and neither can be realistically used on its own.

Speech recognition is the inter-disciplinary sub-field of computational linguistics that develops methodologies and technologies that enables the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT). It incorporates knowledge and research in the linguistics, computer science, and electrical engineering fields.

Microphone a device that converts sound into an electrical signal

A microphone, colloquially nicknamed mic or mike, is a transducer that converts sound into an electrical signal.

To simplify and speed up communication, the IraqComm interface provides a shortcut menu of frequently used phrases. The user can change the phrase to be translated by editing it with the keyboard or by selecting a similar-sounding phrase with a single press of a button.

SRI's DynaSpeak speech recognition engine is embedded in the IraqComm system. DynaSpeak speech recognition is also used in the Phraselator, a weatherproof handheld language translation device developed by VoxTec, a former division of the military contractor Marine Acoustics, located in Annapolis, MD.

The Phraselator is a weatherproof handheld language translation device developed by Applied Data Systems and VoxTec, a former division of the military contractor Marine Acoustics, located in Annapolis, MD. It was designed to serve as a handheld computer device that translates English into one of 40 different languages.

Related Research Articles

Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech computer or speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech.

A universal translator is a device common to many science fiction works, especially on television. First described in Murray Leinster's 1945 novella "First Contact", the translator's purpose is to offer an instant translation of any language. Technology companies are striving to develop a practical "universal" translator for common use.

SRI International United States research institute

SRI International (SRI) is an American nonprofit scientific research institute and organization headquartered in Menlo Park, California. The trustees of Stanford University established SRI in 1946 as a center of innovation to support economic development in the region.

Nuance is a U.S. based multinational computer software technology corporation, headquartered in Burlington, Massachusetts, United States on the outskirts of Boston, that provides speech recognition, and artificial intelligence. Current business products focus on server and embedded speech recognition, telephone call steering systems, automated telephone directory services, and medical transcription software and systems. The company also maintains a small division which does software and system development for military and government agencies based in Westborough, Massachusetts, allegedly called Twined.

Dialogue system

A dialogue system, or conversational agent (CA), is a computer system intended to converse with a human with a coherent structure. Dialogue systems have employed text, speech, graphics, haptics, gestures, and other modes for communication on both the input and output channel.

Google Translate multilingual machine translation service

Google Translate is a free multilingual machine translation service developed by Google, to translate text. It offers a website interface, mobile apps for Android and iOS, and an API that helps developers build browser extensions and software applications. Google Translate supports over 100 languages at various levels and as of May 2017, serves over 500 million people daily.

Statistical machine translation (SMT) is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. The statistical approach contrasts with the rule-based approaches to machine translation as well as with example-based machine translation.

Chinese speech synthesis is the application of speech synthesis to the Chinese language. It poses additional difficulties due to the Chinese characters, the complex prosody, which is essential to convey the meaning of words, and sometimes the difficulty in obtaining agreement among native speakers concerning what the correct pronunciation is of certain phonemes.

ECTACO Inc. is a US-based developer and manufacturer of hardware and software products for speech recognition and electronic translation. They also make jetBook eBook readers.

Voice search, also called voice-enabled, allows the user to use a voice command to search the Internet, a website, or an app.

A spoken dialog system is a computer system able to converse with a human with voice. It has two essential components that do not exist in a written text dialog system: a speech recognizer and a text-to-speech module. In can be further distinguished from command and control speech systems that can respond to requests but do not attempt to maintain continuity over time.

A IP PBX is a system that connects telephone extensions to the Public Switched Telephone Network and provides internal communication for a business. An IP PBX is a PBX system with IP connectivity and may provide additional audio, video, or instant messaging communication utilizing the TCP/IP protocol stack.

SVOX is an embedded speech technology company founded in 2000 and headquartered in Zurich, Switzerland. SVOX was acquired by Nuance Communications in 2011. The company’s products included Automated Speech Recognition (ASR), Text-to-Speech (TTS) and Speech Dialog systems, with customers mostly being manufacturers and system integrators in automotive and mobile device industries.

Applications Technology (AppTek) is a U.S. software company specializing in human language technology, headquartered in McLean, Virginia.

Speech translation is the process by which conversational spoken phrases are instantly translated and spoken aloud in a second language. This differs from phrase translation, which is where the system only translates a fixed and finite set of phrases that have been manually entered into the system. Speech translation technology enables speakers of different languages to communicate. It thus is of tremendous value for humankind in terms of science, cross-cultural exchange and global business.

Julia Hirschberg is an American computer scientist noted for her research on computational linguistics and natural language processing. She is currently the Percy K. and Vida L. W. Hudson Professor of Computer Science at Columbia University.

The Global Autonomous Language Exploitation (GALE) program was funded by DARPA starting in 2005 to develop technologies for automatic information extraction from multilingual newscasts, documents and other forms of communication. The program encompassed three main challenges: automatic speech recognition, machine translation, and information retrieval. The focus of the program was on recognizing speech in Mandarin and Arabic and translating it to English.

References