ULTRA (machine translation system)

Last updated

ULTRA is a machine translation system created for five languages (Japanese, Chinese, Spanish, English, and German) in the Computing Research Laboratory in 1991.

Machine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that investigates the use of software to translate text or speech from one language to another.

Contents

ULTRA (Universal Language Translator), is a machine translation system developed at the Computing Research Laboratory, [1] which can translate between five languages (Japanese, Chinese, Spanish, English and German). It uses Artificial intelligence as well as linguistic and logic programming methods. The main goal of the system is to be robust, to cover general language and to be simple to use. It uses bidirectional parsers/generators.

In computer science, artificial intelligence (AI), sometimes called machine intelligence, is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans. Colloquially, the term "artificial intelligence" is often used to describe machines that mimic "cognitive" functions that humans associate with the human mind, such as "learning" and "problem solving".

Logic programming is a type of programming paradigm which is largely based on formal logic. Any program written in a logic programming language is a set of sentences in logical form, expressing facts and rules about some problem domain. Major logic programming language families include Prolog, answer set programming (ASP) and Datalog. In all of these languages, rules are written in the form of clauses:

Parsing, syntax analysis, or syntactic analysis is the process of analysing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part.

The system has a language-independent system of intermediate representation, which means that it takes into account needs for expression (expression is one of the main elements of language) and it uses relaxation techniques to provide the best translation. It used an X Window user interface. [2]

X Window System windowing system for bitmap displays on UNIX-like systems

The X Window System is a windowing system for bitmap displays, common on Unix-like operating systems.

ULTRA's databases

Operation

Users paste a sentence into the "source" window. They chose a target language and press Translate. [3] The tool translates the source text, taking into consideration what is said, how it is said and why it is said.

Lexical entries in the system have two parts:

ULTRA works with Intermediate representation of the language between the systems, so no transfer takes place. Each language has its own systems, which are independent. Having the independent systems gives an extra benefit. Adding another language does not disrupt existing language translations.

An Intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive for further processing, such as optimization and translation. A "good" IR must be accurate – capable of representing the source code without loss of information – and independent of any particular source or target language. An IR may take one of several forms: an in-memory data structure, or a special tuple- or stack-based code readable by the program. In the latter case it is also called an intermediate language.

Intermediate representation

Developers David Farwell and Yorick Wilks created IR (interlingual representation). It was a base for analyzing and generating expressions. [4]

They analyzed many different types of communications (business letters, documents, emails) to compare the communication style. ULTRA looks for the best words for some kinds of information and good forms and equivalents for some expression in target language.

Related Research Articles

A compiler is a computer program that translates computer code written in one programming language into another programming language. The name compiler is primarily used for programs that translate source code from a high-level programming language to a lower level language to create an executable program.

An interpreted language is a type of programming language for which most of its implementations execute instructions directly and freely, without previously compiling a program into machine-language instructions. The interpreter executes the program directly, translating each statement into a sequence of one or more subroutines, and then into another language.

In computer science, an interpreter is a computer program that directly executes instructions written in a programming or scripting language, without requiring them previously to have been compiled into a machine language program. An interpreter generally uses one of the following strategies for program execution:

  1. Parse the source code and perform its behavior directly;
  2. Translate source code into some efficient intermediate representation and immediately execute this;
  3. Explicitly execute stored precompiled code made by a compiler which is part of the interpreter system.

In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine.

In computing, code generation is the process by which a compiler's code generator converts some intermediate representation of source code into a form that can be readily executed by a machine.

In computer science, a high-level programming language is a programming language with strong abstraction from the details of the computer. In contrast to low-level programming languages, it may use natural language elements, be easier to use, or may automate significant areas of computing systems, making the process of developing a program simpler and more understandable than when using a lower-level language. The amount of abstraction provided defines how "high-level" a programming language is.

A modeling language is any artificial language that can be used to express information or knowledge or systems in a structure that is defined by a consistent set of rules. The rules are used for interpretation of the meaning of components in the structure.

Computer-assisted translation, computer-aided translation or CAT is a form of language translation in which a human translator uses computer hardware to support and facilitate the translation process.

Data modeling (in software engineering) process of creating a data model for an information system by applying certain formal techniques

Data modeling in software engineering is the process of creating a data model for an information system by applying certain formal techniques.

A Translation management system (TMS) is a type of software for automating many parts of the human language translation process and maximizing translator efficiency. The ideal of a translation management system is to automate all repeatable and non-essential work that can be done by software/systems and leaving only the creative work of translation and review to be done by human beings. A translation management system generally includes at least two types of technology: process management technology to automate the flow of work, and linguistic technology to aid the translator.

Interlingual machine translation

Interlingual machine translation is one of the classic approaches to machine translation. In this approach, the source language, i.e. the text to be translated is transformed into an interlingua, i.e., an abstract language-independent representation. The target language is then generated from the interlingua. Within the rule-based machine translation paradigm, the interlingual approach is an alternative to the direct approach and the transfer approach.

Yorick Wilks British computer scientist

Yorick Wilks FBCS, a British computer scientist, is Emeritus Professor of Artificial Intelligence at the University of Sheffield, Visiting Professor of Artificial Intelligence at Gresham College, Senior Research Fellow at the Oxford Internet Institute, Senior Scientist at the Florida Institute for Human and Machine Cognition, and a member of the Epiphany Philosophers.

A multilingual notation is a representation in a lexical resource that allows the translation between two or more words.

A decompiler is a computer program that takes an executable file as input, and attempts to create a high level source file which can be recompiled successfully. It is therefore the opposite of a compiler, which takes a source file and makes an executable. Decompilers are usually unable to perfectly reconstruct the original source code, and as such, will frequently produce obfuscated code. Nonetheless, decompilers remain an important tool in the reverse engineering of computer software.

Mobile translation refers to any electronic device or software application that provides audio translation. It includes any handheld electronic device that is specifically designed for audio translation. It also includes any machine translation service or software application for hand-held devices, including mobile telephones, Pocket PCs, and PDAs. Mobile translation provides hand-held device users with the advantage of instantaneous and non-mediated translation from one human language to another, usually against a service fee that is, nevertheless, significantly smaller than a human translator charges.

In computing, a compiler is a computer program that transforms source code written in a programming language or computer language, into another computer language. The most common reason for transforming source code is to create an executable program.

Microsoft Translator

Microsoft Translator is a multilingual machine translation cloud service provided by Microsoft. Microsoft Translator is integrated across multiple consumer, developer, and enterprise products; including Bing, Microsoft Office, SharePoint, Microsoft Edge, Microsoft Lync, Yammer, Skype Translator, Visual Studio, Internet Explorer, and Microsoft Translator apps for Windows, Windows Phone, iPhone and Apple Watch, and Android phone and Android Wear.

MedSLT is a medium-ranged open source spoken language translator developed by the University of Geneva. It is funded by the Swiss National Science Foundation. The system has been designed for the medical domain. It currently covers the doctor-patient diagnosis dialogues for the domains of headache, chest and abdominal pain in English, French, Japanese, Spanish, Catalan and Arabic. The vocabulary used ranges from 350 to 1000 words depending on the domain and language pair.

References

  1. Farwell & Wilks, 1989
  2. Austermuhl, Frank (2001). Electronic tools for translations. Manchester: St. Jerome Publishing. ISBN   1900650347.
  3. Wilks, Yorick (2009). Machine Translation. Its Scope and Limits. Springer Science+Business Media LLC. ISBN   9780387727738.
  4. Farwell David, Wilks Yorick (1991). "ULTRA: A multilingual machine translator".

External