Computer-assisted translation

Last updated

Computer-aided translation (CAT), also referred to as computer-assisted translation or computer-aided human translation (CAHT), is the use of software, also known as a translator, to assist a human translator in the translation process. The translation is created by a human, and certain aspects of the process are facilitated by software; this is in contrast with machine translation (MT), in which the translation is created by a computer, optionally with some human intervention (e.g. pre-editing and post-editing). [1]

Contents

CAT tools are typically understood to mean programs that specifically facilitate the actual translation process. Most CAT tools have (a) the ability to translate a variety of source file formats in a single editing environment without needing to use the file format's associated software for most or all of the translation process, (b) translation memory, and (c) integration of various utilities or processes that increase productivity and consistency in translation.

Range of tools

Computer-assisted translation is a broad and imprecise term covering a range of tools. These can include:

Concepts

Translation memory software

Translation memory programs store previously translated source texts and their equivalent target texts in a database and retrieve related segments during the translation of new texts. [4]

Such programs split the source text into manageable units known as "segments". A source-text sentence or sentence-like unit (headings, titles or elements in a list) may be considered a segment. Texts may also be segmented into larger units such as paragraphs or small ones, such as clauses. As the translator works through a document, the software displays each source segment in turn, and provides a previous translation for re-use if it finds a matching source segment in its database. If it does not, the program allows the translator to enter a translation for the new segment. After the translation for a segment is completed, the program stores the new translation and moves on to the next segment. In the dominant paradigm, the translation memory is, in principle, a simple database of fields containing the source language segment, the translation of the segment, and other information such as segment creation date, last access, translator name, and so on. Another translation memory approach does not involve the creation of a database, relying on aligned reference documents instead. [5]

Some translation memory programs function as standalone environments, while others function as an add-on or macro for commercially available word-processing or other business software programs. Add-on programs allow source documents from other formats, such as desktop publishing files, spreadsheets, or HTML code, to be handled using the TM program. For an example, see MEMOrg.

Language search-engine software

New to the translation industry, Language search-engine software is typically an Internet-based system that works similarly to Internet search engines. Rather than searching the Internet, however, a language search engine searches a large repository of Translation Memories to find previously translated sentence fragments, phrases, whole sentences, even complete paragraphs that match source document segments.

Language search engines are designed to leverage modern search technology to conduct searches based on the source words in context to ensure that the search results match the meaning of the source segments. Like traditional TM tools, the value of a language search engine rests heavily on the Translation Memory repository it searches against.

Terminology management software

Terminology management software provides the translator a means of automatically searching a given terminology database for terms appearing in a document, either by automatically displaying terms in the translation memory software interface window or through the use of hot keys to view the entry in the terminology database. Some programs have other hotkey combinations allowing the translator to add new terminology pairs to the terminology database on the fly during translation. Some of the more advanced systems enable translators to check, either interactively or in batch mode, if the correct source/target term combination has been used within and across the translation memory segments in a given project. Independent terminology management systems also exist that can provide workflow functionality, visual taxonomy, work as a type of term checker (similar to spell checker, terms that have not been used correctly are flagged) and can support other types of multilingual term facet classifications such as pictures, videos, or sound. [6] [4]

Alignment software

The process of binding a source language segment to its corresponding target language segment. The purpose is to create a translation memory database or to add to an existing one.

Interactive machine translation

Interactive machine translation is a paradigm in which the automatic system attempts to predict the translation the human translator is going to produce by suggesting translation hypotheses. These hypotheses may either be the complete sentence, or the part of the sentence that is yet to be translated.

Augmented translation

Augmented translation is a form of human translation carried out within an integrated technology environment that provides translators access to subsegment adaptive machine translation (MT) and translation memory (TM), terminology lookup (CAT), and automatic content enrichment (ACE) to aid their work, and that automates project management, file handling, and other ancillary tasks. [7] [8]

Based on the concept of augmented reality, augmented translation seeks to make translators more productive by providing them with relevant information on an as-needed basis. This information adapts to the habits and style of individual translators in order to accelerate their work and increase productivity. It differs from classical postediting of MT, which has linguists revise entire texts translated by machines, in that it provides machine translation and information as suggestions that can be adopted in their entirety, edited, or ignored, as appropriate. [7]

Augmented translation extends principles first developed in the 1980s that made their way into CAT tools. However, it integrates several functions that have previously been discrete into one environment. For example, translators historically have had to leave their translation environments to do terminology research, but in an augmented environment, an ACE component would automatically provide links to information about terms and concepts found in the text directly within the environment.

As of May 2017, no full implementations of an augmented translation environment exist, although individual developers have created partial systems.

See also

Related Research Articles

A translation memory (TM) is a database that stores "segments", which can be sentences, paragraphs or sentence-like units that have previously been translated, in order to aid human translators. The translation memory stores the source text and its corresponding translation in language pairs called “translation units”. Individual words are handled by terminology bases and are not within the domain of TM.

<span class="mw-page-title-main">Parallel text</span> Text placed alongside its translation or translations

A parallel text is a text placed alongside its translation or translations. Parallel text alignment is the identification of the corresponding sentences in both halves of the parallel text. The Loeb Classical Library and the Clay Sanskrit Library are two examples of dual-language series of texts. Reference Bibles may contain the original languages and a translation, or several translations by themselves, for ease of comparison and study; Origen's Hexapla placed six versions of the Old Testament side by side. A famous example is the Rosetta Stone, whose discovery allowed the Ancient Egyptian language to begin being deciphered.

A programming tool or software development tool is a computer program that software developers use to create, debug, maintain, or otherwise support other programs and applications. The term usually refers to relatively simple programs, that can be combined to accomplish a task, much as one might use multiple hands to fix a physical object. The most basic tools are a source code editor and a compiler or interpreter, which are used ubiquitously and continuously. Other tools are used more or less depending on the language, development methodology, and individual engineer, often used for a discrete task, like a debugger or profiler. Tools may be discrete programs, executed separately – often from the command line – or may be parts of a single large program, called an integrated development environment (IDE). In many cases, particularly for simpler use, simple ad hoc techniques are used instead of a tool, such as print debugging instead of using a debugger, manual timing instead of a profiler, or tracking bugs in a text file or spreadsheet instead of a bug tracking system.

A translation management system (TMS), formerly globalization management system (GMS), is a type of software for automating many parts of the human language translation process and maximizing translator efficiency. The idea of a translation management system is to automate all repeatable and non-essential work that can be done by software/systems and leaving only the creative work of translation and review to be done by human beings. A translation management system generally includes at least two types of technology: process management technology to automate the flow of work, and linguistic technology to aid the translator.

<span class="mw-page-title-main">OmegaT</span> Computer assisted translation tool written in Java

OmegaT is a computer-assisted translation tool written in the Java programming language. It is free software originally developed by Keith Godfrey in 2000, and is currently developed by a team led by Aaron Madlon-Kay.

<span class="mw-page-title-main">Wordfast</span>

The name Wordfast is used for any number of translation memory products developed by Wordfast LLC. The original Wordfast product, now called Wordfast Classic, was developed by Yves Champollion in 1999 as a cheaper alternative to Trados, a translation memory program. The current Wordfast products run on a variety of platforms but use largely compatible translation memory formats, and often also have similar workflows. The software is most popular with freelance translators, although some of the products are also suited for corporate environments.

<span class="mw-page-title-main">Gtranslator</span> Free computer-assisted translation software

Gtranslator is a specialized computer-assisted translation software and po file editor for the internationalization and localization (i18n) of software that uses the gettext system. It handles all forms of gettext po files and includes features such as Find/Replace, Translation Memory, different Translator Profiles, Messages Table, Easy Navigation and Editing of translation messages and comments of the translation where accurate. Gtranslator includes also a plugin system with plugins such as Alternate Language, Insert Tags, Open Tran, Integration with Subversion, and Source Code Viewer. Gtranslator is written in the programming language C for the GNOME desktop environment. It is available as free software under the terms of the GNU General Public License (GPL).

<span class="mw-page-title-main">Virtaal</span>

Virtaal is a computer-assisted translation tool written in the Python programming language. It is free software developed and maintained by Translate.org.za.

Trados Studio is a computer-assisted translation software tool which provides a comprehensive platform for translation tasks, including editing, reviewing, and project management. It is available both as a local desktop tool or online. Trados, owned by RWS, also provides a suite of intelligent machine translation products.

GlobalSight is a free and open source translation management system (TMS) released under the Apache License 2.0. As of version 7.1 it supports the TMX and SRX 2.0 Localization Industry Standards Association standards. It was developed in the Java programming language and uses a MySQL database. GlobalSight also supports computer-assisted translation and machine translation.

Fuzzy matching is a technique used in computer-assisted translation as a special case of record linkage. It works with matches that may be less than 100% perfect when finding correspondences between segments of a text and entries in a database of previous translations. It usually operates at sentence-level segments, but some translation technology allows matching at a phrasal level. It is used when the translator is working with translation memory (TM). It uses approximate string matching.

Open Language Tools is a Java project released by Sun Microsystems under the terms of Sun's CDDL.

openTMS is an acronym for Open Source Translation Management System.

Google Translator Toolkit was an online computer-assisted translation tool (CAT)—a web application designed to permit translators to edit the translations that Google Translate automatically generated using its own and/or user-uploaded files of appropriate glossaries and translation memory. The toolkit was designed to let translators organize their work and use shared translations, glossaries and translation memories, and was compatible with Microsoft Word, HTML, and other formats.

Caitra is a translation Computer Assisted Tool, or CAT, developed by the University of Edinburgh. Provided from an online platform, Caitra is based on AJAX Web.2 technologies and the Moses decoder. The web page of the tool is implemented with Ruby on Rails, an open source web framework, and C++.

The name MetaTexis is used for several software products developed by MetaTexis Software and Services. The main software products are MetaTexis for Word and the MetaTexis Server. MetaTexis for Word is a translation memory software, also called a Computer-assisted translation tool, that runs inside Microsoft Word. The MetaTexis Server is a server software for translation memories (TMs) and terminology databases (TDBs) that allows numerous translators to work with the same TMs and TDBs via LAN or Internet.

MultiTerm is a terminology management tool providing one solution to store and manage multilingual terminology.

The following outline is provided as an overview of and topical guide to natural-language processing:

memoQ is a proprietary computer-assisted translation software suite which runs on Microsoft Windows operating systems. It is developed by the Hungarian software company memoQ Fordítástechnológiai Zrt., formerly Kilgray, a provider of translation management software established in 2004 and cited as one of the fastest-growing companies in the translation technology sector in 2012, and 2013. memoQ provides translation memory, terminology, machine translation integration and reference information management in desktop, client/server and web application environments.

MateCat is a web-based computer-assisted translation (CAT) tool, released as open-source software under the Lesser General Public License (LGPL).

References

  1. Bowker, Lynne; Fisher, Des (2010). "Computer-aided translation". In Gambier, Y.; van Doorslaer, L. (eds.). Handbook of Translation Studies. John Benjamins Publishing Company. p. 60. ISBN   978-90-272-0331-1 . Retrieved 4 March 2024.
  2. Christensen, Tina Paulsen; Schjoldager, Anne. "Translation-Memory (TM) Research: What Do We Know and How Do We Know It?" (PDF). Hermes. 44.
  3. "Bitext Alignment | Request PDF - ResearchGate".
  4. 1 2 "Terminology Management and MT" (PDF). Circuit. 117.
  5. "CAT Tools vs. Machine Translation: What's the Best Method?". Asian Absolute. 21 December 2015. Retrieved 29 January 2017.
  6. "Archived copy" (PDF). Archived from the original (PDF) on 25 April 2012. Retrieved 3 October 2011.{{cite web}}: CS1 maint: archived copy as title (link)
  7. 1 2 DePalma, Donald A. and Arle Lommel (15 February 2017). "Augmented Translation Powers up Language Services". Common Sense Advisory. Retrieved 19 May 2017.
  8. Eggers, William D., David Schatsky, and Dr. Peter Viechnicki (26 April 2017). "AI-augmented government: Using cognitive technologies to redesign public sector work". Deloitte University Press. Retrieved 19 May 2017.{{cite web}}: CS1 maint: multiple names: authors list (link)