OmegaT

Last updated
OmegaT
Original author(s) Keith Godfrey
Developer(s) Aaron Madlon-Kay, Didier Briel, Alex Buloichik, Zoltan Bartko, Tiago Saboga, etc.
Initial releaseNovember 28, 2002
Stable release 4.3.3 (March 18, 2022;18 months ago (2022-03-18)) [±]
Preview release 5.7.1 (March 18, 2022;18 months ago (2022-03-18)) [±]
Repository
Operating system Microsoft Windows, macOS, Linux, Solaris
Type Computer-assisted translation
License GPLv3+ [1]
Website omegat.org

OmegaT is a computer-assisted translation tool written in the Java programming language. It is free software originally developed by Keith Godfrey in 2000, and is currently developed by a team led by Aaron Madlon-Kay.

Contents

OmegaT is intended for professional translators. Its features include customisable segmentation using regular expressions, translation memory with fuzzy matching and match propagation, glossary matching, dictionary matching, translation memory and reference material searching, and inline spell-checking using Hunspell spelling dictionaries.

OmegaT runs on Linux, macOS, Microsoft Windows and Solaris, and requires Java 8. [2] It is available in 27 languages. According to a survey in 2010 [3] among 458 professional translators, OmegaT is used 1/3 as much as Wordfast, Déjà Vu and MemoQ, and 1/8 as much as the market leader Trados.

History

OmegaT was first developed by Keith Godfrey in 2000. It was originally written in C++.

The first public release in February 2001 [4] was written in Java. This version used a proprietary translation memory format. It could translate unformatted text files, and HTML, and perform only block-level segmentation (i.e. paragraphs instead of sentences).

Development and software releases

The development of OmegaT is hosted on SourceForge. The development team is led by Aaron Madlon-Kay. As with many open source projects, new versions of OmegaT are released frequently, usually with 2-3 bugfixes and feature updates each. There is a "standard" version, which always has a complete user manual and a "latest" version which includes features that are not yet documented in the user manual. [5] The updated sources are always available from the SourceForge code repository. [6]

How OmegaT works

OmegaT handles a translation job as a project, a hierarchy of folders with specific names. The user copies non-translated documents into one named /source/ (or subfolders thereof). The Editor pane displays the source documents as individual “segments” for translation one segment at a time. OmegaT, when directed, generates the (partially) translated versions in the /target/ subfolder.

Other named folders include ones for automatic consultation within the program: /tm/ for existing translation pairs in .tmx format, /tm/auto/ for automatic translation of 100% matches, /glossary/ for glossaries, /dictionary/ for StarDict (and .tbx) dictionaries.

When the user goes to translate a segment in the Editor pane, OmegaT automatically searches the .tmx files in the /tm/ hierarchy for previous translation pairs with similar source sentences and displays them in the Fuzzy Matches pane for insertion into the Editor pane with a keyboard shortcut. The Glossary and Dictionary panes provide similar automatic look-up functions for any glossaries and dictionaries in the corresponding named folders in the project. The optional Machine Translation pane shows machine translations from Google Translate and similar services.

When the user leaves a segment, OmegaT normally first adds the source-target pair to its database in memory. It subsequently saves that database to disk in Translation Memory eXchange (.tmx) format for use another day, in other projects, by other translators, and even with other CAT tools. No change, naturally enough, means no such update. Version 3.1 added a setting for blocking targets equal to their sources, a common slip, plus a keyboard shortcut for overriding it—numbers, source code in programming manuals, etc.

At any point, the user can create partially translated versions of the source files. Note that OmegaT copies source segments verbatim if they have yet to be translated. Before doing so, however, the user is advised to use the Validate menu command to check for tag and other errors. Version 3.1 added a menu command (and keyboard shortcut) for limiting operation to the current file—for partial delivery or quick update, for example.

Features of OmegaT

OmegaT shares many features with proprietary CAT tools. These include creating, importing and exporting translation memories, fuzzy matching from translation memories, glossary look-up, and reference and concordance searching.

OmegaT also has additional features that are not always available in other CAT tools. These include:

Document formats support

Several file types can be translated directly in OmegaT. OmegaT determines the file type by the file extension. The file extension handling and preferred encoding can be customised to override default settings.

OmegaT handles formatted documents by converting formatting to tags, similar to other commercial CAT tools.

Directly supported formats

Indirectly supported formats

There are two processes that allow OmegaT to handle unsupported formats:

Support for XLIFF

The program Rainbow from the Okapi Framework can convert certain file formats to an XLIFF format that OmegaT does support. Rainbow can also create complete OmegaT project folders from such documents, for easier handling in OmegaT. [8]

Support for Gettext PO

A number of file formats can be converted to Gettext Portable Object (PO) files, which can be translated in OmegaT. The Debian program po4a can convert formats such as LaTeX, TeX and POD to Gettext PO. [9] The Translate Toolkit can convert Mozilla .properties and dtd files, CSV files, certain Qt .ts files, and certain XLIFF files to Gettext PO.

Support for Office Open XML and ODF

Microsoft Word, Excel and PowerPoint documents from version 97 to 2003 can be converted to Office Open XML (Microsoft Office 2007/2010) or ODF (OpenOffice.org) format. Conversion is not entirely lossless and may lead to loss of formatting.

Support for Trados® .ttx files

Trados® .ttx files can be treated using the Okapi TTX Filter.

Supported memory and glossary formats

Translation memories in TMX format

OmegaT's internal translation memory format is not visible to the user, but every time it autosaves the translation project, all new or updated translation units are automatically exported and added to three external TMX memories: a native OmegaT TMX, a level 1 TMX and a level 2 TMX.

Exported level 2 files include OmegaT's internal tags encapsulated in TMX tags which allows such TMX files to generate matches in TMX level 2 supporting CAT tools. Tests have been positive in Trados and SDLX.

OmegaT can import TMX files up to version 1.4b level 1 as well as level 2. Level 2 files imported in OmegaT will generate matches of the same level since OmegaT converts the TMX level 2 tags of the foreign TMX. Here again, tests have been positive with TMX files created by Transit.

Glossaries

For glossaries, OmegaT mainly uses tab-delimited plain text files in UTF-8 encoding with the .txt extension. The structure of a glossary file is extremely simple: the first column contains the source language word, the second column contains the corresponding target language words, the third column (optional) can contain anything including comments on context etc. Such glossaries can easily be created in a text editor.

Similarly structured files in standard CSV format are also supported, as well as TBX files.

Involvement by community of users

The OmegaT Project

OmegaT is open-source software and benefits from the help of volunteers. Programming is certainly the most important function, but it would benefit from greater support from volunteers in almost all areas. If you feel so inclined, you may also modify OmegaT to suit your own requirements. [10]

Localization of OmegaT

OmegaT's user interface and documentation have been translated into about 30 languages. Volunteer translators can translate either the user interface, the "Instant Start" short tutorial, or the entire user manual (or all three components). All the language files and all translations of the user manual are included in the standard distribution of OmegaT.

User-created programs

A characteristic of the OmegaT user community is that deficiencies in OmegaT often prompt users to create macros, scripts and programs that perform those functions, although sometimes those features later become available in OmegaT itself. When OmegaT offered only paragraph segmentation, a user created OpenOffice.org macros for segmenting by sentence. When automatic leveraging of TMs in OmegaT still required TMs to be merged, a user created a TMX merging script. When OmegaT offered no spell-checking support, several users created scripts or found solutions to provide spell-checking as part of an OmegaT based translation process. [11]

Other software built on OmegaT

OmegaT in DGT

Latest update: 2021-03-21

The Directorate-General for Translation of the European Commission (DGT) uses OmegaT as an alternative CAT tool alongside a mainstream commercial tool. DGT maintains a fork of OmegaT (DGT-OmegaT) with adaptations/improvements/new features that meet DGT-specific requirements as well as a number of helper-applications to integrate OmegaT in its workflow: a Wizard to automate the creation, updating, revision and delivery of projects, Tagwipe to clean useless tags in docx documents and TeamBase to allow the sharing of memories in real-time. Those applications are made available by DGT as free open source software. [12]

Benten

Latest update: 2018-04-07

Benten is an Eclipse-based XLIFF editor. It uses OmegaT code to handle the TM matching process. It is partly funded by the Japanese government. [13]

Autshumato translation suite

Latest update: 2017-02-28

Autshumato consists of a CAT tool, an aligner, a PDF extractor, a TMX editor, and a public TM based on crawled data. The finished version will include a terminology manager and a machine translator. The CAT tool element is built upon OmegaT, and requires OpenOffice.org to run. Development is funded by the South African government's Department of Arts and Culture. [14]

OmegaT+

Latest update: 2012-10-24

OmegaT+ is a CAT tool that was forked from OmegaT version 1.4.5 in 2005. OmegaT+ works in a way similar to OmegaT. It has developed its own features but projects are not compatible with OmegaT. [15]

Boltran

Latest update: 2010-10-12

Boltran is a web-based tool that mimicks the workflow of an OmegaT project. It is built upon the source code of OmegaT and can export OmegaT projects. [16]

See also

Related Research Articles

DocBook is a semantic markup language for technical documentation. It was originally intended for writing technical documents related to computer hardware and software, but it can be used for any other sort of documentation.

A translation memory (TM) is a database that stores "segments", which can be sentences, paragraphs or sentence-like units that have previously been translated, in order to aid human translators. The translation memory stores the source text and its corresponding translation in language pairs called “translation units”. Individual words are handled by terminology bases and are not within the domain of TM.

Computer-aided translation (CAT), also referred to as computer-assisted translation or computer-aided human translation (CAHT), is the use of software to assist a human translator in the translation process. The translation is created by a human, and certain aspects of the process are facilitated by software; this is in contrast with machine translation (MT), in which the translation is created by a computer, optionally with some human intervention.

<span class="mw-page-title-main">Okapi Framework</span>

The Okapi Framework is a cross-platform and open-source set of components and applications that offer extensive support for localizing and translating documentation and software.

<span class="mw-page-title-main">Wordfast</span>

The name Wordfast is used for any number of translation memory products developed by Wordfast LLC. The original Wordfast product, now called Wordfast Classic, was developed by Yves Champollion in 1999 as a cheaper alternative to Trados, a well-known translation memory program. The current Wordfast products run on a variety of platforms but use largely compatible translation memory formats, and often also have similar workflows. The software is most popular with freelance translators, although some of the products are also suited for corporate environments.

<span class="mw-page-title-main">Pootle</span> Free translation software

Pootle is an online translation management tool with a translation interface. It is written in the Python programming language using the Django framework and is free software originally developed and released by Translate.org.za in 2004. It was further developed as part of the WordForge project and the African Network for Localisation and is now maintained by Translate.org.za.

XLIFF is an XML-based bitext format created to standardize the way localizable data are passed between and among tools during a localization process and a common format for CAT tool exchange. The XLIFF Technical Committee (TC) first convened at OASIS in December 2001, but the first fully ratified version of XLIFF appeared as XLIFF Version 1.2 in February 2008. Its current specification is v2.1 released on 2018-02-13, which is backwards compatible with v2.0 released on 2014-08-05.

The Translate Toolkit is a localization and translation toolkit. It provides a set of tools for working with localization file formats and files that might need localization. The toolkit also provides an API on which to develop other localization tools.

<span class="mw-page-title-main">Virtaal</span>

Virtaal is a computer-assisted translation tool written in the Python programming language. It is free software developed and maintained by Translate.org.za.

Trados Studio is a computer-assisted translation software tool which offers a complete, centralized translation environment for editing, reviewing and managing translation projects and terminology – It can be used both offline in a desktop tool or online in the cloud. Trados Studio is part of the Trados product portfolio, which is a suite of intelligent translation products owned by RWS that enables freelance translators, language service providers (LSPs) and corporations to streamline processes and improve efficiencies while keeping costs down.

Open Language Tools is a Java project released by Sun Microsystems under the terms of Sun's CDDL.

openTMS is an acronym for Open Source Translation Management System.

Google Translator Toolkit was an online computer-assisted translation tool (CAT)—a web application designed to permit translators to edit the translations that Google Translate automatically generated using its own and/or user-uploaded files of appropriate glossaries and translation memory. The toolkit was designed to let translators organize their work and use shared translations, glossaries and translation memories, and was compatible with Microsoft Word, HTML, and other formats.

Segmentation Rules eXchange or SRX is an XML-based standard that was maintained by Localization Industry Standards Association, until it became insolvent in 2011, and then by the Globalization and Localization Association (GALA).

TranslateCAD is a tool for computer-aided translation software, designed to extract translatable text from CAD drawings saved in the industry-standard DXF format - regardless of the CAD software used to create such drawings - so that professional translators are able to translate in plain text using a number of CAT tools available.

Moses for Mere Mortals (MMM) is a free open source software composed of a set of scripts designed to allow the automation of processes for the installation and operation of the Moses Open Source Translation System, a statistical machine translation system.

Swordfish Translation Editor is a Computer-assisted translation software.

The name MetaTexis is used for several software products developed by MetaTexis Software and Services. The main software products are MetaTexis for Word and the MetaTexis Server. MetaTexis for Word is a translation memory software, also called a Computer-assisted translation tool, that runs inside Microsoft Word. The MetaTexis Server is a server software for translation memories (TMs) and terminology databases (TDBs) that allows numerous translators to work with the same TMs and TDBs via LAN or Internet.

memoQ is a proprietary computer-assisted translation software suite which runs on Microsoft Windows operating systems. It is developed by the Hungarian software company memoQ Fordítástechnológiai Zrt., formerly Kilgray, a provider of translation management software established in 2004 and cited as one of the fastest-growing companies in the translation technology sector in 2012 and 2013. memoQ provides translation memory, terminology, machine translation integration and reference information management in desktop, client/server and web application environments.

<span class="mw-page-title-main">MateCat</span>

MateCat is a web-based computer-assisted translation (CAT) tool. MateCat is released as open source software under the Lesser General Public License (LGPL) from the Free Software Foundation.

References

  1. "OmegaT - multiplatform CAT tool / Code / [86775c] /Release/OmegaT-license.TXT".
  2. "Chapter 1. Installing and running OmegaT". omegat.sourceforge.io. Retrieved 2019-08-14.
  3. "Results of the June translation tools surveys".
  4. "Close Windows. Open Doors".
  5. https://sourceforge.net/projects/omegat/files/ OmegaT's "standard" and "latest" versions
  6. https://archive.today/20120717155731/http://omegat.svn.sourceforge.net/viewvc/omegat/trunk/ The latest source files are always available from the SourceForge code repository
  7. Open Document Format for Office Applications – ISO/IEC 26300:2006 format
  8. Okapi Framework – Text Extraction utility can create an OmegaT project folder tree
  9. po4a Archived 2006-06-22 at the Wayback Machine – A conversion utility to and from the Portable Object format, perl application packaged under Debian
  10. The OmegaT project and You Archived 2011-05-23 at the Wayback Machine
  11. "OmegaT, free memory translation tool". www.omegat.org. Archived from the original on 2008-05-09.
  12. DGT-OmegaT
  13. Benten
  14. Autshumato
  15. OmegaT+
  16. "Boltran". Archived from the original on 2022-01-01. Retrieved 2013-10-11.

User support