Moses for Mere Mortals

Last updated

Moses for Mere Mortals (MMM) [1] is a free open source software composed of a set of scripts designed to allow the automation of processes for the installation and operation of the Moses Open Source Translation System, a statistical machine translation system.

Contents

MMM builds a translation chain prototype with Moses + IRSTLM + RandLM + MGIZA. [2] [3]

The first version of Moses for Mere Mortals was published in November 2009, and it has been updated and tested on Linux - Ubuntu distributions. MMM is available in the GitHub Project Hosting website. [1]

Overview

Its main aims are to:

Even though the main thrust is centred on Linux, two Windows add-ins help to make the bridge from Windows to Linux and then back from Linux.

General features

Overview MosesforMereMortals.JPG
Overview

Moses allows the training of corpora where every word is presented together with, for instance, its respective lemma and/or part of speech tag (“factored training”). The scripts do not cover this type of training.

MMM consists of seven scripts for Linux, thoroughly tested with Ubuntu (12.04 and 14.04, 64-bit):

MMM comes with a 200,000-segment demonstration corpus — which is too small to do justice to the qualitative results achievable with Moses, but capable of giving a realistic view of the relative duration of the steps involved and useful to test whether the installation was correctly done. In order to get good results, one generally needs a corpus with several million segments. Each orthogonal corpus consists of two strictly aligned UTF-8 files, one in the source language and the other in the target language. No grammar knowledge is required, though some language pairs give better results than others. In a general way, morphologically rich languages give worse results.

Add-ins

MMM also contains (for Windows and Linux):

MMM also contains the file Nonbreaking_prefix.pt, a list of abbreviations specific to the Portuguese language, based on English and German versions already available with the Moses package.

Software features

Moses for Mere Mortals also has some original features:

Related Research Articles

Linux distribution Operating system based on the Linux kernel

A Linux distribution is an operating system made from a software collection that is based upon the Linux kernel and, often, a package management system. Linux users usually obtain their operating system by downloading one of the Linux distributions, which are available for a wide variety of systems ranging from embedded devices and personal computers to powerful supercomputers.

A translation memory (TM) is a database that stores "segments", which can be sentences, paragraphs or sentence-like units that have previously been translated, in order to aid human translators. The translation memory stores the source text and its corresponding translation in language pairs called “translation units”. Individual words are handled by terminology bases and are not within the domain of TM.

Almquist shell is a lightweight Unix shell originally written by Kenneth Almquist in the late 1980s. Initially a clone of the System V.4 variant of the Bourne shell, it replaced the original Bourne shell in the BSD versions of Unix released in the early 1990s.

GNU GRUB Boot loader package

GNU GRUB is a boot loader package from the GNU Project. GRUB is the reference implementation of the Free Software Foundation's Multiboot Specification, which provides a user the choice to boot one of multiple operating systems installed on a computer or select a specific kernel configuration available on a particular operating system's partitions.

Live CD

A live CD is a complete bootable computer installation including operating system which runs directly from a CD-ROM or similar storage device into a computer's memory, rather than loading from a hard disk drive. A live CD allows users to run an operating system for any purpose without installing it or making any changes to the computer's configuration. Live CDs can run on a computer without secondary storage, such as a hard disk drive, or with a corrupted hard disk drive or file system, allowing data recovery.

Parallel text Text placed alongside its translation or translations

A parallel text is a text placed alongside its translation or translations. Parallel text alignment is the identification of the corresponding sentences in both halves of the parallel text. The Loeb Classical Library and the Clay Sanskrit Library are two examples of dual-language series of texts. Reference Bibles may contain the original languages and a translation, or several translations by themselves, for ease of comparison and study; Origen's Hexapla placed six versions of the Old Testament side by side. A famous example is the Rosetta Stone, whose discovery allowed the Ancient Egyptian language to begin being deciphered.

Computer-aided translation (CAT), also referred to as machine-assisted translation (MAT) or machine-aided human translation (MAHT), is the use of software to assist a human translator in the translation process. The translation is created by a human, and certain aspects of the process are facilitated by software; this is in contrast with machine translation (MT), in which the translation is created by a computer, optionally with some human intervention.

Puppy Linux Lightweight Linux distribution

Puppy Linux is an operating system and family of light-weight Linux distributions that focus on ease of use and minimal memory footprint. The entire system can be run from random-access memory with current versions generally taking up about 600 MB (64-bit), 300 MB (32-bit), allowing the boot medium to be removed after the operating system has started. Applications such as AbiWord, Gnumeric and MPlayer are included, along with a choice of lightweight web browsers and a utility for downloading other packages. The distribution was originally developed by Barry Kauler and other members of the community, until Kauler retired in 2013. The tool Woof can build a Puppy Linux distribution from the binary packages of other Linux distributions.

In computing, initrd is a scheme for loading a temporary root file system into memory, which may be used as part of the Linux startup process. initrd and initramfs refer to two different methods of achieving this. Both are commonly used to make preparations before the real root file system can be mounted.

OmegaT Computer assisted translation tool written in Java

OmegaT is a computer-assisted translation tool written in the Java programming language. It is free software originally developed by Keith Godfrey in 2000, and is currently developed by a team led by Aaron Madlon-Kay.

TestDisk

TestDisk is a free and open-source data recovery utility. It is primarily designed to help recover lost data storage partitions and/or make non-booting disks bootable again when these symptoms are caused by faulty software, certain types of viruses or human error . TestDisk can be used to collect detailed information about a corrupted drive, which can then be sent to a technician for further analysis.

Metalink

Metalink is an extensible metadata file format that describes one or more computer files available for download. It specifies files appropriate for the user's language and operating system; facilitates file verification and recovery from data corruption; and lists alternate download sources.

Wubi (software)

Wubi is a free software Ubuntu installer, that was the official Windows-based software, from 2008 until 2013, to install Ubuntu from within Windows, to a single file within an existing Windows partition.

Strigi was a file indexing and file search framework adopted by KDE SC. Strigi was initiated by Jos van den Oever. Strigi's goals are to be fast, use a small amount of RAM, and use flexible backends and plug-ins. A benchmark as of January 2007 showed that Strigi is faster and uses less memory than other search systems, but it lacks many of their features. Like most desktop search systems, Strigi can extract information from files, such as the length of an audio clip, the contents of a document, or the resolution of a picture; plugins determine what filetypes it is capable of handling. Strigi uses its own Jstream system which allows for deep indexing of files. Strigi is accessible via Konqueror, or by clicking on its icon, after adding it to KDE's Kicker or GNOME Panel. The graphical user interface (GUI) is named Strigiclient.

The Translate Toolkit is a localization and translation toolkit. It provides a set of tools for working with localization file formats and files that might need localization. The toolkit also provides an API on which to develop other localization tools.

Virtaal

Virtaal is a computer-assisted translation tool written in the Python programming language. It is free software developed and maintained by Translate.org.za.

Ubiquity (software) Free and open-source system installer for Ubuntu and its derivatives

Ubiquity is the default installer for Ubuntu and its derivatives. It is run from the Live CD or USB and can be triggered to run from the options on the device or on the desktop of the Live mode. It was first introduced in Ubuntu 6.06 LTS "Dapper Drake". At program start, it allows the user to change the language to a local language if they prefer. It is designed to be easy to use.

Open Language Tools is a Java project released by Sun Microsystems under the terms of Sun’s CDDL.

XZ Utils is a set of free software command-line lossless data compressors, including lzma and xz, for Unix-like operating systems and, from version 5.0 onwards, Microsoft Windows.

memoQ is a proprietary computer-assisted translation software suite which runs on Microsoft Windows operating systems. It is developed by the Hungarian software company memoQ Fordítástechnológiai Zrt., formerly Kilgray, a provider of translation management software established in 2004 and cited as one of the fastest growing companies in the translation technology sector in 2012 and 2013. memoQ provides translation memory, terminology, machine translation integration and reference information management in desktop, client/server and web application environments.

References

  1. 1 2 "moses-for-mere-mortals". GitHub. Retrieved 2014-11-28.
  2. "Welcome to Moses!" . Retrieved 2012-01-29.
  3. "mosesdecoder" . Retrieved 2012-01-29.