Postediting

Last updated

Post-editing (or postediting) is the process whereby humans amend machine-generated translation to achieve an acceptable final product. A person who post-edits is called a post-editor. The concept of post-editing is linked to that of pre-editing. In the process of translating a text via machine translation, best results may be gained by pre-editing the source text for example by applying the principles of controlled language  and then post-editing the machine output. It is distinct from editing, which refers to the process of improving human generated text (a process which is often known as revision in the field of translation). Post-edited text may afterwards be revised to ensure the quality of the language choices are proofread to correct simple mistakes.

Contents

Post-editing involves the correction of machine translation output to ensure that it meets a level of quality negotiated in advance between the client and the post-editor. Light post-editing aims at making the output simply understandable; full post-editing at making it also stylistically appropriate. With advances in machine translation full post-editing is becoming an alternative to manual translation. Practically all computer-assisted translation (CAT) tools now support post-editing of machine translated output.

Post-editing and machine translation

Machine translation left the labs to start being used for its actual purpose in the late seventies at some big institutions such as the European Commission and the Pan-American Health Organization, and then, later, at some corporations such as Caterpillar and General Motors. First studies on post-editing appeared in the eighties, linked to those implementations. [1] [2] To develop appropriate guidelines and training, members of the Association for Machine Translation in the Americas (AMTA) and the European Association for Machine Translation (EAMT) set a Post-editing Special Interest Group in 1999. [3] [4]

After the nineties, advances in computer power and connectivity sped machine translation development and allowed for its deployment through the web browser, including as a free, useful adjunct to the main search engines (Google Translate, Bing Translator, Yahoo! Babel Fish). A wider acceptance of less than perfect machine translation was accompanied also by a wider acceptance of post-editing. With the demand for localisation of goods and services growing at a pace that could not be met by human translation, not even assisted by translation memory and other translation management technologies, industry bodies such as the Translation Automation Users Society (TAUS) expect machine translation and post-editing to play a much bigger role within the next few years. [5]

The use of Machine Translation suggests sometimes pre-editing.

Light and full post-editing

For many years, no widely accepted, standardized post-editing guidelines existed; [6] however, in 2017, ISO standard 18587:2017: Translation services — Post-editing of machine translation output — Requirements was published. Studies in the eighties distinguished between degrees of post-editing which, in the context of the European Commission Translation Service, were first defined as conventional and rapid [7] or full and rapid. [8] Light and full post-editing seems the wording most used today.

Light post-editing implies minimal intervention by the post-editor, with the aim of ensuring quality is "good enough" or "understandable"; [6] the expectation is that the client will use it for inbound purposes only, often when the text is needed urgently, or has a short time span.

Full post-editing involves a greater level of intervention to achieve a degree of quality to be negotiated between client and post-editor; the expectation is that the outcome will be a text that is not only understandable but presented in some stylistically appropriate way, so it can be used for assimilation and even for dissemination, for inbound and for outbound purposes. The quality is expected to be publishable and equivalent to that of a human translation. [6]

The assumption, however, has been that it takes less effort for translators to work directly from the source text than to post-edit the machine generated version. With advances in machine translation, this may be changing. For some language pairs and for some tasks, and with engines that have been customised with domain specific good quality data, some clients are already requesting translators to post-edit instead of translating from scratch, in the belief that they will attain similar quality at a lower cost.

The light/full classification, developed in the nineties when machine translation still came on a CD-ROM, may not suit advances in machine translation at the light post-editing end either. For some language pairs and some tasks, particularly if the source has been pre-edited, raw machine output may be good enough for gisting purposes without requiring subsequent human intervention.

Post-editing efficiency

Post-editing is used when raw machine translation is not good enough and human translation not required. Industry advises post-editing to be used when it can at least double the productivity of manual translation, even fourfold it in the case of light post-editing (1000 words per hour vs. 250 wph). [9] [10]

However, post-editing efficiency is difficult to predict. Various studies from both academia and industry have claimed that post-editing is generally faster than translating from scratch, regardless of language pairs or translators' experience. [11] There is, however, no agreement about how much time can be saved through post-editing in practice (if any at all): While the industry reports on time savings around 40%, [12] some academic studies suggest that time savings under actual working conditions are more likely to be between 0–20%, or that it may depend on the terminological proximity between the source and target languages. [13] Professionals have also reported negative productivity gains where corrections require more time than to translate from scratch. [14] [15]

Post-editing and the language industry

After some thirty years, post-editing is still "a nascent profession". [16] What the right profile of the post-editor is, has not yet been fully studied. Post-editing overlaps with translating and editing, but only partially. Most think the ideal post-editor will be a translator keen to be trained on the specific skills required, but there are some who think a bilingual without a background in translation may be easier to train. [17] Not much is known either on who the actual post-editors are, whether they tend to be professional translators, whether they work mostly as in-house employees or self-employed, and on which conditions. Many professional translators dislike post-editing, among other reasons because it tends to be paid at lower rates than conventional translations, with the International Association of Professional Translators and Interpreters (IAPTI) having been particularly vocal about it. [18]

The quality of machine translation output for post-editing is higher, and therefore requires less post-editing effort, when the machine translation is provided by a neural, vertical or customised machine translation engine. Translation efficiency gains can be measured by tracking time linguists need to correct the machine translation in the same translation environment, such as XTM Cloud, [19] a Translation management system and Computer-assisted translation tool, where post-editing times and linguistic quality assessment results of the post-edited texts can be compared.

There are not clear figures on how big the post-editing pie is within the translation industry. A recent survey showed 50% of language service providers offered it, but for 85% of them it accounted less than 10% of their throughput. [20] Memsource, a web-based translation tool, claims over 50 percent of translations between English and Spanish, French and other languages have been done in its platform combining translation memory with machine translation. [21] Post-editing is also being done through translation crowdsourcing portals such as Unbabel which, by November 2014 claimed to have post-edited over 11 million words. [22]

Productivity and volume estimates are, in any case, moving targets since advances in machine translation, in a significant part driven by the post-edited text being fed back into its engines, will mean that the more post-editing is done, the higher the quality of machine translation and the more widespread post-editing will become.[ citation needed ]

See also

Related Research Articles

<span class="mw-page-title-main">Machine translation</span> Use of software for language translation

Machine translation is use of either rule-based or probabilistic machine learning approaches to translation of text or speech from one language to another, including the contextual, idiomatic and pragmatic nuances of both languages.

A translation memory (TM) is a database that stores "segments", which can be sentences, paragraphs or sentence-like units that have previously been translated, in order to aid human translators. The translation memory stores the source text and its corresponding translation in language pairs called “translation units”. Individual words are handled by terminology bases and are not within the domain of TM.

Natural language generation (NLG) is a software process that produces natural language output. A widely-cited survey of NLG methods describes NLG as "the subfield of artificial intelligence and computational linguistics that is concerned with the construction of computer systems than can produce understandable texts in English or other human languages from some underlying non-linguistic representation of information".

<span class="mw-page-title-main">Parallel text</span> Text placed alongside its translation or translations

A parallel text is a text placed alongside its translation or translations. Parallel text alignment is the identification of the corresponding sentences in both halves of the parallel text. The Loeb Classical Library and the Clay Sanskrit Library are two examples of dual-language series of texts. Reference Bibles may contain the original languages and a translation, or several translations by themselves, for ease of comparison and study; Origen's Hexapla placed six versions of the Old Testament side by side. A famous example is the Rosetta Stone, whose discovery allowed the Ancient Egyptian language to begin being deciphered.

Computer-aided translation (CAT), also referred to as computer-assisted translation or computer-aided human translation (CAHT), is the use of software, also known as a translator, to assist a human translator in the translation process. The translation is created by a human, and certain aspects of the process are facilitated by software; this is in contrast with machine translation (MT), in which the translation is created by a computer, optionally with some human intervention.

A translation management system (TMS), formerly globalization management system (GMS), is a type of software for automating many parts of the human language translation process and maximizing translator efficiency. The idea of a translation management system is to automate all repeatable and non-essential work that can be done by software/systems and leaving only the creative work of translation and review to be done by human beings. A translation management system generally includes at least two types of technology: process management technology to automate the flow of work, and linguistic technology to aid the translator.

Martin Kay was a computer scientist, known especially for his work in computational linguistics.

Machine translation is a sub-field of computational linguistics that investigates the use of software to translate text or speech from one natural language to another.

Various methods for the evaluation for machine translation have been employed. This article focuses on the evaluation of the output of machine translation, rather than on performance or usability evaluation.

Weidner Communications Inc. was founded by Stephen Weidner in 1977 and marketed the Weidner Multi-Lingual Word Processing System.

Technical translation is a type of specialized translation involving the translation of documents produced by technical writers, or more specifically, texts which relate to technological subject areas or texts which deal with the practical application of scientific and technological information. While the presence of specialized terminology is a feature of technical texts, specialized terminology alone is not sufficient for classifying a text as "technical" since numerous disciplines and subjects which are not "technical" possess what can be regarded as specialized terminology. Technical translation covers the translation of many kinds of specialized texts and requires a high level of subject knowledge and mastery of the relevant terminology and writing conventions.

Mobile translation is any electronic device or software application that provides audio translation. The concept includes any handheld electronic device that is specifically designed for audio translation. It also includes any machine translation service or software application for hand-held devices, including mobile telephones, Pocket PCs, and PDAs. Mobile translation provides hand-held device users with the advantage of instantaneous and non-mediated translation from one human language to another, usually against a service fee that is, nevertheless, significantly smaller than a human translator charges.

The language industry is the sector of activity dedicated to facilitating multilingual communication, both oral and written. According to the European Commission's Directorate-General of Translation, the language industry comprises following activities: translation, interpreting, subtitling, dubbing, software and website globalisation, language technology tools development, international conference organisation, language teaching and linguistic consultancy.

Fuzzy matching is a technique used in computer-assisted translation as a special case of record linkage. It works with matches that may be less than 100% perfect when finding correspondences between segments of a text and entries in a database of previous translations. It usually operates at sentence-level segments, but some translation technology allows matching at a phrasal level. It is used when the translator is working with translation memory (TM). It uses approximate string matching.

Caitra is a translation Computer Assisted Tool, or CAT, developed by the University of Edinburgh. Provided from an online platform, Caitra is based on AJAX Web.2 technologies and the Moses decoder. The web page of the tool is implemented with Ruby on Rails, an open source web framework, and C++.

Interactive machine translation (IMT), is a specific sub-field of computer-aided translation. Under this translation paradigm, the computer software that assists the human translator attempts to predict the text the user is going to input by taking into account all the information it has available. Whenever such prediction is wrong and the user provides feedback to the system, a new prediction is performed considering the new information available. Such process is repeated until the translation provided matches the user's expectations.

Hybrid machine translation is a method of machine translation that is characterized by the use of multiple machine translation approaches within a single machine translation system. The motivation for developing hybrid machine translation systems stems from the failure of any single technique to achieve a satisfactory level of accuracy. Many hybrid machine translation systems have been successful in improving the accuracy of the translations, and there are several popular machine translation systems which employ hybrid methods.

<span class="mw-page-title-main">MateCat</span> Web-based computer-assisted translation tool

MateCat is a web-based computer-assisted translation (CAT) tool, released as open-source software under the Lesser General Public License (LGPL).

ISO 17100:2015 Translation Services-Requirements for Translation Services was published on May 1, 2015. It was prepared by the International Organization for Standardization's Technical Committee ISO/TC 37, Terminology and other language and content resources, Subcommittee SC 5, Translation, interpreting and related technology.

<span class="mw-page-title-main">Marco Trombetti</span>

Marco Trombetti is an Italian computer scientist, entrepreneur, investor, and ocean sailor. He co-founded Translated, a pioneer of artificial intelligence in the language industry. Through Translated, he helped develop the first AI-powered open-source Computer Assisted Translation (CAT) tool, Matecat, which also introduced the first adaptive machine translation system. He is considered one of the most influential leaders and innovators in the language industry. His research on progress toward the language singularity, presented during a keynote at the Association for Machine Translation in the Americas (AMTA) conference in 2022, has provided invaluable insights into the field of artificial intelligence.

References

  1. Senez, Dorothy (12–13 November 1998). "Post-editing service for machine translation users at the European Commission". Translating and the Computer 20. Proceedings from ASLIB Conference. CiteSeerX   10.1.1.477.4105 .
  2. Vasconcellos, Muriel; Léon, Marjorie (1985). "SPANAM and ENGSPA: Machine Translation at the Pan American Health Organization". Computational Linguistics. 11: 122–136. CiteSeerX   10.1.1.14.9212 .
  3. Allen, Jeffrey H. (2003). "16. Post-editing". In Somers, H. L. (ed.). Computers and translation: a translator's guide. Amsterdam Philadelphia: J. Benjamins. p. 312. ISBN   978-90-272-1640-3. OCLC   52938937.
  4. Somers, H. L. (January 2003). Allen, Jeffrey. "Post-editing", in Harold Somers (ed.) (2003). Computers and Translation. A translator's guide. Benjamins: Amsterdam/Philadelphia, p. 312. ISBN   978-90-272-1640-3.
  5. "TAUS website". YouTube .[ dead YouTube link ]
  6. 1 2 3 Hu, Ke; cadwell, Patrick (2016). "A Comparative Study of Post-editing Guidelines". Baltic Journal of Modern Computing. 4: 346–353.
  7. LOFFLER-LAURIAN, ANNE-MARIE (1986). "Post-édition rapide et post-édition conventionelle: l Deux modalités d´une activité spécifique". Multilingua – Journal of Cross-Cultural and Interlanguage Communication. 5 (2). Walter de Gruyter GmbH: 81–88. doi:10.1515/mult.1986.5.2.81. ISSN   0167-8507. S2CID   201700030.
  8. Wagner, Elisabeth (10–11 November 1983). "Rapid post-editing of Systran". Translating and the Computer 5. Proceedings from ASLIB Conference: 199–213.
  9. Boitet, Christian; Blanchon, Hervé (1994). ""Promesses et problèmes de la " TAO pour tous ". Après LIDIA-1, une première maquette"". Langages. 28 (116): 20–47. doi:10.3406/lgge.1994.1692.
  10. "Fiche métier – Post-édition | Société française des traducteurs : syndicat professionnel (SFT)". www.sft.fr. Retrieved 16 August 2022.
  11. Green, Spence; Jeffrey Heer; Christopher D. Manning (2013). "The Efficacy of Human Post-Editing for Language Translation" (PDF). ACM Human Factors in Computing Systems.
  12. Plitt, Mirko and Francois Masselot (2010). "A Productivity Test of Statistical Machine Translation Post-Editing in A Typical Localisation Context" (PDF). Prague Bulletin of Mathematical Linguistics. 93: 7–16. doi: 10.2478/v10108-010-0010-x .
  13. Shah, Ritesh; Boitet, Christian; Bhattacharyya, Pushpak; Padmakumar, Mithun; Zilio, Leonardo; Kalitvianski, Ruslan; Nasiruddin, Mohammad; Tomokiyo, Mutsuko; Páez, Sandra Castellanos (2015). "Post-editing a chapter of a specialized textbook into 7 languages: importance of terminological proximity with English for productivity". Proceedings of the 12th International Conference on Natural Language Processing. Trivandrum, India: NLP Association of India: 325–332.
  14. Marcello Federico; Alessandro Cattelan; Marco Trombetti (2012). "Measuring user productivity in machine translation enhanced computer assisted translation" (PDF). Proceedings of the Tenth Biennial Conference of the Association for Machine Translation in the Americas (AMTA), San Diego, CA, 28 October – 1 November.
  15. Läubli, Samuel; Mark Fishel; Gary Massey; Maureen Ehrensberger-Dow; Martin Volk (2013). "Assessing post-editing efficiency in a realistic translation environment" (PDF). Proceedings of the 2nd Workshop on Post-editing Technology and Practice. pp. 83–91.
  16. "TAUS website".
  17. Hutchins, John (1995). "Reflections on the History and present state of machine translation" (PDF).
  18. "IAPTI website".
  19. "XTM International official website".
  20. "Postediting in Practice. A TAUS Report" (PDF). March 2010. p. 13.
  21. "Memsource website".
  22. "Unbabel Launches A Human-Edited Machine Translation Service To Help Businesses Go Global, Localize Customer Support".

Further reading