Translation unit

Last updated

In the field of translation, a translation unit is a segment of a text which the translator treats as a single cognitive unit for the purposes of establishing an equivalence. It may be a single word, a phrase, one or more sentences, or even a larger unit.

Contents

When a translator segments a text into translation units, the larger these units are, the better chance there is of obtaining an idiomatic translation. This is true not only of human translation, but also where human translators use computer-assisted translation, such as translation memories, and when translations are performed by machine translation systems.

Perceptions on the concept of unit

Vinay and Darbelnet took to Saussure's original concepts of the linguistic sign when beginning to discuss the idea of a single word as a translation unit. [1] According to Saussure, the sign is naturally arbitrary, so it can only derive meaning from contrast in other signs in that same system.

However, Russian scholar Leonid Barkhudarov [2] stated that, limiting it to poetry, for instance, a translation unit can take the form of a complete text. This seems to relate to his conception that a translation unit is the smallest unit in the source language with an equivalent in the target one, and when its parts are taken individually, they become untranslatable; these parts can be as small as phonemes or morphemes, or as large as entire texts.

Susan Bassnett widened Barkhudarov's poetry perception to include prose, adding that in this type of translation text is the prime unit, including the idea that sentence-by-sentence translation could cause loss of important structural features.

Swiss linguist Werner Koller connected Barkhudarov's idea of unit sizing to the difference between the two languages involved, by stating that the more different or unrelated these languages were, the larger the unit would be. [3]

One final perception on the idea of unit came from linguist Eugene Nida. To him, translation units have a tendency to be small groups of language building up into sentences, thus forming what he called meaningful mouthfuls of language.

Points of view towards translation units

Process-oriented POV

According to this point of view, a translation unit is a stretch of text on which attention is focused to be represented as a whole in the target language. In this point of view we can consider the concept of the think-aloud protocol, supported by German linguist Wolfgang Lörscher: isolating units using self-reports by translating subjects. It also relates to how experienced the translator in question is: language learners take a word as a translation unit, whereas experienced translators isolate and translate units of meaning in the form of phrases, clauses or sentences.

Since 1996 and 2005 keylogging [4] and eyetracking [5] technologies were introduced in Translation Process Research. These more advanced and non-invasive research methods made it possible to elaborate a finer-grained assessment of translation units as loops of (source or target text) reading and target text typing. Loops of translation units are thought to be the basic units by which translations are produced. Thus, Malmkjaer [6] , for instance, defines process oriented translation units as a “stretch of the source text that the translator keeps in mind at any one time, in order to produce translation equivalents in the text he or she is creating” (p. 286). Records of keystrokes and eye movements allow to investigate these mental constructs through their physical (observable) behavioral traces in the translation process data. Empirical Translation Process Research has deployed numerous theories to explain and models the behavioral traces of these assumed mental units.

Product-oriented POV

Here, the target-text unit can be mapped into an equivalent source-text unit. A case study on this matter was reported by Gideon Toury, in which 27 English-Hebrew student-produced translations were mapped onto a source text. Those students that were less experienced had larger numbers of small units at word and morpheme level in their translations, while one student with translation experience had approximately half of those units, mostly at phrase or clause level.

Related Research Articles

<span class="mw-page-title-main">Language</span> Structured system of communication

Language is a structured system of communication that consists of grammar and vocabulary. It is the primary means by which humans convey meaning, both in spoken and written forms, and may also be conveyed through sign languages. Human language is characterized by its cultural and historical diversity, with significant variations observed between cultures and across time. Human languages possess the properties of productivity and displacement, which enable the creation of an infinite number of sentences, and the ability to refer to objects, events, and ideas that are not immediately present in the discourse. The use of human language relies on social convention and is acquired through learning.

In linguistics, morphology is the study of words, including the principles by which they are formed, and how they relate to one another within a language. Most approaches to morphology investigate the structure of words in terms of morphemes, which are the smallest units in a language with some independent meaning. Morphemes include roots that can exist as words by themselves, but also categories such as affixes that can only appear as part of a larger word. For example, in English the root catch and the suffix -ing are both morphemes; catch may appear as its own word, or it may be combined with -ing to form the new word catching. Morphology also analyzes how words behave as parts of speech, and how they may be inflected to express grammatical categories including number, tense, and aspect. Concepts such as productivity are concerned with how speakers create words in specific contexts, which evolves over the history of a language.

A translation memory (TM) is a database that stores "segments", which can be sentences, paragraphs or sentence-like units that have previously been translated, in order to aid human translators. The translation memory stores the source text and its corresponding translation in language pairs called “translation units”. Individual words are handled by terminology bases and are not within the domain of TM.

Linguistics is the scientific study of human language. Someone who engages in this study is called a linguist. See also the Outline of linguistics, the List of phonetics topics, the List of linguists, and the List of cognitive science topics. Articles related to linguistics include:

In linguistics, a calque or loan translation is a word or phrase borrowed from another language by literal word-for-word or root-for-root translation. When used as a verb, “to calque” means to borrow a word or phrase from another language while translating its components, so as to create a new lexeme in the target language. For instance, the English word "skyscraper" has been calqued in dozens of other languages, combining words for "sky" and "scrape" in each language, as for example, German: Wolkenkratzer, Portuguese: Arranha-céu. Another notable example is the Latin weekday names, which came to be associated by ancient Germanic speakers with their own gods following a practice known as interpretatio germanica: the Latin "Day of Mercury", Mercurii dies, was borrowed into Late Proto-Germanic as the "Day of Wōđanaz" (Wodanesdag), which became Wōdnesdæg in Old English, then "Wednesday" in Modern English.

<span class="mw-page-title-main">Agglutination</span> Process of word formation by combining morphemes of singular meaning

In linguistics, agglutination is a morphological process in which words are formed by stringing together morphemes, each of which corresponds to a single syntactic feature. Languages that use agglutination widely are called agglutinative languages. For example, in the agglutinative language of Turkish, the word evlerinizden consists of the morphemes ev-ler-i-n-i-z-de-n. Agglutinative languages are often contrasted with isolating languages, in which words are monomorphemic, and fusional languages, in which words can be complex, but morphemes may correspond to multiple features.

In linguistics, a bound morpheme is a morpheme that can appear only as part of a larger expression, while a free morpheme is one that can stand alone. A bound morpheme is a type of bound form, and a free morpheme is a type of free form.

Computer-aided translation (CAT), also referred to as computer-assisted translation or computer-aided human translation (CAHT), is the use of software, also known as a translator, to assist a human translator in the translation process. The translation is created by a human, and certain aspects of the process are facilitated by software; this is in contrast with machine translation (MT), in which the translation is created by a computer, optionally with some human intervention.

Syntax is concerned with the way sentences are constructed from smaller parts, such as words and phrases. Two steps can be distinguished in the study of syntax. The first step is to identify different types of units in the stream of speech and writing. In natural languages, such units include sentences, phrases, and words. The second step is to analyze how these units build up larger patterns, and in particular to find general rules that govern the construction of sentences.http://people.dsv.su.se/~vadim/cmnew/chapter2/ch2_21.htm

In linguistics and pedagogy, an interlinear gloss is a gloss placed between lines, such as between a line of original text and its translation into another language. When glossed, each line of the original text acquires one or more corresponding lines of transcription known as an interlinear text or interlinear glossed text (IGT)—interlinear for short. Such glosses help the reader follow the relationship between the source text and its translation, and the structure of the original language. In its simplest form, an interlinear gloss is simply a literal, word-for-word translation of the source text.

Statistical machine translation (SMT) was a machine translation approach, that superseded the previous, rule-based approach because it required explicit description of each and every linguistic rule, which was costly, and which often did not generalize to other languages. Since 2003, the statistical approach itself has been gradually superseded by the deep learning-based neural network approach.

Martin Kay was a computer scientist, known especially for his work in computational linguistics.

<span class="mw-page-title-main">Isthmus Zapotec</span> Language

Isthmus Zapotec, also known as Juchitán Zapotec, is a Zapotec language spoken in Tehuantepec and Juchitán de Zaragoza, in the Mexican state of Oaxaca. According to the census of 1990 it has about 85,000 native speakers, however this number is rapidly decreasing, as speakers shift to Spanish.

The term linguistic performance was used by Noam Chomsky in 1960 to describe "the actual use of language in concrete situations". It is used to describe both the production, sometimes called parole, as well as the comprehension of language. Performance is defined in opposition to "competence"; the latter describes the mental knowledge that a speaker or listener has of language.

The Columbia School of Linguistics is a group of linguists with a radically functional and empirical conception of language. According to their school of thought, the main function of language is communication, and it is this fact that guides the formulation of grammatical hypotheses and constrains the form these hypotheses can take. Columbia School linguistic analyses typically are based on observable data, such as corpora, not on introspective ad hoc sentence examples. Rather than a single theory of language, the Columbia School is a set of orientations in which scholars analyze actual speech acts in an attempt to explain why they take the forms they do. This was the methodology of its founder, the late William Diver, who taught linguistics at Columbia University until his retirement in 1989.

In semantics, donkey sentences are sentences that contain a pronoun with clear meaning but whose syntactic role in the sentence poses challenges to linguists. Such sentences defy straightforward attempts to generate their formal language equivalents. The difficulty is with understanding how English speakers parse such sentences.

<span class="mw-page-title-main">Translation</span> Transfer of the meaning of something in one language into another

Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. The English language draws a terminological distinction between translating and interpreting ; under this distinction, translation can begin only after the appearance of writing within a language community.

Structural linguistics, or structuralism, in linguistics, denotes schools or theories in which language is conceived as a self-contained, self-regulating semiotic system whose elements are defined by their relationship to other elements within the system. It is derived from the work of Swiss linguist Ferdinand de Saussure and is part of the overall approach of structuralism. Saussure's Course in General Linguistics, published posthumously in 1916, stressed examining language as a dynamic system of interconnected units. Saussure is also known for introducing several basic dimensions of semiotic analysis that are still important today. Two of these are his key methods of syntagmatic and paradigmatic analysis, which define units syntactically and lexically, respectively, according to their contrast with the other units in the system.

Linguistics is the scientific study of language. Linguistics is based on a theoretical as well as a descriptive study of language and is also interlinked with the applied fields of language studies and language learning, which entails the study of specific languages. Before the 20th century, linguistics evolved in conjunction with literary study and did not employ scientific methods. Modern-day linguistics is considered a science because it entails a comprehensive, systematic, objective, and precise analysis of all aspects of language – i.e., the cognitive, the social, the cultural, the psychological, the environmental, the biological, the literary, the grammatical, the paleographical, and the structural.

Example-based machine translation (EBMT) is a method of machine translation often characterized by its use of a bilingual corpus with parallel texts as its main knowledge base at run-time. It is essentially a translation by analogy and can be viewed as an implementation of a case-based reasoning approach to machine learning.

References

  1. "Dr. Shadia Y. Banjar: Lecture Notes: The unit of translation". 2009-11-05.
  2. Barkhudarov, Leonid (1969). Urovni yazykovoy iyerarkhii i perevod (Levels of language hierarchy and translation). In: Tetradi perevodchika (The Translator's Notebooks). pp. 3–12.
  3. Koller, Werner (1992). Einführung in die Übersetzungswissenschaft (Introduction to Translation Studies). Heidelberg: Quelle & Meyer.
  4. Jakobsen, Arnt L., and Lasse Schou. 1999 “Translog Documentation, Version 1.0.” In Probing the Process in Translation: Methods and Results ed. by Gyde Hansen, 1–36. Copenhagen: Samfundslitteratur
  5. Hvelplund (2017) Eye Tracking in Translation Process Research. In The Handbook of Translation and Cognition, (Eds.) John W. Schwieter and Aline Ferreira, Willey, https://doi.org/10.1002/9781119241485.ch14
  6. Malmkjaer (1998) Unit of Translation. In M. Baker (Ed.) Routledge Encyclopedia of Translation Studies (pp. 286–88). London: Routledge