History of machine translation

Last updated November 06, 2024

Machine translation is a sub-field of computational linguistics that investigates the use of software to translate text or speech from one natural language to another.

In the 1950s, machine translation became a reality in research, although references to the subject can be found as early as the 17th century. The Georgetown experiment, which involved successful fully automatic translation of more than sixty Russian sentences into English in 1954, was one of the earliest recorded projects.^[1]^[2] Researchers of the Georgetown experiment asserted their belief that machine translation would be a solved problem within a few years.^[3] In the Soviet Union, similar experiments were performed shortly after.^[4] Consequently, the success of the experiment ushered in an era of significant funding for machine translation research in the United States. The achieved progress was much slower than expected; in 1966, the ALPAC report found that ten years of research had not fulfilled the expectations of the Georgetown experiment and resulted in dramatically reduced funding^{[ citation needed ]}.

Interest grew in statistical models for machine translation, which became more common and also less expensive in the 1980s as available computational power increased.

Although there exists no autonomous system of "fully automatic high quality translation of unrestricted text,"^[5]^[6]^[7] there are many programs now available that are capable of providing useful output within strict constraints. Several of these programs are available online, such as Google Translate and the SYSTRAN system that powers AltaVista's BabelFish (which was replaced by Microsoft Bing translator in May 2012).

The beginning

The origins of machine translation can be traced back to the work of Al-Kindi, a 9th-century Arabic cryptographer who developed techniques for systemic language translation, including cryptanalysis, frequency analysis, and probability and statistics, which are used in modern machine translation.^[8] The idea of machine translation later appeared in the 17th century. In 1629, René Descartes proposed a universal language, with equivalent ideas in different tongues sharing one symbol.^[9]

In the mid-1930s the first patents for "translating machines" were applied for by Georges Artsrouni, for an automatic bilingual dictionary using paper tape. Russian Peter Troyanskii submitted a more detailed proposal^[10]^[11] that included both the bilingual dictionary and a method for dealing with grammatical roles between languages, based on the grammatical system of Esperanto. This system was separated into three stages: stage one consisted of a native-speaking editor in the source language to organize the words into their logical forms and to exercise the syntactic functions; stage two required the machine to "translate" these forms into the target language; and stage three required a native-speaking editor in the target language to normalize this output. Troyanskii's proposal remained unknown until the late 1950s, by which time computers were well-known and utilized.

The early years

The first set of proposals for computer based machine translation was presented in 1949 by Warren Weaver, a researcher at the Rockefeller Foundation, "Translation memorandum".^[12] These proposals were based on information theory, successes in code breaking during the Second World War, and theories about the universal principles underlying natural language.

A few years after Weaver submitted his proposals, research began in earnest at many universities in the United States. On 7 January 1954 the Georgetown–IBM experiment was held in New York at the head office of IBM. This was the first public demonstration of a machine translation system. The demonstration was widely reported in the newspapers and garnered public interest. The system itself, however, was no more than a "toy" system. It had only 250 words and translated 49 carefully selected Russian sentences into English – mainly in the field of chemistry. Nevertheless, it encouraged the idea that machine translation was imminent and stimulated the financing of the research, not only in the US but worldwide.^[3]

Early systems used large bilingual dictionaries and hand-coded rules for fixing the word order in the final output which was eventually considered too restrictive in linguistic developments at the time. For example, generative linguistics and transformational grammar were exploited to improve the quality of translations. During this period operational systems were installed. The United States Air Force used a system produced by IBM and Washington University in St. Louis, while the Atomic Energy Commission and Euratom, in Italy, used a system developed at Georgetown University. While the quality of the output was poor it met many of the customers' needs, particularly in terms of speed.^{[ citation needed ]}

At the end of the 1950s, Yehoshua Bar-Hillel was asked by the US government to look into machine translation, to assess the possibility of fully automatic high-quality translation by machines. Bar-Hillel described the problem of semantic ambiguity or double-meaning, as illustrated in the following sentence:

Little John was looking for his toy box. Finally he found it. The box was in the pen.

The word pen may have two meanings: the first meaning, something used to write in ink with; the second meaning, a container of some kind. To a human, the meaning is obvious, but Bar-Hillel claimed that without a "universal encyclopedia" a machine would never be able to deal with this problem. At the time, this type of semantic ambiguity could only be solved by writing source texts for machine translation in a controlled language that uses a vocabulary in which each word has exactly one meaning.^{[ citation needed ]}

The 1960s, the ALPAC report and the seventies

Research in the 1960s in both the Soviet Union and the United States concentrated mainly on the Russian–English language pair. The objects of translation were chiefly scientific and technical documents, such as articles from scientific journals. The rough translations produced were sufficient to get a basic understanding of the articles. If an article discussed a subject deemed to be confidential, it was sent to a human translator for a complete translation; if not, it was discarded.

A great blow came to machine-translation research in 1966 with the publication of the ALPAC report. The report was commissioned by the US government and delivered by ALPAC, the Automatic Language Processing Advisory Committee, a group of seven scientists convened by the US government in 1964. The US government was concerned that there was a lack of progress being made despite significant expenditure. The report concluded that machine translation was more expensive, less accurate and slower than human translation, and that despite the expenditures, machine translation was not likely to reach the quality of a human translator in the near future.

The report recommended, however, that tools be developed to aid translators – automatic dictionaries, for example – and that some research in computational linguistics should continue to be supported.

The publication of the report had a profound impact on research into machine translation in the United States, and to a lesser extent the Soviet Union and United Kingdom. Research, at least in the US, was almost completely abandoned for over a decade. In Canada, France and Germany, however, research continued. In the US the main exceptions were the founders of SYSTRAN (Peter Toma) and Logos (Bernard Scott), who established their companies in 1968 and 1970 respectively and served the US Department of Defense. In 1970, the SYSTRAN system was installed for the United States Air Force, and subsequently by the Commission of the European Communities in 1976. The METEO System, developed at the Université de Montréal, was installed in Canada in 1977 to translate weather forecasts from English to French, and was translating close to 80,000 words per day or 30 million words per year until it was replaced by a competitor's system on 30 September 2001.^[13]

While research in the 1960s concentrated on limited language pairs and input, demand in the 1970s was for low-cost systems that could translate a range of technical and commercial documents. This demand was spurred by the increase of globalisation and the demand for translation in Canada, Europe, and Japan.^{[ citation needed ]}

The 1980s and early 1990s

By the 1980s, both the diversity and the number of installed systems for machine translation had increased. A number of systems relying on mainframe technology were in use, such as SYSTRAN, Logos, Ariane-G5, and Metal.^{[ citation needed ]}

As a result of the improved availability of microcomputers, there was a market for lower-end machine translation systems. Many companies took advantage of this in Europe, Japan, and the USA. Systems were also brought onto the market in China, Eastern Europe, Korea, and the Soviet Union.^{[ citation needed ]}

During the 1980s there was a lot of activity in MT in Japan especially. With the fifth-generation computer, Japan intended to leap over its competition in computer hardware and software, and one project that many large Japanese electronics firms found themselves involved in was creating software for translating into and from English (Fujitsu, Toshiba, NTT, Brother, Catena, Matsushita, Mitsubishi, Sharp, Sanyo, Hitachi, NEC, Panasonic, Kodensha, Nova, Oki).^{[ citation needed ]}

Research during the 1980s typically relied on translation through some variety of intermediary linguistic representation involving morphological, syntactic, and semantic analysis.^{[ citation needed ]}

At the end of the 1980s, there was a large surge in a number of novel methods for machine translation. One system was developed at IBM that was based on statistical methods. Makoto Nagao and his group used methods based on large numbers of translation examples, a technique that is now termed example-based machine translation.^[14]^[15] A defining feature of both of these approaches was the neglect of syntactic and semantic rules and reliance instead on the manipulation of large text corpora.

During the 1990s, encouraged by successes in speech recognition and speech synthesis, research began into speech translation with the development of the German Verbmobil project.

The Forward Area Language Converter (FALCon) system, a machine translation technology designed by the Army Research Laboratory, was fielded 1997 to translate documents for soldiers in Bosnia.^[16]

There was significant growth in the use of machine translation as a result of the advent of low-cost and more powerful computers. It was in the early 1990s that machine translation began to make the transition away from large mainframe computers toward personal computers and workstations. Two companies that led the PC market for a time were Globalink and MicroTac, following which a merger of the two companies (in December 1994) was found to be in the corporate interest of both. Intergraph and Systran also began to offer PC versions around this time. Sites also became available on the internet, such as AltaVista's Babel Fish (using Systran technology) and Google Language Tools (also initially using Systran technology exclusively).

2000s

The field of machine translation has seen major changes in the 2000s. A large amount of research was done into statistical machine translation and example-based machine translation. In the area of speech translation, research was focused on moving from domain-limited systems to domain-unlimited translation systems. In different research projects in Europe (like TC-STAR)^[17] and in the United States (STR-DUST and DARPA Global autonomous language exploitation program), solutions for automatically translating Parliamentary speeches and broadcast news was developed. In these scenarios the domain of the content was no longer limited to any special area, but rather the speeches to be translated cover a variety of topics. The French–German project Quaero investigated the possibility of making use of machine translations for a multi-lingual internet. The project sought to translate not only webpages, but also videos and audio files on the internet.

2010s

The past decade witnessed neural machine translation (NMT) methods replace statistical machine translation. The term neural machine translation was coined by Bahdanau et al^[18] and Sutskever et al^[19] who also published the first research regarding this topic in 2014. Neural networks only needed a fraction of the memory needed by statistical models and whole sentences could be modeled in an integrated manner. The first large scale NMT was launched by Baidu in 2015, followed by Google Neural Machine Translation (GNMT) in 2016. This was followed by other translation services like DeepL Translator and the adoption of NMT technology in older translation services like Microsoft translator.

Neural networks use a single end to end neural network architecture known as sequence to sequence (seq2seq) which uses two recurrent neural networks (RNN). An encoder RNN and a decoder RNN. Encoder RNN uses encoding vectors on the source sentence and the decoder RNN generates the target sentence based on the previous encoding vector.^{[ citation needed ]} Further advancements in the attention layer, transformation and back propagation techniques have made NMTs flexible and adopted in most machine translation, summarization and chatbot technologies.^[20]

Notes

↑ Nye, Mary Jo (2016). "Speaking in Tongues: Science's centuries-long hunt for a common language". Distillations. 2 (1): 40–43. Retrieved 22 March 2018.
↑ Gordin, Michael D. (2015). Scientific Babel: How Science Was Done Before and After Global English. Chicago, Illinois: University of Chicago Press. pp. 213–17. ISBN 978-0-226-00029-9.
1 2 Plumb, Robert K. (8 January 1954). "Russian Is Turned Into English By a Fast Electronic Translator". New York Times.
↑ Madsen, Mathias Winther (23 December 2009). The Limits of Machine Translation (Thesis). University of Copenhagen. p. 11.
↑ Melby, Alan K. (1995). The Possibility of Language. Amsterdam: J. Benjamins. pp. 27–41. ISBN 978-90-272-1614-4.
↑ Wooten, Adam (14 February 2006). "A Simple Model Outlining Translation Technology". T&I Business. Archived from the original on 16 July 2012.
↑ "Appendix III of 'The present status of automatic translation of languages'" (PDF). Advances in Computers. 1960. pp. 158–163. Archived from the original (PDF) on 28 September 2018. Retrieved 2 May 2009. Reprinted in Y.Bar-Hillel (1964). Language and information. Massachusetts: Addison-Wesley. pp. 174–179.
↑ DuPont, Quinn (January 2018). "The Cryptological Origins of Machine Translation: From al-Kindi to Weaver". Amodern (8).
↑ 浜口, 稔 (30 April 1993). 英仏普遍言語計画. 工作舎. pp. 70–71. ISBN 978-4-87502-214-5. 普遍的文字の構築という初期の試みに言及するときは1629年11月にデカルトがメルセンヌに宛てた手紙から始まる、というのが通り相場となっている。しかし、この問題への関心を最初に誘発した多くの要因を吟味してみると、ある種の共通の書字という構想は明らかに、ずっと以前から比較的なじみ深いものになっていたようである。…フランシス・ベイコンは、1605年出版の学問の進歩についてのなかで、そのような真正の文字の体系は便利であると述べていたtranslated from
Knowlson, James (1975). UNIVERSAL LANGUAGE SCHEMES IN ENGLAND AND FRANCE 1600-1800 . ISBN 978-0-8020-5296-4.
↑ 別所, 照彦; 棚橋, 善照 (15 October 1960). "自動翻訳". In 玉木, 英彦; 喜安, 善市 (eds.). 自動翻訳デ・ユ・パノフ著 (in Japanese) (1 ed.). Tokyo: （株）みすず書房. pp. 10–11. 翻訳のある程度の機械化は1933年にペ・ペ・トロヤンスキーが企てたのがおそらく最初であろう。彼は「一つの言語から他の一つまたは同時に多数の言語への翻訳に際し、単語を選別しかつ印刷する機械」をつくることを提案した。この発明でペ・ペ・トロヤンスキーは特許をとったが、当時それを実現することは巧くいかなかった。(translation (assisted by Google Translate): It may be almost first case of machine translation that Peter Petrovich Troyanskii tried in 1933. He presented that "to criate the machine which choice words and print them on translation from a language to another language or to multiple languages simultaneously." He got the patent by this invention, but it was not able to implement at that time.)
↑ 別所, 照彦; 沢辺, 弘 (25 February 1964). 翻訳機械（文庫クセジュ現代知識の焦点） (in Japanese) (1 ed.). Tokyo: （株）白水社. p. 39. モスクワで1933年に特許をとったロシア人スミルノフ・トロヤンスキーの発明は、同時にいくつかの言語を翻訳し、遠方まで送ることを可能とするように見えた。(translation (assisted by Google Translate): The invention patented by Peter Petrovich Troyanskii in 1933 seemed be able to translate multiple language simultaneously and sent them to far place.), translated from
Delaveney, Émile. LA MACHINE A TRADUIRE (Collection QUE SAIS-JE? No.834) (in French). Presses Universitaires de France.
↑ "Weaver memorandum". March 1949. Archived from the original on 5 October 2006.
↑ "PROCUREMENT PROCESS". Canadian International Trade Tribunal. 30 July 2002. Archived from the original on 6 July 2011. Retrieved 10 February 2007.
↑ Nagao, Makoto (October 1984). "A framework of a mechanical translation between Japanese and English by analogy principle". Proc. of the International NATO Symposium on Artificial and Human Intelligence. North-Holland. pp. 173–180. ISBN 978-0-444-86545-8.
↑ "the Association for Computational Linguistics – 2003 ACL Lifetime Achievement Award". Association for Computational Linguistics. Archived from the original on 12 June 2010. Retrieved 10 March 2010.
↑ Weisgerber, John; Yang, Jin; Fisher, Pete (2000). "Pacific Rim Portable Translator". Envisioning Machine Translation in the Information Future. Lecture Notes in Computer Science. Vol. 1934. pp. 196–201. doi:10.1007/3-540-39965-8_21. ISBN 978-3-540-41117-8. S2CID 36571004.
↑ "TC-Star" . Retrieved 25 October 2010.
↑ Cho, Kyunghyun; van Merrienboer, Bart; Gulcehre, Caglar; Bahdanau, Dzmitry; Bougares, Fethi; Schwenk, Holger; Bengio, Yoshua (2014). "Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation". Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg, PA, USA: Association for Computational Linguistics. pp. 1724–1734. arXiv: 1406.1078 . doi:10.3115/v1/d14-1179. S2CID 5590763.
↑ Tachioka, Yuuki; Watanabe, Shinji; Le Roux, Jonathan; Hershey, John R (December 2014). "Sequence discriminative training for low-rank deep neural networks". 2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP). IEEE. pp. 572–576. doi:10.1109/globalsip.2014.7032182. ISBN 978-1-4799-7088-9. S2CID 767028.
↑ "What is Neural Machine Translation & How does it work?". TranslateFX. Retrieved 9 December 2022.

Related Research Articles

<span class="mw-page-title-main">Machine translation</span> Computerized translation between natural languages

Machine translation is use of computational techniques to translate text or speech from one language to another, including the contextual, idiomatic and pragmatic nuances of both languages.

SYSTRAN, founded by Dr. Peter Toma in 1968, is one of the oldest machine translation companies. SYSTRAN has done extensive work for the United States Department of Defense and the European Commission.

Japanese is an agglutinative, synthetic, mora-timed language with simple phonotactics, a pure vowel system, phonemic vowel and consonant length, and a lexically significant pitch-accent. Word order is normally subject–object–verb with particles marking the grammatical function of words, and sentence structure is topic–comment. Its phrases are exclusively head-final and compound sentences are exclusively left-branching. Sentence-final particles are used to add emotional or emphatic impact, or make questions. Nouns have no grammatical number or gender, and there are no articles. Verbs are conjugated, primarily for tense and voice, but not person. Japanese adjectives are also conjugated. Japanese has a complex system of honorifics with verb forms and vocabulary to indicate the relative status of the speaker, the listener, and persons mentioned.

<i>Royal Space Force: The Wings of Honnêamise</i> 1987 animated film by Hiroyuki Yamaga

Royal Space Force: The Wings of Honnêamise is a 1987 Japanese animated science fiction film written and directed by Hiroyuki Yamaga, co-produced by Hiroaki Inoue and Hiroyuki Sueyoshi, and planned by Toshio Okada and Shigeru Watanabe, with music by Ryuichi Sakamoto. The story takes place in an alternate world where a disengaged young man, Shirotsugh, inspired by an idealistic woman, Riquinni, volunteers to become the first astronaut. The film was the debut by the studio Gainax, and the first anime produced by Bandai.

2channel, also known as 2ch, Channel 2, and sometimes retrospectively as 2ch.net, was an anonymous Japanese textboard founded in 1999 by Hiroyuki Nishimura. Described in 2007 as "Japan's most popular online community", the site had a level of influence comparable to that of traditional mass media such as television, radio, and magazines. At the time, the site drew an annual revenue of around ¥100 million, and was the largest of its kind in the world, with around ten million visitors and 2.5 million posts made per day.

Takao Suzuki was a Japanese sociolinguist, He was the author of ことばと文化, translated into English as Words in Context.

<i>Angels Egg</i> 1985 Japanese animated film by Mamoru Oshii

Angel's Egg is a Japanese art film original video animation (OVA) written and directed by Mamoru Oshii. Released by Tokuma Shoten on 15 December 1985, the film was a collaboration between artist Yoshitaka Amano and Oshii. The film stars two nameless characters, a young girl who protects an egg, who bonds with a boy who has a dream about a bird. It was the first original project by Oshii and carries themes found in his other works.

Japanese pitch accent is a feature of the Japanese language that distinguishes words by accenting particular morae in most Japanese dialects. The nature and location of the accent for a given word may vary between dialects. For instance, the word for "river" is in the Tokyo dialect, with the accent on the second mora, but in the Kansai dialect it is. A final or is often devoiced to or after a downstep and an unvoiced consonant.

<i>Street Fighter 2010: The Final Fight</i> 1990 video game

Street Fighter 2010: The Final Fight is a side-scrolling action platform game developed and published by Capcom for the Nintendo Entertainment System in 1990. It was marketed as a science-fiction-themed spin-off to the 1987 arcade game Street Fighter. Its English localization changed the name and backstory of the main character to imply that he is Ken from Street Fighter, whereas the protagonist in the original Japanese version is a completely unrelated character named Kevin. Street Fighter 2010 is a different genre from the traditional Street Fighter games, which are one-on-one fighting games.

The Miss Nippon Contest is a Japanese beauty pageant. In the past 57 competitions, the total number of applications is 118,794 people and the Grand Prix is 55 people. It started in 1950 by the Yomiuri Shimbun and is held every year by the "Miss Nippon Association".

Soramimi is a Japanese word that in the context of contemporary Japanese internet meme culture and its related slang is commonly used to refer to humorous homophonic reinterpretation, deliberately interpreting words as other similar-sounding words for comedy.

IBM's Automatic Language Translator was a machine translation system that converted Russian documents into English. It used an optical disc that stored 170,000 word-for-word and statement-for-statement translations and a custom computer to look them up at high speed. Built for the US Air Force's Foreign Technology Division, the AN/GSQ-16, as it was known to the Air Force, was primarily used to convert Soviet technical documents for distribution to western scientists. The translator was installed in 1959, dramatically upgraded in 1964, and was eventually replaced by a mainframe running SYSTRAN in 1970.

<span class="mw-page-title-main">Tatsuo Nishida</span> Linguist and Tangutologist (1928–2012)

Tatsuo Nishida was a professor at Kyoto University. His work encompasses research on a variety of Tibeto-Burman languages, he made great contributions in particular to the deciphering of the Tangut language.

Koide Ichijūrō was a Japanese kabuki composer-performer of "long songs" of the Edo period. He was at first a student of Yoshizumi Kosaburo 1st (1699-1753) and identified himself as Yoshizumi Ichijuro. From 1747 he appeared at the Morita-za, one of the three venues of Edo. Afterwards he became a disciple of Fujita Yoshiji 1st, receiving recognition also in Osaka. During the Meiwa-Tenmei period (1764–89) when the meriyasu genre of solo songs was in vogue he received recognition for shirotae and the kouta song kurokami. The name Ichijūrō was taken up by 2nd, 3rd, generation of performers of his songs.

<i>Another Game</i> 1984 studio album by P-Model

Another Game is the fifth studio album by P-Model.

The Hokkaido characters, also known as Aino characters or Ainu characters, are a set of characters discovered around 1886 on the Japanese island of Hokkaido. At the time of their discovery, they were believed to be a genuine script, but this view is not generally supported today.

Neural machine translation (NMT) is an approach to machine translation that uses an artificial neural network to predict the likelihood of a sequence of words, typically modeling entire sentences in a single integrated model.

Google Neural Machine Translation (GNMT) was a neural machine translation (NMT) system developed by Google and introduced in November 2016 that used an artificial neural network to increase fluency and accuracy in Google Translate. The neural network consisted of two main blocks, an encoder and a decoder, both of LSTM architecture with 8 1024-wide layers each and a simple 1-layer 1024-wide feedforward attention mechanism connecting them. The total number of parameters has been variously described as over 160 million, approximately 210 million, 278 million or 380 million. It used WordPiece tokenizer, and beam search decoding strategy. It ran on Tensor Processing Units.

<span class="mw-page-title-main">Ryuto Yasuoka</span> Japanese basketball player

Ryuto Yasuoka is a Japanese professional basketball player who plays for Akita Northern Happinets of the B.League in Japan. He also plays for Japan men's national 3x3 team, and competed in the 2020 Summer Olympics. He is the first Olympian from Edogawa University.

<span class="mw-page-title-main">DLsite</span> Japanese media distributor platform

DLsite (ディーエルサイト), operated by the Japanese company EISYS, Inc. (株式会社エイシス), is an ecommerce storefront website and digital distribution service for downloading and selling a mixture of all-ages and adults-only doujinshi, doujin games, digital manga, light novel e-books, software, computer games, Android apps, and similar goods. As of 2022, DLsite had reached over 9 million global users.

References

Hutchins, J. (2005). "Milestones in machine translation – No.6: Bar-Hillel and the nonfeasibility of FAHQT]" (PDF). Archived from the original (PDF) on 29 January 2019. Retrieved 9 March 2012.
Van Slype, Georges (1983). Better translation for better communication. Paris: Pergamon Press. ISBN 978-0-08-030534-9.