Automatic Language Translator

Last updated

IBM's Automatic Language Translator was a machine translation system that converted Russian documents into English. It used an optical disc that stored 170,000 word-for-word and statement-for-statement translations and a custom computer to look them up at high speed. Built for the US Air Force's Foreign Technology Division, the AN/GSQ-16 (or XW-2), as it was known to the Air Force, was primarily used to convert Soviet technical documents for distribution to western scientists. The translator was installed in 1959, dramatically upgraded in 1964, and was eventually replaced by a mainframe running SYSTRAN in 1970.

Contents

History

Photoscopic store

The translator began in a June 1953 contract from the US Navy to the International Telemeter Corporation (ITC) of Los Angeles. This was not for a translation system, but a pure research and development contract for a high-performance photographic online storage medium consisting of small black rectangles embedded in a plastic disk. When the initial contract ran out, what was then the Rome Air Development Center (RADC) took up further funding in 1954 and onwards. [1]

The system was developed by Gilbert King, chief of engineering at ITC, along with a team that included Louis Ridenour. It evolved into a 16-inch plastic disk with data recorded as a series of microscopic black rectangles or clear spots. Only the outermost 4 inches of the disk were used for storage, which increased the linear speed of the portion being accessed. When the disk spun at 2,400 RPM it had an access speed of about 1 Mbit/sec. In total, the system stored 30 Mbits, making it the highest density online system of its era. [1] [lower-alpha 1]

Mark I

In 1954 IBM gave an influential demonstration of machine translation, known today as the "Georgetown–IBM experiment". Run on an IBM 704 mainframe, the translation system knew only 250 words of Russian limited to the field of organic chemistry, and only 6 grammar rules for combining them. Nevertheless, the results were extremely promising, and widely reported in the press. [2]

At the time, most researchers in the nascent machine translation field felt that the major challenge to providing reasonable translations was building a large library, as storage devices of the era were both too small and too slow to be useful in this role. [3] King felt that the photoscopic store was a natural solution to the problem, and pitched the idea of an automated translation system based on the photostore to the Air Force. RADC proved interested, and provided a research grant in May 1956. At the time, the Air Force also provided a grant to researchers at the University of Washington who were working on the problem of producing an optimal translation dictionary for the project.

King advocated a simple word-for-word approach to translations. He thought that the natural redundancies in language would allow even a poor translation to be understood, and that local context was alone enough to provide reasonable guesses when faced with ambiguous terms. He stated that "the success of the human in achieving a probability of .50 in anticipating the words in a sentence is largely due to his experience and the real meanings of the words already discovered." [4] In other words, simply translating the words alone would allow a human to effectively read a document, because they would be able to reason out the proper meaning from the context provided by earlier words.

In 1958 King moved to IBM's Thomas J. Watson Research Center, and continued development of the photostore-based translator. Over time, King changed the approach from a pure word-for-word translator to one that stored "stems and endings", which broke words into parts that could be combined back together to form complete words again. [4]

The first machine, "Mark I", was demonstrated in July 1959 and consisted of a 65,000 word dictionary and a custom tube-based computer to do the lookups. [3] Texts were hand-copied onto punched cards using custom Cyrillic terminals, and then input into the machine for translation. The results were less than impressive, but were enough to suggest that a larger and faster machine would be a reasonable development. In the meantime, the Mark I was applied to translations of the Soviet newspaper, Pravda . The results continued to be questionable, but King declared it a success, stating in Scientific American that the system was "...found, in an operational evaluation, to be quite useful by the Government." [3]

Mark II

On 4 October 1957 the USSR launched Sputnik 1, the first artificial satellite. This caused a wave of concern in the US, whose own Project Vanguard was caught flat-footed and then proved to repeatedly fail in spectacular fashion. This embarrassing turn of events led to a huge investment in US science and technology, including the formation of DARPA, NASA and a variety of intelligence efforts that would attempt to avoid being surprised in this fashion again.

After a short period, the intelligence efforts centralized at the Wright-Patterson Air Force Base as the Foreign Technology Division (FTD, now known as the National Air and Space Intelligence Center), run by the Air Force with input from the DIA and other organizations. FTD was tasked with the translation of Soviet and other Warsaw Bloc technical and scientific journals so researchers in the "west" could keep up to date on developments behind the Iron Curtain. Most of these documents were publicly available, but FTD also made a number of one-off translations of other materials upon request.

Assuming there was a shortage of qualified translators, the FTD became extremely interested in King's efforts at IBM. Funding for an upgraded machine was soon forthcoming, and work began on a "Mark II" system based around a transistorized computer with a faster and higher-capacity 10 inch glass-based optical disc spinning at 2,400 RPM. Another addition was an optical character reader provided by the third party, which they hoped would eliminate the time-consuming process of copying the Russian text into machine-readable cards. [3]

In 1960 the Washington team also joined IBM, bringing their dictionary efforts with them. The dictionary continued to expand as additional storage was made available, reaching 170,000 words and terms by the time it was installed at the FTD. A major software update was also incorporated in the Mark II, which King referred to as "dictionary stuffing". Stuffing was an attempt to deal with the problems of ambiguous words by "stuffing" prefixes onto them from earlier words in the text. [3] These modified words would match with similarly stuffed words in the dictionary, reducing the number of false positives.

In 1962 King left IBM for Itek, a military contractor in the process of rapidly acquiring new technologies. Development at IBM continued, and the system went fully operational at FTD in February 1964. The system was demonstrated at the 1964 New York World's Fair. The version at the Fair included a 150,000 word dictionary, with about 1/3 of the words in phrases. About 3,500 of these were stored in core memory to improve performance, and an average speed of 20 words per minute was claimed. The results of the carefully selected input text was quite impressive. [5] After its return to the FTD, it was used continually until 1970, when it was replaced by a machine running SYSTRAN. [6]

ALPAC Report

In 1964 the United States Department of Defense commissioned the United States National Academy of Sciences (NAS) to prepare a report on the state of machine translation. The NAS formed the "Automatic Language Processing Advisory Committee", or ALPAC , and published their findings in 1966. The report, Language and Machines: Computers in Translation and Linguistics, was highly critical of the existing efforts, demonstrating that the systems were no faster than human translations, while also demonstrating that the supposed lack of translators was in fact a surplus, and as a result of supply and demand issues, human translation was relatively inexpensive – about $6 per 1,000 words. Worse, the FTD was slower as well; tests using physics papers as input demonstrated that the translator was "10 percent less accurate, 21 percent slower, and had a comprehension level 29 percent lower than when he used human translation." [7]

The ALPAC report was as influential as the Georgetown experiment had been a decade earlier; in the immediate aftermath of its publication, the US government suspended almost all funding for machine translation research. [8] Ongoing work at IBM and Itek had ended by 1966, leaving the field to the Europeans, who continued development of systems like SYSTRAN and Logos.

Notes

  1. These numbers for the early disk systems appear to be inaccurate – another document from the same author suggests that these figures are actually for the later version used on the Mark II translator.

Related Research Articles

<span class="mw-page-title-main">Machine translation</span> Use of software for language translation

Machine translation is use of either rule-based or probabilistic machine learning approaches to translation of text or speech from one language to another, including the contextual, idiomatic and pragmatic nuances of both languages.

<span class="mw-page-title-main">SYSTRAN</span> Machine translation company

SYSTRAN, founded by Dr. Peter Toma in 1968, is one of the oldest machine translation companies. SYSTRAN has done extensive work for the United States Department of Defense and the European Commission.

<span class="mw-page-title-main">Internationalization and localization</span> Process of making software accessible to people in different areas of the world

In computing, internationalization and localization (American) or internationalisation and localisation (British), often abbreviated i18n and l10n respectively, are means of adapting computer software to different languages, regional peculiarities and technical requirements of a target locale.

<span class="mw-page-title-main">IBM 650</span> Vacuum tube computer system

The IBM 650 Magnetic Drum Data-Processing Machine is an early digital computer produced by IBM in the mid-1950s. It was the first mass produced computer in the world. Almost 2,000 systems were produced, the last in 1962, and it was the first computer to make a meaningful profit. The first one was installed in late 1954 and it was the most-popular computer of the 1950s.

<span class="mw-page-title-main">Compatible Time-Sharing System</span> Computer operating system

The Compatible Time-Sharing System (CTSS) was the first general purpose time-sharing operating system. Compatible Time Sharing referred to time sharing which was compatible with batch processing; it could offer both time sharing and batch processing concurrently.

<span class="mw-page-title-main">IBM 1360</span>

The IBM 1360 Photo-Digital Storage System, or PDSS, was an online archival storage system for large data centers. It was the first storage device designed from the start to hold a terabit of data (128 GB). The 1360 stored data on index card sized pieces of stiff photographic film that were individually retrieved and read, and could be updated by copying data, with changes, to a new card. Only six PDSSs were constructed, including the prototype, and IBM abandoned the film-card system and moved on to other storage systems soon after. Only one similar commercial system seems to have been developed, the Foto-Mem FM 390, from the late 1960s.

The Georgetown–IBM experiment was an influential demonstration of machine translation, which was performed on January 7, 1954. Developed jointly by the Georgetown University and IBM, the experiment involved completely automatic translation of more than sixty Russian sentences into English.

ALPAC was a committee of seven scientists led by John R. Pierce, established in 1964 by the United States government in order to evaluate the progress in computational linguistics in general and machine translation in particular. Its report, issued in 1966, gained notoriety for being very skeptical of research done in machine translation so far, and emphasizing the need for basic research in computational linguistics; this eventually caused the U.S. government to reduce its funding of the topic dramatically. This marked the beginning of the first AI winter.

Martin Kay was a computer scientist, known especially for his work in computational linguistics.

Machine translation is a sub-field of computational linguistics that investigates the use of software to translate text or speech from one natural language to another.

<span class="mw-page-title-main">Chinese typewriter</span> Typewriter that can type Chinese script

A Chinese typewriter is a typewriter that can type Chinese script. Early European typewriters began appearing in the early 19th century. However, as the Chinese language uses a logographic writing system, fitting thousands of Chinese characters on the machine needed much more complex engineering than typewriters using a simple Latin alphabet, or other non-logographic scripts. An ordinary Chinese printing office uses 6,000 Chinese characters. Chinese typewriters, and similar Japanese typewriters invented by Kyota Sugimoto, which use kanji adopted from the Chinese writing system, started to appear only in the early 20th century. There have been at least five dozen versions of Chinese typewriters, ranging from sizable mechanical models to sophisticated electric word processors.

Weidner Communications Inc. was founded by Stephen Weidner in 1977 and marketed the Weidner Multi-Lingual Word Processing System.

Mobile translation is any electronic device or software application that provides audio translation. The concept includes any handheld electronic device that is specifically designed for audio translation. It also includes any machine translation service or software application for hand-held devices, including mobile telephones, Pocket PCs, and PDAs. Mobile translation provides hand-held device users with the advantage of instantaneous and non-mediated translation from one human language to another, usually against a service fee that is, nevertheless, significantly smaller than a human translator charges.

Digigraphics was one of the first graphical computer aided design systems to go on sale. Originally developed at Itek on the PDP-1 as EDM, the efforts were purchased by Control Data Corporation and ported to their machines, along with a new graphics terminal to support it. Systems cost almost $500,000 and supported only a few users at a time, so in spite of a number of advantages it was not cost competitive with traditional manual methods and only a few systems were sold.

DAC-1, for Design Augmented by Computer, was one of the earliest graphical computer aided design systems. Developed by General Motors, IBM was brought in as a partner in 1960 and the two developed the system and released it to production in 1963. It was publicly unveiled at the Fall Joint Computer Conference in Detroit 1964. GM used the DAC system, continually modified, into the 1970s when it was succeeded by CADANCE.

The history of natural language processing describes the advances of natural language processing. There is some overlap with the history of machine translation, the history of speech recognition, and the history of artificial intelligence.

Rule-based machine translation is machine translation systems based on linguistic information about source and target languages basically retrieved from dictionaries and grammars covering the main semantic, morphological, and syntactic regularities of each language respectively. Having input sentences, an RBMT system generates them to output sentences on the basis of morphological, syntactic, and semantic analysis of both the source and the target languages involved in a concrete translation task.

Speech translation is the process by which conversational spoken phrases are instantly translated and spoken aloud in a second language. This differs from phrase translation, which is where the system only translates a fixed and finite set of phrases that have been manually entered into the system. Speech translation technology enables speakers of different languages to communicate. It thus is of tremendous value for humankind in terms of science, cross-cultural exchange and global business.

<span class="mw-page-title-main">Yandex Translate</span> Translation web service by Yandex

Yandex Translate is a web service provided by Yandex, intended for the translation of web pages into another language.

References

Citations

  1. 1 2 Hutchins, pg. 171
  2. John Hutchins, "The first public demonstration of machine translation: the Georgetown-IBM system, 7th January 1954" Archived 3 March 2016 at the Wayback Machine
  3. 1 2 3 4 5 Hutchins, pg. 172
  4. 1 2 King, 1956
  5. Hutchins, pg. 174
  6. Hutchins, pg. 175
  7. ALPAC, pg. 20
  8. John Hutchins, "ALPAC: the (in)famous report" Archived 6 October 2007 at the Wayback Machine

Bibliography

  • G. W. King, G. W. Brown and L. N. Ridenour, "Photographic Techniques for Information Storage", Proceedings of the IRE, Volume 41 Issue 10 (October 1953), pp. 14211428
  • G. W. King, "Stochastic Methods of Mechanical Translation", Mechanical Translation, Volume 3 Issue 2 (1956) pp. 3839
  • J. L. Craft, E. H. Goldman, W. B. Strohm, "A Table Look-up Machine for Processing of Natural Languages", IBM Journal, July 1961, pp. 192203
  • Language Processing Advisory Committee, "Language and Machines: Computers in Translation and Linguistics", National Research Council, 1966 (widely known as the "ALPAC Report")
  • John Hutchins (ed), "Gilbert W. King and the IBM-USAF Translator", Early Years in Machine Translation, Joh Benjamins, 2000, ISBN   90-272-4586-X (RADC-TDR-62-105)
  • Charles Bourne and Trudi Bellardo Hahn, "A History of Online Information Services, 19631976", MIT Press, 2003, ISBN   0-262-02538-8