AN/GSQ-16 Automatic Language Translator

Last updated

IBM's Automatic Language Translator was a machine translation system that converted Russian documents into English. It used an optical disc that stored 170,000 word-for-word and statement-for-statement translations and a custom computer to look them up at high speed. Built for the US Air Force's Foreign Technology Division, the AN/GSQ-16 (or XW-2), as it was known to the Air Force, was primarily used to convert Soviet technical documents for distribution to western scientists. The translator was installed in 1959, dramatically upgraded in 1964, and was eventually replaced by a mainframe running SYSTRAN in 1970.

Contents

History

Photoscopic store

The translator began in a June 1953 contract from the US Navy to the International Telemeter Corporation (ITC) of Los Angeles. This was not for a translation system, but a pure research and development contract for a high-performance photographic online storage medium consisting of small black rectangles embedded in a plastic disk. When the initial contract ran out, what was then the Rome Air Development Center (RADC) took up further funding in 1954 and onwards. [1]

The system was developed by Gilbert King, chief of engineering at ITC, along with a team that included Louis Ridenour. It evolved into a 16-inch plastic disk with data recorded as a series of microscopic black rectangles or clear spots. Only the outermost 4 inches of the disk were used for storage, which increased the linear speed of the portion being accessed. When the disk spun at 2,400 RPM it had an access speed of about 1 Mbit/sec. In total, the system stored 30 Mbits, making it the highest density online system of its era. [1] [lower-alpha 1]

Mark I

In 1954 IBM gave an influential demonstration of machine translation, known today as the "Georgetown–IBM experiment". Run on an IBM 704 mainframe, the translation system knew only 250 words of Russian limited to the field of organic chemistry, and only 6 grammar rules for combining them. Nevertheless, the results were extremely promising, and widely reported in the press. [2]

At the time, most researchers in the nascent machine translation field felt that the major challenge to providing reasonable translations was building a large library, as storage devices of the era were both too small and too slow to be useful in this role. [3] King felt that the photoscopic store was a natural solution to the problem, and pitched the idea of an automated translation system based on the photostore to the Air Force. RADC proved interested, and provided a research grant in May 1956. At the time, the Air Force also provided a grant to researchers at the University of Washington who were working on the problem of producing an optimal translation dictionary for the project.

King advocated a simple word-for-word approach to translations. He thought that the natural redundancies in language would allow even a poor translation to be understood, and that local context was alone enough to provide reasonable guesses when faced with ambiguous terms. He stated that "the success of the human in achieving a probability of .50 in anticipating the words in a sentence is largely due to his experience and the real meanings of the words already discovered." [4] In other words, simply translating the words alone would allow a human to effectively read a document, because they would be able to reason out the proper meaning from the context provided by earlier words.

In 1958 King moved to IBM's Thomas J. Watson Research Center, and continued development of the photostore-based translator. Over time, King changed the approach from a pure word-for-word translator to one that stored "stems and endings", which broke words into parts that could be combined back together to form complete words again. [4]

The first machine, "Mark I", was demonstrated in July 1959 and consisted of a 65,000 word dictionary and a custom tube-based computer to do the lookups. [3] Texts were hand-copied onto punched cards using custom Cyrillic terminals, and then input into the machine for translation. The results were less than impressive, but were enough to suggest that a larger and faster machine would be a reasonable development. In the meantime, the Mark I was applied to translations of the Soviet newspaper, Pravda . The results continued to be questionable, but King declared it a success, stating in Scientific American that the system was "...found, in an operational evaluation, to be quite useful by the Government." [3]

Mark II

On 4 October 1957 the USSR launched Sputnik 1, the first artificial satellite. This caused a wave of concern in the US, whose own Project Vanguard was caught flat-footed and then proved to repeatedly fail in spectacular fashion. This embarrassing turn of events led to a huge investment in US science and technology, including the formation of DARPA, NASA and a variety of intelligence efforts that would attempt to avoid being surprised in this fashion again.

After a short period, the intelligence efforts centralized at the Wright-Patterson Air Force Base as the Foreign Technology Division (FTD, now known as the National Air and Space Intelligence Center), run by the Air Force with input from the DIA and other organizations. FTD was tasked with the translation of Soviet and other Warsaw Bloc technical and scientific journals so researchers in the "west" could keep up to date on developments behind the Iron Curtain. Most of these documents were publicly available, but FTD also made a number of one-off translations of other materials upon request.

Assuming there was a shortage of qualified translators, the FTD became extremely interested in King's efforts at IBM. Funding for an upgraded machine was soon forthcoming, and work began on a "Mark II" system based around a transistorized computer with a faster and higher-capacity 10 inch glass-based optical disc spinning at 2,400 RPM. Another addition was an optical character reader provided by the third party, which they hoped would eliminate the time-consuming process of copying the Russian text into machine-readable cards. [3]

In 1960 the Washington team also joined IBM, bringing their dictionary efforts with them. The dictionary continued to expand as additional storage was made available, reaching 170,000 words and terms by the time it was installed at the FTD. A major software update was also incorporated in the Mark II, which King referred to as "dictionary stuffing". Stuffing was an attempt to deal with the problems of ambiguous words by "stuffing" prefixes onto them from earlier words in the text. [3] These modified words would match with similarly stuffed words in the dictionary, reducing the number of false positives.

In 1962 King left IBM for Itek, a military contractor in the process of rapidly acquiring new technologies. Development at IBM continued, and the system went fully operational at FTD in February 1964. The system was demonstrated at the 1964 New York World's Fair. The version at the Fair included a 150,000 word dictionary, with about 1/3 of the words in phrases. About 3,500 of these were stored in core memory to improve performance, and an average speed of 20 words per minute was claimed. The results of the carefully selected input text was quite impressive. [5] After its return to the FTD, it was used continually until 1970, when it was replaced by a machine running SYSTRAN. [6]

ALPAC Report

In 1964 the United States Department of Defense commissioned the United States National Academy of Sciences (NAS) to prepare a report on the state of machine translation. The NAS formed the "Automatic Language Processing Advisory Committee", or ALPAC , and published their findings in 1966. The report, Language and Machines: Computers in Translation and Linguistics, was highly critical of the existing efforts, demonstrating that the systems were no faster than human translations, while also demonstrating that the supposed lack of translators was in fact a surplus, and as a result of supply and demand issues, human translation was relatively inexpensive – about $6 per 1,000 words. Worse, the FTD was slower as well; tests using physics papers as input demonstrated that the translator was "10 percent less accurate, 21 percent slower, and had a comprehension level 29 percent lower than when he used human translation." [7]

The ALPAC report was as influential as the Georgetown experiment had been a decade earlier; in the immediate aftermath of its publication, the US government suspended almost all funding for machine translation research. [8] Ongoing work at IBM and Itek had ended by 1966, leaving the field to the Europeans, who continued development of systems like SYSTRAN and Logos.

Notes

  1. These numbers for the early disk systems appear to be inaccurate – another document from the same author suggests that these figures are actually for the later version used on the Mark II translator.

Related Research Articles

<span class="mw-page-title-main">Machine translation</span> Computerized translation between natural languages

Machine translation is use of computational techniques to translate text or speech from one language to another, including the contextual, idiomatic and pragmatic nuances of both languages.

<span class="mw-page-title-main">SYSTRAN</span> Machine translation company

SYSTRAN, founded by Dr. Peter Toma in 1968, is one of the oldest machine translation companies. SYSTRAN has done extensive work for the United States Department of Defense and the European Commission.

<span class="mw-page-title-main">Microcomputer</span> Small computer with a CPU made out of a microprocessor

A microcomputer is a small, relatively inexpensive computer having a central processing unit (CPU) made out of a microprocessor. The computer also includes memory and input/output (I/O) circuitry together mounted on a printed circuit board (PCB). Microcomputers became popular in the 1970s and 1980s with the advent of increasingly powerful microprocessors. The predecessors to these computers, mainframes and minicomputers, were comparatively much larger and more expensive. Many microcomputers are also personal computers. An early use of the term "personal computer" in 1962 predates microprocessor-based designs. (See "Personal Computer: Computers at Companies" reference below). A "microcomputer" used as an embedded control system may have no human-readable input and output devices. "Personal computer" may be used generically or may denote an IBM PC compatible machine.

<span class="mw-page-title-main">IBM 650</span> Vacuum-tube 1950s computer system

The IBM 650 Magnetic Drum Data-Processing Machine is an early digital computer produced by IBM in the mid-1950s. It was the first mass-produced computer in the world. Almost 2,000 systems were produced, the last in 1962, and it was the first computer to make a meaningful profit. The first one was installed in late 1954 and it was the most popular computer of the 1950s.

<span class="mw-page-title-main">Compatible Time-Sharing System</span> Computer operating system

The Compatible Time-Sharing System (CTSS) was the first general purpose time-sharing operating system. Compatible Time Sharing referred to time sharing which was compatible with batch processing; it could offer both time sharing and batch processing concurrently.

<span class="mw-page-title-main">IBM 1130</span> 16-bit IBM minicomputer introduced in 1965

The IBM 1130 Computing System, introduced in 1965, was IBM's least expensive computer at that time. A binary 16-bit machine, it was marketed to price-sensitive, computing-intensive technical markets, like education and engineering, succeeding the decimal IBM 1620 in that market segment. Typical installations included a 1 megabyte disk drive that stored the operating system, compilers and object programs, with program source generated and maintained on punched cards. Fortran was the most common programming language used, but several others, including APL, were available.

<span class="mw-page-title-main">Dictionary-based machine translation</span>

Machine translation can use a method based on dictionary entries, which means that the words will be translated as a dictionary does – word by word, usually without much correlation of meaning between them. Dictionary lookups may be done with or without morphological analysis or lemmatisation. While this approach to machine translation is probably the least sophisticated, dictionary-based machine translation is ideally suitable for the translation of long lists of phrases on the subsentential level, e.g. inventories or simple catalogs of products and services.

The Georgetown–IBM experiment was an influential demonstration of machine translation, which was performed on January 7, 1954. Developed jointly by the Georgetown University and IBM, the experiment involved completely automatic translation of more than sixty Russian sentences into English.

ALPAC was a committee of seven scientists led by John R. Pierce, established in 1964 by the United States government in order to evaluate the progress in computational linguistics in general and machine translation in particular. Its report, issued in 1966, gained notoriety for being very skeptical of research done in machine translation so far, and emphasizing the need for basic research in computational linguistics; this eventually caused the U.S. government to reduce its funding of the topic dramatically. This marked the beginning of the first AI winter.

Martin Kay was a computer scientist, known especially for his work in computational linguistics.

Machine translation is a sub-field of computational linguistics that investigates the use of software to translate text or speech from one natural language to another.

<span class="mw-page-title-main">IBM System/7</span>

The IBM System/7 was a computer system designed for industrial control, announced on October 28, 1970 and first shipped in 1971. It was a 16-bit machine and one of the first made by IBM to use novel semiconductor memory, instead of magnetic core memory conventional at that date.

<span class="mw-page-title-main">Chinese typewriter</span> Typewriter that can type Chinese script

Typewriters that can type Chinese characters were invented in the early 20th century. Written Chinese is a logographic writing system, and facilitating the use of thousands of Chinese characters requires more complex engineering than for a writing system derived from the Latin alphabet, which may require only tens of glyphs. An ordinary Chinese printing office uses 6,000 characters. Models began to be mass-produced in the 1920s. Many early models were manufactured by Japanese companies, following the invention of the Japanese typewriter by Kyota Sugimoto, which use kanji adopted from the Chinese writing system. At least sixty different models of Chinese typewriter have been produced, ranging from sizable mechanical models to electronic word processors.

Weidner Communications Inc. was founded by Stephen Weidner in 1977 and marketed the Weidner Multi-Lingual Word Processing System.

Mobile translation is any electronic device or software application that provides audio translation. The concept includes any handheld electronic device that is specifically designed for audio translation. It also includes any machine translation service or software application for hand-held devices, including mobile telephones, Pocket PCs, and PDAs. Mobile translation provides hand-held device users with the advantage of instantaneous and non-mediated translation from one human language to another, usually against a service fee that is, nevertheless, significantly smaller than a human translator charges.

Digigraphics was one of the first graphical computer aided design systems to go on sale. Originally developed at Itek on the PDP-1 as EDM, the efforts were purchased by Control Data Corporation and ported to their machines, along with a new graphics terminal to support it. Systems cost almost $500,000 and supported only a few users at a time, so in spite of a number of advantages it was not cost competitive with traditional manual methods and only a few systems were sold.

DAC-1, for Design Augmented by Computer, was one of the earliest graphical computer aided design systems. Developed by General Motors, IBM was brought in as a partner in 1960 and the two developed the system and released it to production in 1963. It was publicly unveiled at the Fall Joint Computer Conference in Detroit 1964. GM used the DAC system, continually modified, into the 1970s when it was succeeded by CADANCE.

The history of natural language processing describes the advances of natural language processing. There is some overlap with the history of machine translation, the history of speech recognition, and the history of artificial intelligence.

Rule-based machine translation is machine translation systems based on linguistic information about source and target languages basically retrieved from dictionaries and grammars covering the main semantic, morphological, and syntactic regularities of each language respectively. Having input sentences, an RBMT system generates them to output sentences on the basis of morphological, syntactic, and semantic analysis of both the source and the target languages involved in a concrete translation task. RBMT has been progressively superseded by more efficient methods, particularly neural machine translation.

The RAND Tablet is a graphical computer input device developed by The RAND Corporation. The RAND Tablet is claimed to be the first digital graphic device marketed as being a low cost device. The creation of the tablet was performed by the Advanced Research Projects Agency. The RAND Tablet was one of the first devices to utilize a stylus as a highly practical instrument.

References

Citations

  1. 1 2 Hutchins, pg. 171
  2. John Hutchins, "The first public demonstration of machine translation: the Georgetown-IBM system, 7th January 1954" Archived 3 March 2016 at the Wayback Machine
  3. 1 2 3 4 5 Hutchins, pg. 172
  4. 1 2 King, 1956
  5. Hutchins, pg. 174
  6. Hutchins, pg. 175
  7. ALPAC, pg. 20
  8. John Hutchins, "ALPAC: the (in)famous report" Archived 6 October 2007 at the Wayback Machine

Bibliography

  • G. W. King, G. W. Brown and L. N. Ridenour, "Photographic Techniques for Information Storage", Proceedings of the IRE, Volume 41 Issue 10 (October 1953), pp. 14211428
  • G. W. King, "Stochastic Methods of Mechanical Translation", Mechanical Translation, Volume 3 Issue 2 (1956) pp. 3839
  • J. L. Craft, E. H. Goldman, W. B. Strohm, "A Table Look-up Machine for Processing of Natural Languages", IBM Journal, July 1961, pp. 192203
  • Language Processing Advisory Committee, "Language and Machines: Computers in Translation and Linguistics", National Research Council, 1966 (widely known as the "ALPAC Report")
  • John Hutchins (ed), "Gilbert W. King and the IBM-USAF Translator", Early Years in Machine Translation, Joh Benjamins, 2000, ISBN   90-272-4586-X (RADC-TDR-62-105)
  • Charles Bourne and Trudi Bellardo Hahn, "A History of Online Information Services, 19631976", MIT Press, 2003, ISBN   0-262-02538-8

See Also