ALPAC (Automatic Language Processing Advisory Committee) was a committee of seven scientists led by John R. Pierce, established in 1964 by the United States government in order to evaluate the progress in computational linguistics in general and machine translation in particular. Its report, issued in 1966, gained notoriety for being very skeptical of research done in machine translation so far, and emphasizing the need for basic research in computational linguistics; this eventually caused the U.S. government to reduce its funding of the topic dramatically. This marked the beginning of the first AI winter.
The ALPAC was set up in April 1964 with John R. Pierce as the chairman.
The committee consisted of:
Testimony was heard from:
ALPAC's final recommendations (p. 34) were, therefore, that research should be supported on:
Computational linguistics is an interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, computational linguistics draws upon linguistics, computer science, artificial intelligence, mathematics, logic, philosophy, cognitive science, cognitive psychology, psycholinguistics, anthropology and neuroscience, among others.
Machine translation is use of either rule-based or probabilistic machine learning approaches to translation of text or speech from one language to another, including the contextual, idiomatic and pragmatic nuances of both languages.
Natural language processing (NLP) is an interdisciplinary subfield of computer science and linguistics. It is primarily concerned with giving computers the ability to support and manipulate human language. It involves processing natural language datasets, such as text corpora or speech corpora, using either rule-based or probabilistic machine learning approaches. The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves.
Corpus linguistics is the study of a language as that language is expressed in its text corpus, its body of "real world" text. Corpus linguistics proposes that a reliable analysis of a language is more feasible with corpora collected in the field—the natural context ("realia") of that language—with minimal experimental interference. The large collections of text allow linguistics to run quantitative analyses on linguistic concepts, otherwise harder to quantify.
Word-sense disambiguation (WSD) is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious/automatic but can often come to conscious attention when ambiguity impairs clarity of communication, given the pervasive polysemy in natural language. In computational linguistics, it is an open problem that affects other computer-related writing, such as discourse, improving relevance of search engines, anaphora resolution, coherence, and inference.
Natural-language understanding (NLU) or natural-language interpretation (NLI) is a subset of natural-language processing in artificial intelligence that deals with machine reading comprehension. Natural-language understanding is considered an AI-hard problem.
The Compatible Time-Sharing System (CTSS) was the first general purpose time-sharing operating system. Compatible Time Sharing referred to time sharing which was compatible with batch processing; it could offer both time sharing and batch processing concurrently.
The Georgetown–IBM experiment was an influential demonstration of machine translation, which was performed on January 7, 1954. Developed jointly by the Georgetown University and IBM, the experiment involved completely automatic translation of more than sixty Russian sentences into English.
In the history of artificial intelligence, an AI winter is a period of reduced funding and interest in artificial intelligence research. The field has experienced several hype cycles, followed by disappointment and criticism, followed by funding cuts, followed by renewed interest years or even decades later.
Programming language theory (PLT) is a branch of computer science that deals with the design, implementation, analysis, characterization, and classification of formal languages known as programming languages. Programming language theory is closely related to other fields including mathematics, software engineering, and linguistics. There are a number of academic conferences and journals in the area.
Statistical machine translation (SMT) was a machine translation approach, that superseded the previous, rule-based approach because it required explicit description of each and every linguistic rule, which was costly, and which often did not generalize to other languages. Since 2003, the statistical approach itself has been gradually superseded by the deep learning-based neural network approach.
Frederick Jelinek was a Czech-American researcher in information theory, automatic speech recognition, and natural language processing. He is well known for his oft-quoted statement, "Every time I fire a linguist, the performance of the speech recognizer goes up".
Martin Kay was a computer scientist, known especially for his work in computational linguistics.
Machine translation is a sub-field of computational linguistics that investigates the use of software to translate text or speech from one natural language to another.
The Linguistics Research Center (LRC) at the University of Texas is a center for computational linguistics research & development. It was directed by Prof. Winfred Lehmann until his death in 2007, and subsequently by Dr. Jonathan Slocum. Since its founding, virtually all projects at the LRC have involved processing natural language texts with the aid of computers. The principal activities of the Center at present focus on Indo-European languages and comprise historical study, lexicography, and web-based teaching; staff members engage in several independent but often complementary projects in these fields using a variety of software, almost all of it developed in-house.
Various methods for the evaluation for machine translation have been employed. This article focuses on the evaluation of the output of machine translation, rather than on performance or usability evaluation.
IBM's Automatic Language Translator was a machine translation system that converted Russian documents into English. It used an optical disc that stored 170,000 word-for-word and statement-for-statement translations and a custom computer to look them up at high speed. Built for the US Air Force's Foreign Technology Division, the AN/GSQ-16, as it was known to the Air Force, was primarily used to convert Soviet technical documents for distribution to western scientists. The translator was installed in 1959, dramatically upgraded in 1964, and was eventually replaced by a mainframe running SYSTRAN in 1970.
The history of natural language processing describes the advances of natural language processing. There is some overlap with the history of machine translation, the history of speech recognition, and the history of artificial intelligence.
The following outline is provided as an overview of and topical guide to natural-language processing:
LEPOR is an automatic language independent machine translation evaluation metric with tunable parameters and reinforced factors.