History of natural language processing

Last updated

The history of natural language processing describes the advances of natural language processing (Outline of natural language processing). There is some overlap with the history of machine translation, the history of speech recognition, and the history of artificial intelligence.

Contents

Research and development

The history of machine translation dates back to the seventeenth century, when philosophers such as Leibniz and Descartes put forward proposals for codes which would relate words between languages. All of these proposals remained theoretical, and none resulted in the development of an actual machine.

The first patents for "translating machines" were applied for in the mid-1930s. One proposal, by Georges Artsrouni was simply an automatic bilingual dictionary using paper tape. The other proposal, by Peter Troyanskii, a Russian, was more detailed. It included both the bilingual dictionary, and a method for dealing with grammatical roles between languages, based on Esperanto.

In 1950, Alan Turing published his famous article "Computing Machinery and Intelligence" which proposed what is now called the Turing test as a criterion of intelligence. This criterion depends on the ability of a computer program to impersonate a human in a real-time written conversation with a human judge, sufficiently well that the judge is unable to distinguish reliably — on the basis of the conversational content alone — between the program and a real human.

In 1957, Noam Chomsky’s Syntactic Structures revolutionized Linguistics with 'universal grammar', a rule-based system of syntactic structures. [1]

The Georgetown experiment in 1954 involved fully automatic translation of more than sixty Russian sentences into English. The authors claimed that within three or five years, machine translation would be a solved problem. [2] However, real progress was much slower, and after the ALPAC report in 1966, which found that ten years long research had failed to fulfill the expectations, funding for machine translation was dramatically reduced. Little further research in machine translation was conducted until the late 1980s, when the first statistical machine translation systems were developed.

Some notably successful NLP systems developed in the 1960s were SHRDLU, a natural language system working in restricted "blocks worlds" with restricted vocabularies.

In 1969 Roger Schank introduced the conceptual dependency theory for natural language understanding. [3] This model, partially influenced by the work of Sydney Lamb, was extensively used by Schank's students at Yale University, such as Robert Wilensky, Wendy Lehnert, and Janet Kolodner.

In 1970, William A. Woods introduced the augmented transition network (ATN) to represent natural language input. [4] Instead of phrase structure rules ATNs used an equivalent set of finite-state automata that were called recursively. ATNs and their more general format called "generalized ATNs" continued to be used for a number of years. During the 1970s many programmers began to write 'conceptual ontologies', which structured real-world information into computer-understandable data. Examples are MARGIE (Schank, 1975), SAM (Cullingford, 1978), PAM (Wilensky, 1978), TaleSpin (Meehan, 1976), QUALM (Lehnert, 1977), Politics (Carbonell, 1979), and Plot Units (Lehnert 1981). During this time, many chatterbots were written including PARRY, Racter, and Jabberwacky.

Up to the 1980s, most NLP systems were based on complex sets of hand-written rules. Starting in the late 1980s, however, there was a revolution in NLP with the introduction of machine learning algorithms for language processing. This was due both to the steady increase in computational power resulting from Moore's law and the gradual lessening of the dominance of Chomskyan theories of linguistics (e.g. transformational grammar), whose theoretical underpinnings discouraged the sort of corpus linguistics that underlies the machine-learning approach to language processing. [5] Some of the earliest-used machine learning algorithms, such as decision trees, produced systems of hard if-then rules similar to existing hand-written rules. Increasingly, however, research has focused on statistical models, which make soft, probabilistic decisions based on attaching real-valued weights to the features making up the input data. The cache language models upon which many speech recognition systems now rely are examples of such statistical models. Such models are generally more robust when given unfamiliar input, especially input that contains errors (as is very common for real-world data), and produce more reliable results when integrated into a larger system comprising multiple subtasks.

Many of the notable early successes occurred in the field of machine translation, due especially to work at IBM Research, where successively more complicated statistical models were developed. These systems were able to take advantage of existing multilingual textual corpora that had been produced by the Parliament of Canada and the European Union as a result of laws calling for the translation of all governmental proceedings into all official languages of the corresponding systems of government. However, most other systems depended on corpora specifically developed for the tasks implemented by these systems, which was (and often continues to be) a major limitation in the success of these systems. As a result, a great deal of research has gone into methods of more effectively learning from limited amounts of data.

Recent research has increasingly focused on unsupervised and semi-supervised learning algorithms. Such algorithms are able to learn from data that has not been hand-annotated with the desired answers, or using a combination of annotated and non-annotated data. Generally, this task is much more difficult than supervised learning, and typically produces less accurate results for a given amount of input data. However, there is an enormous amount of non-annotated data available (including, among other things, the entire content of the World Wide Web), which can often make up for the inferior results.

Software

SoftwareYearCreatorDescriptionReference
Georgetown experiment 1954 Georgetown University and IBM involved fully automatic translation of more than sixty Russian sentences into English.
STUDENT 1964 Daniel Bobrow could solve high school algebra word problems. [6]
ELIZA 1964 Joseph Weizenbaum a simulation of a Rogerian psychotherapist, rephrasing her response with a few grammar rules. [7]
SHRDLU 1970 Terry Winograd a natural language system working in restricted "blocks worlds" with restricted vocabularies, worked extremely well
PARRY 1972 Kenneth Colby A chatterbot
KL-ONE 1974Sondheimer et al.a knowledge representation system in the tradition of semantic networks and frames; it is a frame language.
MARGIE 1975 Roger Schank
TaleSpin (software) 1976Meehan
QUALM Lehnert
LIFER/LADDER 1978Hendrixa natural language interface to a database of information about US Navy ships.
SAM (software) 1978Cullingford
PAM (software) 1978 Robert Wilensky
Politics (software) 1979Carbonell
Plot Units (software) 1981Lehnert
Jabberwacky 1982 Rollo Carpenter chatterbot with stated aim to "simulate natural human chat in an interesting, entertaining and humorous manner".
MUMBLE (software) 1982McDonald
Racter 1983William Chamberlain and Thomas Etter chatterbot that generated English language prose at random.
MOPTRANS [8] 1984Lytinen
KODIAK (software) 1986Wilensky
Absity (software) 1987Hirst
Dr. Sbaitso 1991 Creative Labs
Watson (artificial intelligence software) 2006 IBM A question answering system that won the Jeopardy! contest, defeating the best human players in February 2011.
Siri 2011 Apple A virtual assistant developed by Apple.
Cortana 2014 Microsoft A virtual assistant developed by Microsoft.
Amazon Alexa 2014 Amazon A virtual assistant developed by Amazon.
Google Assistant 2016 Google A virtual assistant developed by Google.

Related Research Articles

Artificial intelligence (AI) is the intelligence of machines or software, as opposed to the intelligence of other living beings, primarily of humans. It is a field of study in computer science that develops and studies intelligent machines. Such machines may be called AIs.

Computational linguistics is an interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, computational linguistics draws upon linguistics, computer science, artificial intelligence, mathematics, logic, philosophy, cognitive science, cognitive psychology, psycholinguistics, anthropology and neuroscience, among others.

Natural language processing (NLP) is an interdisciplinary subfield of computer science and linguistics. It is primarily concerned with giving computers the ability to support and manipulate human language. It involves processing natural language datasets, such as text corpora or speech corpora, using either rule-based or probabilistic machine learning approaches. The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves.

Corpus linguistics is the study of a language as that language is expressed in its text corpus, its body of "real world" text. Corpus linguistics proposes that a reliable analysis of a language is more feasible with corpora collected in the field—the natural context ("realia") of that language—with minimal experimental interference. The large collections of text allow linguistics to run quantitative analyses on linguistic concepts, otherwise harder to quantify.

In linguistics and natural language processing, a corpus or text corpus is a dataset, consisting of natively digital and older, digitalized, language resources, either annotated or unannotated.

Word-sense disambiguation (WSD) is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious/automatic but can often come to conscious attention when ambiguity impairs clarity of communication, given the pervasive polysemy in natural language. In computational linguistics, it is an open problem that affects other computer-related writing, such as discourse, improving relevance of search engines, anaphora resolution, coherence, and inference.

Cognitive linguistics is an interdisciplinary branch of linguistics, combining knowledge and research from cognitive science, cognitive psychology, neuropsychology and linguistics. Models and theoretical accounts of cognitive linguistics are considered as psychologically real, and research in cognitive linguistics aims to help understand cognition in general and is seen as a road into the human mind.

Natural-language understanding (NLU) or natural-language interpretation (NLI) is a subset of natural-language processing in artificial intelligence that deals with machine reading comprehension. Natural-language understanding is considered an AI-hard problem.

Natural language generation (NLG) is a software process that produces natural language output. A widely-cited survey of NLG methods describes NLG as "the subfield of artificial intelligence and computational linguistics that is concerned with the construction of computer systems than can produce understandable texts in English or other human languages from some underlying non-linguistic representation of information".

<span class="mw-page-title-main">Symbolic artificial intelligence</span> Methods in artificial intelligence research

In artificial intelligence, symbolic artificial intelligence is the term for the collection of all methods in artificial intelligence research that are based on high-level symbolic (human-readable) representations of problems, logic and search. Symbolic AI used tools such as logic programming, production rules, semantic nets and frames, and it developed applications such as knowledge-based systems, symbolic mathematics, automated theorem provers, ontologies, the semantic web, and automated planning and scheduling systems. The Symbolic AI paradigm led to seminal ideas in search, symbolic programming languages, agents, multi-agent systems, the semantic web, and the strengths and limitations of formal knowledge and reasoning systems.

In the history of artificial intelligence, neat and scruffy are two contrasting approaches to artificial intelligence (AI) research. The distinction was made in the 70s and was a subject of discussion until the middle 80s.

Roger Carl Schank was an American artificial intelligence theorist, cognitive psychologist, learning scientist, educational reformer, and entrepreneur. Beginning in the late 1960s, he pioneered conceptual dependency theory and case-based reasoning, both of which challenged cognitivist views of memory and reasoning. He began his career teaching at Yale University and Stanford University. In 1989, Schank was granted $30 million in a ten-year commitment to his research and development by Andersen Consulting, through which he founded the Institute for the Learning Sciences (ILS) at Northwestern University in Chicago.

<span class="mw-page-title-main">History of artificial intelligence</span>

The history of artificial intelligence (AI) began in antiquity, with myths, stories and rumors of artificial beings endowed with intelligence or consciousness by master craftsmen. The seeds of modern AI were planted by philosophers who attempted to describe the process of human thinking as the mechanical manipulation of symbols. This work culminated in the invention of the programmable digital computer in the 1940s, a machine based on the abstract essence of mathematical reasoning. This device and the ideas behind it inspired a handful of scientists to begin seriously discussing the possibility of building an electronic brain.

In the history of artificial intelligence, an AI winter is a period of reduced funding and interest in artificial intelligence research. The field has experienced several hype cycles, followed by disappointment and criticism, followed by funding cuts, followed by renewed interest years or even decades later.

Grammar induction is the process in machine learning of learning a formal grammar from a set of observations, thus constructing a model which accounts for the characteristics of the observed objects. More generally, grammatical inference is that branch of machine learning where the instance space consists of discrete combinatorial objects such as strings, trees and graphs.

In natural language processing, language identification or language guessing is the problem of determining which natural language given content is in. Computational approaches to this problem view it as a special case of text categorization, solved with various statistical methods.

Conceptual dependency theory is a model of natural language understanding used in artificial intelligence systems.

Error-driven learning is a type of reinforcement learning method. This method tweaks a model’s parameters based on the difference between the proposed and actual results. These models stand out as they depend on environmental feedback instead of explicit labels or categories. They are based on the idea that language acquisition involves the minimization of the prediction error (MPSE). By leveraging these prediction errors, the models consistently refine expectations and decrease computational complexity. Typically, these algorithms are operated by the GeneRec algorithm.

The following outline is provided as an overview of and topical guide to natural-language processing:

References

  1. "SEM1A5 - Part 1 - A brief history of NLP" . Retrieved 2010-06-25.
  2. Hutchins, J. (2005)
  3. Roger Schank, 1969, A conceptual dependency parser for natural language Proceedings of the 1969 conference on Computational linguistics, Sång-Säby, Sweden, pages 1-3
  4. Woods, William A (1970). "Transition Network Grammars for Natural Language Analysis". Communications of the ACM 13 (10): 591–606
  5. Chomskyan linguistics encourages the investigation of "corner cases" that stress the limits of its theoretical models (comparable to pathological phenomena in mathematics), typically created using thought experiments, rather than the systematic investigation of typical phenomena that occur in real-world data, as is the case in corpus linguistics. The creation and use of such corpora of real-world data is a fundamental part of machine-learning algorithms for NLP. In addition, theoretical underpinnings of Chomskyan linguistics such as the so-called "poverty of the stimulus" argument entail that general learning algorithms, as are typically used in machine learning, cannot be successful in language processing. As a result, the Chomskyan paradigm discouraged the application of such models to language processing.
  6. McCorduck 2004 , p. 286, Crevier 1993 , pp. 76−79, Russell & Norvig 2003 , p. 19
  7. McCorduck 2004 , pp. 291–296, Crevier 1993 , pp. 134−139
  8. Janet L. Kolodner, Christopher K. Riesbeck; Experience, Memory, and Reasoning; Psychology Press; 2014 reprint

Bibliography