Explanation-based learning

Last updated

Explanation-based learning (EBL) is a form of machine learning that exploits a very strong, or even perfect, domain theory (i.e. a formal theory of an application domain akin to a domain model in ontology engineering, not to be confused with Scott's domain theory) in order to make generalizations or form concepts from training examples. [1] It is also linked with Encoding (memory) to help with Learning. [2]

Contents

Details

An example of EBL using a perfect domain theory is a program that learns to play chess through example. A specific chess position that contains an important feature such as "Forced loss of black queen in two moves" includes many irrelevant features, such as the specific scattering of pawns on the board. EBL can take a single training example and determine what are the relevant features in order to form a generalization. [3]

A domain theory is perfect or complete if it contains, in principle, all information needed to decide any question about the domain. For example, the domain theory for chess is simply the rules of chess. Knowing the rules, in principle, it is possible to deduce the best move in any situation. However, actually making such a deduction is impossible in practice due to combinatoric explosion. EBL uses training examples to make searching for deductive consequences of a domain theory efficient in practice.

In essence, an EBL system works by finding a way to deduce each training example from the system's existing database of domain theory. Having a short proof of the training example extends the domain-theory database, enabling the EBL system to find and classify future examples that are similar to the training example very quickly. [4] The main drawback of the method—the cost of applying the learned proof macros, as these become numerous—was analyzed by Minton. [5]

Basic formulation

EBL software takes four inputs:

Application

An especially good application domain for an EBL is natural language processing (NLP). Here a rich domain theory, i.e., a natural language grammar—although neither perfect nor complete, is tuned to a particular application or particular language usage, using a treebank (training examples). Rayner pioneered this work. [7] The first successful industrial application was to a commercial NL interface to relational databases. [8] The method has been successfully applied to several large-scale natural language parsing systems, [9] where the utility problem was solved by omitting the original grammar (domain theory) and using specialized LR-parsing techniques, resulting in huge speed-ups, at a cost in coverage, but with a gain in disambiguation. EBL-like techniques have also been applied to surface generation, the converse of parsing. [10]

When applying EBL to NLP, the operationality criteria can be hand-crafted, [11] or can be inferred from the treebank using either the entropy of its or-nodes [12] or a target coverage/disambiguation trade-off (= recall/precision trade-off = f-score). [13] EBL can also be used to compile grammar-based language models for speech recognition, from general unification grammars. [14] Note how the utility problem, first exposed by Minton, was solved by discarding the original grammar/domain theory, and that the quoted articles tend to contain the phrase grammar specialization—quite the opposite of the original term explanation-based generalization. Perhaps the best name for this technique would be data-driven search space reduction. Other people who worked on EBL for NLP include Guenther Neumann, Aravind Joshi, Srinivas Bangalore, and Khalil Sima'an.

See also

Related Research Articles

Natural language processing (NLP) is an interdisciplinary subfield of computer science and information retrieval. It is primarily concerned with giving computers the ability to support and manipulate human language. It involves processing natural language datasets, such as text corpora or speech corpora, using either rule-based or probabilistic machine learning approaches. The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them. To this end, natural language processing often borrows ideas from theoretical linguistics. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves.

Corpus linguistics is an empirical method for the study of language by way of a text corpus. Corpora are balanced, often stratified collections of authentic, "real world", text of speech or writing that aim to represent a given linguistic variety. Today, corpora are generally machine-readable data collections.

<span class="mw-page-title-main">Parse tree</span> Tree in formal language theory

A parse tree or parsing tree or derivation tree or concrete syntax tree is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term parse tree itself is used primarily in computational linguistics; in theoretical syntax, the term syntax tree is more common.

Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part.

Link grammar (LG) is a theory of syntax by Davy Temperley and Daniel Sleator which builds relations between pairs of words, rather than constructing constituents in a phrase structure hierarchy. Link grammar is similar to dependency grammar, but dependency grammar includes a head-dependent relationship, whereas link grammar makes the head-dependent relationship optional. Colored Multiplanar Link Grammar (CMLG) is an extension of LG allowing crossing relations between pairs of words. The relationship between words is indicated with link types, thus making the Link grammar closely related to certain categorial grammars.

In corpus linguistics, part-of-speech tagging, also called grammatical tagging is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context. A simplified form of this is commonly taught to school-age children, in the identification of words as nouns, verbs, adjectives, adverbs, etc.

<span class="mw-page-title-main">Treebank</span>

In linguistics, a treebank is a parsed text corpus that annotates syntactic or semantic sentence structure. The construction of parsed corpora in the early 1990s revolutionized computational linguistics, which benefitted from large-scale empirical data.

Grammar induction is the process in machine learning of learning a formal grammar from a set of observations, thus constructing a model which accounts for the characteristics of the observed objects. More generally, grammatical inference is that branch of machine learning where the instance space consists of discrete combinatorial objects such as strings, trees and graphs.

Linguistic categories include

Error-driven learning is a type of reinforcement learning method. This method tweaks a model’s parameters based on the difference between the proposed and actual results. These models stand out as they depend on environmental feedback instead of explicit labels or categories. They are based on the idea that language acquisition involves the minimization of the prediction error (MPSE). By leveraging these prediction errors, the models consistently refine expectations and decrease computational complexity. Typically, these algorithms are operated by the GeneRec algorithm.

Structured prediction or structured (output) learning is an umbrella term for supervised machine learning techniques that involves predicting structured objects, rather than scalar discrete or real values.

The history of natural language processing describes the advances of natural language processing. There is some overlap with the history of machine translation, the history of speech recognition, and the history of artificial intelligence.

A constrained conditional model (CCM) is a machine learning and inference framework that augments the learning of conditional models with declarative constraints. The constraint can be used as a way to incorporate expressive prior knowledge into the model and bias the assignments made by the learned model to satisfy these constraints. The framework can be used to support decisions in an expressive output space while maintaining modularity and tractability of training and inference.

MedSLT is a medium-ranged open source spoken language translator developed by the University of Geneva. It is funded by the Swiss National Science Foundation. The system has been designed for the medical domain. It currently covers the doctor-patient diagnosis dialogues for the domains of headache, chest and abdominal pain in English, French, Japanese, Spanish, Catalan and Arabic. The vocabulary used ranges from 350 to 1000 words depending on the domain and language pair.

Deep linguistic processing is a natural language processing framework which draws on theoretical and descriptive linguistics. It models language predominantly by way of theoretical syntactic/semantic theory. Deep linguistic processing approaches differ from "shallower" methods in that they yield more expressive and structural representations which directly capture long-distance dependencies and underlying predicate-argument structures.
The knowledge-intensive approach of deep linguistic processing requires considerable computational power, and has in the past sometimes been judged as being intractable. However, research in the early 2000s had made considerable advancement in efficiency of deep processing. Today, efficiency is no longer a major problem for applications using deep linguistic processing.

The following outline is provided as an overview of and topical guide to natural-language processing:

In natural language processing (NLP), a word embedding is a representation of a word. The embedding is used in text analysis. Typically, the representation is a real-valued vector that encodes the meaning of the word in such a way that the words that are closer in the vector space are expected to be similar in meaning. Word embeddings can be obtained using language modeling and feature learning techniques, where words or phrases from the vocabulary are mapped to vectors of real numbers.

<span class="mw-page-title-main">Semantic parsing</span>

Semantic parsing is the task of converting a natural language utterance to a logical form: a machine-understandable representation of its meaning. Semantic parsing can thus be understood as extracting the precise meaning of an utterance. Applications of semantic parsing include machine translation, question answering, ontology induction, automated reasoning, and code generation. The phrase was first used in the 1970s by Yorick Wilks as the basis for machine translation programs working with only semantic representations. Semantic parsing is one of the important tasks in computational linguistics and natural language processing.

Universal Dependencies, frequently abbreviated as UD, is an international cooperative project to create treebanks of the world's languages. These treebanks are openly accessible and available. Core applications are automated text processing in the field of natural language processing (NLP) and research into natural language syntax and grammar, especially within linguistic typology. The project's primary aim is to achieve cross-linguistic consistency of annotation, while still permitting language-specific extensions when necessary. The annotation scheme has it roots in three related projects: Stanford Dependencies, Google universal part-of-speech tags, and the Interset interlingua for morphosyntactic tagsets. The UD annotation scheme uses a representation in the form of dependency trees as opposed to a phrase structure trees. At the present time, there are just over 200 treebanks of more than 100 languages available in the UD inventory.

Syntactic parsing is the automatic analysis of syntactic structure of natural language, especially syntactic relations and labelling spans of constituents. It is motivated by the problem of structural ambiguity in natural language: a sentence can be assigned multiple grammatical parses, so some kind of knowledge beyond computational grammar rules is needed to tell which parse is intended. Syntactic parsing is one of the important tasks in computational linguistics and natural language processing, and has been a subject of research since the mid-20th century with the advent of computers.

References

  1. "Special issue on explanation in case-based reasoning". Artificial Intelligence Review. 24 (2). October 2005.
  2. Calin-Jageman, Robert J.; Horn Ratner, Hilary (2005-12-01). "The Role of Encoding in the Self-Explanation Effect". Cognition and Instruction. 23 (4): 523–543. doi:10.1207/s1532690xci2304_4. ISSN   0737-0008. S2CID   145410154.
  3. Black-queen example from Mitchell, Tom (1997). Machine Learning . McGraw-Hill. pp.  308–309. ISBN   0-07-042807-7.
  4. Mitchell, Tom (1997). Machine Learning . McGraw-Hill. pp.  320. ISBN   0-07-042807-7. In its pure form, EBL involves reformulating the domain theory to produce general rules that classify examples in a single inference step.
  5. Minton, Steven (1990). "Quantitative Results Concerning the Utility Problem in Explanation-Based Learning". Artificial Intelligence. 42 (2–3): 363–392. doi:10.1016/0004-3702(90)90059-9.
  6. Keller, Richard (1988). "Defining operationality for explanation-based learning" (PDF). Artificial Intelligence. 35 (2): 227–241. doi:10.1016/0004-3702(88)90013-6 . Retrieved 2009-02-22. Current Operationality Defn.: A concept description is operational if it can be used efficiently to recognize instances of the concept it denotes After stating the common definition, the paper actually argues against it in favor of more-refined criteria.
  7. Rayner, Manny (1988). "Applying Explanation-Based Generalization to Natural Language Processing". Procs. International Conference on Fifth Generation Computing, Kyoto. pp. 1267–1274.
  8. Samuelsson, Christer; Manny Rayner (1991). "Quantitative Evaluation of Explanation-Based Learning as an Optimization Tool for a Large-Scale Natural Language System". Procs. 12th International Joint Conference on Artificial Intelligence, Sydney. pp. 609–615.{{cite news}}: CS1 maint: location (link)
  9. Samuelsson, Christer (1994). Fast Natural-Language Parsing Using Explanation-Based Learning. Stockholm: Doctoral Dissertation, Royal Institute of Technology.
  10. Samuelsson, Christer (1996). "Example-Based Optimization of Surface-Generation Tables". in R. Mitkov and N. Nicolov (eds.) "Recent Advances in Natural Language Processing," vol. 136 of "Current Issues in Linguistic Theory": John Benjamins, Amsterdam.{{cite news}}: CS1 maint: location (link)
  11. Rayner, Manny; David Carter (1996). "Fast Parsing using Pruning and Grammar Specialization". Procs. ACL, Santa Cruz.
  12. Samuelsson, Christer (1994). "Grammar Specialization through Entropy Thresholds". Procs. ACL, Las Cruces. pp. 188–195.
  13. Cancedda, Nicola; Christer Samuelsson (2000). "Corpus-based Grammar Specialization". Procs 4th Computational Natural Language Learning Workshop.{{cite news}}: CS1 maint: location (link)
  14. Rayner, Manny; Beth Ann Hockey; Pierrette Bouillon (n.d.). Putting Linguistics into Speech Recognition: The Regulus Grammar Compiler. Center for the Study of Language and Information. ISBN   1-57586-526-2.