First-order inductive learner

Last updated April 01, 2019

In machine learning, first-order inductive learner (FOIL) is a rule-based learning algorithm.

Machine learning (ML) is the scientific study of algorithms and statistical models that computer systems use to effectively perform a specific task without using explicit instructions, relying on patterns and inference instead. It is seen as a subset of artificial intelligence. Machine learning algorithms build a mathematical model of sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to perform the task. Machine learning algorithms are used in a wide variety of applications, such as email filtering, and computer vision, where it is infeasible to develop an algorithm of specific instructions for performing the task. Machine learning is closely related to computational statistics, which focuses on making predictions using computers. The study of mathematical optimization delivers methods, theory and application domains to the field of machine learning. Data mining is a field of study within machine learning, and focuses on exploratory data analysis through unsupervised learning. In its application across business problems, machine learning is also referred to as predictive analytics.

Background

Developed in 1990 by Ross Quinlan,^[1] FOIL learns function-free Horn clauses, a subset of first-order predicate calculus. Given positive and negative examples of some concept and a set of background-knowledge predicates, FOIL inductively generates a logical concept definition or rule for the concept. The induced rule must not involve any constants (color(X,red) becomes color(X,Y), red(Y)) or function symbols, but may allow negated predicates; recursive concepts are also learnable.

John Ross Quinlan is a computer science researcher in data mining and decision theory. He has contributed extensively to the development of decision tree algorithms, including inventing the canonical C4.5 and ID3 algorithms. He also contributed to early ILP literature with First Order Inductive Learner (FOIL). He is currently running the company RuleQuest Research which he founded in 1997.

In mathematical logic and logic programming, a Horn clause is a logical formula of a particular rule-like form which gives it useful properties for use in logic programming, formal specification, and model theory. Horn clauses are named for the logician Alfred Horn, who first pointed out their significance in 1951.

In mathematical logic, a predicate is commonly understood to be a Boolean-valued function P: X→ {true, false}, called the predicate on X. However, predicates have many different uses and interpretations in mathematics and logic, and their precise definition, meaning and use will vary from theory to theory. So, for example, when a theory defines the concept of a relation, then a predicate is simply the characteristic function of a relation. However, not all theories have relations, or are founded on set theory, and so one must be careful with the proper definition and semantic interpretation of a predicate.

Like the ID3 algorithm, FOIL hill climbs using a metric based on information theory to construct a rule that covers the data. Unlike ID3, however, FOIL uses a separate-and-conquer method rather than divide-and-conquer, focusing on creating one rule at a time and collecting uncovered examples for the next iteration of the algorithm.^{[ citation needed ]}

In decision tree learning, ID3 is an algorithm invented by Ross Quinlan used to generate a decision tree from a dataset. ID3 is the precursor to the C4.5 algorithm, and is typically used in the machine learning and natural language processing domains.

Information theory studies the quantification, storage, and communication of information. It was originally proposed by Claude Shannon in 1948 to find fundamental limits on signal processing and communication operations such as data compression, in a landmark paper entitled "A Mathematical Theory of Communication". Applications of fundamental topics of information theory include lossless data compression, lossy data compression, and channel coding. Its impact has been crucial to the success of the Voyager missions to deep space, the invention of the compact disc, the feasibility of mobile phones, the development of the Internet, the study of linguistics and of human perception, the understanding of black holes, and numerous other fields.

Algorithm

The FOIL algorithm is as follows:

InputList of examples

OutputRule in first-order predicate logic

FOIL(Examples)

Let Pos be the positive examples

Let Pred be the predicate to be learned

Until Pos is empty do:

Let Neg be the negative examples

Set Body to empty

Call LearnClauseBody

Add Pred ← Body to the rule

Remove from Pos all examples which satisfy Body

Procedure LearnClauseBody

Until Neg is empty do:

Choose a literal L

Conjoin L to Body

Remove from Neg examples that do not satisfy L

Example

Suppose FOIL's task is to learn the concept grandfather(X,Y) given the relations father(X,Y) and parent(X,Y). Furthermore, suppose our current Body consists of grandfather(X,Y) ← parent(X,Z). This can be extended by conjoining Body with any of the literals father(X,X), father(Y,Z), parent(U,Y), or many others – to create this literal, the algorithm must choose both a predicate name and a set of variables for the predicate (at least one of which is required to be present already in an unnegated literal of the clause). If FOIL extends a clause grandfather(X,Y) ← true by conjoining the literal parent(X,Z), it is introducing the new variable Z. Positive examples now consist of those values <X,Y,Z> such that grandfather(X,Y) is true and parent(X,Z) is true; negative examples are those where grandfather(X,Y) is true but parent(X,Z) is false.

On the next iteration of FOIL after parent(X,Z) has been added, the algorithm will consider all combinations of predicate names and variables such that at least one variable in the new literal is present in the existing clause. This results in a very large search space.^[2] Several extensions of the FOIL theory have shown that additions to the basic algorithm may reduce this search space, sometimes drastically.^{[ citation needed ]}

Extensions

The FOCL algorithm^[3] (First Order Combined Learner) extends FOIL in a variety of ways, which affect how FOCL selects literals to test while extending a clause under construction. Constraints on the search space are allowed, as are predicates that are defined on a rule rather than on a set of examples (called intensional predicates); most importantly a potentially incorrect hypothesis is allowed as an initial approximation to the predicate to be learned. The main goal of FOCL is to incorporate the methods of explanation-based learning (EBL) into the empirical methods of FOIL.

Explanation-based learning (EBL) is a form of machine learning that exploits a very strong, or even perfect, domain theory in order to make generalizations or form concepts from training examples.

Even when no additional knowledge is provided to FOCL over FOIL, however, it utilizes an iterative widening search strategy similar to depth-first search: first FOCL attempts to learn a clause by introducing no free variables. If this fails (no positive gain), one additional free variable per failure is allowed until the number of free variables exceeds the maximum used for any predicate.

In computer science, iterative deepening search or more specifically iterative deepening depth-first search is a state space/graph search strategy in which a depth-limited version of depth-first search is run repeatedly with increasing depth limits until the goal is found. IDDFS is optimal like breadth-first search, but uses much less memory; at each iteration, it visits the nodes in the search tree in the same order as depth-first search, but the cumulative order in which nodes are first visited is effectively breadth-first.

Constraints

Unlike FOIL, which does not put typing constraints on its variables, FOCL uses typing as an inexpensive way of incorporating a simple form of background knowledge. For example, a predicate livesAt(X,Y) may have types livesAt(person, location). Additional predicates may need to be introduced, though – without types, nextDoor(X,Y) could determine whether person X and person Y live next door to each other, or whether two locations are next door to each other. With types, two different predicates nextDoor(person, person) and nextDoor(location, location) would need to exist for this functionality to be maintained. However, this typing mechanism eliminates the need for predicates such as isPerson(X) or isLocation(Y), and need not consider livesAt(A,B) when A and B are defined to be person variables, reducing the search space. Additionally, typing can improve the accuracy of the resulting rule by eliminating from consideration impossible literals such as livesAt(A,B) which may nevertheless appear to have a high information gain.

Rather than implementing trivial predicates such as equals(X,X) or between(X,X,Y), FOCL introduces implicit constraints on variables, further reducing search space. Some predicates must have all variables unique, others must have commutativity (adjacent(X,Y) is equivalent to adjacent(Y,X)), still others may require that a particular variable be present in the current clause, and many other potential constraints.

Operational rules

Operational rules are those rules which are defined extensionally, or as a list of tuples for which a predicate is true. FOIL allows only operational rules; FOCL extends its knowledge base to allow combinations of rules called non-operational rules as well as partially defined or incorrect rules for robustness. Allowing for partial definitions reduces the amount of work needed as the algorithm need not generate these partial definitions for itself, and the incorrect rules do not add significantly to the work needed since they are discarded if they are not judged to provide positive information gain. Non-operational rules are advantageous as the individual rules which they combine may not provide information gain on their own, but are useful when taken in conjunction. If a literal with the most information gain in an iteration of FOCL is non-operational, it is operationalized and its definition is added to the clause under construction.

InputsLiteral to be operationalized, List of positive examples, List of negative examples

OutputLiteral in operational form

Operationalize(Literal, Positive examples, Negative examples)

If Literal is operational

Return Literal

Initialize OperationalLiterals to the empty set

For each clause in the definition of Literal

Compute information gain of the clause over Positive examples and Negative examples

For the clause with the maximum gain

For each literal L in the clause

Add Operationalize(L, Positive examples, Negative examples) to OperationalLiterals

An operational rule might be the literal lessThan(X,Y); a non-operational rule might be between(X,Y,Z) ← lessThan(X,Y), lessThan(Y,Z).

Initial rules

The addition of non-operational rules to the knowledge base increases the size of the space which FOCL must search. Rather than simply providing the algorithm with a target concept (e.g. grandfather(X,Y)), the algorithm takes as input a set of non-operational rules which it tests for correctness and operationalizes for its learned concept. A correct target concept will clearly improve computational time and accuracy, but even an incorrect concept will give the algorithm a basis from which to work and improve accuracy and time.^[3]

Related Research Articles

In computer science, the Boolean satisfiability problem is the problem of determining if there exists an interpretation that satisfies a given Boolean formula. In other words, it asks whether the variables of a given Boolean formula can be consistently replaced by the values TRUE or FALSE in such a way that the formula evaluates to TRUE. If this is the case, the formula is called satisfiable. On the other hand, if no such assignment exists, the function expressed by the formula is FALSE for all possible variable assignments and the formula is unsatisfiable. For example, the formula "a AND NOT b" is satisfiable because one can find the values a = TRUE and b = FALSE, which make = TRUE. In contrast, "a AND NOT a" is unsatisfiable.

Logic programming is a type of programming paradigm which is largely based on formal logic. Any program written in a logic programming language is a set of sentences in logical form, expressing facts and rules about some problem domain. Major logic programming language families include Prolog, Answer set programming (ASP) and Datalog. In all of these languages, rules are written in the form of clauses:

Inductive logic programming (ILP) is a subfield of symbolic artificial intelligence which uses logic programming as a uniform representation for examples, background knowledge and hypotheses. Given an encoding of the known background knowledge and a set of examples represented as a logical database of facts, an ILP system will derive a hypothesised logic program which entails all the positive and none of the negative examples.

In boolean logic, a disjunctive normal form (DNF) is a standardization of a logical formula which is a disjunction of conjunctive clauses; it can also be described as an OR of ANDs, a sum of products, or a cluster concept. As a normal form, it is useful in automated theorem proving.

In Boolean logic, a formula is in conjunctive normal form (CNF) or clausal normal form if it is a conjunction of one or more clauses, where a clause is a disjunction of literals; otherwise put, it is an AND of ORs. As a normal form, it is useful in automated theorem proving. It is similar to the product of sums form used in circuit theory.

Stratification has several usages in mathematics.

Datalog is a declarative logic programming language that syntactically is a subset of Prolog. It is often used as a query language for deductive databases. In recent years, Datalog has found new application in data integration, information extraction, networking, program analysis, security, and cloud computing.

CARINE is a first-order classical logic automated theorem prover.

Answer set programming (ASP) is a form of declarative programming oriented towards difficult search problems. It is based on the stable model semantics of logic programming. In ASP, search problems are reduced to computing stable models, and answer set solvers—programs for generating stable models—are used to perform search. The computational process employed in the design of many answer set solvers is an enhancement of the DPLL algorithm and, in principle, it always terminates.

In mathematical logic and automated theorem proving, resolution is a rule of inference leading to a refutation theorem-proving technique for sentences in propositional logic and first-order logic. In other words, iteratively applying the resolution rule in a suitable way allows for telling whether a propositional formula is satisfiable and for proving that a first-order formula is unsatisfiable. Attempting to prove a satisfiable first-order formula as unsatisfiable may result in a nonterminating computation; this problem doesn't occur in propositional logic.

Constraint logic programming is a form of constraint programming, in which logic programming is extended to include concepts from constraint satisfaction. A constraint logic program is a logic program that contains constraints in the body of clauses. An example of a clause including a constraint is A(X,Y):-X+Y>0,B(X),C(Y) . In this clause, X+Y>0 is a constraint; A(X,Y), B(X), and C(Y) are literals as in regular logic programming. This clause states one condition under which the statement A(X,Y) holds: X+Y is greater than zero and both B(X) and C(Y) are true.

In mathematical logic, a literal is an atomic formula (atom) or its negation. The definition mostly appears in proof theory, e.g. in conjunctive normal form and the method of resolution.

The Rule Interchange Format (RIF) is a W3C Recommendation. RIF is part of the infrastructure for the semantic web, along with (principally) SPARQL, RDF and OWL. Although originally envisioned by many as a "rules layer" for the semantic web, in reality the design of RIF is based on the observation that there are many "rules languages" in existence, and what is needed is to exchange rules between them.

B-Prolog is a high-performance implementation of the standard Prolog language with several extended features including matching clauses, action rules for event handling, finite-domain constraint solving, arrays and hash tables, declarative loops, and tabling. First released in 1994, B-Prolog is now a widely used CLP system. The constraint solver of B-Prolog was ranked top in two categories in the Second International Solvers Competition, and it also took the second place in P class in the second ASP solver competition and the second place overall in the third ASP solver competition. B-Prolog underpins the PRISM system, a logic-based probabilistic reasoning and learning system. B-Prolog is a commercial product, but it can be used for learning and non-profit research purposes free of charge.

An alternating decision tree (ADTree) is a machine learning method for classification. It generalizes decision trees and has connections to boosting.

SLD resolution is the basic inference rule used in logic programming. It is a refinement of resolution, which is both sound and refutation complete for Horn clauses.

The syntax and semantics of the Prolog programming language are the set of rules that defines how a Prolog program is written and how it is interpreted. The rules are laid out in ISO standard ISO/IEC 13211 although there are differences in the Prolog implementations.

Range concatenation grammar (RCG) is a grammar formalism developed by Pierre Boullier in 1998 as an attempt to characterize a number of phenomena of natural language, such as Chinese numbers and German word order scrambling, which are outside the bounds of the Mildly context-sensitive languages.

In computer science, conflict-driven clause learning (CDCL) is an algorithm for solving the Boolean satisfiability problem (SAT). Given a Boolean formula, the SAT problem asks for an assignment of variables so that the entire formula evaluates to true. The internal workings of CDCL SAT solvers were inspired by DPLL solvers.

References

http://www.csc.liv.ac.uk/~frans/KDD/Software/FOIL_PRM_CPAR/foil.html

↑ J.R. Quinlan. Learning Logical Definitions from Relations. Machine Learning, Volume 5, Number 3, 1990.
↑ Let Var be the largest number of distinct variables for any clause in rule R, excluding the last conjunct. Let MaxP be the number of predicates with largest arity MaxA. Then an approximation of the number of nodes generated to learn R is: NodesSearched ≤ 2 * MaxP * (Var + MaxA – 1)^MaxA, as shown in Pazzani and Kibler (1992).
1 2 Michael Pazzani and Dennis Kibler. The Utility of Knowledge in Inductive Learning. Machine Learning, Volume 9, Number 1, 1992.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Quinlan1990-1] J.R. Quinlan. Learning Logical Definitions from Relations. Machine Learning, Volume 5, Number 3, 1990.

[2] Let Var be the largest number of distinct variables for any clause in rule R, excluding the last conjunct. Let MaxP be the number of predicates with largest arity MaxA. Then an approximation of the number of nodes generated to learn R is: NodesSearched ≤ 2 * MaxP * (Var + MaxA – 1)^MaxA, as shown in Pazzani and Kibler (1992).

[Pazzani-3] 1 2 Michael Pazzani and Dennis Kibler. The Utility of Knowledge in Inductive Learning. Machine Learning, Volume 9, Number 1, 1992.