Syntax and semantics of logic programming

Last updated

Logic programming is a programming paradigm that includes languages based on formal logic, including Datalog and Prolog. This article describes the syntax and semantics of the purely declarative subset of these languages. Confusingly, the name "logic programming" also refers to a specific programming language that roughly corresponds to the declarative subset of Prolog. Unfortunately, the term must be used in both senses in this article.

Contents

Declarative logic programs consist entirely of rules of the form

H:-B1,...,BN.

Each such rule can be read as an implication:

meaning "If each is true, then is true". Logic programs compute the set of facts that are implied by their rules.

Many implementations of Datalog, Prolog, and related languages add procedural features such as Prolog's cut operator or extra-logical features such as a foreign function interface. The formal semantics of such extensions are beyond the scope of this article.

Datalog

Datalog is the simplest widely-studied logic programming language. There are three major definitions of the semantics of Datalog, and they are all equivalent. The syntax and semantics of other logic programming languages are extensions and generalizations of those of Datalog.

Syntax

A Datalog program consists of a list of rules (Horn clauses). [1] If constant and variable are two countable sets of constants and variables respectively and relation is a countable set of predicate symbols, then the following BNF grammar expresses the structure of a Datalog program:

<program>::=<rule><program> | "" <rule>::=<atom> ":-" <atom-list> "." <atom>::=<relation> "(" <term-list> ")" <atom-list>::=<atom> | <atom> "," <atom-list> | "" <term>::=<constant> | <variable><term-list>::=<term> | <term> "," <term-list> | "" 

Atoms are also referred to as literals. The atom to the left of the :- symbol is called the head of the rule; the atoms to the right are the body. Every Datalog program must satisfy the condition that every variable that appears in the head of a rule also appears in the body (this condition is sometimes called the range restriction). [1] [2]

Rules with empty bodies are called facts. For example, the following rule is a fact:

r(x):-.

Syntactic sugar

Many implementations of logic programming extend the above grammar to allow writing facts without the :-, like so:

r(x).

Many also allow writing 0-ary relations without parentheses, like so:

p:-q.

These are merely abbreviations (syntactic sugar); they have no impact on the semantics of the program.

Example

The following program computes the relation path, which is the transitive closure of the relation edge.

edge(x,y).edge(y,z).path(A,B):-edge(A,B).path(A,C):-path(A,B),edge(B,C).

Semantics

There are three widely-used approaches to the semantics of Datalog programs: model-theoretic, fixed-point, and proof-theoretic. These three approaches can be proven to be equivalent. [3]

An atom is called ground if none of its subterms are variables. Intuitively, each of the semantics define the meaning of a program to be the set of all ground atoms that can be deduced from the rules of the program, starting from the facts.

Model theoretic

Hasse diagram of Herbrand interpretations of the Datalog program
e(x, y). e(y, z). p(A, B) :- e(A, B). p(A, C) :- p(A, B), e(B, C).
The interpretation
M
{\displaystyle M}
is the minimal Herbrand model. All interpretations above it are also models, all interpretations below it are not models. Herbrand-model-hasse.png
Hasse diagram of Herbrand interpretations of the Datalog program
e(x,y).e(y,z).p(A,B):-e(A,B).p(A,C):-p(A,B),e(B,C).
The interpretation is the minimal Herbrand model. All interpretations above it are also models, all interpretations below it are not models.

A rule is called ground if all of its atoms (head and body) are ground. A ground rule R1 is a ground instance of another rule R2 if R1 is the result of a substitution of constants for all the variables in R2.

The Herbrand base of a Datalog program is the set of all ground atoms that can be made with the constants appearing in the program. An interpretation (also known as a database instance) is a subset of the Herbrand base. A ground atom is true in an interpretation I if it is an element of I. A rule is true in an interpretation I if for each ground instance of that rule, if all the clauses in the body are true in I, then the head of the rule is also true in I.

A model of a Datalog program P is an interpretation I of P which contains all the ground facts of P, and makes all of the rules of P true in I. Model-theoretic semantics state that the meaning of a Datalog program is its minimal model (equivalently, the intersection of all its models). [4]

For example, this program:

edge(x,y).edge(y,z).path(A,B):-edge(A,B).path(A,C):-path(A,B),edge(B,C).

has this Herbrand universe: x, y, z

and this Herbrand base: edge(x, x), edge(x, y), ..., edge(z, z), path(x, x), ..., path(z, z)

and this minimal Herbrand model: edge(x, y), edge(y, z), path(x, y), path(y, z), path(x, z)

Fixed-point

Let I be the set of interpretations of a Datalog program P, that is, I = P(H), where H is the Herbrand base of P and P is the powerset operator. The immediate consequence operator for P is the following map T from I to I: For each ground instance of each rule in P, if every clause in the body is in the input interpretation, then add the head of the ground instance to the output interpretation. This map T is monotonic with respect to the partial order given by subset inclusion on T. By the Knaster–Tarski theorem, this map has a least fixed point; by the Kleene fixed-point theorem the fixed point is the supremum of the chain . The least fixed point of M coincides with the minimal Herbrand model of the program. [5]

The fixpoint semantics suggest an algorithm for computing the minimal Herbrand model: Start with the set of ground facts in the program, then repeatedly add consequences of the rules until a fixpoint is reached. This algorithm is called naïve evaluation.

Proof-theoretic

Proof tree showing the derivation of the ground atom path(x, z) from the program
edge(x, y). edge(y, z). path(A, B) :- edge(A, B). path(A, C) :- path(A, B), edge(B, C). Proof tree for Datalog transitive closure computation.svg
Proof tree showing the derivation of the ground atom path(x, z) from the program
edge(x,y).edge(y,z).path(A,B):-edge(A,B).path(A,C):-path(A,B),edge(B,C).

Given a program P, a proof tree of a ground atom A is a tree with a root labeled by A, leaves labeled by ground atoms from the heads of facts in P, and branches with children labeled by ground atoms G such that there exists a ground instance

G :- A1, …, An.

of a rule in P. The proof-theoretic semantics defines the meaning of a Datalog program to be the set of ground atoms that can be defived from such trees. This set coincides with the minimal Herbrand model. [6]

One might be interested in knowing whether or not a particular ground atom appears in the minimal Herbrand model of a Datalog program, perhaps without caring much about the rest of the model. A top-down reading of the proof trees described above suggests an algorithm for computing the results of such queries, such a reading informs the SLD resolution algorithm, which forms the basis for the evaluation of Prolog.

Other approaches

The semantics of Datalog have also been studied in the context of fixpoints over more general semirings. [7]

Logic programming

While the name "logic programming" is used to refer to the entire paradigm of programming languages including Datalog and Prolog, when discussing formal semantics, it generally refers to an extension of Datalog with function symbols. Logic programs are also called Horn clause programs. Logic programming as discussed in this article is closely related to the "pure" or declarative subset of Prolog.

Syntax

The syntax of logic programming extends the syntax of Datalog with function symbols. Logic programming drops the range restriction, allowing variables to appear in the heads of rules that do not appear in their bodies. [8]

Semantics

Due to the presence of function symbols, the Herbrand models of logic programs can be infinite. However, the semantics of a logic program is still defined to be its minimal Herbrand model. Relatedly, the fixpoint of the immediate consequence operator may not converge in a finite number of steps (or to a finite set). However, any ground atom in the minimal Herbrand model will have a finite proof tree. This is why Prolog is evaluated top-down. [8] Just as in Datalog, the three semantics can be proven equivalent.

Negation

Logic programming has the desirable property that all three major definitions of the semantics of logic programs agree. In contrast, there are many conflicting proposals for the semantics of logic programs with negation. The source of the disagreement is that logic programs have a unique minimal Herbrand model, but in general, logic programming (or even Datalog) programs with negation do not.

Syntax

Negation is written not, and can appear in front of any atom in the body of a rule.

<atom-list>::=<atom> | "not" <atom> | <atom> "," <atom-list> | "" 

Semantics

Stratified negation

A logic program with negation is stratified when it is possible to assign each relation to some stratum, such that if a relation R appears negated in the body of a relation S, then R is in a lower stratum than S. [9] The model-theoretic and fixed-point semantics of Datalog can be extended to handle stratified negation, and such extensions can be proved equivalent.

Many implementations of Datalog use a bottom-up evaluation model inspired by the fixed point semantics. Since this semantics can handle stratified negation, several implementations of Datalog implement stratified negation.

While stratified negation is a common extension to Datalog, there are reasonable programs that cannot be stratified. The following program describes a two-player game where a player wins if their opponent has no moves: [10]

move(a,b).win(X):-move(X,Y),notwin(Y).

This program is not stratified, but it seems reasonable to think that a should win the game.

Completion semantics

Perfect model semantics

Stable model semantics

The stable model semantics define a condition for calling certain Herbrand models of a program stable. Intuitively, stable models are the "possible sets of beliefs that a rational agent might hold, given [the program]" as premises. [11]

A program with negation may have many stable models or no stable models. For instance, the program

p:-notq.q:-notp.

has two stable models , . The one-rule program

p:-notp.

has no stable models.

Every stable model is a minimal Herbrand model. A Datalog program without negation has a single stable model, which is exactly its minimal Herbrand model. The stable model semantics defines the meaning of a logic program with negation to be its stable model, if there is exactly one. However, it can be useful to investigate all (or at least, several) of the stable models of a program; this is the goal of answer set programming.

Well-founded semantics

Further extensions

Several other extensions of Datalog have been proposed and studied, including variants with support for integer constants and functions (including Datalog), [12] [13] inequality constraints in the bodies of rules, and aggregate functions.

Constraint logic programming allows for constraints over domains such as the reals or integers to appear in the bodies of rules.

See also

Related Research Articles

First-order logic—also known as predicate logic, quantificational logic, and first-order predicate calculus—is a collection of formal systems used in mathematics, philosophy, linguistics, and computer science. First-order logic uses quantified variables over non-logical objects, and allows the use of sentences that contain variables, so that rather than propositions such as "Socrates is a man", one can have expressions in the form "there exists x such that x is Socrates and x is a man", where "there exists" is a quantifier, while x is a variable. This distinguishes it from propositional logic, which does not use quantifiers or relations; in this sense, propositional logic is the foundation of first-order logic.

Logic programming is a programming, database and knowledge-representation and reasoning paradigm which is based on formal logic. A program, database or knowledge base in a logic programming language is a set of sentences in logical form, expressing facts and rules about some problem domain. Major logic programming language families include Prolog, Answer Set Programming (ASP) and Datalog. In all of these languages, rules are written in the form of clauses:

Prolog is a logic programming language that has its origins in artificial intelligence and computational linguistics.

In logic and computer science, unification is an algorithmic process of solving equations between symbolic expressions. For example, using x,y,z as variables, the singleton equation set { cons(x,cons(x,nil)) = cons(2,y) } is a syntactic first-order unification problem that has the substitution { x ↦ 2, ycons(2,nil) } as its only solution.

In computer science, declarative programming is a programming paradigm—a style of building the structure and elements of computer programs—that expresses the logic of a computation without describing its control flow.

Stratification has several usages in mathematics.

Datalog is a declarative logic programming language. While it is syntactically a subset of Prolog, Datalog generally uses a bottom-up rather than top-down evaluation model. This difference yields significantly different behavior and properties from Prolog. It is often used as a query language for deductive databases. Datalog has been applied to problems in data integration, networking, program analysis, and more.

In computer science, the occurs check is a part of algorithms for syntactic unification. It causes unification of a variable V and a structure S to fail if S contains V.

A deductive database is a database system that can make deductions based on rules and facts stored in its database. Datalog is the language typically used to specify facts, rules and queries in deductive databases. Deductive databases have grown out of the desire to combine logic programming with relational databases to construct systems that support a powerful formalism and are still fast and able to deal with very large datasets. Deductive databases are more expressive than relational databases but less expressive than logic programming systems such as Prolog. In recent years, deductive databases have found new application in data integration, information extraction, networking, program analysis, security, and cloud computing.

In mathematical logic, a ground term of a formal system is a term that does not contain any variables. Similarly, a ground formula is a formula that does not contain any variables.

Answer set programming (ASP) is a form of declarative programming oriented towards difficult search problems. It is based on the stable model semantics of logic programming. In ASP, search problems are reduced to computing stable models, and answer set solvers—programs for generating stable models—are used to perform search. The computational process employed in the design of many answer set solvers is an enhancement of the DPLL algorithm and, in principle, it always terminates.

Negation as failure is a non-monotonic inference rule in logic programming, used to derive from failure to derive . Note that can be different from the statement of the logical negation of , depending on the completeness of the inference algorithm and thus also on the formal logic system.

The concept of a stable model, or answer set, is used to define a declarative semantics for logic programs with negation as failure. This is one of several standard approaches to the meaning of negation in logic programming, along with program completion and the well-founded semantics. The stable model semantics is the basis of answer set programming.

In computer science, the well-founded semantics is a three-valued semantics for logic programming, which gives a precise meaning to general logic programs.

Abductive logic programming (ALP) is a high-level knowledge-representation framework that can be used to solve problems declaratively, based on abductive reasoning. It extends normal logic programming by allowing some predicates to be incompletely defined, declared as abducible predicates. Problem solving is effected by deriving hypotheses on these abducible predicates as solutions of problems to be solved. These problems can be either observations that need to be explained or goals to be achieved. It can be used to solve problems in diagnosis, planning, natural language and machine learning. It has also been used to interpret negation as failure as a form of abductive reasoning.

Flix is a functional, imperative, and logic programming language developed at Aarhus University, with funding from the Independent Research Fund Denmark, and by a community of open source contributors. The Flix language supports algebraic data types, pattern matching, parametric polymorphism, currying, higher-order functions, extensible records, channel and process-based concurrency, and tail call elimination. Two notable features of Flix are its type and effect system and its support for first-class Datalog constraints.

ProbLog is a probabilistic logic programming language that extends Prolog with probabilities. It minimally extends Prolog by adding the notion of a probabilistic fact, which combines the idea of logical atoms and random variables. Similarly to Prolog, ProbLog can query an atom. While Prolog returns the truth value of the queried atom, ProbLog returns the probability of it being true.

The Vadalog system is a Knowledge Graph Management System (KGMS) that offers a language for performing complex logic reasoning tasks over knowledge graphs. At the same time, Vadalog delivers a platform to support the entire spectrum of data science tasks: data integration, pre-processing, statistical analysis, machine learning, algorithmic modeling, probabilistic reasoning and temporal reasoning. Its language is based on an extension of the rule-based language Datalog, Warded Datalog±, a high-performance language using an aggressive termination control strategy. Vadalog can support the entire spectrum of data science activities and tools. The system can read from and connect to multiple sources, from relational databases, such as PostgreSQL and MySQL, to graph databases, such as Neo4j, as well as make use of machine learning tools, and a web data extraction tool, OXPath. Additional Python libraries and extensions can also be easily integrated into the system.

Disjunctive Datalog is an extension of the logic programming language Datalog that allows disjunctions in the heads of rules. This extension enables disjunctive Datalog to express several NP-hard problems that are not known to be expressable in plain Datalog. Disjunctive Datalog has been applied in the context of reasoning about ontologies in the semantic web. DLV is an implementation of disjunctive Datalog.

DatalogZ is an extension of Datalog with integer arithmetic and comparisons. The decision problem of whether or not a given ground atom (fact) is entailed by a DatalogZ program is RE-complete, which can be shown by a reduction to diophantine equations.

References

Notes

  1. 1 2 Ceri, Gottlob & Tanca 1989, p. 146.
  2. Eisner, Jason; Filardo, Nathaniel W. (2011). de Moor, Oege; Gottlob, Georg; Furche, Tim; Sellers, Andrew (eds.). Dyna: Extending Datalog for Modern AI. Datalog Reloaded, First International Workshop, Datalog 2010, Oxford, UK, March 16-19, 2010. Lecture Notes in Computer Science. Vol. 6702. Berlin, Heidelberg: Springer. pp. 181–220. doi:10.1007/978-3-642-24206-9_11. ISBN   978-3-642-24206-9.
  3. van Emden, M. H.; Kowalski, R. A. (1976-10-01). "The Semantics of Predicate Logic as a Programming Language". Journal of the ACM. 23 (4): 733–742. doi: 10.1145/321978.321991 . ISSN   0004-5411. S2CID   11048276.
  4. Ceri, Gottlob & Tanca 1989, p. 149.
  5. Ceri, Gottlob & Tanca 1989, p. 150.
  6. Abiteboul, Serge (1996). Foundations of databases. Addison-Wesley. ISBN   0-201-53771-0. OCLC   247979782.
  7. Khamis, Mahmoud Abo; Ngo, Hung Q.; Pichler, Reinhard; Suciu, Dan; Wang, Yisu Remy (2023-02-01). "Convergence of Datalog over (Pre-) Semirings". arXiv: 2105.14435 [cs.DB].
  8. 1 2 Abiteboul 1996, p. 299.
  9. Halevy, Alon Y.; Mumick, Inderpal Singh; Sagiv, Yehoshua; Shmueli, Oded (2001-09-01). "Static analysis in datalog extensions". Journal of the ACM. 48 (5): 971–1012. doi:10.1145/502102.502104. ISSN   0004-5411. S2CID   18868009.
  10. Leone, N; Rullo, P (1992-01-01). "Safe computation of the well-founded semantics of datalog queries". Information Systems. 17 (1): 17–31. doi:10.1016/0306-4379(92)90003-6. ISSN   0306-4379.
  11. Gelfond, Michael; Lifschitz, Vladimir (1988). "The Stable Model Semantics for Logic Programming". In Kowalski, Robert; Bowen, Kenneth (eds.). Proceedings of International Logic Programming Conference and Symposium. MIT Press. pp. 1070–1080.
  12. Kaminski, Mark; Grau, Bernardo Cuenca; Kostylev, Egor V.; Motik, Boris; Horrocks, Ian (2017-11-12). "Foundations of Declarative Data Analysis Using Limit Datalog Programs". arXiv: 1705.06927 [cs.AI].
  13. Grau, Bernardo Cuenca; Horrocks, Ian; Kaminski, Mark; Kostylev, Egor V.; Motik, Boris (2020-02-25). "Limit Datalog: A Declarative Query Language for Data Analysis". ACM SIGMOD Record. 48 (4): 6–17. doi:10.1145/3385658.3385660. ISSN   0163-5808. S2CID   211520719.

Sources