Datalog

Last updated
Datalog
Paradigm Logic, Declarative
Family Prolog
First appeared1977;47 years ago (1977)
Typing discipline Weak
Dialects
Datomic, .QL, Soufflé, XTDB, etc.
Influenced by
Prolog
Influenced
SQL
Datalog
Filename extension
.dl
Internet media type
Website datalog-specs.info

Datalog is a declarative logic programming language. While it is syntactically a subset of Prolog, Datalog generally uses a bottom-up rather than top-down evaluation model. This difference yields significantly different behavior and properties from Prolog. It is often used as a query language for deductive databases. Datalog has been applied to problems in data integration, networking, program analysis, and more.

Contents

Example

A Datalog program consists of facts, which are statements that are held to be true, and rules, which say how to deduce new facts from known facts. For example, here are two facts that mean xerces is a parent of brooke and brooke is a parent of damocles:

parent(xerces,brooke).parent(brooke,damocles).

The names are written in lowercase because strings beginning with an uppercase letter stand for variables. Here are two rules:

ancestor(X,Y):-parent(X,Y).ancestor(X,Y):-parent(X,Z),ancestor(Z,Y).

The :- symbol is read as "if", and the comma is read "and", so these rules mean:

The meaning of a program is defined to be the set of all of the facts that can be deduced using the initial facts and the rules. This program's meaning is given by the following facts:

parent(xerces,brooke).parent(brooke,damocles).ancestor(xerces,brooke).ancestor(brooke,damocles).ancestor(xerces,damocles).

Some Datalog implementations don't deduce all possible facts, but instead answer queries:

?-ancestor(xerces,X).

This query asks: Who are all the X that xerces is an ancestor of? For this example, it would return brooke and damocles.

Comparison to relational databases

The non-recursive subset of Datalog is closely related to query languages for relational databases, such as SQL. The following table maps between Datalog, relational algebra, and SQL concepts:

DatalogRelational algebraSQL
RelationRelation Table
Fact Tuple Row
Rulen/a Materialized view
QuerySelect Query

More formally, non-recursive Datalog corresponds precisely to unions of conjunctive queries, or equivalently, negation-free relational algebra.

Schematic translation from non-recursive Datalog into SQL
s(x,y).t(y).r(A,B):-s(A,B),t(B).
CREATETABLEs(z0TEXTNONNULL,z1TEXTNONNULL,PRIMARYKEY(z0,z1));CREATETABLEt(z0TEXTNONNULLPRIMARYKEY);INSERTINTOsVALUES('x','y');INSERTINTOtVALUES('y');CREATEVIEWrASSELECTs.z0,s.z1FROMs,tWHEREs.z1=t.z0;

Syntax

A Datalog program consists of a list of rules (Horn clauses). [1] If constant and variable are two countable sets of constants and variables respectively and relation is a countable set of predicate symbols, then the following BNF grammar expresses the structure of a Datalog program:

<program>::=<rule><program> | "" <rule>::=<atom> ":-" <atom-list> "." <atom>::=<relation> "(" <term-list> ")" <atom-list>::=<atom> | <atom> "," <atom-list> | "" <term>::=<constant> | <variable><term-list>::=<term> | <term> "," <term-list> | "" 

Atoms are also referred to as literals. The atom to the left of the :- symbol is called the head of the rule; the atoms to the right are the body. Every Datalog program must satisfy the condition that every variable that appears in the head of a rule also appears in the body (this condition is sometimes called the range restriction). [1] [2]

There are two common conventions for variable names: capitalizing variables, or prefixing them with a question mark ?. [3]

Note that under this definition, Datalog does not include negation nor aggregates; see § Extensions for more information about those constructs.

Rules with empty bodies are called facts. For example, the following rule is a fact:

r(x):-.

The set of facts is called the extensional database or EDB of the Datalog program. The set of tuples computed by evaluating the Datalog program is called the intensional database or IDB.

Syntactic sugar

Many implementations of logic programming extend the above grammar to allow writing facts without the :-, like so:

r(x).

Some also allow writing 0-ary relations without parentheses, like so:

p:-q.

These are merely abbreviations (syntactic sugar); they have no impact on the semantics of the program.

Semantics

There are three widely-used approaches to the semantics of Datalog programs: model-theoretic, fixed-point, and proof-theoretic. These three approaches can be proven equivalent. [4]

An atom is called ground if none of its subterms are variables. Intuitively, each of the semantics define the meaning of a program to be the set of all ground atoms that can be deduced from the rules of the program, starting from the facts.

Model theoretic

A rule is called ground if all of its atoms (head and body) are ground. A ground rule R1 is a ground instance of another rule R2 if R1 is the result of a substitution of constants for all the variables in R2. The Herbrand base of a Datalog program is the set of all ground atoms that can be made with the constants appearing in the program. The Herbrand model of a Datalog program is the smallest subset of the Herbrand base such that, for each ground instance of each rule in the program, if the atoms in the body of the rule are in the set, then so is the head. [5] The model-theoretic semantics define the minimal Herbrand model to be the meaning of the program.

Fixed-point

Let I be the power set of the Herbrand base of a program P. The immediate consequence operator for P is a map T from I to I that adds all of the new ground atoms that can be derived from the rules of the program in a single step. The least-fixed-point semantics define the least fixed point of T to be the meaning of the program; this coincides with the minimal Herbrand model. [6]

The fixpoint semantics suggest an algorithm for computing the minimal model: Start with the set of ground facts in the program, then repeatedly add consequences of the rules until a fixpoint is reached. This algorithm is called naïve evaluation.

Proof-theoretic

Proof tree showing the derivation of the ground atom path(x, z) from the program
edge(x, y). edge(y, z). path(A, B) :- edge(A, B). path(A, C) :- path(A, B), edge(B, C). Proof tree for Datalog transitive closure computation.svg
Proof tree showing the derivation of the ground atom path(x, z) from the program
edge(x,y).edge(y,z).path(A,B):-edge(A,B).path(A,C):-path(A,B),edge(B,C).

The proof-theoretic semantics defines the meaning of a Datalog program to be the set of facts with corresponding proof trees. Intuitively, a proof tree shows how to derive a fact from the facts and rules of a program.

One might be interested in knowing whether or not a particular ground atom appears in the minimal Herbrand model of a Datalog program, perhaps without caring much about the rest of the model. A top-down reading of the proof trees described above suggests an algorithm for computing the results of such queries. This reading informs the SLD resolution algorithm, which forms the basis for the evaluation of Prolog.

Evaluation

There are many different ways to evaluate a Datalog program, with different performance characteristics.

Bottom-up evaluation strategies

Bottom-up evaluation strategies start with the facts in the program and repeatedly apply the rules until either some goal or query is established, or until the complete minimal model of the program is produced.

Naïve evaluation

Naïve evaluation mirrors the fixpoint semantics for Datalog programs. Naïve evaluation uses a set of "known facts", which is initialized to the facts in the program. It proceeds by repeatedly enumerating all ground instances of each rule in the program. If each atom in the body of the ground instance is in the set of known facts, then the head atom is added to the set of known facts. This process is repeated until a fixed point is reached, and no more facts may be deduced. Naïve evaluation produces the entire minimal model of the program. [7]

Semi-naïve evaluation

Semi-naïve evaluation is a bottom-up evaluation strategy that can be asymptotically faster than naïve evaluation. [8]

Performance considerations

A parallel Datalog engine was evaluated on the Theta supercomputer at Argonne National Laboratory. Theta supercomputer - 389 071 002 (36954713450).jpg
A parallel Datalog engine was evaluated on the Theta supercomputer at Argonne National Laboratory.

Naïve and semi-naïve evaluation both evaluate recursive Datalog rules by repeatedly applying them to a set of known facts until a fixed point is reached. In each iteration, rules are only run for "one step", i.e., non-recursively. As mentioned above, each non-recursive Datalog rule corresponds precisely to a conjunctive query. Therefore, many of the techniques from database theory used to speed up conjunctive queries are applicable to bottom-up evaluation of Datalog, such as

Many such techniques are implemented in modern bottom-up Datalog engines such as Soufflé. Some Datalog engines integrate SQL databases directly. [17]

Bottom-up evaluation of Datalog is also amenable to parallelization. Parallel Datalog engines are generally divided into two paradigms:

Top-down evaluation strategies

SLD resolution is sound and complete for Datalog programs.

Magic sets

Top-down evaluation strategies begin with a query or goal. Bottom-up evaluation strategies can answer queries by computing the entire minimal model and matching the query against it, but this can be inefficient if the answer only depends on a small subset of the entire model. The magic sets algorithm takes a Datalog program and a query, and produces a more efficient program that computes the same answer to the query while still using bottom-up evaluation. [23] A variant of the magic sets algorithm has been shown to produce programs that, when evaluated using semi-naïve evaluation, are as efficient as top-down evaluation. [24]

Complexity

The decision problem formulation of Datalog evaluation is as follows: Given a Datalog program P split into a set of facts (EDB) E and a set of rules R, and a ground atom A, is A in the minimal model of P? In this formulation, there are three variations of the computational complexity of evaluating Datalog programs: [25]

With respect to data complexity, the decision problem for Datalog is P-complete. With respect to program complexity, the decision problem is EXPTIME-complete. In particular, evaluating Datalog programs always terminates; Datalog is not Turing-complete.

Some extensions to Datalog do not preserve these complexity bounds. Extensions implemented in some Datalog engines, such as algebraic data types, can even make the resulting language Turing-complete.

Extensions

Several extensions have been made to Datalog, e.g., to support negation, aggregate functions, inequalities, to allow object-oriented programming, or to allow disjunctions as heads of clauses. These extensions have significant impacts on the language's semantics and on the implementation of a corresponding interpreter.

Datalog is a syntactic subset of Prolog, disjunctive Datalog, answer set programming, DatalogZ, and constraint logic programming. When evaluated as an answer set program, a Datalog program yields a single answer set, which is exactly its minimal model. [26]

Many implementations of Datalog extend Datalog with additional features; see § Datalog engines for more information.

Aggregation

Datalog can be extended to support aggregate functions. [27]

Notable Datalog engines that implement aggregation include:

Negation

Adding negation to Datalog complicates its semantics, leading to whole new languages and strategies for evaluation. For example, the language that results from adding negation with the stable model semantics is exactly answer set programming.

Stratified negation can be added to Datalog while retaining its model-theoretic and fixed-point semantics. Notable Datalog engines that implement stratified negation include:

Comparison to Prolog

Unlike in Prolog, statements of a Datalog program can be stated in any order. Datalog does not have Prolog's cut operator. This makes Datalog a fully declarative language.

In contrast to Prolog, Datalog

This article deals primarily with Datalog without negation (see also Syntax and semantics of logic programming § Extending Datalog with negation). However, stratified negation is a common addition to Datalog; the following list contrasts Prolog with Datalog with stratified negation. Datalog with stratified negation

Expressiveness

Datalog generalizes many other query languages. For instance, conjunctive queries and union of conjunctive queries can be expressed in Datalog. Datalog can also express regular path queries.

When we consider ordered databases, i.e., databases with an order relation on their active domain, then the Immerman–Vardi theorem implies that the expressive power of Datalog is precisely that of the class PTIME: a property can be expressed in Datalog if and only if it is computable in polynomial time. [31]

The boundedness problem for Datalog asks, given a Datalog program, whether it is bounded, i.e., the maximal recursion depth reached when evaluating the program on an input database can be bounded by some constant. In other words, this question asks whether the Datalog program could be rewritten as a nonrecursive Datalog program, or, equivalently, as a union of conjunctive queries. Solving the boundedness problem on arbitrary Datalog programs is undecidable, [32] but it can be made decidable by restricting to some fragments of Datalog.

Datalog engines

Systems that implement languages inspired by Datalog, whether compilers, interpreters, libraries, or embedded DSLs, are referred to as Datalog engines. Datalog engines often implement extensions of Datalog, extending it with additional data types, foreign function interfaces, or support for user-defined lattices. Such extensions may allow for writing non-terminating or otherwise ill-defined programs.[ citation needed ]

Here is a short list of systems that are either based on Datalog or provide a Datalog interpreter:

Free software/open source

List of Datalog engines that are free software and/or open source
NameYear of latest releaseWritten inLicenceData sourcesDescriptionLinks
AbcDatalog2023 Java BSD Datalog engine that implements common evaluation algorithms; designed for extensibility, research use, and education Homepage
Ascent2023 Rust MIT License A logic programming language (similar to Datalog) embedded in Rust via macros, supporting a Lattice and customized datastructure. Repository
bddbddb2007 Java GNU LGPL Datalog implementation designed to query Java bytecode including points-to analysis on large Java programs; using BDDs internally. Homepage
Bloom (Bud)2017 Ruby BSD 3-ClauseRuby DSL for programming with data-centric constructs, based on the Dedalus extension of Datalog which adds a temporal dimension to the logic Homepage Repository
Cascalog2014 Clojure Apache 2.0 can query other DBMS Data processing and querying library for Clojure and Java, designed to be used on Hadoop Repository Homepage (archived)
Clingo2024 C++ MIT License Answer Set Programming system that supports Datalog as a special case; its standalone grounder gringo suffices for plain Datalog Homepage Repository Online demo
ConceptBase2023various BSD 2-Clausedeductive and object-oriented database system for conceptual modeling and metamodeling, which includes a Datalog query evaluator Homepage
Coral1997 C++ proprietary, free for some uses, open sourceA deductive database system written in C++ with semi-naïve datalog evaluation. Developed 1988-1997. Homepage
Crepe2023 Rust Apache 2.0 or MIT Rust library for expressing Datalog-like inferences, based on procedural macros Homepage
Datafrog2019 Rust Apache 2.0 or MIT Lightweight Datalog engine intended to be embedded in other Rust programs Homepage
Datafun2016 Racket open source, no license in repositoryFunctional programming language that generalized Datalog on semilattices Homepage Repository
Datahike2024 Clojure Eclipse Public License 1.0 built-in database (in-memory or file)Fork of DataScript with a durable backend based on a hitchhiker tree, using Datalog as query language Homepage
Datalevin2024 Clojure Eclipse Public License 1.0 LMDB bindingsFork of DataScript optimized for LMDB durable storage, using Datalog as query language Homepage
Datalog (Erlang)2019 Erlang Apache 2.0 Library to support Datalog queries in Erlang, with data represented as streams of tuples Homepage
Datalog (MITRE)2016 Lua GNU LGPL Lightweight deductive database system, designed to be small and usable on memory constrained devices Homepage Online demo
Datalog (OCaml)2019 OCaml BSD 2-clauseIn-memory Datalog implementation for OCaml featuring bottom-up and top-down algorithms Homepage
Datalog (Racket)2022 Racket Apache 2.0 or MIT Racket package for using Datalog Homepage Repository
Datalog Educational System2021 Prolog GNU LGPL DBMS connectorsOpen-source implementation intended for teaching Datalog [33] Homepage
DataScript2024 Clojure Eclipse Public License 1.0 in-memory databaseImmutable database that runs in a browser, using Datalog as query language Homepage
Datomic2024 Clojure closed source; binaries released under Apache 2.0 bindings for DynamoDB, Cassandra, PostgreSQL and othersDistributed database running on cloud architectures; uses Datalog as query language Homepage
DDlog2021 Rust MIT License Incremental, in-memory, typed Datalog engine; compiled in Rust; based on the differential dataflow [34] library Homepage
DLV 2023 C++ proprietary, free for some uses Answer Set Programming system that supports Datalog as a special case Homepage
Company
Dyna12013 Haskell GNU AGPL v3 Declarative programming language using Datalog for statistical AI programming; later Dyna versions do not use Datalog Repository Homepage (archived)
Flix 2024 Java Apache 2.0 Functional and logic programming language inspired by Datalog extended with user-defined lattices and monotone filter/transfer functions Homepage Online demo
Graal2018 Java CeCILL v2.1 RDF import, CSV import, DBMS connectorsJava toolkit dedicated to querying knowledge bases within the framework of existential rules (a.k.a. tuple-generating dependencies or Datalog+/-) Homepage
Inter4QL2020 C++ BSD Interpreter for a database query language based on four-valued logic, supports Datalog as a special case Homepage
IRIS2016 Java GNU LGPL v2.1Logic programming system supporting Datalog and negation under the well-founded semantics; support for RDFS Repository
Jena 2024 Java Apache 2.0 RDF importSemantic web framework that includes a Datalog implementation as part of its general purpose rule engine; compatibility with RDF Rule engine documentation
Mangle2024 Go Apache 2.0 Programming language for deductive database programming, supporting an extension of Datalog Homepage
Naga2021 Clojure Eclipse Public License 1.0 Asami graph database Query engine that executes Datalog queries over the graph database; runs in browsers (memory), on JVM (memory/files), or natively (memory/files). Homepage
Nemo2024 Rust Apache 2.0 or MIT RDF import, CSV importIn-memory rule engine for knowledge graph analysis and database transformations; compatible with RDF and SPARQL; supports tgds Homepage Online demo
pyDatalog2015 Python GNU LGPL DBMS connectors from PythonPython library for interpreting Datalog queries Homepage Repository
RDFox2024 C++ proprietary, free for some usesin-memory database, RDF import, CSV import, DBMS connectorsMain-memory based RDF triple store with Datalog reasoning; supports incremental evaluation and high availability setups Homepage
SociaLite2016 Java Apache 2.0 HDFS bindingsDatalog variant and engine for large-scale graph analysis Homepage (archived) Repository
Soufflé 2023 C++ UPL v1.0 CSV import, sqlite3 bindingsDatalog engine originally designed for applications static program analysis; rule sets are either compiled to C++ programs or interpreted Homepage
tclbdd2015 Tcl BSD Datalog implementation based on binary decision diagrams; designed to support development of an optimizing compiler for Tcl [35] Homepage
TerminusDB 2024 Prolog/Rust Apache 2.0 Graph database and document store, that also features a Datalog-based query language Homepage
XSB 2022 C GNU LGPL A logic programming and deductive database system based on Prolog with tabling giving Datalog-like termination and efficiency, including incremental evaluation [36] Homepage
XTDB (formerly Crux)2024 Clojure MPL 2.0 bindings for Apache Kafka and othersImmutable database with time-travel, Datalog used as query language in XTDB 1.x (may change in XTDB 2.x) Homepage Repository

Non-free software

Uses and influence

Datalog is quite limited in its expressivity. It is not Turing-complete, and doesn't include basic data types such as integers or strings. This parsimony is appealing from a theoretical standpoint, but it means Datalog per se is rarely used as a programming language or knowledge representation language. [41] Most Datalog engines implement substantial extensions of Datalog. However, Datalog has a strong influence on such implementations, and many authors don't bother to distinguish them from Datalog as presented in this article. Accordingly, the applications discussed in this section include applications of realistic implementations of Datalog-based languages.

Datalog has been applied to problems in data integration, information extraction, networking, security, cloud computing and machine learning. [42] [43] Google has developed an extension to Datalog for big data processing. [44]

Datalog has seen application in static program analysis. [45] The Soufflé dialect has been used to write pointer analyses for Java and a control-flow analysis for Scheme. [46] [47] Datalog has been integrated with SMT solvers to make it easier to write certain static analyses. [48] The Flix dialect is also suited to writing static program analyses. [49]

Some widely used database systems include ideas and algorithms developed for Datalog. For example, the SQL:1999 standard includes recursive queries, and the Magic Sets algorithm (initially developed for the faster evaluation of Datalog queries) is implemented in IBM's DB2. [50]

History

The origins of Datalog date back to the beginning of logic programming, but it became prominent as a separate area around 1977 when Hervé Gallaire and Jack Minker organized a workshop on logic and databases. [51] David Maier is credited with coining the term Datalog. [52]

See also

Notes

  1. 1 2 Ceri, Gottlob & Tanca 1989, p. 146.
  2. Eisner, Jason; Filardo, Nathaniel W. (2011). "Dyna: Extending Datalog for Modern AI". In de Moor, Oege; Gottlob, Georg; Furche, Tim; Sellers, Andrew (eds.). Datalog Reloaded. Lecture Notes in Computer Science. Vol. 6702. Berlin, Heidelberg: Springer. pp. 181–220. doi:10.1007/978-3-642-24206-9_11. ISBN   978-3-642-24206-9.
  3. Maier, David; Tekle, K. Tuncay; Kifer, Michael; Warren, David S. (2018-09-01), "Datalog: concepts, history, and outlook", Declarative Logic Programming: Theory, Systems, and Applications, vol. 20, Association for Computing Machinery and Morgan & Claypool, pp. 3–100, doi:10.1145/3191315.3191317, ISBN   978-1-970001-99-0, S2CID   69379310 , retrieved 2023-03-02
  4. Van Emden, M. H.; Kowalski, R. A. (1976-10-01). "The Semantics of Predicate Logic as a Programming Language". Journal of the ACM . 23 (4): 733–742. doi: 10.1145/321978.321991 . ISSN   0004-5411. S2CID   11048276.
  5. Ceri, Gottlob & Tanca 1989, p. 149.
  6. Ceri, Gottlob & Tanca 1989, p. 150.
  7. Ceri, Gottlob & Tanca 1989, p. 154.
  8. Alvarez-Picallo, Mario; Eyers-Taylor, Alex; Peyton Jones, Michael; Ong, C.-H. Luke (2019). "Fixing Incremental Computation: Derivatives of Fixpoints, and the Recursive Semantics of Datalog". In Caires, Luís (ed.). Programming Languages and Systems. Lecture Notes in Computer Science. Vol. 11423. Cham: Springer International Publishing. pp. 525–552. doi:10.1007/978-3-030-17184-1_19. ISBN   978-3-030-17184-1. S2CID   53430789.
  9. 1 2 Gilray, Thomas; Sahebolamri, Arash; Kumar, Sidharth; Micinski, Kristopher (2022-11-21). "Higher-Order, Data-Parallel Structured Deduction". arXiv: 2211.11573 [cs.PL].
  10. Subotić, Pavle; Jordan, Herbert; Chang, Lijun; Fekete, Alan; Scholz, Bernhard (2018-10-01). "Automatic index selection for large-scale datalog computation". Proceedings of the VLDB Endowment. 12 (2): 141–153. doi:10.14778/3282495.3282500. ISSN   2150-8097. S2CID   53569679.
  11. Antoniadis, Tony; Triantafyllou, Konstantinos; Smaragdakis, Yannis (2017-06-18). "Porting doop to Soufflé". Proceedings of the 6th ACM SIGPLAN International Workshop on State of the Art in Program Analysis. SOAP 2017. New York, NY, USA: Association for Computing Machinery. pp. 25–30. doi:10.1145/3088515.3088522. ISBN   978-1-4503-5072-3. S2CID   3074689. "The LogicBlox engine performs full query optimization."
  12. Arch, Samuel; Hu, Xiaowen; Zhao, David; Subotić, Pavle; Scholz, Bernhard (2022). "Building a Join Optimizer for Soufflé". In Villanueva, Alicia (ed.). Logic-Based Program Synthesis and Transformation. Lecture Notes in Computer Science. Vol. 13474. Cham: Springer International Publishing. pp. 83–102. doi:10.1007/978-3-031-16767-6_5. ISBN   978-3-031-16767-6.
  13. Nappa, Patrick; Zhao, David; Subotic, Pavle; Scholz, Bernhard (2019). "Fast Parallel Equivalence Relations in a Datalog Compiler". 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT). pp. 82–96. doi:10.1109/PACT.2019.00015. ISBN   978-1-7281-3613-4. S2CID   204827819 . Retrieved 2023-11-28.
  14. Jordan, Herbert; Subotić, Pavle; Zhao, David; Scholz, Bernhard (2019-02-17). "Brie: A Specialized Trie for Concurrent Datalog". Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores. New York, NY, USA: Association for Computing Machinery. pp. 31–40. doi:10.1145/3303084.3309490. ISBN   978-1-4503-6290-0. S2CID   239258588.
  15. Whaley, John; Avots, Dzintars; Carbin, Michael; Lam, Monica S. (2005). "Using Datalog with Binary Decision Diagrams for Program Analysis". In Yi, Kwangkeun (ed.). Programming Languages and Systems. Lecture Notes in Computer Science. Vol. 3780. Berlin, Heidelberg: Springer. pp. 97–118. doi:10.1007/11575467_8. ISBN   978-3-540-32247-4. S2CID   5223577.
  16. Hoder, Kryštof; Bjørner, Nikolaj; de Moura, Leonardo (2011). "μZ– an Efficient Engine for Fixed Points with Constraints". In Gopalakrishnan, Ganesh; Qadeer, Shaz (eds.). Computer Aided Verification. Lecture Notes in Computer Science. Vol. 6806. Berlin, Heidelberg: Springer. pp. 457–462. doi:10.1007/978-3-642-22110-1_36. ISBN   978-3-642-22110-1.
  17. Fan, Zhiwei; Zhu, Jianqiao; Zhang, Zuyu; Albarghouthi, Aws; Koutris, Paraschos; Patel, Jignesh (2018-12-10). "Scaling-Up In-Memory Datalog Processing: Observations and Techniques". arXiv: 1812.03975 [cs.DB].
  18. Shovon, Ahmedur Rahman; Dyken, Landon Richard; Green, Oded; Gilray, Thomas; Kumar, Sidharth (November 2022). "Accelerating Datalog applications with cuDF". 2022 IEEE/ACM Workshop on Irregular Applications: Architectures and Algorithms (IA3). IEEE. pp. 41–45. doi:10.1109/IA356718.2022.00012. ISBN   978-1-6654-7506-8. S2CID   256565728.
  19. Jordan, Herbert; Subotić, Pavle; Zhao, David; Scholz, Bernhard (2019-02-16). "A specialized B-tree for concurrent datalog evaluation". Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming. PPoPP '19. New York, NY, USA: Association for Computing Machinery. pp. 327–339. doi:10.1145/3293883.3295719. ISBN   978-1-4503-6225-2. S2CID   59617209.
  20. Wu, Jiacheng; Wang, Jin; Zaniolo, Carlo (2022-06-11). "Optimizing Parallel Recursive Datalog Evaluation on Multicore Machines". Proceedings of the 2022 International Conference on Management of Data. SIGMOD '22. New York, NY, USA: Association for Computing Machinery. pp. 1433–1446. doi:10.1145/3514221.3517853. ISBN   978-1-4503-9249-5. S2CID   249578825. "These approaches implement the idea of parallel bottom-up evaluation by splitting the tables into disjoint partitions via discriminating functions, such as hashing, where each partition is then mapped to one of the parallel workers. After each iteration, workers coordinate with each other to exchange newly generated tuples where necessary.
  21. Shaw, Marianne; Koutris, Paraschos; Howe, Bill; Suciu, Dan (2012). "Optimizing Large-Scale Semi-Naïve Datalog Evaluation in Hadoop". In Barceló, Pablo; Pichler, Reinhard (eds.). Datalog in Academia and Industry. Lecture Notes in Computer Science. Vol. 7494. Berlin, Heidelberg: Springer. pp. 165–176. doi:10.1007/978-3-642-32925-8_17. ISBN   978-3-642-32925-8.
  22. Shkapsky, Alexander; Yang, Mohan; Interlandi, Matteo; Chiu, Hsuan; Condie, Tyson; Zaniolo, Carlo (2016-06-14). "Big Data Analytics with Datalog Queries on Spark". Proceedings of the 2016 International Conference on Management of Data. SIGMOD '16. Vol. 2016. New York, NY, USA: Association for Computing Machinery. pp. 1135–1149. doi:10.1145/2882903.2915229. ISBN   978-1-4503-3531-7. PMC   5470845 . PMID   28626296.
  23. Balbin, I.; Port, G. S.; Ramamohanarao, K.; Meenakshi, K. (1991-10-01). "Efficient bottom-up computation of queries on stratified databases". The Journal of Logic Programming . 11 (3): 295–344. doi: 10.1016/0743-1066(91)90030-S . ISSN   0743-1066.
  24. Ullman, J. D. (1989-03-29). "Bottom-up beats top-down for datalog". Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems - PODS '89. New York, NY, USA: Association for Computing Machinery. pp. 140–149. doi:10.1145/73721.73736. ISBN   978-0-89791-308-9. S2CID   13269547.
  25. Dantsin, Evgeny; Eiter, Thomas; Gottlob, Georg; Voronkov, Andrei (2001-09-01). "Complexity and expressive power of logic programming". ACM Computing Surveys . 33 (3): 374–425. doi:10.1145/502807.502810. ISSN   0360-0300.
  26. Bembenek, Aaron; Greenberg, Michael; Chong, Stephen (2023-01-11). "From SMT to ASP: Solver-Based Approaches to Solving Datalog Synthesis-as-Rule-Selection Problems". Proceedings of the ACM on Programming Languages. 7 (POPL): 7:185–7:217. doi: 10.1145/3571200 . S2CID   253525805.
  27. Zaniolo, Carlo; Yang, Mohan; Das, Ariyam; Shkapsky, Alexander; Condie, Tyson; Interlandi, Matteo (September 2017). "Fixpoint semantics and optimization of recursive Datalog programs with aggregates*". Theory and Practice of Logic Programming. 17 (5–6): 1048–1065. arXiv: 1707.05681 . doi:10.1017/S1471068417000436. ISSN   1471-0684. S2CID   6272867.
  28. "Chapter 7. Rules - LogicBlox 3.10 Reference Manual". developer.logicblox.com. Retrieved 2023-03-04.
  29. "6.4. Negation - LogicBlox 3.10 Reference Manual". developer.logicblox.com. Retrieved 2023-03-04. "Additionally, negation is only allowed when the platform can determine a way to stratify all rules and constraints that use negation."
  30. Michael Lam; Dr. Sin Min Lee. "Datalog". Course CS 157A. SAN JOSÉ STATE UNIVERSITY, department of Computer Science. Archived from the original on 2017-03-25.
  31. Kolaitis, Phokion G.; Vardi, Moshe Y. (1990-04-02). "On the expressive power of datalog: Tools and a case study". Proceedings of the ninth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems. ACM. pp. 61–71. doi:10.1145/298514.298542. ISBN   978-0-89791-352-2.{{cite book}}: |journal= ignored (help)
  32. Hillebrand, Gerd G; Kanellakis, Paris C; Mairson, Harry G; Vardi, Moshe Y (1995-11-01). "Undecidable boundedness problems for datalog programs". The Journal of Logic Programming. 25 (2): 163–190. doi: 10.1016/0743-1066(95)00051-K . ISSN   0743-1066.
  33. Saenz-Perez (2011), "DES: A Deductive Database System", Electronic Notes in Theoretical Computer Science, 271, ES: 63–78, doi: 10.1016/j.entcs.2011.02.011 .
  34. Differential Dataflow, July 2022
  35. Kenny, Kevin B (12–14 November 2014). Binary decision diagrams, relational algebra, and Datalog: deductive reasoning for Tcl (PDF). Twenty-first Annual Tcl/Tk Conference. Portland, Oregon. Retrieved 29 December 2015.
  36. The XSB System, Version 3.7.x, Volume 1: Programmer's Manual (PDF).
  37. FoundationDB Datalog Tutorial, archived from the original on 2013-08-09.
  38. "Leapsight". Archived from the original on 2018-11-11.
  39. Semmle QL, 18 September 2019.
  40. "SecPAL". Microsoft Research. Archived from the original on 2007-02-23.
  41. Lifschitz, Vladimir. "Foundations of logic programming." Principles of knowledge representation 3 (1996): 69-127. "The expressive possibilities of [Datalog] are much too limited for meaningful applications to knowledge representation."
  42. Huang, Green, and Loo, "Datalog and Emerging applications", SIGMOD 2011 (PDF), UC Davis{{citation}}: CS1 maint: multiple names: authors list (link).
  43. Mei, Hongyuan; Qin, Guanghui; Xu, Minjie; Eisner, Jason (2020). "Neural Datalog Through Time: Informed Temporal Modeling via Logical Specification". Proceedings of ICML 2020. arXiv: 2006.16723 .
  44. Chin, Brian; Dincklage, Daniel von; Ercegovac, Vuk; Hawkins, Peter; Miller, Mark S.; Och, Franz; Olston, Christopher; Pereira, Fernando (2015). Ball, Thomas; Bodik, Rastislav; Krishnamurthi, Shriram; Lerner, Benjamin S.; Morrisett, Greg (eds.). Yedalog: Exploring Knowledge at Scale. 1st Summit on Advances in Programming Languages (SNAPL 2015). Leibniz International Proceedings in Informatics (LIPIcs). Vol. 32. Dagstuhl, Germany: Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik. pp. 63–78. doi: 10.4230/LIPIcs.SNAPL.2015.63 . ISBN   978-3-939897-80-4.
  45. Whaley, John; Avots, Dzintars; Carbin, Michael; Lam, Monica S. (2005). "Using Datalog with Binary Decision Diagrams for Program Analysis". In Yi, Kwangkeun (ed.). Programming Languages and Systems. Lecture Notes in Computer Science. Vol. 3780. Berlin, Heidelberg: Springer. pp. 97–118. doi:10.1007/11575467_8. ISBN   978-3-540-32247-4. S2CID   5223577.
  46. Scholz, Bernhard; Jordan, Herbert; Subotić, Pavle; Westmann, Till (2016-03-17). "On fast large-scale program analysis in Datalog". Proceedings of the 25th International Conference on Compiler Construction. CC 2016. New York, NY, USA: Association for Computing Machinery. pp. 196–206. doi:10.1145/2892208.2892226. ISBN   978-1-4503-4241-4. S2CID   7531543.
  47. Antoniadis, Tony; Triantafyllou, Konstantinos; Smaragdakis, Yannis (2017-06-18). "Porting doop to Soufflé". Proceedings of the 6th ACM SIGPLAN International Workshop on State of the Art in Program Analysis. SOAP 2017. New York, NY, USA: Association for Computing Machinery. pp. 25–30. doi:10.1145/3088515.3088522. ISBN   978-1-4503-5072-3. S2CID   3074689.
  48. Bembenek, Aaron; Greenberg, Michael; Chong, Stephen (2020-11-13). "Formulog: Datalog for SMT-based static analysis". Proceedings of the ACM on Programming Languages. 4 (OOPSLA): 141:1–141:31. doi: 10.1145/3428209 . S2CID   226961727.
  49. Madsen, Magnus; Yee, Ming-Ho; Lhoták, Ondřej (2016-06-02). "From Datalog to flix: a declarative language for fixed points on lattices". ACM SIGPLAN Notices . 51 (6): 194–208. doi:10.1145/2980983.2908096. ISSN   0362-1340.
  50. Gryz; Guo; Liu; Zuzarte (2004). "Query sampling in DB2 Universal Database" (PDF). Proceedings of the 2004 ACM SIGMOD international conference on Management of data - SIGMOD '04. p. 839. doi:10.1145/1007568.1007664. ISBN   978-1581138597. S2CID   7775190.
  51. Gallaire, Hervé; Minker, John 'Jack', eds. (1978), "Logic and Data Bases, Symposium on Logic and Data Bases, Centre d'études et de recherches de Toulouse, 1977", Advances in Data Base Theory , New York: Plenum Press, ISBN   978-0-306-40060-5 .
  52. Abiteboul, Serge; Hull, Richard; Vianu, Victor (1995), Foundations of databases, Addison-Wesley, p. 305, ISBN   9780201537710 .

Related Research Articles

Logic programming is a programming, database and knowledge representation paradigm based on formal logic. A logic program is a set of sentences in logical form, representing knowledge about some problem domain. Computation is performed by applying logical reasoning to that knowledge, to solve problems in the domain. Major logic programming language families include Prolog, Answer Set Programming (ASP) and Datalog. In all of these languages, rules are written in the form of clauses:

Prolog is a logic programming language that has its origins in artificial intelligence, automated theorem proving and computational linguistics.

In computer science, declarative programming is a programming paradigm—a style of building the structure and elements of computer programs—that expresses the logic of a computation without describing its control flow.

A deductive database is a database system that can make deductions based on rules and facts stored in its database. Datalog is the language typically used to specify facts, rules and queries in deductive databases. Deductive databases have grown out of the desire to combine logic programming with relational databases to construct systems that support a powerful formalism and are still fast and able to deal with very large datasets. Deductive databases are more expressive than relational databases but less expressive than logic programming systems such as Prolog. In recent years, deductive databases have found new application in data integration, information extraction, networking, program analysis, security, and cloud computing.

In computer science, the well-founded semantics is a three-valued semantics for logic programming, which gives a precise meaning to general logic programs.

XSB is the name of a dialect of the Prolog programming language and its implementation developed at Stony Brook University in collaboration with the Katholieke Universiteit Leuven, the New University of Lisbon, Uppsala University and software vendor XSB, Inc.

The Semantic Web Rule Language (SWRL) is a proposed language for the Semantic Web that can be used to express rules as well as logic, combining OWL DL or OWL Lite with a subset of the Rule Markup Language.

In database theory, a conjunctive query is a restricted form of first-order queries using the logical conjunction operator. Many first-order queries can be written as conjunctive queries. In particular, a large part of queries issued on relational databases can be expressed in this way. Conjunctive queries also have a number of desirable theoretical properties that larger classes of queries do not share.

<span class="mw-page-title-main">Georg Gottlob</span> Austrian computer scientist

Georg Gottlob FRS is an Austrian-Italian computer scientist who works in the areas of database theory, logic, and artificial intelligence and is Professor of Informatics at the University of Calabria. He was Professor at the University of Oxford.

In computer science, a rule-based system is a computer system in which domain-specific knowledge is represented in the form of rules and general-purpose reasoning is used to solve problems in the domain.

<span class="mw-page-title-main">Incremental computing</span> Software feature

Incremental computing, also known as incremental computation, is a software feature which, whenever a piece of data changes, attempts to save time by only recomputing those outputs which depend on the changed data. When incremental computing is successful, it can be significantly faster than computing new outputs naively. For example, a spreadsheet software package might use incremental computation in its recalculation features, to update only those cells containing formulas which depend on the changed cells.

The syntax and semantics of Prolog, a programming language, are the sets of rules that define how a Prolog program is written and how it is interpreted, respectively. The rules are laid out in ISO standard ISO/IEC 13211 although there are differences in the Prolog implementations.

Flix is a functional, imperative, and logic programming language developed at Aarhus University, with funding from the Independent Research Fund Denmark, and by a community of open source contributors. The Flix language supports algebraic data types, pattern matching, parametric polymorphism, currying, higher-order functions, extensible records, channel and process-based concurrency, and tail call elimination. Two notable features of Flix are its type and effect system and its support for first-class Datalog constraints.

Vadalog is a system for performing complex logic reasoning tasks over knowledge graphs. Its language is based on an extension of the rule-based language Datalog, Warded Datalog±.

Soufflé is an open source parallel logic programming language, influenced by Datalog. Soufflé includes both an interpreter and a compiler that targets parallel C++. Soufflé has been used to build static analyzers, disassemblers, and tools for binary reverse engineering. Soufflé is considered by academic researchers to be high-performance and "state of the art," and is often used in benchmarks in academic papers.

Disjunctive Datalog is an extension of the logic programming language Datalog that allows disjunctions in the heads of rules. This extension enables disjunctive Datalog to express several NP-hard problems that are not known to be expressable in plain Datalog. Disjunctive Datalog has been applied in the context of reasoning about ontologies in the semantic web. DLV is an implementation of disjunctive Datalog.

Logic programming is a programming paradigm that includes languages based on formal logic, including Datalog and Prolog. This article describes the syntax and semantics of the purely declarative subset of these languages. Confusingly, the name "logic programming" also refers to a specific programming language that roughly corresponds to the declarative subset of Prolog. Unfortunately, the term must be used in both senses in this article.

The LogicBlox system is a commercial, declarative, incremental logic programming language and deductive database inspired by Datalog. The LogiQL programming language extends Datalog with several features, including stratified negation, aggregation, and a module system. LogicBlox has been used to build pointer analyses for Java.

DatalogZ is an extension of Datalog with integer arithmetic and comparisons. The decision problem of whether or not a given ground atom (fact) is entailed by a DatalogZ program is RE-complete, which can be shown by a reduction to diophantine equations.

Probabilistic logic programming is a programming paradigm that combines logic programming with probabilities.

References