ProbLog

Last updated
ProbLog
Original author(s) DTAI research lab (KU Leuven)
Initial releaseNovember 11, 2007 (2007-11-11)
Stable release
2.1
Written in Python
Operating system Linux, Mac OS X, Microsoft Windows
Type Probabilistic logic
License Apache License, Version 2.0
Website dtai.cs.kuleuven.be/problog/

ProbLog is a probabilistic logic programming language that extends Prolog with probabilities. [1] [2] [3] It minimally extends Prolog by adding the notion of a probabilistic fact, which combines the idea of logical atoms and random variables. Similarly to Prolog, ProbLog can query an atom. While Prolog returns the truth value of the queried atom, ProbLog returns the probability of it being true.

Contents

Semantics

A probabilistic fact is a pair with an atom and the probability of being true. A rule is defined by an atom , called the head, and a finite set of literals , called the body.

ProbLog programs consist of a set of probabilistic facts and a set of rules . Using the distribution semantics, a probability distribution is defined over the two-valued well-founded models of the atoms in the program. The probability of a model is defined as

where the product runs over all the literals in the model . For a query atom the distribution semantics defines a probability for the query

in which the sum runs over all the models where is true.

ProbLog supports multiple tasks:

Example

ProbLog can for example be used to calculate the probability of getting wet given the probabilities for rain and the probabilities that someone brings an umbrella as follows:

0.4::rain(weekday).0.9::rain(weekend).0.8::umbrella_if_rainy(Day).0.2::umbrella_if_dry(Day).umbrella(Day):-rain(Day),umbrella_if_rainy(Day).umbrella(Day):-\+rain(Day),umbrella_if_dry(Day).wet(Day):-rain(Day),\+umbrella(Day).query(\+wet(weekend)).

The last rule before the query states that someone gets wet if it rains and no umbrella was brought. When ProbLog is asked to solve the "probabilistic inference" task, the query asks for the probability to stay dry on a weekend day. When solving the "most probable explanation" task, ProbLog will return the most likely reason for staying dry, i.e. because it is not raining or because the person has an umbrella.

Implementations

The ProbLog language has been implemented as a YAP Prolog library (ProbLog 1). [4] and as a stand-alone Python framework (ProbLog 2) [5] The source code of ProbLog 2 is licensed under Apache License, Version 2.0 and available on GitHub. [6] The ProbLog language has also been implemented as part of the cplint probabilistic logic programming package for SWI-Prolog, YAP and XSB. [7]

ProbLog variants

ProbLog has been extended or used as inspiration for several different variants, including:

Further reading

Related Research Articles

Prolog is a logic programming language that has its origins in artificial intelligence and computational linguistics.

Inductive logic programming (ILP) is a subfield of symbolic artificial intelligence which uses logic programming as a uniform representation for examples, background knowledge and hypotheses. The term "inductive" here refers to philosophical rather than mathematical induction. Given an encoding of the known background knowledge and a set of examples represented as a logical database of facts, an ILP system will derive a hypothesised logic program which entails all the positive and none of the negative examples.

In theoretical computer science, a probabilistic Turing machine is a non-deterministic Turing machine that chooses between the available transitions at each point according to some probability distribution. As a consequence, a probabilistic Turing machine can—unlike a deterministic Turing Machine—have stochastic results; that is, on a given input and instruction state machine, it may have different run times, or it may not halt at all; furthermore, it may accept an input in one execution and reject the same input in another execution.

A Bayesian network is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). While it is one of several forms of causal notation, causal networks are special cases of Bayesian networks. Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.

Grammar theory to model symbol strings originated from work in computational linguistics aiming to understand the structure of natural languages. Probabilistic context free grammars (PCFGs) have been applied in probabilistic modeling of RNA structures almost 40 years after they were introduced in computational linguistics.

Inferences are steps in reasoning, moving from premises to logical consequences; etymologically, the word infer means to "carry forward". Inference is theoretically traditionally divided into deduction and induction, a distinction that in Europe dates at least to Aristotle. Deduction is inference deriving logical conclusions from premises known or assumed to be true, with the laws of valid inference being studied in logic. Induction is inference from particular evidence to a universal conclusion. A third type of inference is sometimes distinguished, notably by Charles Sanders Peirce, contradistinguishing abduction from induction.

In the field of information retrieval, divergence from randomness, one of the first models, is one type of probabilistic model. It is basically used to test the amount of information carried in the documents. It is based on Harter's 2-Poisson indexing-model. The 2-Poisson model has a hypothesis that the level of the documents is related to a set of documents which contains words occur relatively greater than the rest of the documents. It is not a 'model', but a framework for weighting terms using probabilistic methods, and it has a special relationship for term weighting based on notion of eliteness.

Answer set programming (ASP) is a form of declarative programming oriented towards difficult search problems. It is based on the stable model semantics of logic programming. In ASP, search problems are reduced to computing stable models, and answer set solvers—programs for generating stable models—are used to perform search. The computational process employed in the design of many answer set solvers is an enhancement of the DPLL algorithm and, in principle, it always terminates.

<span class="mw-page-title-main">Kaplan–Meier estimator</span> Non-parametric statistic used to estimate the survival function

The Kaplan–Meier estimator, also known as the product limit estimator, is a non-parametric statistic used to estimate the survival function from lifetime data. In medical research, it is often used to measure the fraction of patients living for a certain amount of time after treatment. In other fields, Kaplan–Meier estimators may be used to measure the length of time people remain unemployed after a job loss, the time-to-failure of machine parts, or how long fleshy fruits remain on plants before they are removed by frugivores. The estimator is named after Edward L. Kaplan and Paul Meier, who each submitted similar manuscripts to the Journal of the American Statistical Association. The journal editor, John Tukey, convinced them to combine their work into one paper, which has been cited more than 61,800 times since its publication in 1958.

A Markov logic network (MLN) is a probabilistic logic which applies the ideas of a Markov network to first-order logic, defining probability distributions on possible worlds on any given domain.

<span class="mw-page-title-main">Scoring rule</span> Measure for evaluating probabilistic forecasts

In decision theory, a scoring rule provides a summary measure for the evaluation of probabilistic predictions or forecasts. It is applicable to tasks in which predictions assign probabilities to events, i.e. one issues a probability distribution as prediction. This includes probabilistic classification of a set of mutually exclusive outcomes or classes.

The concept of a stable model, or answer set, is used to define a declarative semantics for logic programs with negation as failure. This is one of several standard approaches to the meaning of negation in logic programming, along with program completion and the well-founded semantics. The stable model semantics is the basis of answer set programming.

The Binary Independence Model (BIM) in computing and information science is a probabilistic information retrieval technique. The model makes some simple assumptions to make the estimation of document/query similarity probable and feasible.

The query likelihood model is a language model used in information retrieval. A language model is constructed for each document in the collection. It is then possible to rank each document by the probability of specific documents given a query. This is interpreted as being the likelihood of a document being relevant given a query.

Probabilistic programming (PP) is a programming paradigm in which probabilistic models are specified and inference for these models is performed automatically. It represents an attempt to unify probabilistic modeling and traditional general purpose programming in order to make the former easier and more widely applicable. It can be used to create systems that help make decisions in the face of uncertainty.

<span class="mw-page-title-main">Bayesian programming</span> Statistics concept

Bayesian programming is a formalism and a methodology for having a technique to specify probabilistic models and solve problems when less than the necessary information is available.

<span class="mw-page-title-main">ProbOnto</span> Knowledge base and ontology of probability distributions

ProbOnto is a knowledge base and ontology of probability distributions. ProbOnto 2.5 contains over 150 uni- and multivariate distributions and alternative parameterizations, more than 220 relationships and re-parameterization formulas, supporting also the encoding of empirical and univariate mixture distributions.

Logic programming is a programming paradigm that includes languages based on formal logic, including Datalog and Prolog. This article describes the syntax and semantics of the purely declarative subset of these languages. Confusingly, the name "logic programming" also refers to a specific programming language that roughly corresponds to the declarative subset of Prolog. Unfortunately, the term must be used in both senses in this article.

Probabilistic logic programming is a programming paradigm that combines logic programming with probabilities.


In artificial intelligence, a sentential decision diagram (SDD) is a type of knowledge representation used in knowledge compilation to represent Boolean functions. SDDs can be viewed as a generalization of the influential ordered binary decision diagram (OBDD) representation, by allowing decisions on multiple variables at once. Like OBDDs, SDDs allow for tractable Boolean operations, while being exponentially more succinct. For this reason, they have become an important representation in knowledge compilation.

References

  1. De Raedt, Luc; Kimmig, Angelika; Toivonen, Hannu (November 2007). ProbLog: A Probabilistic Prolog and Its Application in Link Discovery. IJCAI. Vol. 7.
  2. Fierens, D; Van den Broeck, G.; Bruynooghe, M.; De Raedt, L. (2012). Constraints for probabilistic logic programming. Proceedings of the NIPS Probabilistic Programming Workshop. pp. 1–4.
  3. De Raedt, Luc; Kimmig, Angelika (2015). "Probabilistic (logic) programming concepts". Machine Learning. 100 (1): 5–47. doi: 10.1007/s10994-015-5494-z . S2CID   3166992.
  4. "ProbLog1". dtai.cs.kuleuven.be.
  5. 1 2 "ProbLog: Probabilistic Programming". dtai.cs.kuleuven.be.
  6. 1 2 "ProbLog GitHub repository". github.com. 12 October 2022.
  7. "cplint – AI@UNIFE" . Retrieved 2023-11-13.
  8. Manhaeve, Robin; Dumancic, Sebastijan; Kimmig, Angelika; Demeester, Thomas; De Raedt, Luc (2018). DeepProbLog: Neural Probabilistic Logic Programming. NeurIPS 2018, Thirty-second Conference on Neural Information Processing Systems. pp. 3753–3760.
  9. Van den Broeck, Guy; Thon, Ingo; Van Otterlo, Martijn; De Raedt, Luc (2010). "DTProbLog: A decision-theoretic probabilistic Prolog". Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 24.
  10. Kimmig, A.; Van den Broeck, G.; De Raedt, L. (2011). An algebraic Prolog for reasoning about possible worlds. Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence. pp. 209–214.
  11. "PRISM: PRogramming In Statistical Modeling". rjida.meijo-u.ac.jp.
  12. Poole, David (2008). "The independent choice logic and beyond". In Luc Raedt; Paolo Frasconi; Kristian Kersting; Stephen Muggleton (eds.). Probabilistic Inductive Logic Programming. Lecture Notes in Computer Science. Vol. 4911. Springer. pp. 222–243. doi:10.1007/978-3-540-78652-8_8. ISBN   978-3-540-78651-1.
  13. Vennekens, Joost; Denecker, Marc; Bruynooghe, Maurice (2009). CP-logic: A language of causal probabilistic events and its relation to logic programming. Theory and practice of logic programming. Vol. 9. pp. 245–308. arXiv: 0904.1672 .
  14. "PITA: Probabilistic Inference with Tabling and Answer subsumption". ml.unife.it.
  15. "Distributional Clauses". dtai.cs.kuleuven.be.
  16. "ProbLog: ProbLog 2.1 documentation". problog.readthedocs.io.