Graphoid

Last updated

A graphoid is a set of statements of the form, "X is irrelevant to Y given that we know Z" where X, Y and Z are sets of variables. The notion of "irrelevance" and "given that we know" may obtain different interpretations, including probabilistic, relational and correlational, depending on the application. These interpretations share common properties that can be captured by paths in graphs (hence the name "graphoid"). The theory of graphoids characterizes these properties in a finite set of axioms that are common to informational irrelevance and its graphical representations.

Contents

History

Judea Pearl and Azaria Paz [1] coined the term "graphoids" after discovering that a set of axioms that govern conditional independence in probability theory is shared by undirected graphs. Variables are represented as nodes in a graph in such a way that variable sets X and Y are independent conditioned on Z in the distribution whenever node set Z separates X from Y in the graph. Axioms for conditional independence in probability were derived earlier by A. Philip Dawid [2] and Wolfgang Spohn. [3] The correspondence between dependence and graphs was later extended to directed acyclic graphs (DAGs) [4] [5] [6] and to other models of dependency. [1] [7]

Definition

A dependency model M is a subset of triplets (X,Z,Y) for which the predicate I(X,Z,Y): X is independent of Y given Z, is true. A graphoid is defined as a dependency model that is closed under the following five axioms:

  1. Symmetry:
  2. Decomposition:
  3. Weak Union:
  4. Contraction:
  5. Intersection:

A semi-graphoid is a dependency model closed under 1–4. These five axioms together are known as the graphoid axioms. [8] Intuitively, the weak union and contraction properties mean that irrelevant information should not alter the relevance status of other propositions in the system; what was relevant remains relevant and what was irrelevant remains irrelevant. [8]

Types of graphoids

Probabilistic graphoids

Conditional independence, defined as

is a semi-graphoid which becomes a full graphoid when P is strictly positive. [1] [7]

Correlational graphoids

A dependency model is a correlational graphoid if in some probability function we have,

where is the partial correlation between x and y given set Z.

In other words, the linear estimation error of the variables in X using measurements on Z would not be reduced by adding measurements of the variables in Y, thus making Y irrelevant to the estimation of X. Correlational and probabilistic dependency models coincide for normal distributions. [1] [7]

Relational graphoids

A dependency model is a relational graphoid if it satisfies

In words, the range of values permitted for X is not restricted by the choice of Y, once Z is fixed. Independence statements belonging to this model are similar to embedded multi-valued dependencies (EMVDs) in databases. [1] [7]

Graph-induced graphoids

If there exists an undirected graph G such that,

then the graphoid is called graph-induced. In other words, there exists an undirected graph G such that every independence statement in M is reflected as a vertex separation in G and vice versa. A necessary and sufficient condition for a dependency model to be a graph-induced graphoid is that it satisfies the following axioms: symmetry, decomposition, intersection, strong union and transitivity.

Strong union states that

Transitivity states that

The axioms symmetry, decomposition, intersection, strong union and transitivity constitute a complete characterization of undirected graphs. [9]

DAG-induced graphoids

A graphoid is termed DAG-induced if there exists a directed acyclic graph D such that where stands for d-separation in D. d-separation (d-connotes "directional") extends the notion of vertex separation from undirected graphs to directed acyclic graphs. It permits the reading of conditional independencies from the structure of Bayesian networks. However, conditional independencies in a DAG cannot be completely characterized by a finite set of axioms. [10]

Inclusion and construction

Graph-induced and DAG-induced graphoids are both contained in probabilistic graphoids. [11] This means that for every graph G there exists a probability distribution P such that every conditional independence in P is represented in G, and vice versa. The same is true for DAGs. However, there are probabilistic distributions that are not graphoids and, moreover, there is no finite axiomatization for probabilistic conditional dependencies. [12]

Thomas Verma showed that every semi-graphoid has a recursive way of constructing a DAG in which every d-separation is valid. [13] The construction is similar to that used in Bayes networks and goes as follows:

  1. Arrange the variables in some arbitrary order 1, 2,...,i,...,N and, starting with i = 1,
  2. choose for each node i a set of nodes PAi such that i is independent on all its predecessors, 1, 2,...,i  1, conditioned on PAi.
  3. Draw arrows from PAi to i and continue.

The DAG created by this construction will represent all the conditional independencies that follow from those used in the construction. Furthermore, every d-separation shown in the DAG will be a valid conditional independence in the graphoid used in the construction.

Related Research Articles

First-order logic—also known as predicate logic, quantificational logic, and first-order predicate calculus—is a collection of formal systems used in mathematics, philosophy, linguistics, and computer science. First-order logic uses quantified variables over non-logical objects, and allows the use of sentences that contain variables, so that rather than propositions such as "Socrates is a man", one can have expressions in the form "there exists x such that x is Socrates and x is a man", where "there exists" is a quantifier, while x is a variable. This distinguishes it from propositional logic, which does not use quantifiers or relations; in this sense, propositional logic is the foundation of first-order logic.

In set theory, Zermelo–Fraenkel set theory, named after mathematicians Ernst Zermelo and Abraham Fraenkel, is an axiomatic system that was proposed in the early twentieth century in order to formulate a theory of sets free of paradoxes such as Russell's paradox. Today, Zermelo–Fraenkel set theory, with the historically controversial axiom of choice (AC) included, is the standard form of axiomatic set theory and as such is the most common foundation of mathematics. Zermelo–Fraenkel set theory with the axiom of choice included is abbreviated ZFC, where C stands for "choice", and ZF refers to the axioms of Zermelo–Fraenkel set theory with the axiom of choice excluded.

A Bayesian network is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). While it is one of several forms of causal notation, causal networks are special cases of Bayesian networks. Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.

<span class="mw-page-title-main">Directed acyclic graph</span> Directed graph with no directed cycles

In mathematics, particularly graph theory, and computer science, a directed acyclic graph (DAG) is a directed graph with no directed cycles. That is, it consists of vertices and edges, with each edge directed from one vertex to another, such that following those directions will never form a closed loop. A directed graph is a DAG if and only if it can be topologically ordered, by arranging the vertices as a linear ordering that is consistent with all edge directions. DAGs have numerous scientific and computational applications, ranging from biology to information science to computation (scheduling).

A graphical model or probabilistic graphical model (PGM) or structured probabilistic model is a probabilistic model for which a graph expresses the conditional dependence structure between random variables. They are commonly used in probability theory, statistics—particularly Bayesian statistics—and machine learning.

<span class="mw-page-title-main">Conditional independence</span> Probability theory concept

In probability theory, conditional independence describes situations wherein an observation is irrelevant or redundant when evaluating the certainty of a hypothesis. Conditional independence is usually formulated in terms of conditional probability, as a special case where the probability of the hypothesis given the uninformative observation is equal to the probability without. If is the hypothesis, and and are observations, conditional independence can be stated as an equality:

The Markov condition, sometimes called the Markov assumption, is an assumption made in Bayesian probability theory, that every node in a Bayesian network is conditionally independent of its nondescendants, given its parents. Stated loosely, it is assumed that a node has no bearing on nodes which do not descend from it. In a DAG, this local Markov condition is equivalent to the global Markov condition, which states that d-separations in the graph also correspond to conditional independence relations. This also means that a node is conditionally independent of the entire network, given its Markov blanket.

An influence diagram (ID) is a compact graphical and mathematical representation of a decision situation. It is a generalization of a Bayesian network, in which not only probabilistic inference problems but also decision making problems can be modeled and solved.

<span class="mw-page-title-main">Markov random field</span> Set of random variables

In the domain of physics and probability, a Markov random field (MRF), Markov network or undirected graphical model is a set of random variables having a Markov property described by an undirected graph. In other words, a random field is said to be a Markov random field if it satisfies Markov properties. The concept originates from the Sherrington–Kirkpatrick model.

In statistics, econometrics, epidemiology and related disciplines, the method of instrumental variables (IV) is used to estimate causal relationships when controlled experiments are not feasible or when a treatment is not successfully delivered to every unit in a randomized experiment. Intuitively, IVs are used when an explanatory variable of interest is correlated with the error term (endogenous), in which case ordinary least squares and ANOVA give biased results. A valid instrument induces changes in the explanatory variable but has no independent effect on the dependent variable and is not correlated with the error term, allowing a researcher to uncover the causal effect of the explanatory variable on the dependent variable.

In formal ontology, a branch of metaphysics, and in ontological computer science, mereotopology is a first-order theory, embodying mereological and topological concepts, of the relations among wholes, parts, parts of parts, and the boundaries between parts.

<span class="mw-page-title-main">Polytree</span>

In mathematics, and more specifically in graph theory, a polytree is a directed acyclic graph whose underlying undirected graph is a tree. In other words, if we replace its directed edges with undirected edges, we obtain an undirected graph that is both connected and acyclic.

In the foundations of mathematics, Morse–Kelley set theory (MK), Kelley–Morse set theory (KM), Morse–Tarski set theory (MT), Quine–Morse set theory (QM) or the system of Quine and Morse is a first-order axiomatic set theory that is closely related to von Neumann–Bernays–Gödel set theory (NBG). While von Neumann–Bernays–Gödel set theory restricts the bound variables in the schematic formula appearing in the axiom schema of Class Comprehension to range over sets alone, Morse–Kelley set theory allows these bound variables to range over proper classes as well as sets, as first suggested by Quine in 1940 for his system ML.

Conditional random fields (CRFs) are a class of statistical modeling methods often applied in pattern recognition and machine learning and used for structured prediction. Whereas a classifier predicts a label for a single sample without considering "neighbouring" samples, a CRF can take context into account. To do so, the predictions are modelled as a graphical model, which represents the presence of dependencies between the predictions. What kind of graph is used depends on the application. For example, in natural language processing, "linear chain" CRFs are popular, for which each prediction is dependent only on its immediate neighbours. In image processing, the graph typically connects locations to nearby and/or similar locations to enforce that they receive similar predictions.

<span class="mw-page-title-main">Moral graph</span>

In graph theory, a moral graph is used to find the equivalent undirected form of a directed acyclic graph. It is a key step of the junction tree algorithm, used in belief propagation on graphical models.

General set theory (GST) is George Boolos's (1998) name for a fragment of the axiomatic set theory Z. GST is sufficient for all mathematics not requiring infinite sets, and is the weakest known set theory whose theorems include the Peano axioms.

Discriminative models, also referred to as conditional models, are a class of logistical models used for classification or regression. They distinguish decision boundaries through observed data, such as pass/fail, win/lose, alive/dead or healthy/sick.

Variable elimination (VE) is a simple and general exact inference algorithm in probabilistic graphical models, such as Bayesian networks and Markov random fields. It can be used for inference of maximum a posteriori (MAP) state or estimation of conditional or marginal distributions over a subset of variables. The algorithm has exponential time complexity, but could be efficient in practice for low-treewidth graphs, if the proper elimination order is used.

In statistics, econometrics, epidemiology, genetics and related disciplines, causal graphs are probabilistic graphical models used to encode assumptions about the data-generating process.

Dependency networks (DNs) are graphical models, similar to Markov networks, wherein each vertex (node) corresponds to a random variable and each edge captures dependencies among variables. Unlike Bayesian networks, DNs may contain cycles. Each node is associated to a conditional probability table, which determines the realization of the random variable given its parents.

References

  1. 1 2 3 4 5 Pearl, Judea; Paz, Azaria (1985). "Graphoids: A Graph-Based Logic for Reasoning About Relevance Relations" (PDF).
  2. Dawid, A. Philip (1979). "Conditional independence in statistical theory". Journal of the Royal Statistical Society, Series B: 1–31.
  3. Spohn, Wolfgang (1980). "Stochastic independence, causal independence, and shieldability". Journal of Philosophical Logic. 9: 73–99. doi:10.1007/bf00258078.
  4. Pearl, Judea (1986). "Fusion, propagation and structuring in belief networks". Artificial Intelligence. 29 (3): 241–288. doi:10.1016/0004-3702(86)90072-x.
  5. Verma, Thomas; Pearl, Judea (1988). "Causal networks: Semantics and expressiveness". Proceedings of the 4th Workshop on Uncertainty in Artificial Intelligence: 352–359.
  6. Lauritzen, S.L. (1996). Graphical Models. Oxford: Clarendon Press.
  7. 1 2 3 4 Geiger, Dan (1990). "Graphoids: A Qualitative Framework for Probabilistic Inference" (PhD Dissertation, Technical Report R-142, Computer Science Department, University of California, Los Angeles).
  8. 1 2 Pearl, Judea (1988). Probabilistic reasoning in intelligent systems: networks of plausible inference . Morgan Kaufmann.
  9. A. Paz, J. Pearl, and S. Ur, "A New Characterization of Graphs Based on Interception Relations" Journal of Graph Theory, Vol. 22, No. 2, 125-136, 1996.
  10. Geiger, D. (1987). "The non-axiomatizability of dependencies in directed acyclic graphs" (PDF). UCLA Computer Science Tech Report R-83.
  11. Geiger, D.; Pearl, J. (1993). "Logical and algorithmic properties of conditional independence and graphical models". The Annals of Statistics. 21 (4): 2001–2021. CiteSeerX   10.1.1.295.2043 . doi:10.1214/aos/1176349407.
  12. Studeny, M. (1992). Kubik, S.; Visek, J.A. (eds.). "Conditional independence relations have no finite complete characterization". Information Theory, Statistical Decision Functions and Random Processes. Transactions of the 11th Prague Conference. Dordrecht: Kluwer. B: 377–396.
  13. Verma, T.; Pearl, J. (1990). Shachter, R.; Levitt, T.S.; Kanal, L.N. (eds.). "Causal Networks: Semantics and Expressiveness". Uncertainty in AI 4. Elsevier Science Publishers: 69–76.