Causal Markov condition

Last updated

The Markov condition, sometimes called the Markov assumption, is an assumption made in Bayesian probability theory, that every node in a Bayesian network is conditionally independent of its nondescendants, given its parents. Stated loosely, it is assumed that a node has no bearing on nodes which do not descend from it. In a DAG, this local Markov condition is equivalent to the global Markov condition, which states that d-separations in the graph also correspond to conditional independence relations. [1] [2] This also means that a node is conditionally independent of the entire network, given its Markov blanket.

Contents

The related Causal Markov (CM) condition states that, conditional on the set of all its direct causes, a node is independent of all variables which are not effects or direct causes of that node. [3] In the event that the structure of a Bayesian network accurately depicts causality, the two conditions are equivalent. However, a network may accurately embody the Markov condition without depicting causality, in which case it should not be assumed to embody the causal Markov condition.

Definition

Let G be an acyclic causal graph (a graph in which each node appears only once along any path) with vertex set V and let P be a probability distribution over the vertices in V generated by G. G and P satisfy the Causal Markov Condition if every node X in V is independent of given [4]

Motivation

Statisticians are enormously interested in the ways in which certain events and variables are connected. The precise notion of what constitutes a cause and effect is necessary to understand the connections between them. The central idea behind the philosophical study of causation is that causes raise the probabilities of their effects, all else being equal.

A deterministic interpretation of causation means that if A causes B, then A must always be followed by B. In this sense, smoking does not cause cancer because some smokers never develop cancer.

On the other hand, a probabilistic interpretation simply means that causes raise the probability of their effects. In this sense, changes in meteorological readings associated with a storm do cause that storm, since they raise its probability. (However, simply looking at a barometer does not change the probability of the storm, for a more detailed analysis, see: [5] ).


Implications

Dependence and Causation

It follows from the definition that if X and Y are in V and are probabilistically dependent, then either X causes Y, Y causes X, or X and Y are both effects of some common cause Z in V. [3] This definition was seminally introduced by Hans Reichenbach as the Common Cause Principle (CCP) [6]

Screening

It once again follows from the definition that the parents of X screen X from other "indirect causes" of X (parents of Parents(X)) and other effects of Parents(X) which are not also effects of X. [3]

Examples

In a simple view, releasing one's hand from a hammer causes the hammer to fall. However, doing so in outer space does not produce the same outcome, calling into question if releasing one's fingers from a hammer always causes it to fall.

A causal graph could be created to acknowledge that both the presence of gravity and the release of the hammer contribute to its falling. However, it would be very surprising if the surface underneath the hammer affected its falling. This essentially states the Causal Markov Condition, that given the existence of gravity the release of the hammer, it will fall regardless of what is beneath it.

See also

Notes

  1. Geiger, Dan; Pearl, Judea (1990). "On the Logic of Causal Models". Machine Intelligence and Pattern Recognition. 9: 3–14. doi:10.1016/b978-0-444-88650-7.50006-8.
  2. Lauritzen, S. L.; Dawid, A. P.; Larsen, B. N.; Leimer, H.-G. (August 1990). "Independence properties of directed markov fields". Networks. 20 (5): 491–505. doi:10.1002/net.3230200503.
  3. 1 2 3 Hausman, D.M.; Woodward, J. (December 1999). "Independence, Invariance, and the Causal Markov Condition" (PDF). British Journal for the Philosophy of Science. 50 (4): 521–583. doi:10.1093/bjps/50.4.521.
  4. Spirtes, Peter; Glymour, Clark; Scheines, Richard (1993). Causation, Prediction, and Search. Lecture Notes in Statistics. Vol. 81. New York, NY: Springer New York. doi:10.1007/978-1-4612-2748-9. ISBN   9781461276500.
  5. Pearl, Judea (2009). Causality. Cambridge: Cambridge University Press. doi:10.1017/cbo9780511803161. ISBN   9780511803161.
  6. Reichenbach, Hans (1956). The Direction of Time. Los Angeles: University of California Press. ISBN   9780486409269.

Related Research Articles

Causality is influence by which one event, process, state, or object contributes to the production of another event, process, state, or object where the cause is partly responsible for the effect, and the effect is partly dependent on the cause. In general, a process has many causes, which are also said to be causal factors for it, and all lie in its past. An effect can in turn be a cause of, or causal factor for, many other effects, which all lie in its future. Some writers have held that causality is metaphysically prior to notions of time and space.

<span class="mw-page-title-main">Simpson's paradox</span> Probability and statistics phenomenon

Simpson's paradox is a phenomenon in probability and statistics in which a trend appears in several groups of data but disappears or reverses when the groups are combined. This result is often encountered in social-science and medical-science statistics, and is particularly problematic when frequency data are unduly given causal interpretations. The paradox can be resolved when confounding variables and causal relations are appropriately addressed in the statistical modeling. Simpson's paradox has been used to illustrate the kind of misleading results that the misuse of statistics can generate.

A Bayesian network is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.

<span class="mw-page-title-main">Graphical model</span> Probabilistic model

A graphical model or probabilistic graphical model (PGM) or structured probabilistic model is a probabilistic model for which a graph expresses the conditional dependence structure between random variables. They are commonly used in probability theory, statistics—particularly Bayesian statistics—and machine learning.

In statistics, Gibbs sampling or a Gibbs sampler is a Markov chain Monte Carlo (MCMC) algorithm for obtaining a sequence of observations which are approximated from a specified multivariate probability distribution, when direct sampling is difficult. This sequence can be used to approximate the joint distribution ; to approximate the marginal distribution of one of the variables, or some subset of the variables ; or to compute an integral. Typically, some of the variables correspond to observations whose values are known, and hence do not need to be sampled.

Belief propagation, also known as sum–product message passing, is a message-passing algorithm for performing inference on graphical models, such as Bayesian networks and Markov random fields. It calculates the marginal distribution for each unobserved node, conditional on any observed nodes. Belief propagation is commonly used in artificial intelligence and information theory, and has demonstrated empirical success in numerous applications, including low-density parity-check codes, turbo codes, free energy approximation, and satisfiability.

<span class="mw-page-title-main">Markov blanket</span>

In statistics and machine learning, when one wants to infer a random variable with a set of variables, usually a subset is enough, and other variables are useless. Such a subset that contains all the useful information is called a Markov blanket. If a Markov blanket is minimal, meaning that it cannot drop any variable without losing information, it is called a Markov boundary. Identifying a Markov blanket or a Markov boundary helps to extract useful features. The terms of Markov blanket and Markov boundary were coined by Judea Pearl in 1988.

An influence diagram (ID) is a compact graphical and mathematical representation of a decision situation. It is a generalization of a Bayesian network, in which not only probabilistic inference problems but also decision making problems can be modeled and solved.

<span class="mw-page-title-main">Markov random field</span>

In the domain of physics and probability, a Markov random field (MRF), Markov network or undirected graphical model is a set of random variables having a Markov property described by an undirected graph. In other words, a random field is said to be a Markov random field if it satisfies Markov properties. The concept originates from the Sherrington–Kirkpatrick model.

<span class="mw-page-title-main">Granger causality</span> Statistical hypothesis test for forecasting

The Granger causality test is a statistical hypothesis test for determining whether one time series is useful in forecasting another, first proposed in 1969. Ordinarily, regressions reflect "mere" correlations, but Clive Granger argued that causality in economics could be tested for by measuring the ability to predict the future values of a time series using prior values of another time series. Since the question of "true causality" is deeply philosophical, and because of the post hoc ergo propter hoc fallacy of assuming that one thing preceding another can be used as a proof of causation, econometricians assert that the Granger test finds only "predictive causality". Using the term "causality" alone is a misnomer, as Granger-causality is better described as "precedence", or, as Granger himself later claimed in 1977, "temporally related". Rather than testing whether Xcauses Y, the Granger causality tests whether X forecastsY.

<span class="mw-page-title-main">Conditional random field</span> Class of statistical modeling methods

Conditional random fields (CRFs) are a class of statistical modeling methods often applied in pattern recognition and machine learning and used for structured prediction. Whereas a classifier predicts a label for a single sample without considering "neighbouring" samples, a CRF can take context into account. To do so, the predictions are modelled as a graphical model, which represents the presence of dependencies between the predictions. What kind of graph is used depends on the application. For example, in natural language processing, "linear chain" CRFs are popular, for which each prediction is dependent only on its immediate neighbours. In image processing, the graph typically connects locations to nearby and/or similar locations to enforce that they receive similar predictions.

<span class="mw-page-title-main">Causal model</span> Conceptual model in philosophy of science

In the philosophy of science, a causal model is a conceptual model that describes the causal mechanisms of a system. Causal models can improve study designs by providing clear rules for deciding which independent variables need to be included/controlled for.

Probabilistic causation is a concept in a group of philosophical theories that aim to characterize the relationship between cause and effect using the tools of probability theory. The central idea behind these theories is that causes raise the probabilities of their effects, all else being equal.

Clark N. Glymour is the Alumni University Professor Emeritus in the Department of Philosophy at Carnegie Mellon University. He is also a senior research scientist at the Florida Institute for Human and Machine Cognition.

<span class="mw-page-title-main">Collider (statistics)</span>

In statistics and causal graphs, a variable is a collider when it is causally influenced by two or more variables. The name "collider" reflects the fact that in graphical models, the arrow heads from variables that lead into the collider appear to "collide" on the node that is the collider. They are sometimes also referred to as inverted forks.

A graphoid is a set of statements of the form, "X is irrelevant to Y given that we know Z" where X, Y and Z are sets of variables. The notion of "irrelevance" and "given that we know" may obtain different interpretations, including probabilistic, relational and correlational, depending on the application. These interpretations share common properties that can be captured by paths in graphs. The theory of graphoids characterizes these properties in a finite set of axioms that are common to informational irrelevance and its graphical representations.

In statistics, econometrics, epidemiology, genetics and related disciplines, causal graphs are probabilistic graphical models used to encode assumptions about the data-generating process.

<span class="mw-page-title-main">Forensic epidemiology</span>

The discipline of forensic epidemiology (FE) is a hybrid of principles and practices common to both forensic medicine and epidemiology. FE is directed at filling the gap between clinical judgment and epidemiologic data for determinations of causality in civil lawsuits and criminal prosecution and defense.

<span class="mw-page-title-main">Richard Neapolitan</span>

Richard Eugene Neapolitan was an American scientist. Neapolitan is most well-known for his role in establishing the use of probability theory in artificial intelligence and in the development of the field Bayesian networks.

Dependency networks (DNs) are graphical models, similar to Markov networks, wherein each vertex (node) corresponds to a random variable and each edge captures dependencies among variables. Unlike Bayesian networks, DNs may contain cycles. Each node is associated to a conditional probability table, which determines the realization of the random variable given its parents.