Probabilistic causation

Last updated

Probabilistic causation is a concept in a group of philosophical theories that aim to characterize the relationship between cause and effect using the tools of probability theory. The central idea behind these theories is that causes raise the probabilities of their effects, all else being equal.

Contents

Deterministic versus probabilistic theory

Interpreting causation as a deterministic relation means that if A causes B, then A must always be followed by B. In this sense, war does not cause deaths, nor does smoking cause cancer. As a result, many turn to a notion of probabilistic causation. Informally, A probabilistically causes B if A's occurrence increases the probability of B. This is sometimes interpreted to reflect imperfect knowledge of a deterministic system but other times interpreted to mean that the causal system under study has an inherently indeterministic nature. (Propensity probability is an analogous idea, according to which probabilities have an objective existence and are not just limitations in a subject's knowledge).

Philosophers such as Hugh Mellor [1] and Patrick Suppes [2] have defined causation in terms of a cause preceding and increasing the probability of the effect. (Additionally, Mellor claims that cause and effect are both facts - not events - since even a non-event, such as the failure of a train to arrive, can cause effects such as my taking the bus. Suppes, by contrast, relies on events defined set-theoretically, and much of his discussion is informed by this terminology.) [3]

Pearl [4] argues that the entire enterprise of probabilistic causation has been misguided from the very beginning, because the central notion that causes "raise the probabilities" of their effects cannot be expressed in the language of probability theory. In particular, the inequality Pr(effect | cause) > Pr(effect | ~cause) which philosophers invoked to define causation, as well as its many variations and nuances, fails to capture the intuition behind "probability raising", which is inherently a manipulative or counterfactual notion. The correct formulation, according to Pearl, should read:

Pr(effect | do(cause)) > Pr(effect | do(~cause))

where do(C) stands for an external intervention that compels the truth of C. The conditional probability Pr(E | C), in contrast, represents a probability resulting from a passive observation of C, and rarely coincides with Pr(E | do(C)). Indeed, observing the barometer falling increases the probability of a storm coming, but does not "cause" the storm; were the act of manipulating the barometer to change the probability of storms, the falling barometer would qualify as a cause of storms. In general, formulating the notion of "probability raising" within the calculus of do-operators [4] resolves the difficulties that probabilistic causation has encountered in the past half-century, [2] [5] [6] among them the infamous Simpson's paradox, and clarifies precisely what relationships exist between probabilities and causation.

The establishing of cause and effect, even with this relaxed reading, is notoriously difficult, expressed by the widely accepted statement "Correlation does not imply causation". For instance, the observation that smokers have a dramatically increased lung cancer rate does not establish that smoking must be a cause of that increased cancer rate: maybe there exists a certain genetic defect which both causes cancer and a yearning for nicotine; or even perhaps nicotine craving is a symptom of very early-stage lung cancer which is not otherwise detectable. Scientists are always seeking the exact mechanisms by which Event A produces Event B. But scientists also are comfortable making a statement like, "Smoking probably causes cancer," when the statistical correlation between the two, according to probability theory, is far greater than chance. In this dual approach, scientists accept both deterministic and probabilistic causation in their terminology.

In statistics, it is generally accepted that observational studies (like counting cancer cases among smokers and among non-smokers and then comparing the two) can give hints, but can never establish cause and effect. Often, however, qualitative causal assumptions (e.g., absence of causation between some variables) may permit the derivation of consistent causal effect estimates from observational studies. [4]

The gold standard for causation here is the randomized experiment: take a large number of people, randomly divide them into two groups, force one group to smoke and prohibit the other group from smoking, then determine whether one group develops a significantly higher lung cancer rate. Random assignment plays a crucial role in the inference to causation because, in the long run, it renders the two groups equivalent in terms of all other possible effects on the outcome (cancer) so that any changes in the outcome will reflect only the manipulation (smoking). Obviously, for ethical reasons this experiment cannot be performed, but the method is widely applicable for less damaging experiments. One limitation of experiments, however, is that whereas they do a good job of testing for the presence of some causal effect they do less well at estimating the size of that effect in a population of interest. (This is a common criticism of studies of safety of food additives that use doses much higher than people consuming the product would actually ingest.)

Closed versus open systems

In a closed system the data may suggest that cause A * B precedes effect C in a defined interval of time τ. This relationship can determine causality with confidence bounded by τ. However, this same relationship may not be deterministic with confidence in an open system where uncontrolled factors may affect the result. [7]

An example would be a system of A, B and C, where A, B and C are known. Characteristics are below and limited to a given time (such as 50 ms, or 50 hours):

^A * ^ B => ^ C (99.9999998027%)

A * ^B => ^C (99.9999998027%)

^A * B => ^C (99.9999998027%)

A * B => C (99.9999998027%)

One can reasonably claim, within 6 Standard Deviations, that A * B cause C given the time boundary (such as 50 ms, or 50 hours) IF And Only IF A, B and C are the only parts of the system in question. Any result outside of this may be considered a deviation.

Notes

  1. Mellor, D.H. (1995) The Facts of Causation, Routledge, ISBN   0-415-19756-2
  2. 1 2 Suppes, P. (1970) A Probabilistic Theory of Causality, Amsterdam: North-Holland Publishing
  3. Stanford Encyclopedia of Philosophy: Interpretations of Probability
  4. 1 2 3 Pearl, Judea (2000). Causality: Models, Reasoning, and Inference, Cambridge University Press.
  5. Cartwright, N. (1989). Nature's Capacities and Their Measurement, Clarendon Press, Oxnard.
  6. Eells, E. (1991). Probabilistic Causality Cambridge University Press, Cambridge, MA.
  7. Markov Condition: Interpretations of Philosophy

Related Research Articles

Causality is influence by which one event, process, state, or object contributes to the production of another event, process, state, or object where the cause is partly responsible for the effect, and the effect is partly dependent on the cause. In general, a process has many causes, which are also said to be causal factors for it, and all lie in its past. An effect can in turn be a cause of, or causal factor for, many other effects, which all lie in its future. Some writers have held that causality is metaphysically prior to notions of time and space.

The phrase "correlation does not imply causation" refers to the inability to legitimately deduce a cause-and-effect relationship between two events or variables solely on the basis of an observed association or correlation between them. The idea that "correlation implies causation" is an example of a questionable-cause logical fallacy, in which two events occurring together are taken to have established a cause-and-effect relationship. This fallacy is also known by the Latin phrase cum hoc ergo propter hoc. This differs from the fallacy known as post hoc ergo propter hoc, in which an event following another is seen as a necessary consequence of the former event, and from conflation, the errant merging of two events, ideas, databases, etc., into one.

<span class="mw-page-title-main">Determinism</span> Philosophical view that events are pre-determined

Determinism is a philosophical view where all events are determined completely by previously existing causes. Deterministic theories throughout the history of philosophy have developed from diverse and sometimes overlapping motives and considerations. The opposite of determinism is some kind of indeterminism or randomness. Determinism is often contrasted with free will, although some philosophers claim that the two are compatible.

Causality is the relationship between causes and effects. While causality is also a topic studied from the perspectives of philosophy and physics, it is operationalized so that causes of an event must be in the past light cone of the event and ultimately reducible to fundamental interactions. Similarly, a cause cannot have an effect outside its future light cone.

Indeterminism is the idea that events are not caused, or do not cause deterministically.

Libertarianism (metaphysics) Term in metaphysics

Libertarianism is one of the main philosophical positions related to the problems of free will and determinism which are part of the larger domain of metaphysics. In particular, libertarianism is an incompatibilist position which argues that free will is logically incompatible with a deterministic universe. Libertarianism states that since agents have free will, determinism must be false.

The Markov condition, sometimes called the Markov assumption, is an assumption made in Bayesian probability theory, that every node in a Bayesian network is conditionally independent of its nondescendants, given its parents. Stated loosely, it is assumed that a node has no bearing on nodes which do not descend from it. In a DAG, this local Markov condition is equivalent to the global Markov condition, which states that d-separations in the graph also correspond to conditional independence relations. This also means that a node is conditionally independent of the entire network, given its Markov blanket.

Granger causality Statistical hypothesis test for forecasting

The Granger causality test is a statistical hypothesis test for determining whether one time series is useful in forecasting another, first proposed in 1969. Ordinarily, regressions reflect "mere" correlations, but Clive Granger argued that causality in economics could be tested for by measuring the ability to predict the future values of a time series using prior values of another time series. Since the question of "true causality" is deeply philosophical, and because of the post hoc ergo propter hoc fallacy of assuming that one thing preceding another can be used as a proof of causation, econometricians assert that the Granger test finds only "predictive causality". Using the term "causality" alone is a misnomer, as Granger-causality is better described as "precedence", or, as Granger himself later claimed in 1977, "temporally related". Rather than testing whether Xcauses Y, the Granger causality tests whether X forecastsY.

The deductive-nomological model of scientific explanation, also known as Hempel's model, the Hempel–Oppenheim model, the Popper–Hempel model, or the covering law model, is a formal view of scientifically answering questions asking, "Why...?". The DN model poses scientific explanation as a deductive structure, one where truth of its premises entails truth of its conclusion, hinged on accurate prediction or postdiction of the phenomenon to be explained.

Wesley Charles Salmon was an American philosopher of science renowned for his work on the nature of scientific explanation. He also worked on confirmation theory, trying to explicate how probability theory via inductive logic might help confirm and choose hypotheses. Yet most prominently, Salmon was a realist about causality in scientific explanation, although his realist explanation of causality drew ample criticism. Still, his books on scientific explanation itself were landmarks of the 20th century's philosophy of science, and solidified recognition of causality's important roles in scientific explanation, whereas causality itself has evaded satisfactory elucidation by anyone.

Confounding Variable in statistics

In statistics, a confounder is a variable that influences both the dependent variable and independent variable, causing a spurious association. Confounding is a causal concept, and as such, cannot be described in terms of correlations or associations. The existence of confounders is an important quantitative explanation why correlation does not imply causation.

Causal model Conceptual model in philosophy of science

In the philosophy of science, a causal model is a conceptual model that describes the causal mechanisms of a system. Causal models can improve study designs by providing clear rules for deciding which independent variables need to be included/controlled for.

Causal reasoning is the process of identifying causality: the relationship between a cause and its effect. The study of causality extends from ancient philosophy to contemporary neuropsychology; assumptions about the nature of causality may be shown to be functions of a previous event preceding a later one. The first known protoscientific study of cause and effect occurred in Aristotle's Physics. Causal inference is an example of causal reasoning.

Causation (sociology) Belief that events occur in predictable ways and that one event leads to another

Causation refers to the existence of "cause and effect" relationships between multiple variables. Causation presumes that variables, which act in a predictable manner, can produce change in related variables and that this relationship can be deduced through direct and repeated observation. Theories of causation underpin social research as it aims to deduce causal relationships between structural phenomena and individuals and explain these relationships through the application and development of theory. Due to divergence amongst theoretical and methodological approaches, different theories, namely functionalism, all maintain varying conceptions on the nature of causality and causal relationships. Similarly, a multiplicity of causes have led to the distinction between necessary and sufficient causes.

Causal analysis is the field of experimental design and statistics pertaining to establishing cause and effect. Typically it involves establishing four elements: correlation, sequence in time, a plausible physical or information-theoretical mechanism for an observed effect to follow from a possible cause, and eliminating the possibility of common and alternative ("special") causes. Such analysis usually involves one or more artificial or natural experiments.

The Bradford Hill criteria, otherwise known as Hill's criteria for causation, are a group of nine principles that can be useful in establishing epidemiologic evidence of a causal relationship between a presumed cause and an observed effect and have been widely used in public health research. They were established in 1965 by the English epidemiologist Sir Austin Bradford Hill.

Causal inference is the process of determining the independent, actual effect of a particular phenomenon that is a component of a larger system. The main difference between causal inference and inference of association is that causal inference analyzes the response of an effect variable when a cause of the effect variable is changed. The science of why things occur is called etiology. Causal inference is said to provide the evidence of causality theorized by causal reasoning.

Forensic epidemiology

The discipline of forensic epidemiology (FE) is a hybrid of principles and practices common to both forensic medicine and epidemiology. FE is directed at filling the gap between clinical judgment and epidemiologic data for determinations of causality in civil lawsuits and criminal prosecution and defense.

Causal analysis is the field of experimental design and statistics pertaining to establishing cause and effect. Exploratory causal analysis (ECA), also known as data causality or causal discovery is the use of statistical algorithms to infer associations in observed data sets that are potentially causal under strict assumptions. ECA is a type of causal inference distinct from causal modeling and treatment effects in randomized controlled trials. It is exploratory research usually preceding more formal causal research in the same way exploratory data analysis often precedes statistical hypothesis testing in data analysis

<i>The Book of Why</i> 2018 book by Judea Pearl and Dana Mackenzie

The Book of Why: The New Science of Cause and Effect is a 2018 nonfiction book by computer scientist Judea Pearl and writer Dana Mackenzie. The book explores the subject of causality and causal inference from statistical and philosophical points of view for a general audience.

References