Causal analysis

Last updated

Causal analysis is the field of experimental design and statistics pertaining to establishing cause and effect. [1] Typically it involves establishing four elements: correlation, sequence in time (that is, causes must occur before their proposed effect), a plausible physical or information-theoretical mechanism for an observed effect to follow from a possible cause, and eliminating the possibility of common and alternative ("special") causes. Such analysis usually involves one or more artificial or natural experiments. [2]

Contents

Motivation

Data analysis is primarily concerned with causal questions. [3] [4] [5] [6] [7] For example, did the fertilizer cause the crops to grow? [8] Or, can a given sickness be prevented? [9] Or, why is my friend depressed? [10] The potential outcomes and regression analysis techniques handle such queries when data is collected using designed experiments. Data collected in observational studies require different techniques for causal inference (because, for example, of issues such as confounding). [11] Causal inference techniques used with experimental data require additional assumptions to produce reasonable inferences with observation data. [12] The difficulty of causal inference under such circumstances is often summed up as "correlation does not imply causation".

In philosophy and physics

The nature of causality is systematically investigated in several academic disciplines, including philosophy and physics.

In academia, there are a significant number of theories on causality; The Oxford Handbook of Causation( Beebee, Hitchcock & Menzies 2009 ) encompasses 770 pages. Among the more influential theories within philosophy are Aristotle's Four causes and Al-Ghazali's occasionalism. [13] David Hume argued that beliefs about causality are based on experience, and experience similarly based on the assumption that the future models the past, which in turn can only be based on experience – leading to circular logic. In conclusion, he asserted that causality is not based on actual reasoning: only correlation can actually be perceived. [14] Immanuel Kant, according to Beebee, Hitchcock & Menzies (2009), held that "a causal principle according to which every event has a cause, or follows according to a causal law, cannot be established through induction as a purely empirical claim, since it would then lack strict universality, or necessity".

Outside the field of philosophy, theories of causation can be identified in classical mechanics, statistical mechanics, quantum mechanics, spacetime theories, biology, social sciences, and law. [13] To establish a correlation as causal within physics, it is normally understood that the cause and the effect must connect through a local mechanism (cf. for instance the concept of impact) or a nonlocal mechanism (cf. the concept of field), in accordance with known laws of nature.

From the point of view of thermodynamics, universal properties of causes as compared to effects have been identified through the Second law of thermodynamics, confirming the ancient, medieval and Cartesian [15] view that "the cause is greater than the effect" for the particular case of thermodynamic free energy. This, in turn, is challenged[ dubious ] by popular interpretations of the concepts of nonlinear systems and the butterfly effect, in which small events cause large effects due to, respectively, unpredictability and an unlikely triggering of large amounts of potential energy.

Causality construed from counterfactual states

Intuitively, causation seems to require not just a correlation, but a counterfactual dependence. Suppose that a student performed poorly on a test and guesses that the cause was his not studying. To prove this, one thinks of the counterfactual – the same student writing the same test under the same circumstances but having studied the night before. If one could rewind history, and change only one small thing (making the student study for the exam), then causation could be observed (by comparing version 1 to version 2). Because one cannot rewind history and replay events after making small controlled changes, causation can only be inferred, never exactly known. This is referred to as the Fundamental Problem of Causal Inference – it is impossible to directly observe causal effects. [16]

A major goal of scientific experiments and statistical methods is to approximate as best possible the counterfactual state of the world. [17] For example, one could run an experiment on identical twins who were known to consistently get the same grades on their tests. One twin is sent to study for six hours while the other is sent to the amusement park. If their test scores suddenly diverged by a large degree, this would be strong evidence that studying (or going to the amusement park) had a causal effect on test scores. In this case, correlation between studying and test scores would almost certainly imply causation.

Well-designed experimental studies replace equality of individuals as in the previous example by equality of groups. The objective is to construct two groups that are similar except for the treatment that the groups receive. This is achieved by selecting subjects from a single population and randomly assigning them to two or more groups. The likelihood of the groups behaving similarly to one another (on average) rises with the number of subjects in each group. If the groups are essentially equivalent except for the treatment they receive, and a difference in the outcome for the groups is observed, then this constitutes evidence that the treatment is responsible for the outcome, or in other words the treatment causes the observed effect. However, an observed effect could also be caused "by chance", for example as a result of random perturbations in the population. Statistical tests exist to quantify the likelihood of erroneously concluding that an observed difference exists when in fact it does not (for example see P-value).

Operational definitions of causality

Clive Granger created the first operational definition of causality in 1969. [18] Granger made the definition of probabilistic causality proposed by Norbert Wiener operational as a comparison of variances. [19]

Verification by "truth"

Peter Spirtes, Clark Glymour, and Richard Scheines introduced the idea of explicitly not providing a definition of causality [ clarification needed ]. [3] Spirtes and Glymour introduced the PC algorithm for causal discovery in 1990. [20] Many recent causal discovery algorithms follow the Spirtes-Glymour approach to verification. [21]

Exploratory

Exploratory causal analysis, also known as "data causality" or "causal discovery" [3] is the use of statistical algorithms to infer associations in observed data sets that are potentially causal under strict assumptions. ECA is a type of causal inference distinct from causal modeling and treatment effects in randomized controlled trials. [4] It is exploratory research usually preceding more formal causal research in the same way exploratory data analysis often precedes statistical hypothesis testing in data analysis. [22] [23]

See also

Related Research Articles

Econometrics is an application of statistical methods to economic data in order to give empirical content to economic relationships. More precisely, it is "the quantitative analysis of actual economic phenomena based on the concurrent development of theory and observation, related by appropriate methods of inference." An introductory economics textbook describes econometrics as allowing economists "to sift through mountains of data to extract simple relationships." Jan Tinbergen is one of the two founding fathers of econometrics. The other, Ragnar Frisch, also coined the term in the sense in which it is used today.

Causality (also called causation, or cause and effect) is an influence by which one event, process, state, or object (acause) contributes to the production of another event, process, state, or object (an effect) where the cause is partly responsible for the effect, and the effect is partly dependent on the cause. In general, a process has many causes, which are also said to be causal factors for it, and all lie in its past. An effect can in turn be a cause of, or causal factor for, many other effects, which all lie in its future. Some writers have held that causality is metaphysically prior to notions of time and space.

The phrase "correlation does not imply causation" refers to the inability to legitimately deduce a cause-and-effect relationship between two events or variables solely on the basis of an observed association or correlation between them. The idea that "correlation implies causation" is an example of a questionable-cause logical fallacy, in which two events occurring together are taken to have established a cause-and-effect relationship. This fallacy is also known by the Latin phrase cum hoc ergo propter hoc. This differs from the fallacy known as post hoc ergo propter hoc, in which an event following another is seen as a necessary consequence of the former event, and from conflation, the errant merging of two events, ideas, databases, etc., into one.

<span class="mw-page-title-main">Spurious relationship</span> Apparent, but false, correlation between causally-independent variables

In statistics, a spurious relationship or spurious correlation is a mathematical relationship in which two or more events or variables are associated but not causally related, due to either coincidence or the presence of a certain third, unseen factor.

<span class="mw-page-title-main">Trygve Haavelmo</span> Norwegian economist and econometrician

Trygve Magnus Haavelmo, born in Skedsmo, Norway, was an economist whose research interests centered on econometrics. He received the Nobel Memorial Prize in Economic Sciences in 1989.

<span class="mw-page-title-main">Granger causality</span> Statistical hypothesis test for forecasting

The Granger causality test is a statistical hypothesis test for determining whether one time series is useful in forecasting another, first proposed in 1969. Ordinarily, regressions reflect "mere" correlations, but Clive Granger argued that causality in economics could be tested for by measuring the ability to predict the future values of a time series using prior values of another time series. Since the question of "true causality" is deeply philosophical, and because of the post hoc ergo propter hoc fallacy of assuming that one thing preceding another can be used as a proof of causation, econometricians assert that the Granger test finds only "predictive causality". Using the term "causality" alone is a misnomer, as Granger-causality is better described as "precedence", or, as Granger himself later claimed in 1977, "temporally related". Rather than testing whether Xcauses Y, the Granger causality tests whether X forecastsY.

<span class="mw-page-title-main">Structural equation modeling</span> Form of causal modeling that fit networks of constructs to data

Structural equation modeling (SEM) is a diverse set of methods used by scientists doing both observational and experimental research. SEM is used mostly in the social and behavioral sciences but it is also used in epidemiology, business, and other fields. A definition of SEM is difficult without reference to technical language, but a good starting place is the name itself.

<span class="mw-page-title-main">Confounding</span> Variable or factor in causal inference

In causal inference, a confounder is a variable that influences both the dependent variable and independent variable, causing a spurious association. Confounding is a causal concept, and as such, cannot be described in terms of correlations or associations. The existence of confounders is an important quantitative explanation why correlation does not imply causation. Some notations are explicitly designed to identify the existence, possible existence, or non-existence of confounders in causal relationships between elements of a system.

The Rubin causal model (RCM), also known as the Neyman–Rubin causal model, is an approach to the statistical analysis of cause and effect based on the framework of potential outcomes, named after Donald Rubin. The name "Rubin causal model" was first coined by Paul W. Holland. The potential outcomes framework was first proposed by Jerzy Neyman in his 1923 Master's thesis, though he discussed it only in the context of completely randomized experiments. Rubin extended it into a general framework for thinking about causation in both observational and experimental studies.

<span class="mw-page-title-main">Causal model</span> Conceptual model in philosophy of science

In the philosophy of science, a causal model is a conceptual model that describes the causal mechanisms of a system. Several types of causal notation may be used in the development of a causal model. Causal models can improve study designs by providing clear rules for deciding which independent variables need to be included/controlled for.

Probabilistic causation is a concept in a group of philosophical theories that aim to characterize the relationship between cause and effect using the tools of probability theory. The central idea behind these theories is that causes raise the probabilities of their effects, all else being equal.

Causal reasoning is the process of identifying causality: the relationship between a cause and its effect. The study of causality extends from ancient philosophy to contemporary neuropsychology; assumptions about the nature of causality may be shown to be functions of a previous event preceding a later one. The first known protoscientific study of cause and effect occurred in Aristotle's Physics. Causal inference is an example of causal reasoning.

The Bradford Hill criteria, otherwise known as Hill's criteria for causation, are a group of nine principles that can be useful in establishing epidemiologic evidence of a causal relationship between a presumed cause and an observed effect and have been widely used in public health research. They were established in 1965 by the English epidemiologist Sir Austin Bradford Hill.

Clark N. Glymour is the Alumni University Professor Emeritus in the Department of Philosophy at Carnegie Mellon University. He is also a senior research scientist at the Florida Institute for Human and Machine Cognition.

<span class="mw-page-title-main">Collider (statistics)</span> Variable that is causally influenced by two or more variables

In statistics and causal graphs, a variable is a collider when it is causally influenced by two or more variables. The name "collider" reflects the fact that in graphical models, the arrow heads from variables that lead into the collider appear to "collide" on the node that is the collider. They are sometimes also referred to as inverted forks.

There have been many criticisms of econometrics' usefulness as a discipline and perceived widespread methodological shortcomings in econometric modelling practices.

Causal inference is the process of determining the independent, actual effect of a particular phenomenon that is a component of a larger system. The main difference between causal inference and inference of association is that causal inference analyzes the response of an effect variable when a cause of the effect variable is changed. The study of why things occur is called etiology, and can be described using the language of scientific causal notation. Causal inference is said to provide the evidence of causality theorized by causal reasoning.

In statistics, econometrics, epidemiology, genetics and related disciplines, causal graphs are probabilistic graphical models used to encode assumptions about the data-generating process.

Causal analysis is the field of experimental design and statistical analysis pertaining to establishing cause and effect. Exploratory causal analysis (ECA), also known as data causality or causal discovery is the use of statistical algorithms to infer associations in observed data sets that are potentially causal under strict assumptions. ECA is a type of causal inference distinct from causal modeling and treatment effects in randomized controlled trials. It is exploratory research usually preceding more formal causal research in the same way exploratory data analysis often precedes statistical hypothesis testing in data analysis

<i>The Book of Why</i> 2018 book by Judea Pearl and Dana Mackenzie

The Book of Why: The New Science of Cause and Effect is a 2018 nonfiction book by computer scientist Judea Pearl and writer Dana Mackenzie. The book explores the subject of causality and causal inference from statistical and philosophical points of view for a general audience.

References

  1. Rohlfing, Ingo; Schneider, Carsten Q. (2018). "A Unifying Framework for Causal Analysis in Set-Theoretic Multimethod Research" (PDF). Sociological Methods & Research. 47 (1): 37–63. doi:10.1177/0049124115626170. S2CID   124804330 . Retrieved 29 February 2020.
  2. Brady, Henry E. (7 July 2011). "Causation and Explanation in Social Science". The Oxford Handbook of Political Science. doi:10.1093/oxfordhb/9780199604456.013.0049 . Retrieved 29 February 2020.
  3. 1 2 3 Spirtes, P.; Glymour, C.; Scheines, R. (2012). Causation, Prediction, and Search. Springer Science & Business Media. ISBN   978-1461227489.
  4. 1 2 Rosenbaum, Paul (2017). Observation and Experiment: An Introduction to Causal Inference. Harvard University Press. ISBN   9780674975576.
  5. Pearl, Judea; Mackenzie, Dana (2018). The Book of Why: The New Science of Cause and Effect. Basic Books. ISBN   978-0465097616.
  6. Kleinberg, Samantha (2015). Why: A Guide to Finding and Using Causes. O'Reilly Media, Inc. ISBN   978-1491952191.
  7. Illari, P.; Russo, F. (2014). Causality: Philosophical Theory meets Scientific Practice. OUP Oxford. ISBN   978-0191639685.
  8. Fisher, R. (1937). The design of experiments. Oliver And Boyd.
  9. Hill, B. (1955). Principles of Medical Statistics. Lancet Limited.
  10. Halpern, J. (2016). Actual Causality. MIT Press. ISBN   978-0262035026.
  11. Pearl, J.; Glymour, M.; Jewell, N. P. (2016). Causal inference in statistics: a primer. John Wiley & Sons. ISBN   978-1119186847.
  12. Stone, R. (1993). "The Assumptions on Which Causal Inferences Rest". Journal of the Royal Statistical Society. Series B (Methodological). 55 (2): 455–466. doi:10.1111/j.2517-6161.1993.tb01915.x.
  13. 1 2 Beebee, Hitchcock & Menzies 2009
  14. Morris, William Edward (2001). "David Hume". The Stanford Encyclopedia of Philosophy.
  15. Lloyd, A.C. (1976). "The principle that the cause is greater than its effect". Phronesis. 21 (2): 146–156. doi:10.1163/156852876x00101. JSTOR   4181986.
  16. Holland, Paul W. (1986). "Statistics and Causal Inference". Journal of the American Statistical Association . 81 (396): 945–960. doi:10.1080/01621459.1986.10478354. S2CID   14377504.
  17. Pearl, Judea (2000). Causality: Models, Reasoning, and Inference . Cambridge University Press. ISBN   9780521773621.
  18. Granger, C. W. J. (1969). "Investigating Causal Relations by Econometric Models and Cross-spectral Methods". Econometrica. 37 (3): 424–438. doi:10.2307/1912791. JSTOR   1912791.
  19. Granger, Clive. "Prize Lecture. NobelPrize.org. Nobel Media AB 2018".
  20. Spirtes, P.; Glymour, C. (1991). "An algorithm for fast recovery of sparse causal graphs". Social Science Computer Review. 9 (1): 62–72. doi:10.1177/089443939100900106. S2CID   38398322.
  21. Guo, Ruocheng; Cheng, Lu; Li, Jundong; Hahn, P. Richard; Liu, Huan (2020). "A Survey of Learning Causality with Data". ACM Computing Surveys. 53 (4): 1–37. arXiv: 1809.09337 . doi:10.1145/3397269. S2CID   52822969.
  22. McCracken, James (2016). Exploratory Causal Analysis with Time Series Data (Synthesis Lectures on Data Mining and Knowledge Discovery). Morgan & Claypool Publishers. ISBN   978-1627059343.
  23. Tukey, John W. (1977). Exploratory Data Analysis. Pearson. ISBN   978-0201076165.

Bibliography