Exploratory causal analysis

Last updated

Causal analysis is the field of experimental design and statistical analysis pertaining to establishing cause and effect. [1] [2] Exploratory causal analysis (ECA), also known as data causality or causal discovery [3] is the use of statistical algorithms to infer associations in observed data sets that are potentially causal under strict assumptions. ECA is a type of causal inference distinct from causal modeling and treatment effects in randomized controlled trials. [4] It is exploratory research usually preceding more formal causal research in the same way exploratory data analysis often precedes statistical hypothesis testing in data analysis [5] [6]

Contents

Motivation

Data analysis is primarily concerned with causal questions. [3] [4] [7] [8] [9] For example, did the fertilizer cause the crops to grow? [10] Or, can a given sickness be prevented? [11] Or, why is my friend depressed? [12] The potential outcomes and regression analysis techniques handle such queries when data is collected using designed experiments. Data collected in observational studies require different techniques for causal inference (because, for example, of issues such as confounding). [13] Causal inference techniques used with experimental data require additional assumptions to produce reasonable inferences with observation data. [14] The difficulty of causal inference under such circumstances is often summed up as "correlation does not imply causation".

Overview

ECA postulates that there exist data analysis procedures performed on specific subsets of variables within a larger set whose outputs might be indicative of causality between those variables. [3] For example, if we assume every relevant covariate in the data is observed, then propensity score matching can be used to find the causal effect between two observational variables. [4] Granger causality can also be used to find the causality between two observational variables under different, but similarly strict, assumptions. [15]

The two broad approaches to developing such procedures are using operational definitions of causality [5] or verification by "truth" (i.e., explicitly ignoring the problem of defining causality and showing that a given algorithm implies a causal relationship in scenarios when causal relationships are known to exist, e.g., using synthetic data [3] ).

Operational definitions of causality

Clive Granger created the first operational definition of causality in 1969. [16] Granger made the definition of probabilistic causality proposed by Norbert Wiener operational as a comparison of variances. [17]

Some authors prefer using ECA techniques developed using operational definitions of causality because they believe it may help in the search for causal mechanisms. [5] [18]

Verification by "truth"

Peter Spirtes, Clark Glymour, and Richard Scheines introduced the idea of explicitly not providing a definition of causality. [3] Spirtes and Glymour introduced the PC algorithm for causal discovery in 1990. [19] Many recent causal discovery algorithms follow the Spirtes-Glymour approach to verification. [20]

Techniques

There are many surveys of causal discovery techniques. [3] [5] [20] [21] [22] [23] This section lists the well-known techniques.

Bivariate (or "pairwise")

Multivariate

Many of these techniques are discussed in the tutorials provided by the Center for Causal Discovery (CCD) .

Use-case examples

Social science

The PC algorithm has been applied to several different social science data sets. [3]

Medicine

The PC algorithm has been applied to medical data. [28] Granger causality has been applied to fMRI data. [29] CCD tested their tools using biomedical data .

Physics

ECA is used in physics to understand the physical causal mechanisms of the system, e.g., in geophysics using the PC-stable algorithm (a variant of the original PC algorithm) [30] and in dynamical systems using pairwise asymmetric inference (a variant of convergent cross mapping). [31]

Criticism

There is debate over whether or not the relationships between data found using causal discovery are actually causal. [3] [25] Judea Pearl has emphasized that causal inference requires a causal model developed by "intelligence" through an iterative process of testing assumptions and fitting data. [7]

Response to the criticism points out that assumptions used for developing ECA techniques may not hold for a given data set [3] [14] [32] [33] [34] and that any causal relationships discovered during ECA are contingent on these assumptions holding true [25] [35]

Software Packages

Comprehensive toolkits

Specific Techniques

Granger causality

convergent cross mapping

LiNGAM

There is also a collection of tools and data maintained by the Causality Workbench team and the CCD team .

Related Research Articles

References

  1. Rohlfing, Ingo; Schneider, Carsten Q. (2018). "A Unifying Framework for Causal Analysis in Set-Theoretic Multimethod Research" (PDF). Sociological Methods & Research. 47 (1): 37–63. doi:10.1177/0049124115626170. S2CID   124804330. Archived from the original (PDF) on 9 October 2022. Retrieved 29 February 2020.
  2. Brady, Henry E. (7 July 2011). "Causation and Explanation in Social Science". The Oxford Handbook of Political Science. doi:10.1093/oxfordhb/9780199604456.013.0049 . Retrieved 29 February 2020.
  3. 1 2 3 4 5 6 7 8 9 10 11 Spirtes, P.; Glymour, C.; Scheines, R. (2012). Causation, Prediction, and Search. Springer Science & Business Media. ISBN   978-1461227489.
  4. 1 2 3 Rosenbaum, Paul (2017). Observation and Experiment: An Introduction to Causal Inference. Harvard University Press. ISBN   9780674975576.
  5. 1 2 3 4 McCracken, James (2016). Exploratory Causal Analysis with Time Series Data (Synthesis Lectures on Data Mining and Knowledge Discovery). Morgan & Claypool Publishers. ISBN   978-1627059343.
  6. Tukey, John W. (1977). Exploratory Data Analysis. Pearson. ISBN   978-0201076165.
  7. 1 2 Pearl, Judea (2018). The Book of Why: The New Science of Cause and Effect. Basic Books. ISBN   978-0465097616.
  8. Kleinberg, Samantha (2015). Why: A Guide to Finding and Using Causes. O'Reilly Media, Inc. ISBN   978-1491952191.
  9. Illari, P.; Russo, F. (2014). Causality: Philosophical Theory meets Scientific Practice. OUP Oxford. ISBN   978-0191639685.
  10. Fisher, R. (1937). The design of experiments. Oliver And Boyd.
  11. Hill, B. (1955). Principles of Medical Statistics. Lancet Limited.
  12. Halpern, J. (2016). Actual Causality. MIT Press. ISBN   978-0262035026.
  13. Pearl, J.; Glymour, M.; Jewell, N. P. (2016). Causal inference in statistics: a primer. John Wiley & Sons. ISBN   978-1119186847.
  14. 1 2 Stone, R. (1993). "The Assumptions on Which Causal Inferences Rest". Journal of the Royal Statistical Society. Series B (Methodological). 55 (2): 455–466. doi:10.1111/j.2517-6161.1993.tb01915.x.
  15. Granger, C (1980). "Testing for causality: a personal viewpoint". Journal of Economic Dynamics and Control. 2: 329–352. doi:10.1016/0165-1889(80)90069-X.
  16. Granger, C. W. J. (1969). "Investigating Causal Relations by Econometric Models and Cross-spectral Methods". Econometrica. 37 (3): 424–438. doi:10.2307/1912791. JSTOR   1912791.
  17. Granger, Clive. "Prize Lecture. NobelPrize.org. Nobel Media AB 2018".
  18. Woodward, James (2004). Making Things Happen: A Theory of Causal Explanation (Oxford Studies in the Philosophy of Science). Oxford University Press. ISBN   978-1435619999.
  19. Spirtes, P.; Glymour, C. (1991). "An algorithm for fast recovery of sparse causal graphs". Social Science Computer Review. 9 (1): 62–72. doi:10.1177/089443939100900106. S2CID   38398322.
  20. 1 2 Guo, Ruocheng; Cheng, Lu; Li, Jundong; Hahn, P. Richard; Liu, Huan (2020). "A Survey of Learning Causality with Data". ACM Computing Surveys. 53 (4): 1–37. arXiv: 1809.09337 . doi:10.1145/3397269. S2CID   52822969.
  21. Malinsky, Daniel; Danks, David (2018). "Causal discovery algorithms: A practical guide". Philosophy Compass. 13 (1): e12470. doi: 10.1111/phc3.12470 .
  22. Spirtes, P.; Zhang, K. (2016). "Causal discovery and inference: concepts and recent methodological advances". Appl Inform (Berl). 3: 3. doi: 10.1186/s40535-016-0018-x . PMC   4841209 . PMID   27195202.
  23. Yu, Kui; Li, Jiuyong; Liu, Lin; Richard Hahn, P.; Liu, Huan (2016). "A review on algorithms for constraint-based causal discovery". arXiv: 1611.03977 [cs.AI].
  24. Sun, Jie; Bollt, Erik M.; Li, Jundong; Richard Hahn, P.; Liu, Huan (2014). "Causation entropy identifies indirect influences, dominance of neighbors and anticipatory couplings". Physica D: Nonlinear Phenomena. 267: 49–57. arXiv: 1504.03769 . Bibcode:2014PhyD..267...49S. doi:10.1016/j.physd.2013.07.001. S2CID   14422483.
  25. 1 2 3 Freedman, David; Humphreys, Paul (1999). "Are there algorithms that discover causal structure?". Synthese. 121 (1–2): 29–54. doi:10.1023/A:1005277613752. S2CID   6826436.
  26. Raghu, V. K.; Ramsey, J. D.; Morris, A.; Manatakis, D. V.; Sprites, P.; Chrysanthis, P. K.; Glymour, C.; Benos, P. V. (2018). "Comparison of strategies for scalable causal discovery of latent variable models from mixed data". International Journal of Data Science and Analytics. 6 (33): 33–45. doi:10.1007/s41060-018-0104-3. PMC   6096780 . PMID   30148202.
  27. Shimizu, S (2014). "LiNGAM: non-Gaussian methods for estimating causal structures". Behaviormetrika. 41 (1): 65–98. doi:10.2333/bhmk.41.65. S2CID   49238101.
  28. Cheek, C.; Zheng, H.; Hallstrom, B. R.; Hughes, R. E. (2018). "Application of a Causal Discovery Algorithm to the Analysis of Arthroplasty Registry Data". Biomedical Engineering and Computational Biology. 9: 117959721875689. doi:10.1177/1179597218756896. PMC   5826097 . PMID   29511363.
  29. Wen, X.; Rangarajan, G.; Ding, M. (2013). "Is Granger Causality a Viable Technique for Analyzing fMRI Data?". PLOS ONE. 8 (7): e67428. Bibcode:2013PLoSO...867428W. doi: 10.1371/journal.pone.0067428 . PMC   3701552 . PMID   23861763.
  30. Ebert-Uphoff, Imme; Deng, Yi (2017). "Causal discovery in the geosciences—Using synthetic data to learn how to interpret results". Computers & Geosciences. 99: 50–60. Bibcode:2017CG.....99...50E. doi: 10.1016/j.cageo.2016.10.008 .
  31. McCracken, J. M.; Weigel, R. S.; Li, Jundong; Richard Hahn, P.; Liu, Huan (2014). "Convergent cross-mapping and pairwise asymmetric inference". Phys. Rev. E. 90 (6): 062903. arXiv: 1407.5696 . Bibcode:2014PhRvE..90f2903M. doi:10.1103/PhysRevE.90.062903. PMID   25615160. S2CID   7506718.
  32. Scheines, R. (1997). "An introduction to causal inference" (PDF). Causality in Crisis: 185–199.
  33. Holland, P. W. (1986). "Statistics and causal inference". Journal of the American Statistical Association. 81 (396): 945–960. doi:10.1080/01621459.1986.10478354. S2CID   14377504.
  34. Imbens, G. W.; Rubin, D. B. (2015). Causal inference in statistics, social, and biomedical sciences. Cambridge University Press. ISBN   978-0521885881.
  35. Morgan, S. L.; Winship, C. (2015). Counterfactuals and causal inference. Cambridge University Press. ISBN   978-1107065079.
  36. "Causal Models and Statistical Data, The Tetrad Project".
  37. "Tools, Center for Causal Discovery, University of Pittsburg". 10 August 2016.