The Book of Why

Last updated
The Book of Why: The New Science of Cause and Effect
The Book of Why.jpg
First edition (US)
Authors Judea Pearl and Dana Mackenzie
LanguageEnglish
Subjects Causality, Causal Inference, Statistics
Publisher Basic Books (US)
Penguin (UK)
Publication date
2018
ISBN 9780141982410
Preceded byCausal Inference in Statistics: A Primer 

The Book of Why: The New Science of Cause and Effect is a 2018 nonfiction book by computer scientist Judea Pearl and writer Dana Mackenzie. The book explores the subject of causality and causal inference from statistical and philosophical points of view for a general audience.

Contents

Summary

The book consists of ten chapters and an introduction.

Introduction: Mind over Data

The introduction describes the inadequacy of early 20th century statistical methods at making statements about causal relationships between variables. The authors then describe what they term 'The Causal Revolution', which started in the middle of the 20th century, and provided new conceptual and mathematical tools for describing causal relationships.

Chapter 1: The Ladder of Causation

Chapter 1 introduces the 'ladder of causation' - a diagram used to illustrate the three levels of causal reasoning. The first level is named 'Association', which discusses associations between variables. Questions such as 'is variable X associated with variable Y?' can be answered at this level. However, crucially, causality is not invoked. An example of reasoning on this first level is the observation that a crowing rooster is associated with the sunrise. However, this kind of reasoning cannot describe causal relations. For example, we cannot say whether the sunrise causes the rooster to crow, or whether the rooster causes the sun to rise. Many of the early 20th century statistical tools, such as correlation and regression operate on this level.

The second level (or 'rung') on the ladder of causation is labelled 'Intervention'. Reasoning on this level answers questions of the form 'if I make the intervention X, how will this affect the probability of the outcome Y?'. For example, the question 'does smoking increase my chance of lung cancer?' exists on the second level of the ladder of causation. This kind of reasoning invokes causality and can be used to investigate more questions than the reasoning of the first rung.

The third rung of the ladder of causation is labelled 'Counterfactuals' and involves answering questions which ask what might have been, had circumstances been different. Such reasoning invokes causality to a greater degree than the previous level. An example counterfactual question given in the book is 'Would Kennedy be alive if Oswald had not killed him?'

Chapter 2: From Buccaneers to Guinea Pigs: The Genesis of Causal Inference

Chapter 2 starts with a brief summary of the contributions of Francis Galton and Karl Pearson to the development of statistics in the late 19th Century and early 20th Centuries. The authors blame Galton for keeping the study of statistics on the first rung of the ladder of causation and discouraging any discussion of causality in statistics. Causal analysis using path diagrams is then introduced through the explanations of the work of Sewall Wright.

Chapter 3: From Evidence to Causes: Reverend Bayes meets Mr Holmes

Chapter 3 provides an introduction to Bayes' Theorem. Then Bayesian Networks are introduced. Finally, the links between Baysian networks and causal diagrams are discussed.

Chapter 4: Confounding and Deconfounding, or, Slaying the Lurking Variable

This chapter introduces the idea of confounding and describes how causal diagrams can be used to identify confounding variables and determine their effect. Pearl explains that randomized controlled trials (RCTs) can be used to nullify the effect of confounders, but shows that, provided one has a causal model of confounding, an RCT does not necessarily have to be performed to get results.

Chapter 5: The Smoke-filled Debate: Clearing the Air

This chapter takes a historical approach to the question 'does smoking cause lung cancer?', focusing on the arguments made by Abraham Lilienfeld, Jacob Yerushalmy, Ronald Fisher and Jerome Cornfield. The authors explain that, though cigarette smoking was clearly correlated with lung cancer, some, such as Fisher and Yerushalmy, believed that the two variables were confounded and argued against the hypothesis that cigarettes caused the cancer. The authors then explain how causal reasoning (as developed in the rest of the book) can be used to argue that cigarettes do indeed cause cancer.

Chapter 6: Paradoxes Galore!

This chapter examines several paradoxes, including the Monty Hall Problem, Simpson's paradox, Berkson's paradox and Lord's paradox. The authors show how these paradoxes can be resolved using causal reasoning.

Chapter 7: Beyond Adjustment: The Conquest of Mount Intervention

This chapter looks at the 'second rung' of the ladder of causation introduced in chapter 1. The authors describe how to use causal diagrams to ascertain the causal effect of performing interventions (eg. smoking) on outcomes (such as lung cancer). The 'front-door criterion' and the 'do-calculus' are introduced as tools for doing this. The chapter finishes with two examples, used to introduce the use of instrumental variables to estimate causal relationships. The first is John Snow's discovery that cholera is caused by unsanitary water supplies. The second is the relationship between cholesterol levels and likelihood of a heart attack.

Chapter 8: Counterfactuals: Mining worlds that could have been

This chapter examines the third rung of the ladder of causation: counterfactuals. The chapter introduces 'structural causal models', which allow reasoning about counterfactuals in a way that traditional (non-causal) statistics does not. Then, the applications of counterfactual reasoning are explored in the areas of climate science and the law.

Chapter 9: Mediation: The Search for Mechanism

This chapter discusses mediation: the mechanism by which a cause leads to an effect. The authors discuss the work of Barbara Stoddard Burks on the causes of intelligence of children, the 'algebra for all' policy by Chicago public schools, and the use of tourniquets to treat combat wounds.

Chapter 10: Big Data, Artificial Intelligence and the Big Questions

The final chapter discusses the use of causal reasoning in big data and artificial intelligence (AI) and the philosophical problem that AI would have to reflect on its own actions, which requires counterfactual (and therefore causal) reasoning.

Reviews

Scientific background, excerpts, errata, and a list of 37 reviews of The Book of Why is provided on Judea Pearl's web page. [1]

The Book of Why was reviewed by Jonathan Knee in The New York Times . The review was positive, with Knee calling the book "illuminating". However, he describes some parts of the book as "challenging", stating that the book is "not always fully accessible to readers who do not share the author's fondness for equations". [2]

Tim Maudlin gave the book a mixed review in The Boston Review , calling it a "splendid overview of the state of the art in causal analysis". However, Maudlin criticizes the inclusion of "counterfactuals" as separate rung on the "ladder of causation", stating "[c]ounterfactuals are so closely entwined with causal claims that it is not possible to think causally but not counterfactually". Maudlin also criticizes the section on free will for its "imprecision and lack of familiarity with the philosophical literature". Finally he points to the work of several scientists (including Clark Glymour) who developed similar ideas to Pearl, and claims that Pearl "could have saved himself literally years of effort had he been apprised of this work". [3]

In a rebuttal, Pearl notes that, not only was he apprised of these scientists' work, but he actively collaborated in its creation. Additionally, the key developments described in The Book of Why, among them (1) identification analysis, (2) the algorithmization of counterfactuals, (3) mediation analysis, and (4) external validity, far surpass the narrow philosophical literature of the pre-2000 era.

Zoe Hackett, writing in Chemistry World , gave The Book of Why a positive review, with the caveat that "[i]t requires concentration, and a studious effort to work through the mind-bending statistical problems posited in the text". The review concludes by stating that "[t]his book is a must for any serious student of philosophy of science, and should be required reading for any first-year undergraduate statistics class". [4]

Lisa R. Goldberg wrote a detailed, technical review in Notices of the American Mathematical Society . [5]

Related Research Articles

Causality is influence by which one event, process, state, or object contributes to the production of another event, process, state, or object where the cause is partly responsible for the effect, and the effect is partly dependent on the cause. In general, a process has many causes, which are also said to be causal factors for it, and all lie in its past. An effect can in turn be a cause of, or causal factor for, many other effects, which all lie in its future. Some writers have held that causality is metaphysically prior to notions of time and space.

The phrase "correlation does not imply causation" refers to the inability to legitimately deduce a cause-and-effect relationship between two events or variables solely on the basis of an observed association or correlation between them. The idea that "correlation implies causation" is an example of a questionable-cause logical fallacy, in which two events occurring together are taken to have established a cause-and-effect relationship. This fallacy is also known by the Latin phrase cum hoc ergo propter hoc. This differs from the fallacy known as post hoc ergo propter hoc, in which an event following another is seen as a necessary consequence of the former event, and from conflation, the errant merging of two events, ideas, databases, etc., into one.

<span class="mw-page-title-main">Simpson's paradox</span> Probability and statistics phenomenon

Simpson's paradox is a phenomenon in probability and statistics in which a trend appears in several groups of data but disappears or reverses when the groups are combined. This result is often encountered in social-science and medical-science statistics, and is particularly problematic when frequency data are unduly given causal interpretations. The paradox can be resolved when confounding variables and causal relations are appropriately addressed in the statistical modeling. Simpson's paradox has been used to illustrate the kind of misleading results that the misuse of statistics can generate.

<span class="mw-page-title-main">Spurious relationship</span> Apparent, but false, correlation between causally-independent variables

In statistics, a spurious relationship or spurious correlation is a mathematical relationship in which two or more events or variables are associated but not causally related, due to either coincidence or the presence of a certain third, unseen factor.

<span class="mw-page-title-main">Trygve Haavelmo</span> Norwegian economist and econometrician

Trygve Magnus Haavelmo, born in Skedsmo, Norway, was an economist whose research interests centered on econometrics. He received the Nobel Memorial Prize in Economic Sciences in 1989.

The Markov condition, sometimes called the Markov assumption, is an assumption made in Bayesian probability theory, that every node in a Bayesian network is conditionally independent of its nondescendants, given its parents. Stated loosely, it is assumed that a node has no bearing on nodes which do not descend from it. In a DAG, this local Markov condition is equivalent to the global Markov condition, which states that d-separations in the graph also correspond to conditional independence relations. This also means that a node is conditionally independent of the entire network, given its Markov blanket.

Wesley Charles Salmon was an American philosopher of science renowned for his work on the nature of scientific explanation. He also worked on confirmation theory, trying to explicate how probability theory via inductive logic might help confirm and choose hypotheses. Yet most prominently, Salmon was a realist about causality in scientific explanation, although his realist explanation of causality drew ample criticism. Still, his books on scientific explanation itself were landmarks of the 20th century's philosophy of science, and solidified recognition of causality's important roles in scientific explanation, whereas causality itself has evaded satisfactory elucidation by anyone.

<span class="mw-page-title-main">Confounding</span> Variable in statistics

In statistics, a confounder is a variable that influences both the dependent variable and independent variable, causing a spurious association. Confounding is a causal concept, and as such, cannot be described in terms of correlations or associations. The existence of confounders is an important quantitative explanation why correlation does not imply causation.

Causality is the influence that connects one process or state, the cause, with another process or state, the effect, where the cause is partly responsible for the effect, and the effect is partly dependent on the cause.

<span class="mw-page-title-main">Causal model</span> Conceptual model in philosophy of science

In the philosophy of science, a causal model is a conceptual model that describes the causal mechanisms of a system. Causal models can improve study designs by providing clear rules for deciding which independent variables need to be included/controlled for.

In causal models, controlling for a variable means binning data according to measured values of the variable. This is typically done so that the variable can no longer act as a confounder in, for example, an observational study or experiment.

Why–because analysis (WBA) is a method for accident analysis. It is independent of application domain and has been used to analyse, among others, aviation-, railway-, marine-, and computer-related accidents and incidents. It is mainly used as an after-the-fact analysis method. WBA strives to ensure objectivity, falsifiability and reproducibility of results.

Probabilistic causation is a concept in a group of philosophical theories that aim to characterize the relationship between cause and effect using the tools of probability theory. The central idea behind these theories is that causes raise the probabilities of their effects, all else being equal.

Causal reasoning is the process of identifying causality: the relationship between a cause and its effect. The study of causality extends from ancient philosophy to contemporary neuropsychology; assumptions about the nature of causality may be shown to be functions of a previous event preceding a later one. The first known protoscientific study of cause and effect occurred in Aristotle's Physics. Causal inference is an example of causal reasoning.

Causal analysis is the field of experimental design and statistics pertaining to establishing cause and effect. Typically it involves establishing four elements: correlation, sequence in time, a plausible physical or information-theoretical mechanism for an observed effect to follow from a possible cause, and eliminating the possibility of common and alternative ("special") causes. Such analysis usually involves one or more artificial or natural experiments.

The Bradford Hill criteria, otherwise known as Hill's criteria for causation, are a group of nine principles that can be useful in establishing epidemiologic evidence of a causal relationship between a presumed cause and an observed effect and have been widely used in public health research. They were established in 1965 by the English epidemiologist Sir Austin Bradford Hill.

<span class="mw-page-title-main">Collider (statistics)</span>

In statistics and causal graphs, a variable is a collider when it is causally influenced by two or more variables. The name "collider" reflects the fact that in graphical models, the arrow heads from variables that lead into the collider appear to "collide" on the node that is the collider. They are sometimes also referred to as inverted forks.

Causal inference is the process of determining the independent, actual effect of a particular phenomenon that is a component of a larger system. The main difference between causal inference and inference of association is that causal inference analyzes the response of an effect variable when a cause of the effect variable is changed. The science of why things occur is called etiology. Causal inference is said to provide the evidence of causality theorized by causal reasoning.

<i>Causality</i> (book)

Causality: Models, Reasoning and Inference is a book by Judea Pearl. It is an exposition and analysis of causality. It is considered to have been instrumental in laying the foundations of the modern debate on causal inference in several fields including statistics, computer science and epidemiology. In this book, Pearl espouses the Structural Causal Model (SCM) that uses structural equation modeling. This model is a competing viewpoint to the Rubin causal model. Some of the material from the book was reintroduced in the more general-audience targeting The Book of Why.

Causal analysis is the field of experimental design and statistics pertaining to establishing cause and effect. Exploratory causal analysis (ECA), also known as data causality or causal discovery is the use of statistical algorithms to infer associations in observed data sets that are potentially causal under strict assumptions. ECA is a type of causal inference distinct from causal modeling and treatment effects in randomized controlled trials. It is exploratory research usually preceding more formal causal research in the same way exploratory data analysis often precedes statistical hypothesis testing in data analysis

References

  1. Judea Pearl's information page for The Book of Why, http://bayes.cs.ucla.edu/WHY/
  2. "Review: The Book of Why Examines the Science of Cause and Effect". The New York Times. 1 June 2018.
  3. Tim Maudlin (4 September 2019). "The Why of the World". The Boston Review.
  4. Zoe Hackett (18 January 2019). "The Book of Why: The New Science of Cause and Effect". Chemistry World.
  5. Lisa R. Goldberg (August 2019). "The Book of Why" (PDF). Notices of the American Mathematical Society.