Rubin causal model

Last updated

The Rubin causal model (RCM), also known as the Neyman–Rubin causal model, [1] is an approach to the statistical analysis of cause and effect based on the framework of potential outcomes, named after Donald Rubin. The name "Rubin causal model" was first coined by Paul W. Holland. [2] The potential outcomes framework was first proposed by Jerzy Neyman in his 1923 Master's thesis, [3] though he discussed it only in the context of completely randomized experiments. [4] Rubin extended it into a general framework for thinking about causation in both observational and experimental studies. [1]



The Rubin causal model is based on the idea of potential outcomes. For example, a person would have a particular income at age 40 if they had attended college, whereas they would have a different income at age 40 if they had not attended college. To measure the causal effect of going to college for this person, we need to compare the outcome for the same individual in both alternative futures. Since it is impossible to see both potential outcomes at once, one of the potential outcomes is always missing. This dilemma is the "fundamental problem of causal inference".

Because of the fundamental problem of causal inference, unit-level causal effects cannot be directly observed. However, randomized experiments allow for the estimation of population-level causal effects. [5] A randomized experiment assigns people randomly to treatments: college or no college. Because of this random assignment, the groups are (on average) equivalent, and the difference in income at age 40 can be attributed to the college assignment since that was the only difference between the groups. An estimate of the average causal effect (also referred to as the average treatment effect ) can then be obtained by computing the difference in means between the treated (college-attending) and control (not-college-attending) samples.

In many circumstances, however, randomized experiments are not possible due to ethical or practical concerns. In such scenarios there is a non-random assignment mechanism. This is the case for the example of college attendance: people are not randomly assigned to attend college. Rather, people may choose to attend college based on their financial situation, parents' education, and so on. Many statistical methods have been developed for causal inference, such as propensity score matching. These methods attempt to correct for the assignment mechanism by finding control units similar to treatment units.

An extended example

Rubin defines a causal effect:

Intuitively, the causal effect of one treatment, E, over another, C, for a particular unit and an interval of time from to is the difference between what would have happened at time if the unit had been exposed to E initiated at and what would have happened at if the unit had been exposed to C initiated at : 'If an hour ago I had taken two aspirins instead of just a glass of water, my headache would now be gone,' or 'because an hour ago I took two aspirins instead of just a glass of water, my headache is now gone.' Our definition of the causal effect of the E versus C treatment will reflect this intuitive meaning." [5]

According to the RCM, the causal effect of your taking or not taking aspirin one hour ago is the difference between how your head would have felt in case 1 (taking the aspirin) and case 2 (not taking the aspirin). If your headache would remain without aspirin but disappear if you took aspirin, then the causal effect of taking aspirin is headache relief. In most circumstances, we are interested in comparing two futures, one generally termed "treatment" and the other "control". These labels are somewhat arbitrary.

Potential outcomes

Suppose that Joe is participating in an FDA test for a new hypertension drug. If we were omniscient, we would know the outcomes for Joe under both treatment (the new drug) and control (either no treatment or the current standard treatment). The causal effect, or treatment effect, is the difference between these two potential outcomes.


is Joe's blood pressure if he takes the new pill. In general, this notation expresses the potential outcome which results from a treatment, t, on a unit, u. Similarly, is the effect of a different treatment, c or control, on a unit, u. In this case, is Joe's blood pressure if he doesn't take the pill. is the causal effect of taking the new drug.

From this table we only know the causal effect on Joe. Everyone else in the study might have an increase in blood pressure if they take the pill. However, regardless of what the causal effect is for the other subjects, the causal effect for Joe is lower blood pressure, relative to what his blood pressure would have been if he had not taken the pill.

Consider a larger sample of patients:


The causal effect is different for every subject, but the drug works for Joe, Mary and Bob because the causal effect is negative. Their blood pressure is lower with the drug than it would have been if each did not take the drug. For Sally, on the other hand, the drug causes an increase in blood pressure.

In order for a potential outcome to make sense, it must be possible, at least a priori. For example, if there is no way for Joe, under any circumstance, to obtain the new drug, then is impossible for him. It can never happen. And if can never be observed, even in theory, then the causal effect of treatment on Joe's blood pressure is not defined.

No causation without manipulation

The causal effect of new drug is well defined because it is the simple difference of two potential outcomes, both of which might happen. In this case, we (or something else) can manipulate the world, at least conceptually, so that it is possible that one thing or a different thing might happen.

This definition of causal effects becomes much more problematic if there is no way for one of the potential outcomes to happen, ever. For example, what is the causal effect of Joe's height on his weight? Naively, this seems similar to our other examples. We just need to compare two potential outcomes: what would Joe's weight be under the treatment (where treatment is defined as being 3 inches taller) and what would Joe's weight be under the control (where control is defined as his current height).

A moment's reflection highlights the problem: we can't increase Joe's height. There is no way to observe, even conceptually, what Joe's weight would be if he were taller because there is no way to make him taller. We can't manipulate Joe's height, so it makes no sense to investigate the causal effect of height on weight. Hence the slogan: No causation without manipulation.

Stable unit treatment value assumption (SUTVA)

We require that "the [potential outcome] observation on one unit should be unaffected by the particular assignment of treatments to the other units" (Cox 1958, §2.4). This is called the stable unit treatment value assumption (SUTVA), which goes beyond the concept of independence.

In the context of our example, Joe's blood pressure should not depend on whether or not Mary receives the drug. But what if it does? Suppose that Joe and Mary live in the same house and Mary always cooks. The drug causes Mary to crave salty foods, so if she takes the drug she will cook with more salt than she would have otherwise. A high salt diet increases Joe's blood pressure. Therefore, his outcome will depend on both which treatment he received and which treatment Mary receives.

SUTVA violation makes causal inference more difficult. We can account for dependent observations by considering more treatments. We create 4 treatments by taking into account whether or not Mary receives treatment.

subjectJoe = c, Mary = tJoe = t, Mary = tJoe = c, Mary = cJoe = t, Mary = c

Recall that a causal effect is defined as the difference between two potential outcomes. In this case, there are multiple causal effects because there are more than two potential outcomes. One is the causal effect of the drug on Joe when Mary receives treatment and is calculated, . Another is the causal effect on Joe when Mary does not receive treatment and is calculated . The third is the causal effect of Mary's treatment on Joe when Joe is not treated. This is calculated as . The treatment Mary receives has a greater causal effect on Joe than the treatment which Joe received has on Joe, and it is in the opposite direction.

By considering more potential outcomes in this way, we can cause SUTVA to hold. However, if any units other than Joe are dependent on Mary, then we must consider further potential outcomes. The greater the number of dependent units, the more potential outcomes we must consider and the more complex the calculations become (consider an experiment with 20 different people, each of whose treatment status can effect outcomes for every one else). In order to (easily) estimate the causal effect of a single treatment relative to a control, SUTVA should hold.

Average causal effect



One may calculate the average causal effect by taking the mean of all the causal effects.

How we measure the response affects what inferences we draw. Suppose that we measure changes in blood pressure as a percentage change rather than in absolute values. Then, depending in the exact numbers, the average causal effect might be an increase in blood pressure. For example, assume that George's blood pressure would be 154 under control and 140 with treatment. The absolute size of the causal effect is −14, but the percentage difference (in terms of the treatment level of 140) is −10%. If Sarah's blood pressure is 200 under treatment and 184 under control, then the causal effect in 16 in absolute terms but 8% in terms of the treatment value. A smaller absolute change in blood pressure (−14 versus 16) yields a larger percentage change (−10% versus 8%) for George. Even though the average causal effect for George and Sarah is +1 in absolute terms, it is −1 in percentage terms.

The fundamental problem of causal inference

The results we have seen up to this point would never be measured in practice. It is impossible, by definition, to observe the effect of more than one treatment on a subject over a specific time period. Joe cannot both take the pill and not take the pill at the same time. Therefore, the data would look something like this:

Joe130 ? ?

Question marks are responses that could not be observed. The Fundamental Problem of Causal Inference [2] is that directly observing causal effects is impossible. However, this does not make causal inference impossible. Certain techniques and assumptions allow the fundamental problem to be overcome.

Assume that we have the following data:

Joe130 ? ?
Mary ?125 ?
Sally100 ? ?
Bob ?130 ?
James ?120 ?

We can infer what Joe's potential outcome under control would have been if we make an assumption of constant effect:


If we wanted to infer the unobserved values we could assume a constant effect. The following tables illustrates data consistent with the assumption of a constant effect.


All of the subjects have the same causal effect even though they have different outcomes under the treatment.

The assignment mechanism

The assignment mechanism, the method by which units are assigned treatment, affects the calculation of the average causal effect. One such assignment mechanism is randomization. For each subject we could flip a coin to determine if she receives treatment. If we wanted five subjects to receive treatment, we could assign treatment to the first five names we pick out of a hat. When we randomly assign treatments we may get different answers.

Assume that this data is the truth:


The true average causal effect is −8. But the causal effect for these individuals is never equal to this average. The causal effect varies, as it generally (always?) does in real life. After assigning treatments randomly, we might estimate the causal effect as:

Joe130 ? ?
Mary120 ? ?
Sally ?125 ?
Bob ?130 ?
James115 ? ?

A different random assignment of treatments yields a different estimate of the average causal effect.

Joe130 ? ?
Mary120 ? ?
Sally100 ? ?
Bob ?130 ?
James ?120 ?

The average causal effect varies because our sample is small and the responses have a large variance. If the sample were larger and the variance were less, the average causal effect would be closer to the true average causal effect regardless of the specific units randomly assigned to treatment.

Alternatively, suppose the mechanism assigns the treatment to all men and only to them.

Joe130 ? ?
Bob110 ? ?
James105 ? ?
Mary ?130 ?
Sally ?125 ?
Susie ?135 ?

Under this assignment mechanism, it is impossible for women to receive treatment and therefore impossible to determine the average causal effect on female subjects. In order to make any inferences of causal effect on a subject, the probability that the subject receive treatment must be greater than 0 and less than 1.

The perfect doctor

Consider the use of the perfect doctor as an assignment mechanism. The perfect doctor knows how each subject will respond to the drug or the control and assigns each subject to the treatment that will most benefit her. The perfect doctor knows this information about a sample of patients:


Based on this knowledge she would make the following treatment assignments:

Joe ?115 ?
Bob120 ? ?
James100 ? ?
Mary115 ? ?
Sally120 ? ?
Susie ?105 ?

The perfect doctor distorts both averages by filtering out poor responses to both the treatment and control. The difference between means, which is the supposed average causal effect, is distorted in a direction that depends on the details. For instance, a subject like Susie who is harmed by taking the drug would be assigned to the control group by the perfect doctor and thus the negative effect of the drug would be masked.


The causal effect of a treatment on a single unit at a point in time is the difference between the outcome variable with the treatment and without the treatment. The Fundamental Problem of Causal Inference is that it is impossible to observe the causal effect on a single unit. You either take the aspirin now or you don't. As a consequence, assumptions must be made in order to estimate the missing counterfactuals.

The Rubin causal model has also been connected to instrumental variables (Angrist, Imbens, and Rubin, 1996) [6] and other techniques for causal inference. For more on the connections between the Rubin causal model, structural equation modeling, and other statistical methods for causal inference, see Morgan and Winship (2007). [7]

See also

Related Research Articles

Interaction (statistics)

In statistics, an interaction may arise when considering the relationship among three or more variables, and describes a situation in which the effect of one causal variable on an outcome depends on the state of a second causal variable. Although commonly thought of in terms of causal relationships, the concept of an interaction can also describe non-causal associations. Interactions are often considered in the context of regression analyses or factorial experiments.

Field experiment

Field experiments are experiments carried out outside of laboratory settings.

In statistics, econometrics, epidemiology and related disciplines, the method of instrumental variables (IV) is used to estimate causal relationships when controlled experiments are not feasible or when a treatment is not successfully delivered to every unit in a randomized experiment. Intuitively, IVs are used when an explanatory variable of interest is correlated with the error term, in which case ordinary least squares and ANOVA give biased results. A valid instrument induces changes in the explanatory variable but has no independent effect on the dependent variable, allowing a researcher to uncover the causal effect of the explanatory variable on the dependent variable.

External validity is the validity of applying the conclusions of a scientific study outside the context of that study. In other words, it is the extent to which the results of a study can be generalized to and across other situations, people, stimuli, and times. In contrast, internal validity is the validity of conclusions drawn within the context of a particular study. Because general conclusions are almost always a goal in research, external validity is an important property of any study. Mathematical analysis of external validity concerns a determination of whether generalization across heterogeneous populations is feasible, and devising statistical and computational methods that produce valid generalizations.

The following is a glossary of terms used in the mathematical sciences statistics and probability.

Confounding Variable that influences both the dependent variable and independent variable causing a spurious association

In statistics, a confounder is a variable that influences both the dependent variable and independent variable, causing a spurious association. Confounding is a causal concept, and as such, cannot be described in terms of correlations or associations.

Difference in differences is a statistical technique used in econometrics and quantitative research in the social sciences that attempts to mimic an experimental research design using observational study data, by studying the differential effect of a treatment on a 'treatment group' versus a 'control group' in a natural experiment. It calculates the effect of a treatment on an outcome by comparing the average change over time in the outcome variable for the treatment group to the average change over time for the control group. Although it is intended to mitigate the effects of extraneous factors and selection bias, depending on how the treatment group is chosen, this method may still be subject to certain biases.

In statistics, ignorability is a feature of an experiment design whereby the method of data collection do not depend on the missing data. A missing data mechanism such as a treatment assignment or survey sampling strategy is "ignorable" if the missing data matrix, which indicates which variables are observed or missing, is independent of the missing data conditional on the observed data.

In fields such as epidemiology, social sciences, psychology and statistics, an observational study draws inferences from a sample to a population where the independent variable is not under the control of the researcher because of ethical concerns or logistical constraints. One common observational study is about the possible effect of a treatment on subjects, where the assignment of subjects into a treated group versus a control group is outside the control of the investigator. This is in contrast with experiments, such as randomized controlled trials, where each subject is randomly assigned to a treated group or a control group.

Causal model

In the philosophy of science, a causal model is a conceptual model that describes the causal mechanisms of a system. Causal models can improve study designs by providing clear rules for deciding which independent variables need to be included/controlled for.


A quasi-experiment is an empirical interventional study used to estimate the causal impact of an intervention on target population without random assignment. Quasi-experimental research shares similarities with the traditional experimental design or randomized controlled trial, but it specifically lacks the element of random assignment to treatment or control. Instead, quasi-experimental designs typically allow the researcher to control the assignment to the treatment condition, but using some criterion other than random assignment.

The average treatment effect (ATE) is a measure used to compare treatments in randomized experiments, evaluation of policy interventions, and medical trials. The ATE measures the difference in mean (average) outcomes between units assigned to the treatment and units assigned to the control. In a randomized trial, the average treatment effect can be estimated from a sample using a comparison in mean outcomes for treated and untreated units. However, the ATE is generally understood as a causal parameter that a researcher desires to know, defined without reference to the study design or estimation procedure. Both observational studies and experimental study designs with random assignment may enable one to estimate an ATE in a variety of ways.

In statistics, econometrics, political science, epidemiology, and related disciplines, a regression discontinuity design (RDD) is a quasi-experimental pretest-posttest design that aims to determine the causal effects of interventions by assigning a cutoff or threshold above or below which an intervention is assigned. By comparing observations lying closely on either side of the threshold, it is possible to estimate the average treatment effect in environments in which randomisation is unfeasible. However, it remains impossible to make true causal inference with this method alone, as it does not automatically reject causal effects by any potential confounding variable. First applied by Donald Thistlethwaite and Donald Campbell to the evaluation of scholarship programs, the RDD has become increasingly popular in recent years. Recent study comparisons of randomised controlled trials (RCTs) and RDDs have empirically demonstrated the internal validity of the design.

In the statistical analysis of observational data, propensity score matching (PSM) is a statistical matching technique that attempts to estimate the effect of a treatment, policy, or other intervention by accounting for the covariates that predict receiving the treatment. PSM attempts to reduce the bias due to confounding variables that could be found in an estimate of the treatment effect obtained from simply comparing outcomes among units that received the treatment versus those that did not. Paul R. Rosenbaum and Donald Rubin introduced the technique in 1983.

Inverse probability weighting is a statistical technique for calculating statistics standardized to a pseudo-population different from that in which the data was collected. Study designs with a disparate sampling population and population of target inference are common in application. There may be prohibitive factors barring researchers from directly sampling from the target population such as cost, time, or ethical concerns. A solution to this problem is to use an alternate design strategy, e.g. stratified sampling. Weighting, when correctly applied, can potentially improve the efficiency and reduce the bias of unweighted estimators.

Causal inference is the process of determining the independent, actual effect of a particular phenomenon that is a component of a larger system. The main difference between causal inference and inference of association is that causal inference analyzes the response of an effect variable when a cause of the effect variable is changed. The science of why things occur is called etiology. Causal inference is said to provide the evidence of causality theorized by causal reasoning.

In statistics, Lord's paradox raises the issue of when it is appropriate to control for baseline status. In three papers, Frederic M. Lord gave examples when statisticians could reach different conclusions depending on whether they adjust for pre-existing differences. Holland & Rubin (1983) use these examples to illustrate how there may be multiple valid descriptive comparisons in the data, but causal conclusions require an underlying (untestable) causal model.

In statistics, in particular in the design of experiments, a multi-valued treatment is a treatment that can take on more than two values. It is related to the dose-response model in the medical literature.

In experiments, a spillover is an indirect effect on a subject not directly treated by the experiment. These effects are useful for policy analysis but complicate the statistical analysis of experiments.

The local average treatment effect (LATE), also known as the complier average causal effect (CACE), was first introduced into the econometrics literature by Guido W. Imbens and Joshua D. Angrist in 1994. It is the treatment effect for the subset of the sample that takes the treatment if and only if they were assigned to the treatment, otherwise known as the compliers. It is not to be confused with the average treatment effect (ATE), which is the average subject-level treatment effect; the LATE is only the ATE among the compliers. The LATE can be estimated by a ratio of the estimated intent-to-treat effect and the estimated proportion of compliers, or alternatively through an instrumental variable estimator.


  1. 1 2 Sekhon, Jasjeet (2007). "The Neyman–Rubin Model of Causal Inference and Estimation via Matching Methods" (PDF). The Oxford Handbook of Political Methodology.
  2. 1 2 Holland, Paul W. (1986). "Statistics and Causal Inference". J. Amer. Statist. Assoc. 81 (396): 945–960. doi:10.1080/01621459.1986.10478354. JSTOR   2289064.
  3. Neyman, Jerzy. Sur les applications de la theorie des probabilites aux experiences agricoles: Essai des principes. Master's Thesis (1923). Excerpts reprinted in English, Statistical Science, Vol. 5, pp. 463–472. (D. M. Dabrowska, and T. P. Speed, Translators.)
  4. Rubin, Donald (2005). "Causal Inference Using Potential Outcomes". J. Amer. Statist. Assoc. 100 (469): 322–331. doi:10.1198/016214504000001880.
  5. 1 2 Rubin, Donald (1974). "Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies". J. Educ. Psychol. 66 (5): 688–701 [p. 689]. doi:10.1037/h0037350.
  6. Angrist, J.; Imbens, G.; Rubin, D. (1996). "Identification of Causal effects Using Instrumental Variables" (PDF). J. Amer. Statist. Assoc. 91 (434): 444–455. doi:10.1080/01621459.1996.10476902.
  7. Morgan, S.; Winship, C. (2007). Counterfactuals and Causal Inference: Methods and Principles for Social Research. New York: Cambridge University Press. ISBN   978-0-521-67193-4.

Further reading