Regression discontinuity design

Last updated

In statistics, econometrics, political science, epidemiology, and related disciplines, a regression discontinuity design (RDD) is a quasi-experimental pretest-posttest design that supposedly elicits the causal effects of interventions by assigning a cutoff or threshold above or below which an intervention is assigned. By comparing observations lying closely on either side of the threshold, it is possible to estimate the average treatment effect in environments in which randomisation is unfeasible. However, it remains impossible to make true causal inference with this method alone, as it does not automatically reject causal effects by any potential confounding variable. First applied by Donald Thistlethwaite and Donald Campbell to the evaluation of scholarship programs, [1] the RDD has become increasingly popular in recent years. [2] Recent study comparisons of randomised controlled trials (RCTs) and RDDs have empirically demonstrated the internal validity of the design. [3]

Example

The intuition behind the RDD is well illustrated using the evaluation of merit-based scholarships. The main problem with estimating the causal effect of such an intervention is the homogeneity of performance to the assignment of treatment (e.g. scholarship award). Since high-performing students are more likely to be awarded the merit scholarship and continue performing well at the same time, comparing the outcomes of awardees and non-recipients would lead to an upward bias of the estimates. Even if the scholarship did not improve grades at all, awardees would have performed better than non-recipients, simply because scholarships were given to students who were performing well ex-ante.

Despite the absence of an experimental design, an RDD can exploit exogenous characteristics of the intervention to elicit causal effects. If all students' grades are above a given grade — for example 80% — are given the scholarship, it is possible to elicit the local treatment effect by comparing students around the 80% cut-off. The intuition here is that a student scoring 79% is likely to be very similar to a student scoring 81% — given the pre-defined threshold of 80%. However, one student will receive the scholarship while the other will not. Comparing the outcome of the awardee (treatment group) to the counterfactual outcome of the non-recipient (control group) will hence deliver the local treatment effect.

Methodology

The two most common approaches to estimation using an RDD are non-parametric and parametric (normally polynomial regression).

Non-parametric estimation

The most common non-parametric method used in the RDD context is a local linear regression. This is of the form:

${\displaystyle Y=\alpha +\tau D+\beta _{1}(X-c)+\beta _{2}D(X-c)+\varepsilon ,}$

where ${\displaystyle c}$ is the treatment cutoff and ${\displaystyle D}$ is a binary variable equal to one if ${\displaystyle X\geq c}$. Letting ${\displaystyle h}$ be the bandwidth of data used, we have ${\displaystyle c-h\leq X\leq c+h}$. Different slopes and intercepts fit data on either side of the cutoff. Typically either a rectangular kernel (no weighting) or a triangular kernel are used. Research favours the triangular kernel, [4] but the rectangular kernel has a more straightforward interpretation. [5]

The major benefit of using non-parametric methods in an RDD is that they provide estimates based on data closer to the cut-off, which is intuitively appealing. This reduces some bias that can result from using data farther away from the cutoff to estimate the discontinuity at the cutoff. [5] More formally, local linear regressions are preferred because they have better bias properties [4] and have better convergence. [6] However, the use of both types of estimation, if feasible, is a useful way to argue that the estimated results do not rely too heavily on the particular approach taken.

Parametric estimation

An example of a parametric estimation is:

${\displaystyle Y=\alpha +\beta _{1}x_{i}+\beta _{2}c_{i}+\beta _{3}c_{i}^{2}+\beta _{4}c_{i}^{3}+\varepsilon ,}$

where

${\displaystyle x_{i}={\begin{cases}1{\text{ if }}c_{i}\geq {\bar {c}}\\0{\text{ if }}c_{i}<{\bar {c}}\end{cases}}}$

and ${\displaystyle {\bar {c}}}$ is the treatment cutoff. Note that the polynomial part can be shortened or extended according to the needs.

Other examples

• Policies in which treatment is determined by an age eligibility criterion (e.g. pensions, minimum legal drinking age). [7] [8]
• Elections in which one politician wins by a marginal majority. [9] [10]
• Placement scores within education that sort students into treatment programs. [11]

Required assumptions

Regression discontinuity design requires that all potentially relevant variables besides the treatment variable and outcome variable be continuous at the point where the treatment and outcome discontinuities occur. One sufficient, though not necessary, [10] condition is if the treatment assignment is "as good as random" at the threshold for treatment. [9] If this holds, then it guarantees that those who just barely received treatment are comparable to those who just barely did not receive treatment, as treatment status is effectively random.

Treatment assignment at the threshold can be "as good as random" if there is randomness in the assignment variable and the agents considered (individuals, firms, etc.) cannot perfectly manipulate their treatment status. For example, suppose the treatment is passing an exam, where a grade of 50% is required. In this case, this example is a valid regression discontinuity design so long as grades are somewhat random, due either to the randomness of grading or randomness of student performance.

Students must not also be able to perfectly manipulate their grade so as to determine their treatment status perfectly. Two examples include students being able to convince teachers to "mercy pass" them, or students being allowed to retake the exam until they pass. In the former case, those students who barely fail but are able to secure a "mercy pass" may differ from those who just barely fail but cannot secure a "mercy pass". This leads to selection bias, as the treatment and control groups now differ. In the latter case, some students may decide to retake the exam, stopping once they pass. This also leads to selection bias since only some students will decide to retake the exam. [5]

Testing the validity of the assumptions

It is impossible to definitively test for validity if agents are able to determine their treatment status perfectly. However, some tests can provide evidence that either supports or discounts the validity of the regression discontinuity design.

Density test

McCrary (2008) suggested examining the density of observations of the assignment variable. [12] Suppose there is a discontinuity in the density of the assignment variable at the threshold for treatment. In this case, this may suggest that some agents were able to manipulate their treatment status perfectly.

For example, if several students are able to get a "mercy pass", then there will be more students who just barely passed the exam than who just barely failed. Similarly, if students are allowed to retake the exam until they pass, then there will be a similar result. In both cases, this will likely show up when the density of exam grades is examined. "Gaming the system" in this manner could bias the treatment effect estimate.

Continuity of observable variables

Since the validity of the regression discontinuity design relies on those who were just barely treated being the same as those who were just barely not treated, it makes sense to examine if these groups are similarly based on observable variables. For the earlier example, one could test if those who just barely passed have different characteristics (demographics, family income, etc.) than those who just barely failed. Although some variables may differ for the two groups based on random chance, most of these variables should be the same. [13]

Falsification tests

Predetermined variables

Similar to the continuity of observable variables, one would expect there to be continuity in predetermined variables at the treatment cutoff. Since these variables were determined before the treatment decision, treatment status should not affect them. Consider the earlier merit-based scholarship example. If the outcome of interest is future grades, then we would not expect the scholarship to affect previous grades. If a discontinuity in predetermined variables is present at the treatment cutoff, then this puts the validity of the regression discontinuity design into question.

Other discontinuities

If discontinuities are present at other points of the assignment variable, where these are not expected, then this may make the regression discontinuity design suspect. Consider the example of Carpenter and Dobkin (2011) who studied the effect of legal access to alcohol in the United States. [8] As the access to alcohol increases at age 21, this leads to changes in various outcomes, such as mortality rates and morbidity rates. If mortality and morbidity rates also increase discontinuously at other ages, then it throws the interpretation of the discontinuity at age 21 into question.

Inclusion and exclusion of covariates

If parameter estimates are sensitive to removing or adding covariates to the model, then this may cast doubt on the validity of the regression discontinuity design. A significant change may suggest that those who just barely got treatment to differ in these covariates from those who just barely did not get treatment. Including covariates would remove some of this bias. If a large amount of bias is present, and the covariates explain a significant amount of this, then their inclusion or exclusion would significantly change the parameter estimate. [5]

Recent work has shown how to add covariates, under what conditions doing so is valid, and the potential for increased precision. [14]

• When properly implemented and analysed, the RDD yields an unbiased estimate of the local treatment effect. [15] The RDD can be almost as good as a randomised experiment in measuring a treatment effect.
• RDD, as a quasi-experiment, does not require ex-ante randomisation and circumvents ethical issues of random assignment.
• Well-executed RDD studies can generate treatment effect estimates similar to estimates from randomised studies. [16]

• The estimated effects are only unbiased if the functional form of the relationship between the treatment and outcome is correctly modelled. The most popular caveats are non-linear relationships that are mistaken as a discontinuity.
• Contamination by other treatments. Suppose another treatment occurs at the same cutoff value of the same assignment variable. In that case, the measured discontinuity in the outcome variable may be partially attributed to this other treatment. For example, suppose a researcher wishes to study the impact of legal access to alcohol on mental health using a regression discontinuity design at the minimum legal drinking age. The measured impact could be confused with legal access to gambling, which may occur at the same age.

Extensions

Fuzzy RDD

The identification of causal effects hinges on the crucial assumption that there is indeed a sharp cut-off, around which there is a discontinuity in the probability of assignment from 0 to 1. In reality, however, cutoffs are often not strictly implemented (e.g. exercised discretion for students who just fell short of passing the threshold) and the estimates will hence be biased.

In contrast to the sharp regression discontinuity design, a fuzzy regression discontinuity design (FRDD) does not require a sharp discontinuity in the probability of assignment. Still, it is applicable as long as the probability of assignment is different. The intuition behind it is related to the instrumental variable strategy and intention to treat.

Regression kink design

When the assignment variable is continuous (e.g. student aid) and depends predictably on another observed variable (e.g. family income), one can identify treatment effects using sharp changes in the slope of the treatment function. This technique was coined regression kink design by Nielsen, Sørensen, and Taber (2010), though they cite similar earlier analyses. [17] They write, "This approach resembles the regression discontinuity idea. Instead of a discontinuity of in the level of the stipend-income function, we have a discontinuity in the slope of the function." Rigorous theoretical foundations were provided by Card et al. (2012) [18] and an empirical application by Bockerman et al. (2018). [19]

Note that regression kinks (or kinked regression) can also mean a type of segmented regression, which is a different type of analysis.

Final Considerations

The RD design takes the shape of a quasi-experimental research design with a clear structure that is devoid of randomized experimental features. Several aspects deny the RD designs an allowance for a status quo. For instance, the designs often involve serious issues that do not offer room for random experiments. Besides, the design of the experiments depends on the accuracy of the modelling process and the relationship between inputs and outputs.

Related Research Articles

Econometrics is the application of statistical methods to economic data in order to give empirical content to economic relationships. More precisely, it is "the quantitative analysis of actual economic phenomena based on the concurrent development of theory and observation, related by appropriate methods of inference". An introductory economics textbook describes econometrics as allowing economists "to sift through mountains of data to extract simple relationships". The first known use of the term "econometrics" was by Polish economist Paweł Ciompa in 1910. Jan Tinbergen is one of the two founding fathers of econometrics. The other, Ragnar Frisch, also coined the term in the sense in which it is used today.

Analysis of covariance (ANCOVA) is a general linear model which blends ANOVA and regression. ANCOVA evaluates whether the means of a dependent variable (DV) are equal across levels of a categorical independent variable (IV) often called a treatment, while statistically controlling for the effects of other continuous variables that are not of primary interest, known as covariates (CV) or nuisance variables. Mathematically, ANCOVA decomposes the variance in the DV into variance explained by the CV(s), variance explained by the categorical IV, and residual variance. Intuitively, ANCOVA can be thought of as 'adjusting' the DV by the group means of the CV(s).

Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or hypothesis that they depend, by some law or rule, on the values of other variables. Independent variables, in turn, are not seen as depending on any other variable in the scope of the experiment in question. In this sense, some common independent variables are time, space, density, mass, fluid flow rate, and previous values of some observed value of interest to predict future values.

Field experiments are experiments carried out outside of laboratory settings.

In statistics, econometrics, epidemiology and related disciplines, the method of instrumental variables (IV) is used to estimate causal relationships when controlled experiments are not feasible or when a treatment is not successfully delivered to every unit in a randomized experiment. Intuitively, IVs are used when an explanatory variable of interest is correlated with the error term, in which case ordinary least squares and ANOVA give biased results. A valid instrument induces changes in the explanatory variable but has no independent effect on the dependent variable, allowing a researcher to uncover the causal effect of the explanatory variable on the dependent variable.

The following is a glossary of terms used in the mathematical sciences statistics and probability.

In statistics, a confounder is a variable that influences both the dependent variable and independent variable, causing a spurious association. Confounding is a causal concept, and as such, cannot be described in terms of correlations or associations.

In statistics, ignorability is a feature of an experiment design whereby the method of data collection do not depend on the missing data. A missing data mechanism such as a treatment assignment or survey sampling strategy is "ignorable" if the missing data matrix, which indicates which variables are observed or missing, is independent of the missing data conditional on the observed data.

In fields such as epidemiology, social sciences, psychology and statistics, an observational study draws inferences from a sample to a population where the independent variable is not under the control of the researcher because of ethical concerns or logistical constraints. One common observational study is about the possible effect of a treatment on subjects, where the assignment of subjects into a treated group versus a control group is outside the control of the investigator. This is in contrast with experiments, such as randomized controlled trials, where each subject is randomly assigned to a treated group or a control group.

A quasi-experiment is an empirical interventional study used to estimate the causal impact of an intervention on target population without random assignment. Quasi-experimental research shares similarities with the traditional experimental design or randomized controlled trial, but it specifically lacks the element of random assignment to treatment or control. Instead, quasi-experimental designs typically allow the researcher to control the assignment to the treatment condition, but using some criterion other than random assignment.

The average treatment effect (ATE) is a measure used to compare treatments in randomized experiments, evaluation of policy interventions, and medical trials. The ATE measures the difference in mean (average) outcomes between units assigned to the treatment and units assigned to the control. In a randomized trial, the average treatment effect can be estimated from a sample using a comparison in mean outcomes for treated and untreated units. However, the ATE is generally understood as a causal parameter that a researcher desires to know, defined without reference to the study design or estimation procedure. Both observational studies and experimental study designs with random assignment may enable one to estimate an ATE in a variety of ways.

In the statistical analysis of observational data, propensity score matching (PSM) is a statistical matching technique that attempts to estimate the effect of a treatment, policy, or other intervention by accounting for the covariates that predict receiving the treatment. PSM attempts to reduce the bias due to confounding variables that could be found in an estimate of the treatment effect obtained from simply comparing outcomes among units that received the treatment versus those that did not. Paul R. Rosenbaum and Donald Rubin introduced the technique in 1983.

The Heckman correction is a statistical technique to correct bias from non-randomly selected samples or otherwise incidentally truncated dependent variables, a pervasive issue in quantitative social sciences when using observational data. Conceptually, this is achieved by explicitly modelling the individual sampling probability of each observation together with the conditional expectation of the dependent variable. The resulting likelihood function is mathematically similar to the tobit model for censored dependent variables, a connection first drawn by James Heckman in 1974. Heckman also developed a two-step control function approach to estimate this model, which avoids the computational burden of having to estimate both equations jointly, albeit at the cost of inefficiency. Heckman received the Nobel Memorial Prize in Economic Sciences in 2000 for his work in this field.

Matching is a statistical technique which is used to evaluate the effect of a treatment by comparing the treated and the non-treated units in an observational study or quasi-experiment. The goal of matching is, for every treated unit, to find one non-treated unit(s) with similar observable characteristics against whom the effect of the treatment can be assessed. By matching treated units to similar non-treated units, matching enables a comparison of outcomes among treated and non-treated units to estimate the effect of the treatment reducing bias due to confounding. Propensity score matching, an early matching technique, was developed as part of the Rubin causal model, but has been shown to increase model dependence, bias, inefficiency, and power and is no longer recommended compared to other matching methods.

Inverse probability weighting is a statistical technique for calculating statistics standardized to a pseudo-population different from that in which the data was collected. Study designs with a disparate sampling population and population of target inference are common in application. There may be prohibitive factors barring researchers from directly sampling from the target population such as cost, time, or ethical concerns. A solution to this problem is to use an alternate design strategy, e.g. stratified sampling. Weighting, when correctly applied, can potentially improve the efficiency and reduce the bias of unweighted estimators.

Causal inference is the process of determining the independent, actual effect of a particular phenomenon that is a component of a larger system. The main difference between causal inference and inference of association is that causal inference analyzes the response of an effect variable when a cause of the effect variable is changed. The science of why things occur is called etiology. Causal inference is said to provide the evidence of causality theorized by causal reasoning.

In statistics, linear regression is a linear approach to modelling the relationship between a scalar response and one or more explanatory variables. The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.

Experimental benchmarking allows researchers to learn about the accuracy of non-experimental research designs. Specifically, one can compare observational results to experimental findings to calibrate bias. Under ordinary conditions, carrying out an experiment gives the researchers an unbiased estimate of their parameter of interest. This estimate can then be compared to the findings of observational research. Note that benchmarking is an attempt to calibrate non-statistical uncertainty. When combined with meta-analysis this method can be used to understand the scope of bias associated with a specific area of research.

In experiments, a spillover is an indirect effect on a subject not directly treated by the experiment. These effects are useful for policy analysis but complicate the statistical analysis of experiments.

The local average treatment effect (LATE), also known as the complier average causal effect (CACE), was first introduced into the econometrics literature by Guido W. Imbens and Joshua D. Angrist in 1994. It is the treatment effect for the subset of the sample that takes the treatment if and only if they were assigned to the treatment, otherwise known as the compliers. It is not to be confused with the average treatment effect (ATE), which is the average subject-level treatment effect; the LATE is only the ATE among the compliers. The LATE can be estimated by a ratio of the estimated intent-to-treat effect and the estimated proportion of compliers, or alternatively through an instrumental variable estimator.

References

1. Thistlethwaite, D.; Campbell, D. (1960). "Regression-Discontinuity Analysis: An alternative to the ex post facto experiment". Journal of Educational Psychology . 51 (6): 309–317. doi:10.1037/h0044319.
2. Imbens, G.; Lemieux, T. (2008). "Regression Discontinuity Designs: A Guide to Practice" (PDF). Journal of Econometrics . 142 (2): 615–635. doi:10.1016/j.jeconom.2007.05.001.
3. Chaplin, Duncan D.; Cook, Thomas D.; Zurovac, Jelena; Coopersmith, Jared S.; Finucane, Mariel M.; Vollmer, Lauren N.; Morris, Rebecca E. (2018). "The Internal and External Validity of the Regression Discontinuity Design: A Meta-Analysis of 15 Within-Study Comparisons". Journal of Policy Analysis and Management. 37 (2): 403–429. doi:. ISSN   1520-6688.
4. Fan; Gijbels (1996). Local Polynomial Modelling and Its Applications. London: Chapman and Hall. ISBN   978-0-412-98321-4.
5. Lee; Lemieux (2010). "Regression Discontinuity Designs in Economics". Journal of Economic Literature . 48 (2): 281–355. doi:10.1257/jel.48.2.281. S2CID   14166110.
6. Porter (2003). "Estimation in the Regression Discontinuity Model" (PDF). Unpublished Manuscript.
7. Duflo (2003). "Grandmothers and Granddaughters: Old-age Pensions and Intrahousehold Allocation in South Africa". World Bank Economic Review. 17 (1): 1–25. doi:10.1093/wber/lhg013. hdl:.
8. Carpenter; Dobkin (2011). "The Minimum Legal Drinking Age and Public Health". Journal of Economic Perspectives . 25 (2): 133–156. doi:10.1257/jep.25.2.133. JSTOR   23049457. PMC  . PMID   21595328.
9. Lee (2008). "Randomized Experiments from Non-random Selection in U.S. House Elections". Journal of Econometrics . 142 (2): 675–697. CiteSeerX  . doi:10.1016/j.jeconom.2007.05.004.
10. de la Cuesta, B; Imai, K (2016). "Misunderstandings About the Regression Discontinuity Design in the Study of Close Elections". Annual Review of Political Science . 19 (1): 375–396. doi:.
11. Moss, B. G.; Yeaton, W. H.; Lloyd, J.E. (2014). "Evaluating the Effectiveness of Developmental Mathematics by Embedding a Randomized Experiment Within a Regression Discontinuity Design". Educational Evaluation and Policy Analysis . 36 (2): 170–185. doi:10.3102/0162373713504988. S2CID   123440758.
12. McCrary (2008). "Manipulation of the Running Variable in the Regression Discontinuity Design: A Density Test". Journal of Econometrics . 142 (2): 698–714. CiteSeerX  . doi:10.1016/j.jeconom.2007.05.005.
13. Lee; Moretti; Butler (2004). "Do Voters Affect or Elect Policies? Evidence from the U.S. House". Quarterly Journal of Economics . 119 (3): 807–859. doi:10.1162/0033553041502153.
14. Calonico; Cattaneo; Farrell; Titiunik (2018). "Regression Discontinuity Designs Using Covariates". arXiv: [econ.EM].
15. Rubin (1977). "Assignment to Treatment on the Basis of a Covariate". Journal of Educational and Behavioral Statistics. 2 (1): 1–26. doi:10.3102/10769986002001001. S2CID   123013161.
16. Moss, B. G.; Yeaton, W. H.; Lloyd, J. E. (2014). "Evaluating the Effectiveness of Developmental Mathematics by Embedding a Randomized Experiment Within a Regression Discontinuity Design". Educational Evaluation and Policy Analysis. 36 (2): 170–185. doi:10.3102/0162373713504988. S2CID   123440758.
17. Nielsen, H. S.; Sørensen, T.; Taber, C. R. (2010). "Estimating the Effect of Student Aid on College Enrollment: Evidence from a Government Grant Policy Reform". American Economic Journal: Economic Policy. 2 (2): 185–215. doi:10.1257/pol.2.2.185. hdl:. JSTOR   25760068.
18. Card, David; Lee, David S.; Pei, Zhuan; Weber, Andrea (2012). "Nonlinear Policy Rules and the Identification and Estimation of Causal Effects in a Generalized Regression Kink Design". NBER Working Paper No. W18564. doi:. SSRN  .
19. Bockerman, Petri; Kanninen, Ohto; Suoniemi, Ilpo (2018). "A Kink that Makes You Sick: The Effect of Sick Pay on Absence". Journal of Applied Econometrics. 33 (4): 568–579. doi:.