Multivalued treatment

Last updated

In statistics, in particular in the design of experiments, a multi-valued treatment is a treatment that can take on more than two values. It is related to the dose-response model in the medical literature.

Description

Generally speaking, treatment levels may be finite or infinite as well as ordinal or cardinal, which leads to a large collection of possible treatment effects to be studied in applications. [1] One example is the effect of different levels of program participation (e.g. full-time and part-time) in a job training program. [2]

Assume there exists a finite collection of multi-valued treatment status with J some fixed integer. As in the potential outcomes framework, denote the collection of potential outcomes under the treatment J, and denotes the observed outcome and is an indicator that equals 1 when the treatment equals j and 0 when it does not equal j, leading to a fundamental problem of causal inference. [3] A general framework that analyzes ordered choice models in terms of marginal treatment effects and average treatment effects has been extensively discussed by Heckman and Vytlacil. [4]

Recent work in the econometrics and statistics literature has focused on estimation and inference for multivalued treatments and ignorability conditions for identifying the treatment effects. In the context of program evaluation, the propensity score has been generalized to allow for multi-valued treatments, [5] while other work has also focused on the role of the conditional mean independence assumption. [6] Other recent work has focused more on the large sample properties of an estimator of the marginal mean treatment effect conditional on a treatment level in the context of a difference-in-differences model, [7] and on the efficient estimation of multi-valued treatment effects in a semiparametric framework. [8]

Related Research Articles

Econometrics is an application of statistical methods to economic data in order to give empirical content to economic relationships. More precisely, it is "the quantitative analysis of actual economic phenomena based on the concurrent development of theory and observation, related by appropriate methods of inference." An introductory economics textbook describes econometrics as allowing economists "to sift through mountains of data to extract simple relationships." Jan Tinbergen is one of the two founding fathers of econometrics. The other, Ragnar Frisch, also coined the term in the sense in which it is used today.

In statistics, econometrics, epidemiology and related disciplines, the method of instrumental variables (IV) is used to estimate causal relationships when controlled experiments are not feasible or when a treatment is not successfully delivered to every unit in a randomized experiment. Intuitively, IVs are used when an explanatory variable of interest is correlated with the error term, in which case ordinary least squares and ANOVA give biased results. A valid instrument induces changes in the explanatory variable but has no independent effect on the dependent variable, allowing a researcher to uncover the causal effect of the explanatory variable on the dependent variable.

Difference in differences is a statistical technique used in econometrics and quantitative research in the social sciences that attempts to mimic an experimental research design using observational study data, by studying the differential effect of a treatment on a 'treatment group' versus a 'control group' in a natural experiment. It calculates the effect of a treatment on an outcome by comparing the average change over time in the outcome variable for the treatment group to the average change over time for the control group. Although it is intended to mitigate the effects of extraneous factors and selection bias, depending on how the treatment group is chosen, this method may still be subject to certain biases.

In statistics, a fixed effects model is a statistical model in which the model parameters are fixed or non-random quantities. This is in contrast to random effects models and mixed models in which all or some of the model parameters are random variables. In many applications including econometrics and biostatistics a fixed effects model refers to a regression model in which the group means are fixed (non-random) as opposed to a random effects model in which the group means are a random sample from a population. Generally, data can be grouped according to several observed factors. The group means could be modeled as fixed or random effects for each grouping. In a fixed effects model each group mean is a group-specific fixed quantity.

In statistics, generalized least squares (GLS) is a method used to estimate the unknown parameters in a linear regression model when there is a certain degree of correlation between the residuals in the regression model. GLS is employed to improve statistical efficiency and reduce the risk of drawing erroneous inferences compared to conventional least squares and weighted least squares methods. It was first described by Alexander Aitken in 1935.

In statistics, semiparametric regression includes regression models that combine parametric and nonparametric models. They are often used in situations where the fully nonparametric model may not perform well or when the researcher wants to use a parametric model but the functional form with respect to a subset of the regressors or the density of the errors is not known. Semiparametric regression models are a particular type of semiparametric modelling and, since semiparametric models contain a parametric component, they rely on parametric assumptions and may be misspecified and inconsistent, just like a fully parametric model.

The average treatment effect (ATE) is a measure used to compare treatments in randomized experiments, evaluation of policy interventions, and medical trials. The ATE measures the difference in mean (average) outcomes between units assigned to the treatment and units assigned to the control. In a randomized trial, the average treatment effect can be estimated from a sample using a comparison in mean outcomes for treated and untreated units. However, the ATE is generally understood as a causal parameter that a researcher desires to know, defined without reference to the study design or estimation procedure. Both observational studies and experimental study designs with random assignment may enable one to estimate an ATE in a variety of ways.

In statistics, econometrics, political science, epidemiology, and related disciplines, a regression discontinuity design (RDD) is a quasi-experimental pretest-posttest design that aims to determine the causal effects of interventions by assigning a cutoff or threshold above or below which an intervention is assigned. By comparing observations lying closely on either side of the threshold, it is possible to estimate the average treatment effect in environments in which randomisation is unfeasible. However, it remains impossible to make true causal inference with this method alone, as it does not automatically reject causal effects by any potential confounding variable. First applied by Donald Thistlethwaite and Donald Campbell (1960) to the evaluation of scholarship programs, the RDD has become increasingly popular in recent years. Recent study comparisons of randomised controlled trials (RCTs) and RDDs have empirically demonstrated the internal validity of the design.

In the statistical analysis of observational data, propensity score matching (PSM) is a statistical matching technique that attempts to estimate the effect of a treatment, policy, or other intervention by accounting for the covariates that predict receiving the treatment. PSM attempts to reduce the bias due to confounding variables that could be found in an estimate of the treatment effect obtained from simply comparing outcomes among units that received the treatment versus those that did not. Paul R. Rosenbaum and Donald Rubin introduced the technique in 1983.

The Heckman correction is a statistical technique to correct bias from non-randomly selected samples or otherwise incidentally truncated dependent variables, a pervasive issue in quantitative social sciences when using observational data. Conceptually, this is achieved by explicitly modelling the individual sampling probability of each observation together with the conditional expectation of the dependent variable. The resulting likelihood function is mathematically similar to the tobit model for censored dependent variables, a connection first drawn by James Heckman in 1974. Heckman also developed a two-step control function approach to estimate this model, which avoids the computational burden of having to estimate both equations jointly, albeit at the cost of inefficiency. Heckman received the Nobel Memorial Prize in Economic Sciences in 2000 for his work in this field.

Andrew Donald Roy was a British economist who is known for the Roy model of self-selection and income distribution and Roy's safety-first criterion.

Control functions are statistical methods to correct for endogeneity problems by modelling the endogeneity in the error term. The approach thereby differs in important ways from other models that try to account for the same econometric problem. Instrumental variables, for example, attempt to model the endogenous variable X as an often invertible model with respect to a relevant and exogenous instrument Z. Panel analysis uses special data properties to difference out unobserved heterogeneity that is assumed to be fixed over time.

Multiple treatments, like multivalued treatments, generalize the binary treatment effects framework. But rather than focusing on a treatment effect that can take on different values, the focus now is on different types of treatment. One example could be a job training program, where different types of job training are offered to the participants. The case of multiple treatments is relatively difficult to handle, as it can require additional functional form restrictions, especially when addressing the counterfactual or potential outcomes framework. Nevertheless, the general instrumental variable framework used to analyze binary treatment effects has been extended to allow for multiple treatments.

The Roy model is one of the earliest works in economics on self-selection due to A. D. Roy. The basic model considers two types of workers that choose occupation in one of two sectors.

Two-step M-estimators deals with M-estimation problems that require preliminary estimation to obtain the parameter of interest. Two-step M-estimation is different from usual M-estimation problem because asymptotic distribution of the second-step estimator generally depends on the first-step estimator. Accounting for this change in asymptotic distribution is important for valid inference.

In statistics and econometrics, optimal instruments are a technique for improving the efficiency of estimators in conditional moment models, a class of semiparametric models that generate conditional expectation functions. To estimate parameters of a conditional moment model, the statistician can derive an expectation function and use the generalized method of moments (GMM). However, there are infinitely many moment conditions that can be generated from a single model; optimal instruments provide the most efficient moment conditions.

In econometrics and related empirical fields, the local average treatment effect (LATE), also known as the complier average causal effect (CACE), is the effect of a treatment for subjects who comply with the experimental treatment assigned to their sample group. It is not to be confused with the average treatment effect (ATE), which includes compliers and non-compliers together. Compliance refers to the human-subject response to a proposed experimental treatment condition. Similar to the ATE, the LATE is calculated but does not include non-compliant parties. If the goal is to evaluate the effect of a treatment in ideal, compliant subjects, the LATE value will give a more precise estimate. However, it may lack external validity by ignoring the effect of non-compliance that is likely to occur in the real-world deployment of a treatment method. The LATE can be estimated by a ratio of the estimated intent-to-treat effect and the estimated proportion of compliers, or alternatively through an instrumental variable estimator.

Xiaohong Chen is a Chinese economist who currently serves as the Malcolm K. Brachman Professor of Economics at Yale University. She is a fellow of the Econometric Society and a laureate of the China Economics Prize. As one of the leading experts in econometrics, her research focuses on econometric theory, Semi/nonparametric estimation and inference methods, Sieve methods, Nonlinear time series, and Semi/nonparametric models. She was elected to the American Academy of Arts and Sciences in 2019.

Petra Elisabeth (Crockett) Todd is an American economist whose research interests include labor economics, development economics, microeconomics, and econometrics. She is the Edward J. and Louise W. Kahn Term Professor of Economics at the University of Pennsylvania, and is also affiliated with the University of Pennsylvania Population Studies Center, the Human Capital and Equal Opportunity Global Working Group (HCEO), the IZA Institute of Labor Economics and the National Bureau of Economic Research.

Matias Damian Cattaneo is a Professor of Operations Research and Financial Engineering at Princeton University. His research focuses on statistics, econometrics, data science and decision science, with applications to program evaluation and causal inference. He is best known for his work on Regression discontinuity designs.

References

  1. Cattaneo, M. D. (2010): Multi-valued Treatment Effects. Encyclopedia of Research Design, ed. By N. J. Salkind, Sage Publications.
  2. Wooldridge, J. (2002): Econometric Analysis of Cross Section and Panel Data, MIT Press, Cambridge, Mass.
  3. Cattaneo, M. D. (2010): Efficient Semiparametric Estimation of Multi-Valued Treatment Effects under Ignorability. Journal of Econometrics 155(2), pp. 138–154.
  4. Heckman, J. J., and E. J. Vytlacil (2007): Econometric Evaluation of Social Programs, Part II: Using the Marginal Treatment Effect to Organize Alternative Econometric Estimators to Evaluate Social Programs, and to Forecast the Effects in New Environments. Handbook of Econometrics, Vol 6, ed. by J. J. Heckman and E. E. Leamer. North Holland.
  5. Imbens, G. (2000): The Role of the Propensity Score in Estimating Dose-Response Functions. Biometrika 87(3), pp. 706–710.
  6. Lechner, M. (2001): Identification and Estimation of Causal Effects of Multiple Treatments under the Conditional Independence Assumption. Econometric Evaluation of Labour Market Policies, ed. by M. Lechner and F. Pfeiffer, pp. 43–58. Physica/Springer, Heidelberg.
  7. Abadie, A. (2005): Semiparametric Difference-in-Differences Estimators. Review of Economic Studies 72(1), pp. 1–19.
  8. Cattaneo, M. D. (2010): Efficient Semiparametric Estimation of Multi-Valued Treatment Effects under Ignorability. Journal of Econometrics 155(2), pp. 138–154