Theory-driven evaluation (also theory-based evaluation) is an umbrella term for any approach to program evaluation that develops a theory of change and uses it to design, implement, analyze, and interpret findings from an evaluation. [1] [2] [3] More specifically, an evaluation is theory-driven if it: [4]
By investigating the mechanisms through which outcomes are achieved, theory-driven approaches facilitate learning to improve programs and how they are implemented, and help knowledge to accumulate across apparently different programs. [5] [6] This is in contrast to methods-driven "black box" evaluations, which focus on following the steps of a method (for instance, randomized experiment or focus group) and only assess whether a program leads to its intended outcomes. [7] Theory-driven approaches can also improve the validity of evaluations, for instance leading to more precise estimates of impact in randomized controlled trials. [8]
Theory-driven evaluation emerged in the 1970s and 80s in response to the limitations of methods-driven "black box" evaluations. The term theory-driven evaluation was coined by Huey T. Chen and Peter H. Rossi. [9] Chen (1990) [10] wrote the first comprehensive introduction to conducting theory-driven evaluations, for example explaining how to develop a program theory of change and the different types of design. Its origins have been traced [11] to a book by Carol Weiss (1972) [12] and a rarely-cited article by Carol Taylor Fitz-Gibbon and Lynn Lyons Morris (1975). [13] However, "the first published use of what we would recognize as program theory" was in an evaluation of training programs, by Don Kirkpatrick in 1959. [14]
Funnell and Rogers (2011, pp. 23–24) comment on the confused nomenclature of the field, enumerating 22 approaches such as theory-based evaluation and program theory-driven evaluation science that are equivalent to or overlap significantly with theory-driven evaluation. The first definition of theory-based evaluation, by Fitz-Gibbon and Morris (1975), is near-identical to theory-driven evaluation: [15]
A theory-based evaluation of a program is one in which the selection of program features to evaluate is determined by an explicit conceptualization of the program in terms of a theory […] which attempts to explain how the program produces the desired effects. The theory might be psychological […] or social psychological […] or philosophical […]. The essential characteristic is that the theory points out a causal relationship between a process A and an outcome B.
Consequently, the terms theory-driven and theory-based evaluation are often used interchangeably in the literature. [16] [17] [18] However, theory-based evaluation is sometimes interpreted more narrowly to mean qualitative or small-n case study-based evaluations conducted without a comparison group, for example using process tracing or qualitative comparative analysis. [19] [20]
The theory of theory-driven evaluation seeks to be as close as possible to the proximal causes of a social problem and site of intervention rather than, for instance, a "grand" theory, that tries to provide an overarching understanding of society, or a metaphysical theory about the nature of social reality: [21]
It advances evaluation practice very little to adopt one or another of current global theories in attacking, say, the problem of juvenile delinquency, but it does help a great deal to understand the authority structure in schools and the mechanisms of peer group influence and parental discipline in designing and evaluating a program that is supposed to reduce disciplinary problems in schools. [...T]he theory-driven perspective is closer to what econometricians call "model specification" than are more complicated and more abstract and general theories.
A distinction is also drawn between normative theory, concerning what a program is supposed to do and how it should be implemented, and causal theory, which specifies how the program is thought to work. [22] There can then be two broad ways in which a program fails to lead to the desired outcomes: (1) a program may be implemented as intended according to the normative theory; however, the causal theory is incorrect; and (2) the causal theory is correct; however, the program was not implemented correctly. [23]
Graphical causal models (GCMs) may be used to formalize causal theories and design, e.g., theory-driven quasi-experiments. [24] One of the advantages of GCMs is that they can be used to automatically determine which variables need to be statistically adjusted or matched on, to estimate the causal effect of a program.
Chen's action model/change model schema [25] provides an example of how a program theory and its context are conceptualized. The elements of the schema are then completed for each particular program.
The change model specifies how an intervention of a program leads to outcomes via determinants, also known as intermediate or mediating variables.
The action model specifies how staff and delivery organizations deliver the intervention to beneficiaries:
The full-range of research methods has been argued to apply. For instance, Chen (2015) provides examples using randomized experiments, quasi-experimental designs, process and outcome monitoring, and qualitative methods. [26] Although proponents of theory-driven evaluation are critical of "black box" experiments, Chen and Rossi (1983, p. 292) [27] argue that theory-driven experiments are possible and desirable:
[A]dvocates of the black box experimental paradigm often neglect the fact that after randomization exogenous variables are still correlated with outcome variables. Knowing how such exogenous factors affect outcomes makes it possible to construct more precise estimates of experimental effects by controlling for such exogenous variables.
It has been argued that theory-driven evaluation focusses too much on statistical approaches, such as randomized experiments, quasi-experiments, and structural equation modelling; [28] however, a case has also been made for the importance of qualitative methods, particularly when developing program theories and understanding implementation. [29]
There is also methodological debate concerning whether realist evaluations, considered a particular kind of theory-driven approach, may include randomized controlled trials in any form. Some evaluators think they may and conduct what they call "realist trials". [30] [31] [32] Others argue that a realist trial is an "oxymoron", and recommend instead calling them "theory-oriented trials". [33] A 2023 review of purported realist trials concluded that whether they are really realist depends on "ontological and epistemological" commitments of evaluators and that differences "cannot be resolved" by reviewing studies conducted. [34]
Examples discussed in a 2011 systematic review of 45 theory-driven evaluations include: [35]
A 2014 review of theory-driven evaluation in school psychology [39] highlighted two illustrative examples:
Program evaluation is a systematic method for collecting, analyzing, and using information to answer questions about projects, policies and programs, particularly about their effectiveness and efficiency.
Field experiments are experiments carried out outside of laboratory settings.
The transtheoretical model of behavior change is an integrative theory of therapy that assesses an individual's readiness to act on a new healthier behavior, and provides strategies, or processes of change to guide the individual. The model is composed of constructs such as: stages of change, processes of change, levels of change, self-efficacy, and decisional balance.
The social norms approach, or social norms marketing, is an environmental strategy gaining ground in health campaigns. While conducting research in the mid-1980s, two researchers, H.W. Perkins and A.D. Berkowitz, reported that students at a small U.S. college held exaggerated beliefs about the normal frequency and consumption habits of other students with regard to alcohol. These inflated perceptions have been found in many educational institutions, with varying populations and locations. Despite the fact that college drinking is at elevated levels, the perceived amount almost always exceeds actual behavior. The social norms approach has shown signs of countering misperceptions, however research on changes in behavior resulting from changed perceptions varies between mixed to conclusively nonexistent.
Evidence-based policy is a concept in public policy that advocates for policy decisions to be grounded on, or influenced by, rigorously established objective evidence. This concept presents a stark contrast to policymaking predicated on ideology, 'common sense', anecdotes, or personal intuitions. The methodology employed in evidence-based policy often includes comprehensive research methods such as randomized controlled trials (RCT). Good data, analytical skills, and political support to the use of scientific information are typically seen as the crucial elements of an evidence-based approach.
In metaphysics, a causal model is a conceptual model that describes the causal mechanisms of a system. Several types of causal notation may be used in the development of a causal model. Causal models can improve study designs by providing clear rules for deciding which independent variables need to be included/controlled for.
Logic models are hypothesized descriptions of the chain of causes and effects leading to an outcome of interest. While they can be in a narrative form, logic model usually take form in a graphical depiction of the "if-then" (causal) relationships between the various elements leading to the outcome. However, the logic model is more than the graphical depiction: it is also the theories, scientific evidences, assumptions and beliefs that support it and the various processes behind it.
Impact evaluation assesses the changes that can be attributed to a particular intervention, such as a project, program or policy, both the intended ones, as well as ideally the unintended ones. In contrast to outcome monitoring, which examines whether targets have been achieved, impact evaluation is structured to answer the question: how would outcomes such as participants' well-being have changed if the intervention had not been undertaken? This involves counterfactual analysis, that is, "a comparison between what actually happened and what would have happened in the absence of the intervention." Impact evaluations seek to answer cause-and-effect questions. In other words, they look for the changes in outcome that are directly attributable to a program.
Normalization process theory (NPT) is a sociological theory, generally used in the fields of science and technology studies (STS), implementation research, and healthcare system research. The theory deals with the adoption of technological and organizational innovations into systems, recent studies have utilized this theory in evaluating new practices in social care and education settings. It was developed out of the normalization process model.
In statistics, econometrics, political science, epidemiology, and related disciplines, a regression discontinuity design (RDD) is a quasi-experimental pretest–posttest design that aims to determine the causal effects of interventions by assigning a cutoff or threshold above or below which an intervention is assigned. By comparing observations lying closely on either side of the threshold, it is possible to estimate the average treatment effect in environments in which randomisation is unfeasible. However, it remains impossible to make true causal inference with this method alone, as it does not automatically reject causal effects by any potential confounding variable. First applied by Donald Thistlethwaite and Donald Campbell (1960) to the evaluation of scholarship programs, the RDD has become increasingly popular in recent years. Recent study comparisons of randomised controlled trials (RCTs) and RDDs have empirically demonstrated the internal validity of the design.
Principal stratification is a statistical technique used in causal inference when adjusting results for post-treatment covariates. The idea is to identify underlying strata and then compute causal effects only within strata. It is a generalization of the local average treatment effect (LATE) in the sense of presenting applications besides all-or-none compliance. The LATE method, which was independently developed by Imbens and Angrist (1994) and Baker and Lindeman (1994) also included the key exclusion restriction and monotonicity assumptions for identifiability. For the history of early developments see Baker, Kramer, Lindeman.
A theory of change (ToC) is an explicit theory of how and why it is thought that a social policy or program activities lead to outcomes and impacts. ToCs are used in the design of programs and program evaluation, across a range of policy areas.
The PRECEDE–PROCEED model is a cost–benefit evaluation framework proposed in 1974 by Lawrence W. Green that can help health program planners, policy makers and other evaluators, analyze situations and design health programs efficiently. It provides a comprehensive structure for assessing health and quality of life needs, and for designing, implementing and evaluating health promotion and other public health programs to meet those needs. One purpose and guiding principle of the PRECEDE–PROCEED model is to direct initial attention to outcomes, rather than inputs. It guides planners through a process that starts with desired outcomes and then works backwards in the causal chain to identify a mix of strategies for achieving those objectives. A fundamental assumption of the model is the active participation of its intended audience — that is, that the participants ("consumers") will take an active part in defining their own problems, establishing their goals and developing their solutions.
A behavior change method, or behavior change technique, is a theory-based method for changing one or several determinants of behavior such as a person's attitude or self-efficacy. Such behavior change methods are used in behavior change interventions. Although of course attempts to influence people's attitude and other psychological determinants were much older, especially the definition developed in the late nineties yielded useful insights, in particular four important benefits:
Intervention mapping is a protocol for developing theory-based and evidence-based health promotion programs. Intervention Mapping describes the process of health promotion program planning in six steps:
Realist evaluation or realist review is a type of theory-driven evaluation method used in evaluating social programmes. It is based on the epistemological foundations of critical realism, though one of the originators of realist evaluation, Ray Pawson, who was "initially impressed" by how critical realism explains generative causation in experimental science, later criticised its "philosophical grandstanding" and "explain-all Marxism". Based on specific theories, realist evaluation provides an alternative lens to empiricist evaluation techniques for the study and understanding of programmes and policies. This technique assumes that knowledge is a social and historical product, thus the social and political context as well as theoretical mechanisms, need consideration in analysis of programme or policy effectiveness.
Causal inference is the process of determining the independent, actual effect of a particular phenomenon that is a component of a larger system. The main difference between causal inference and inference of association is that causal inference analyzes the response of an effect variable when a cause of the effect variable is changed. The study of why things occur is called etiology, and can be described using the language of scientific causal notation. Causal inference is said to provide the evidence of causality theorized by causal reasoning.
Control functions are statistical methods to correct for endogeneity problems by modelling the endogeneity in the error term. The approach thereby differs in important ways from other models that try to account for the same econometric problem. Instrumental variables, for example, attempt to model the endogenous variable X as an often invertible model with respect to a relevant and exogenous instrument Z. Panel analysis uses special data properties to difference out unobserved heterogeneity that is assumed to be fixed over time.
Experimental benchmarking allows researchers to learn about the accuracy of non-experimental research designs. Specifically, one can compare observational results to experimental findings to calibrate bias. Under ordinary conditions, carrying out an experiment gives the researchers an unbiased estimate of their parameter of interest. This estimate can then be compared to the findings of observational research. Note that benchmarking is an attempt to calibrate non-statistical uncertainty. When combined with meta-analysis this method can be used to understand the scope of bias associated with a specific area of research.
Huey-tsyh Chen is a Taiwanese American sociologist and scholar of program evaluation. He is Professor in the Department of Public Health and Director of the Center for Evaluation and Applied Research at Mercer University.