Internal validity

Last updated

Internal validity is the extent to which a piece of evidence supports a claim about cause and effect, within the context of a particular study. It is one of the most important properties of scientific studies and is an important concept in reasoning about evidence more generally. Internal validity is determined by how well a study can rule out alternative explanations for its findings (usually, sources of systematic error or 'bias'). It contrasts with external validity, the extent to which results can justify conclusions about other contexts (that is, the extent to which results can be generalized). Both internal and external validity can be described using qualitative or quantitative forms of causal notation.

Contents

Details

Inferences are said to possess internal validity if a causal relationship between two variables is properly demonstrated. [1] [2] A valid causal inference may be made when three criteria are satisfied:

  1. the "cause" precedes the "effect" in time (temporal precedence),
  2. the "cause" and the "effect" tend to occur together (covariation), and
  3. there are no plausible alternative explanations for the observed covariation (nonspuriousness). [2]

In scientific experimental settings, researchers often change the state of one variable (the independent variable) to see what effect it has on a second variable (the dependent variable). [3] For example, a researcher might manipulate the dosage of a particular drug between different groups of people to see what effect it has on health. In this example, the researcher wants to make a causal inference, namely, that different doses of the drug may be held responsible for observed changes or differences. When the researcher may confidently attribute the observed changes or differences in the dependent variable to the independent variable (that is, when the researcher observes an association between these variables and can rule out other explanations or rival hypotheses), then the causal inference is said to be internally valid. [4]

In many cases, however, the size of effects found in the dependent variable may not just depend on

Rather, a number of variables or circumstances uncontrolled for (or uncontrollable) may lead to additional or alternative explanations (a) for the effects found and/or (b) for the magnitude of the effects found. Internal validity, therefore, is more a matter of degree than of either-or, and that is exactly why research designs other than true experiments may also yield results with a high degree of internal validity.

In order to allow for inferences with a high degree of internal validity, precautions may be taken during the design of the study. As a rule of thumb, conclusions based on direct manipulation of the independent variable allow for greater internal validity than conclusions based on an association observed without manipulation.

When considering only Internal Validity, highly controlled true experimental designs (i.e. with random selection, random assignment to either the control or experimental groups, reliable instruments, reliable manipulation processes, and safeguards against confounding factors) may be the "gold standard" of scientific research. However, the very methods used to increase internal validity may also limit the generalizability or external validity of the findings. For example, studying the behavior of animals in a zoo may make it easier to draw valid causal inferences within that context, but these inferences may not generalize to the behavior of animals in the wild. In general, a typical experiment in a laboratory, studying a particular process, may leave out many variables that normally strongly affect that process in nature.

Example threats

To recall eight of these threats to internal validity, use the mnemonic acronym, THIS MESS, [5] which stands for:

Ambiguous temporal precedence

When it is not known which variable changed first, it can be difficult to determine which variable is the cause and which is the effect.

Confounding

A major threat to the validity of causal inferences is confounding: Changes in the dependent variable may rather be attributed to variations in a third variable which is related to the manipulated variable. Where spurious relationships cannot be ruled out, rival hypotheses to the original causal inference may be developed.

Selection bias

Selection bias refers to the problem that, at pre-test, differences between groups exist that may interact with the independent variable and thus be 'responsible' for the observed outcome. Researchers and participants bring to the experiment a myriad of characteristics, some learned and others inherent. For example, sex, weight, hair, eye, and skin color, personality, mental capabilities, and physical abilities, but also attitudes like motivation or willingness to participate.

During the selection step of the research study, if an unequal number of test subjects have similar subject-related variables there is a threat to the internal validity. For example, a researcher created two test groups, the experimental and the control groups. The subjects in both groups are not alike with regard to the independent variable but similar in one or more of the subject-related variables.

Self-selection also has a negative effect on the interpretive power of the dependent variable. This occurs often in online surveys where individuals of specific demographics opt into the test at higher rates than other demographics.

History

Events outside of the study/experiment or between repeated measures of the dependent variable may affect participants' responses to experimental procedures. Often, these are large-scale events (natural disaster, political change, etc.) that affect participants' attitudes and behaviors such that it becomes impossible to determine whether any change on the dependent measures is due to the independent variable, or the historical event.

Maturation

Subjects change during the course of the experiment or even between measurements. For example, young children might mature and their ability to concentrate may change as they grow up. Both permanent changes, such as physical growth and temporary ones like fatigue, provide "natural" alternative explanations; thus, they may change the way a subject would react to the independent variable. So upon completion of the study, the researcher may not be able to determine if the cause of the discrepancy is due to time or the independent variable.

Repeated testing (also referred to as testing effects)

Repeatedly measuring the participants may lead to bias. Participants may remember the correct answers or may be conditioned to know that they are being tested. Repeatedly taking (the same or similar) intelligence tests usually leads to score gains, but instead of concluding that the underlying skills have changed for good, this threat to Internal Validity provides a good rival hypothesis.

Instrument change (instrumentality)

The instrument used during the testing process can change the experiment. This also refers to observers being more concentrated or primed, or having unconsciously changed the criteria they use to make judgments. This can also be an issue with self-report measures given at different times. In this case, the impact may be mitigated through the use of retrospective pretesting. If any instrumentation changes occur, the internal validity of the main conclusion is affected, as alternative explanations are readily available.

Regression toward the mean

This type of error occurs when subjects are selected on the basis of extreme scores (one far away from the mean) during a test. For example, when children with the worst reading scores are selected to participate in a reading course, improvements at the end of the course might be due to regression toward the mean and not the course's effectiveness. If the children had been tested again before the course started, they would likely have obtained better scores anyway. Likewise, extreme outliers on individual scores are more likely to be captured in one instance of testing but will likely evolve into a more normal distribution with repeated testing.

Mortality/differential attrition

This error occurs if inferences are made on the basis of only those participants that have participated from the start to the end. However, participants may have dropped out of the study before completion, and maybe even due to the study or programme or experiment itself. For example, the percentage of group members having quit smoking at post-test was found much higher in a group having received a quit-smoking training program than in the control group. However, in the experimental group only 60% have completed the program. If this attrition is systematically related to any feature of the study, the administration of the independent variable, the instrumentation, or if dropping out leads to relevant bias between groups, a whole class of alternative explanations is possible that account for the observed differences.

Selection-maturation interaction

This occurs when the subject-related variables, color of hair, skin color, etc., and the time-related variables, age, physical size, etc., interact. If a discrepancy between the two groups occurs between the testing, the discrepancy may be due to the age differences in the age categories.

Diffusion

If treatment effects spread from treatment groups to control groups, a lack of differences between experimental and control groups may be observed. This does not mean, however, that the independent variable has no effect or that there is no relationship between dependent and independent variable.

Compensatory rivalry/resentful demoralization

Behavior in the control groups may alter as a result of the study. For example, control group members may work extra hard to see that the expected superiority of the experimental group is not demonstrated. Again, this does not mean that the independent variable produced no effect or that there is no relationship between dependent and independent variable. Vice versa, changes in the dependent variable may only be affected due to a demoralized control group, working less hard or motivated, not due to the independent variable.

Experimenter bias

Experimenter bias occurs when the individuals who are conducting an experiment inadvertently affect the outcome by non-consciously behaving in different ways to members of control and experimental groups. It is possible to eliminate the possibility of experimenter bias through the use of double-blind study designs, in which the experimenter is not aware of the condition to which a participant belongs.

Mutual-internal-validity problem

Experiments that have high internal validity can produce phenomena and results that have no relevance in real life, resulting in the mutual-internal-validity problem. [6] [7] It arises when researchers use experimental results to develop theories and then use those theories to design theory-testing experiments. This mutual feedback between experiments and theories can lead to theories that explain only phenomena and results in artificial laboratory settings but not in real life.

See also

Related Research Articles

<span class="mw-page-title-main">Design of experiments</span> Design of tasks

The design of experiments, also known as experiment design or experimental design, is the design of any task that aims to describe and explain the variation of information under conditions that are hypothesized to reflect the variation. The term is generally associated with experiments in which the design introduces conditions that directly affect the variation, but may also refer to the design of quasi-experiments, in which natural conditions that influence the variation are selected for observation.

The phrase "correlation does not imply causation" refers to the inability to legitimately deduce a cause-and-effect relationship between two events or variables solely on the basis of an observed association or correlation between them. The idea that "correlation implies causation" is an example of a questionable-cause logical fallacy, in which two events occurring together are taken to have established a cause-and-effect relationship. This fallacy is also known by the Latin phrase cum hoc ergo propter hoc. This differs from the fallacy known as post hoc ergo propter hoc, in which an event following another is seen as a necessary consequence of the former event, and from conflation, the errant merging of two events, ideas, databases, etc., into one.

<span class="mw-page-title-main">Experiment</span> Scientific procedure performed to validate a hypothesis

An experiment is a procedure carried out to support or refute a hypothesis, or determine the efficacy or likelihood of something previously untried. Experiments provide insight into cause-and-effect by demonstrating what outcome occurs when a particular factor is manipulated. Experiments vary greatly in goal and scale but always rely on repeatable procedure and logical analysis of the results. There also exist natural experimental studies.

Validity is the main extent to which a concept, conclusion or measurement is well-founded and likely corresponds accurately to the real world. The word "valid" is derived from the Latin validus, meaning strong. The validity of a measurement tool is the degree to which the tool measures what it claims to measure. Validity is based on the strength of a collection of different types of evidence described in greater detail below.

<span class="mw-page-title-main">Experimental psychology</span> Application of experimental method to psychological research

Experimental psychology refers to work done by those who apply experimental methods to psychological study and the underlying processes. Experimental psychologists employ human participants and animal subjects to study a great many topics, including sensation & perception, memory, cognition, learning, motivation, emotion; developmental processes, social psychology, and the neural substrates of all of these.

Construct validity concerns how well a set of indicators represent or reflect a concept that is not directly measurable. Construct validation is the accumulation of evidence to support the interpretation of what a measure reflects. Modern validity theory defines construct validity as the overarching concern of validity research, subsuming all other types of validity evidence such as content validity and criterion validity.

<span class="mw-page-title-main">Field experiment</span>

Field experiments are experiments carried out outside of laboratory settings.

In medical research, social science, and biology, a cross-sectional study is a type of observational study that analyzes data from a population, or a representative subset, at a specific point in time—that is, cross-sectional data.

<span class="mw-page-title-main">Scientific control</span> Methods employed to reduce error in science tests

Informal improvements in any process or enquiry have been made by comparison between what was done previously and the new method for thousands of years. A scientific control is a modern formal experiment or observation designed to minimize the effects of variables other than the independent variable. This increases the reliability of the results, often through a comparison between control measurements and the other measurements. Scientific controls are a part of the scientific method.

External validity is the validity of applying the conclusions of a scientific study outside the context of that study. In other words, it is the extent to which the results of a study can be generalized to and across other situations, people, stimuli, and times. In contrast, internal validity is the validity of conclusions drawn within the context of a particular study. Because general conclusions are almost always a goal in research, external validity is an important property of any study. Mathematical analysis of external validity concerns a determination of whether generalization across heterogeneous populations is feasible, and devising statistical and computational methods that produce valid generalizations.

<span class="mw-page-title-main">Confounding</span> Variable or factor in causal inference

In causal inference, a confounder is a variable that influences both the dependent variable and independent variable, causing a spurious association. Confounding is a causal concept, and as such, cannot be described in terms of correlations or associations. The existence of confounders is an important quantitative explanation why correlation does not imply causation. Some notations are explicitly designed to identify the existence, possible existence, or non-existence of confounders in causal relationships between elements of a system.

The goals of experimental finance are to understand human and market behavior in settings relevant to finance. Experiments are synthetic economic environments created by researchers specifically to answer research questions. This might involve, for example, establishing different market settings and environments to observe experimentally and analyze agents' behavior and the resulting characteristics of trading flows, information diffusion and aggregation, price setting mechanism and returns processes.

<span class="mw-page-title-main">Demand characteristics</span> Extraneous variable in social research

In social research, particularly in psychology, the term demand characteristic refers to an experimental artifact where participants form an interpretation of the experiment's purpose and subconsciously change their behavior to fit that interpretation. Typically, demand characteristics are considered an extraneous variable, exerting an effect on behavior other than that intended by the experimenter. Pioneering research was conducted on demand characteristics by Martin Orne.

<span class="mw-page-title-main">Observational study</span> Study with uncontrolled variable of interest

In fields such as epidemiology, social sciences, psychology and statistics, an observational study draws inferences from a sample to a population where the independent variable is not under the control of the researcher because of ethical concerns or logistical constraints. One common observational study is about the possible effect of a treatment on subjects, where the assignment of subjects into a treated group versus a control group is outside the control of the investigator. This is in contrast with experiments, such as randomized controlled trials, where each subject is randomly assigned to a treated group or a control group. Observational studies, for lacking an assignment mechanism, naturally present difficulties for inferential analysis.

Statistical conclusion validity is the degree to which conclusions about the relationship among variables based on the data are correct or "reasonable". This began as being solely about whether the statistical conclusion about the relationship of the variables was correct, but now there is a movement towards moving to "reasonable" conclusions that use: quantitative, statistical, and qualitative data. Fundamentally, two types of errors can occur: type I and type II. Statistical conclusion validity concerns the qualities of the study that make these types of errors more likely. Statistical conclusion validity involves ensuring the use of adequate sampling procedures, appropriate statistical tests, and reliable measurement procedures.

<span class="mw-page-title-main">Quasi-experiment</span> Empirical interventional study

A quasi-experiment is an empirical interventional study used to estimate the causal impact of an intervention on target population without random assignment. Quasi-experimental research shares similarities with the traditional experimental design or randomized controlled trial, but it specifically lacks the element of random assignment to treatment or control. Instead, quasi-experimental designs typically allow the researcher to control the assignment to the treatment condition, but using some criterion other than random assignment.

Impact evaluation assesses the changes that can be attributed to a particular intervention, such as a project, program or policy, both the intended ones, as well as ideally the unintended ones. In contrast to outcome monitoring, which examines whether targets have been achieved, impact evaluation is structured to answer the question: how would outcomes such as participants' well-being have changed if the intervention had not been undertaken? This involves counterfactual analysis, that is, "a comparison between what actually happened and what would have happened in the absence of the intervention." Impact evaluations seek to answer cause-and-effect questions. In other words, they look for the changes in outcome that are directly attributable to a program.

In the statistical analysis of observational data, propensity score matching (PSM) is a statistical matching technique that attempts to estimate the effect of a treatment, policy, or other intervention by accounting for the covariates that predict receiving the treatment. PSM attempts to reduce the bias due to confounding variables that could be found in an estimate of the treatment effect obtained from simply comparing outcomes among units that received the treatment versus those that did not. Paul R. Rosenbaum and Donald Rubin introduced the technique in 1983.

<span class="mw-page-title-main">Between-group design experiment</span>

In the design of experiments, a between-group design is an experiment that has two or more groups of subjects each being tested by a different testing factor simultaneously. This design is usually used in place of, or in some cases in conjunction with, the within-subject design, which applies the same variations of conditions to each subject to observe the reactions. The simplest between-group design occurs with two groups; one is generally regarded as the treatment group, which receives the ‘special’ treatment, and the control group, which receives no variable treatment and is used as a reference The between-group design is widely used in psychological, economic, and sociological experiments, as well as in several other fields in the natural or social sciences.

Observational methods in psychological research entail the observation and description of a subject's behavior. Researchers utilizing the observational method can exert varying amounts of control over the environment in which the observation takes place. This makes observational research a sort of middle ground between the highly controlled method of experimental design and the less structured approach of conducting interviews.

References

  1. Brewer, M. (2000). Research Design and Issues of Validity. In Reis, H. and Judd, C. (eds.) Handbook of Research Methods in Social and Personality Psychology. Cambridge:Cambridge University Press.
  2. 1 2 Shadish, W., Cook, T., and Campbell, D. (2002). Experimental and Quasi-Experimental Designs for Generilized Causal Inference Boston:Houghton Mifflin.
  3. Levine, G. and Parkinson, S. (1994). Experimental Methods in Psychology. Hillsdale, NJ:Lawrence Erlbaum.
  4. Liebert, R. M. & Liebert, L. L. (1995). Science and behavior: An introduction to methods of psychological research. Englewood Cliffs, NJ: Prentice Hall.
  5. Wortman, P. M. (1983). "Evaluation research – A methodological perspective". Annual Review of Psychology . 34: 223–260. doi:10.1146/annurev.ps.34.020183.001255.
  6. Schram, Arthur (2005-06-01). "Artificiality: The tension between internal and external validity in economic experiments". Journal of Economic Methodology. 12 (2): 225–237. doi:10.1080/13501780500086081. ISSN   1350-178X. S2CID   145588503.
  7. Lin, Hause; Werner, Kaitlyn M.; Inzlicht, Michael (2021-02-16). "Promises and Perils of Experimentation: The Mutual-Internal-Validity Problem". Perspectives on Psychological Science. 16 (4): 854–863. doi:10.1177/1745691620974773. ISSN   1745-6916. PMID   33593177. S2CID   231877717.