The decline effect may occur when scientific claims receive decreasing support over time. The term was first described by parapsychologist Joseph Banks Rhine in the 1930s to describe the disappearing of extrasensory perception (ESP) of psychic experiments conducted by Rhine over the course of study or time. In its more general term, Cronbach, in his review article of science "Beyond the two disciplines of scientific psychology" referred to the phenomenon as "generalizations decay." [1] The term was once again used in a 2010 article by Jonah Lehrer published in The New Yorker . [2]
In his article, Lehrer gives several examples where the decline effect is allegedly showing. In the first example, the development of second generation anti-psychotic drugs, reveals that the first tests had demonstrated a dramatic decrease in the subjects' psychiatric symptoms. [2] However, after repeating tests this effect declined and in the end it was not possible to document that these drugs had any better effect than the first generation anti-psychotics.
A well-known example of the decline effect can be seen in early experiments conducted by Professor Jonathan Schooler examining the effects of verbalization on non-verbal cognition. In an initial series of studies Schooler found evidence that verbal rehearsal of previously seen faces or colors markedly impaired subsequent recognition. [3] This phenomenon is referred to as verbal overshadowing . Although verbal overshadowing effects have been repeatedly observed by Schooler, as well as other researchers, they have also proven to be somewhat challenging to replicate. [2] [4] [5] Verbal overshadowing effects in a variety of domains were initially easy to find, but then became increasingly difficult to replicate indicating a decline effect in the phenomenon. Schooler has now become one of the more prominent researchers examining the decline effect. He has argued that addressing the decline effect may require a major revision to the scientific process whereby scientists log their protocols before conducting their research and then, regardless of outcome, report their findings in an open access repository (such as Brain Nosek's "Project Implicit"). [6] Schooler is currently working with the Fetzer Foundation to organize a major meeting of scientists from various disciplines to consider alternative accounts of the decline effect and approaches for rigorously addressing it. [7]
In 1991, Danish zoologist Anders Møller discovered a connection between symmetry and sexual preference of female birds in nature. This sparked a huge interest in the topic and a lot of follow-up research was published. In three years following the original discovery, 90% of studies confirmed Møller's hypothesis. However, the same outcome was published in just four out of eight research papers in 1995, and only a third in next three years. [8]
A study published in 2022 reported perhaps one of the most striking examples of the decline effect in the field of ecology, where effect sizes of published studies testing for ocean acidification effects on fish behavior have declined by an order of magnitude over a decade of research on this topic. [9]
The decline effect has different types, each with different causes.
If the initial publication is a false positive, i.e. the null hypothesis is true, but the initial publication made mistakenly rejected it, then subsequent attempts at replication would necessarily discover that the effect size is not significantly different from zero. This is the simplest type of decline effect. [10]
For example, statistically significant phenomena in parapsychology are false positives, and so is facilitated communication. The estimated effects of these phenomena become closer to zero with more experimental data, giving a decline effect. [10]
If the initial publication discovered a genuine effect, but did not identify certain relevant variables, then the effect size might be smaller. [10]
Concretely, consider this example. The effect depends on according to where is a standard gaussian noise. Suppose in the initial publication, due to the experiment setup, , so the initial publication mistakenly thought that .
In an attempt at replication, the uncontrolled variable no longer correlates with , but varies independently according to . Now, the replication discovers that where . Thus, the regression coefficient of over declined 50%.
A real example is the drug Timolol for treating glaucoma. Its effect has steadily decreased. [11] This was explained by noting that the early studies used patients with advanced glaucoma, while later studies used less advanced patients. Because less sick patients has less room for improvement, the effect size of Timolol decreased.
One of the explanations of the effect is regression toward the mean, also known as "inflated decline". [10] This is a statistical phenomenon happening when a variable is extreme on the first experiments and by later experiments tend to regress towards average, although this does not explain why sequential results decline in a linear fashion, rather than fluctuating about the true mean as would be expected. [5]
This is particularly likely when the initial study was stopped early when "the effect size is clearly large enough". If one stops the data collection as soon as the effect size is above a threshold that is higher than the true effect size, then subsequent replications will necessarily regress to the mean. [12]
If the true effect size is small, but the initial study has low power (i.e. small sample size), then the null hypothesis will only be rejected if the effect estimate is far from zero, as illustrated in the figure. This means that subsequent replications, with larger sample sizes, will discover effect estimates that are closer to the true effect, which is closer to zero than the initial estimate. [10]
Another reason may be the publication bias: scientists and scientific journals prefer to publish positive results of experiments and tests over null results, especially with new ideas. [2] As a result, the journals may refuse to publish papers that do not prove that the idea works. Later, when an idea is accepted, journals may refuse to publish papers that support it. [13]
In the debate that followed the original article, Lehrer answered some of the questions by claiming that scientific observations might be shaped by one's expectations and desires, sometimes even unconsciously, thus creating a bias towards the desired outcome. [8] This is known as the experimenter effect. For example, in parapsychology, the "experimenter effect" is used to explain how an experimenter who does not believe in psi would discover no evidence for psi, while the same experiment would when performed by an experiment who does believe in psi. [14]
A significant factor contributing to the decline effect can also be the sample size of the scientific research, since smaller sample size is very likely to give more extreme results, suggesting a significant breakthrough, but also a higher probability of an error. Typical examples of this effect are the opinion polls, where those including a larger number of people are closer to reality than those with a small pool of respondents. [15] This suggestion would not appear to account for the observed decrease over time regardless of sample size. Researcher John Ioannidis offers some explanation. He states that early research is usually small and more prone to highly positive results supporting the original idea, including early confirmatory studies. Later, as larger studies are being made, they often show regression to the mean and a failure to repeat the early exaggerated results. [16] [17] [18]
A 2012 report by National Public Radio's show "On The Media" [19] covered scientists who are exploring another option: that the act of observing the universe changes the universe, and that repeated measurement might actually be rendering earlier results invalid. In other words, antipsychotic drugs did work originally, but the more we measured their effectiveness, the more the laws governing those drugs changed so they ceased to be effective. Science fiction author Geoff Ryman explores this idea and its possible ramifications further in his 2012 short story What We Found, [20] which won the Nebula Award for Best Novelette in 2012. [21]
Another reason for some decline effects may be that certain researchers tend to publish larger effect sizes than others. For example, alongside publication bias and sample size effects, the decline effect in ocean acidification effects on fish behavior [9] was largely driven by outstanding effect sizes reported by two particular investigators from the same laboratory who are currently under investigation for potential scientific misconduct and data fabrication. [22]
Several commenters have contested Jonah Lehrer's view of the decline effect being a problematic side of the phenomenon, as presented in his New Yorker article. "The decline effect is troubling because it reminds us how difficult it is to prove anything. We like to pretend that our experiments define the truth for us. But that's often not the case. Just because an idea is true doesn't mean it can be proved. And just because an idea can be proved doesn't mean it's true. When the experiments are done, we still have to choose what to believe." [2]
Steven Novella also challenges Lehrer's view of the decline effect, arguing that Lehrer is concentrating on new discoveries on the cutting edge of scientific research and applying the conclusions to all areas of science. Novella points out that most of the examples used by Lehrer come from medicine, psychology and ecology, scientific fields most influenced by a complex human aspect and that there is not much evidence of the decline effect in other areas of science, such as physics. [23]
Another scientist, Paul Zachary Myers, is also contesting Lehrer's view on the decline effect being a surprising phenomenon in science, claiming that: "This isn't surprising at all. It's what we expect, and there are many very good reasons for the shift." [24]
Lehrer's statements about the difficulty of proving anything and publication bias find support from Jerry A. Coyne. Coyne holds that in the fields of genetics and evolutionary biology, almost no research is replicated and there is a premium motivation offered for publishing positive results of research studies. However, he also contests Lehrer's approach of applying conclusions on all fields of science, stating that in physics, chemistry or molecular biology, previous results are constantly repeated by others in order to progress in their own research. [25]
One concern that some [26] have expressed is that Lehrer's article may further fuel people's skepticism about academic science. It was long believed that Lehrer's article originally hinted that academic science is not as rigid as people would like to believe. It is especially the article's ending that has upset many scientists and led to broad criticism of the article. Lehrer ends the article by saying: "Just because an idea is true doesn't mean it can be proved. And just because an idea can be proved doesn't mean it's true. When the experiments are done, we still have to choose what to believe." This has upset scientists in the scientific community. Many have written back to Lehrer and questioned his agenda. Some have characterized Lehrer's assertion as "absurd", while others claiming that Lehrer is trying to use publication bias as an excuse for not believing in anything. [26]
As an answer to the many comments Lehrer received upon publishing the article, Lehrer published a comment on his blog, The Frontal Cortex, [8] where he denied that he was implicitly questioning science and scientific methods in any way. In the same blog comment, Lehrer stated that he was not questioning fundamental scientific theories such as the theory of evolution by natural selection and global warming by calling them "two of the most robust and widely tested theories of modern science".
A further clarification was published as a follow-up note in The New Yorker. [8] In this note, entitled "More Thoughts on the Decline Effect", Lehrer tries mainly to answer the critics by giving examples where scientific research has both failed and succeeded. As an example, Lehrer uses Richard Feynman's commencement speech at Caltech in 1974 as a starting point. In his commencement speech, Feynman used Robert Millikan's and Harvey Fletcher's oil drop experiment to measure the charge of an electron to illustrate how selective reporting can bias scientific results. On the other hand, Feynman finds solace in the fact that other scientists will repeat other scientists' experiments and hence, the truth will win out in the end.
Lehrer once again uses the follow-up note to deny that his original intention was to support people denying well verified scientific theories such as natural selection and climate change. Instead, he wishes that "we'd spend more time considering the value of second-generation antipsychotics or the verity of the latest gene-association study". In the other parts of the follow-up note, Lehrer briefly discusses some of the creative feedback he has received in order to reduce publication bias. He does not give explicit support to any specific idea. The follow-up article ends with Lehrer once again stating that the decline effect is a problem in today's science, but that science will eventually find a tool to deal with the problem.
Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures used to analyze the differences among means. ANOVA was developed by the statistician Ronald Fisher. ANOVA is based on the law of total variance, where the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether two or more population means are equal, and therefore generalizes the t-test beyond two means. In other words, the ANOVA is used to test the difference between two or more means.
The design of experiments, also known as experiment design or experimental design, is the design of any task that aims to describe and explain the variation of information under conditions that are hypothesized to reflect the variation. The term is generally associated with experiments in which the design introduces conditions that directly affect the variation, but may also refer to the design of quasi-experiments, in which natural conditions that influence the variation are selected for observation.
A statistical hypothesis test is a method of statistical inference used to decide whether the data sufficiently support a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. Then a decision is made, either by comparing the test statistic to a critical value or equivalently by evaluating a p-value computed from the test statistic. Roughly 100 specialized statistical tests have been defined.
An experiment is a procedure carried out to support or refute a hypothesis, or determine the efficacy or likelihood of something previously untried. Experiments provide insight into cause-and-effect by demonstrating what outcome occurs when a particular factor is manipulated. Experiments vary greatly in goal and scale but always rely on repeatable procedure and logical analysis of the results. There also exist natural experimental studies.
In a blind or blinded experiment, information which may influence the participants of the experiment is withheld until after the experiment is complete. Good blinding can reduce or eliminate experimental biases that arise from a participants' expectations, observer's effect on the participants, observer bias, confirmation bias, and other sources. A blind can be imposed on any participant of an experiment, including subjects, researchers, technicians, data analysts, and evaluators. In some cases, while blinding would be useful, it is impossible or unethical. For example, it is not possible to blind a patient to their treatment in a physical therapy intervention. A good clinical protocol ensures that blinding is as effective as possible within ethical and practical constraints.
In statistics, an effect size is a value measuring the strength of the relationship between two variables in a population, or a sample-based estimate of that quantity. It can refer to the value of a statistic calculated from a sample of data, the value of a parameter for a hypothetical population, or to the equation that operationalizes how statistics or parameters lead to the effect size value. Examples of effect sizes include the correlation between two variables, the regression coefficient in a regression, the mean difference, or the risk of a particular event happening. Effect sizes are a complement tool for statistical hypothesis testing, and play an important role in power analyses to assess the sample size required for new experiments. Effect size are fundamental in meta-analyses which aim at provide the combined effect size based on data from multiple studies. The cluster of data-analysis methods concerning effect sizes is referred to as estimation statistics.
In published academic research, publication bias occurs when the outcome of an experiment or research study biases the decision to publish or otherwise distribute it. Publishing only results that show a significant finding disturbs the balance of findings in favor of positive results. The study of publication bias is an important topic in metascience.
External validity is the validity of applying the conclusions of a scientific study outside the context of that study. In other words, it is the extent to which the results of a study can generalize or transport to other situations, people, stimuli, and times. Generalizability refers to the applicability of a predefined sample to a broader population while transportability refers to the applicability of one sample to another target population. In contrast, internal validity is the validity of conclusions drawn within the context of a particular study.
This glossary of statistics and probability is a list of definitions of terms and concepts used in the mathematical sciences of statistics and probability, their sub-disciplines, and related fields. For additional related terms, see Glossary of mathematics and Glossary of experimental design.
In causal inference, a confounder is a variable that influences both the dependent variable and independent variable, causing a spurious association. Confounding is a causal concept, and as such, cannot be described in terms of correlations or associations. The existence of confounders is an important quantitative explanation why correlation does not imply causation. Some notations are explicitly designed to identify the existence, possible existence, or non-existence of confounders in causal relationships between elements of a system.
Bootstrapping is any test or metric that uses random sampling with replacement, and falls under the broader class of resampling methods. Bootstrapping assigns measures of accuracy to sample estimates. This technique allows estimation of the sampling distribution of almost any statistic using random sampling methods.
In statistics, a mediation model seeks to identify and explain the mechanism or process that underlies an observed relationship between an independent variable and a dependent variable via the inclusion of a third hypothetical variable, known as a mediator variable. Rather than a direct causal relationship between the independent variable and the dependent variable, which is often false, a mediation model proposes that the independent variable influences the mediator variable, which in turn influences the dependent variable. Thus, the mediator variable serves to clarify the nature of the relationship between the independent and dependent variables.
In causal models, controlling for a variable means binning data according to measured values of the variable. This is typically done so that the variable can no longer act as a confounder in, for example, an observational study or experiment.
In statistics and regression analysis, moderation occurs when the relationship between two variables depends on a third variable. The third variable is referred to as the moderator variable or simply the moderator. The effect of a moderating variable is characterized statistically as an interaction; that is, a categorical or continuous variable that is associated with the direction and/or magnitude of the relation between dependent and independent variables. Specifically within a correlational analysis framework, a moderator is a third variable that affects the zero-order correlation between two other variables, or the value of the slope of the dependent variable on the independent variable. In analysis of variance (ANOVA) terms, a basic moderator effect can be represented as an interaction between a focal independent variable and a factor that specifies the appropriate conditions for its operation.
In statistics, errors-in-variables models or measurement error models are regression models that account for measurement errors in the independent variables. In contrast, standard regression models assume that those regressors have been measured exactly, or observed without error; as such, those models account only for errors in the dependent variables, or responses.
Behavioural genetics, also referred to as behaviour genetics, is a field of scientific research that uses genetic methods to investigate the nature and origins of individual differences in behaviour. While the name "behavioural genetics" connotes a focus on genetic influences, the field broadly investigates the extent to which genetic and environmental factors influence individual differences, and the development of research designs that can remove the confounding of genes and environment. Behavioural genetics was founded as a scientific discipline by Francis Galton in the late 19th century, only to be discredited through association with eugenics movements before and during World War II. In the latter half of the 20th century, the field saw renewed prominence with research on inheritance of behaviour and mental illness in humans, as well as research on genetically informative model organisms through selective breeding and crosses. In the late 20th and early 21st centuries, technological advances in molecular genetics made it possible to measure and modify the genome directly. This led to major advances in model organism research and in human studies, leading to new scientific discoveries.
"Why Most Published Research Findings Are False" is a 2005 essay written by John Ioannidis, a professor at the Stanford School of Medicine, and published in PLOS Medicine. It is considered foundational to the field of metascience.
Causal inference is the process of determining the independent, actual effect of a particular phenomenon that is a component of a larger system. The main difference between causal inference and inference of association is that causal inference analyzes the response of an effect variable when a cause of the effect variable is changed. The study of why things occur is called etiology, and can be described using the language of scientific causal notation. Causal inference is said to provide the evidence of causality theorized by causal reasoning.
Invalid science consists of scientific claims based on experiments that cannot be reproduced or that are contradicted by experiments that can be reproduced. Recent analyses indicate that the proportion of retracted claims in the scientific literature is steadily increasing. The number of retractions has grown tenfold over the past decade, but they still make up approximately 0.2% of the 1.4m papers published annually in scholarly journals.
The replication crisis is an ongoing methodological crisis in which the results of many scientific studies are difficult or impossible to reproduce. Because the reproducibility of empirical results is an essential part of the scientific method, such failures undermine the credibility of theories building on them and potentially call into question substantial parts of scientific knowledge.