In published academic research, publication bias occurs when the outcome of an experiment or research study biases the decision to publish or otherwise distribute it. Publishing only results that show a significant finding disturbs the balance of findings in favor of positive results. [1] The study of publication bias is an important topic in metascience.
Despite similar quality of execution and design, [2] papers with statistically significant results are three times more likely to be published than those with null results. [3] This unduly motivates researchers to manipulate their practices to ensure statistically significant results, such as by data dredging. [4]
Many factors contribute to publication bias. [5] [6] For instance, once a scientific finding is well established, it may become newsworthy to publish reliable papers that fail to reject the null hypothesis. [7] Most commonly, investigators simply decline to submit results, leading to non-response bias. Investigators may also assume they made a mistake, find that the null result fails to support a known finding, lose interest in the topic, or anticipate that others will be uninterested in the null results. [2] The nature of these issues and the resulting problems form the five diseases that threaten science: "significosis, an inordinate focus on statistically significant results; neophilia, an excessive appreciation for novelty; theorrhea, a mania for new theory; arigorium, a deficiency of rigor in theoretical and empirical work; and finally, disjunctivitis, a proclivity to produce many redundant, trivial, and incoherent works." [8]
Attempts to find unpublished studies often prove difficult or are unsatisfactory. [5] In an effort to combat this problem, some journals require studies submitted for publication pre-register (before data collection and analysis) with organizations like the Center for Open Science.
Other proposed strategies to detect and control for publication bias [5] include p-curve analysis [9] and disfavoring small and non-randomized studies due to high susceptibility to error and bias. [2]
Publication bias occurs when the publication of research results depends not just on the quality of the research but also on the hypothesis tested, and the significance and direction of effects detected. [10] The subject was first discussed in 1959 by statistician Theodore Sterling to refer to fields in which "successful" research is more likely to be published. As a result, "the literature of such a field consists in substantial part of false conclusions resulting from errors of the first kind in statistical tests of significance". [11] In the worst case, false conclusions could canonize as being true if the publication rate of negative results is too low. [12]
One effect of publication bias is sometimes called the file-drawer effect, or file-drawer problem. This term suggests that negative results, those that do not support the initial hypotheses of researchers are often "filed away" and go no further than the researchers' file drawers, leading to a bias in published research. [13] The term "file drawer problem" was coined by psychologist Robert Rosenthal in 1979. [14]
Positive-results bias, a type of publication bias, occurs when authors are more likely to submit, or editors are more likely to accept, positive results than negative or inconclusive results. [15] Outcome reporting bias occurs when multiple outcomes are measured and analyzed, but the reporting of these outcomes is dependent on the strength and direction of its results. A generic term coined to describe these post-hoc choices is HARKing ("Hypothesizing After the Results are Known"). [16]
There is extensive meta-research on publication bias in the biomedical field. Investigators following clinical trials from the submission of their protocols to ethics committees (or regulatory authorities) until the publication of their results observed that those with positive results are more likely to be published. [18] [19] [20] In addition, studies often fail to report negative results when published, as demonstrated by research comparing study protocols with published articles. [21] [22]
The presence of publication bias was investigated in meta-analyses. The largest such analysis investigated the presence of publication bias in systematic reviews of medical treatments from the Cochrane Library. [23] The study showed that statistically positive significant findings are 27% more likely to be included in meta-analyses of efficacy than other findings. Results showing no evidence of adverse effects have a 78% greater probability of inclusion in safety studies than statistically significant results showing adverse effects. Evidence of publication bias was found in meta-analyses published in prominent medical journals. [24]
Meta-analyses (reviews) have been performed in the field of ecology and environmental biology. In a study of 100 meta-analyses in ecology, only 49% tested for publication bias. [25] While there are multiple tests that have been developed to detect publication bias, most perform poorly in the field of ecology because of high levels of heterogeneity in the data and that often observations are not fully independent. [26]
As of 1998 [update] , "No trial published in China or Russia/USSR found a test treatment to be ineffective." [27]
Where publication bias is present, published studies are no longer a representative sample of the available evidence. This bias distorts the results of meta-analyses and systematic reviews. For example, evidence-based medicine is increasingly reliant on meta-analysis to assess evidence.
Meta-analyses and systematic reviews can account for publication bias by including evidence from unpublished studies and the grey literature. The presence of publication bias can also be explored by constructing a funnel plot in which the estimate of the reported effect size is plotted against a measure of precision or sample size. The premise is that the scatter of points should reflect a funnel shape, indicating that the reporting of effect sizes is not related to their statistical significance. [29] However, when small studies are predominately in one direction (usually the direction of larger effect sizes), asymmetry will ensue and this may be indicative of publication bias. [30]
Because an inevitable degree of subjectivity exists in the interpretation of funnel plots, several tests have been proposed for detecting funnel plot asymmetry. [29] [31] [32] These are often based on linear regression including the popular Eggers regression test, [33] and may adopt a multiplicative or additive dispersion parameter to adjust for the presence of between-study heterogeneity. Some approaches may even attempt to compensate for the (potential) presence of publication bias, [23] [34] [35] which is particularly useful to explore the potential impact on meta-analysis results. [36] [37] [38]
In ecology and environmental biology, a study found that publication bias impacted the effect size, statistical power, and magnitude. The prevalence of publication bias distorted confidence in meta-analytic results, with 66% of initially statistically significant meta-analytic means becoming non-significant after correcting for publication bias. [39] Ecological and evolutionary studies consistently had low statistical power (15%) with a 4-fold exaggeration of effects on average (Type M error rates = 4.4).
The presence of publication bias can be detected by Time-lag bias tests, where time-lag bias occurs when larger or statistically significant effects are published more quickly than smaller or non-statistically significant effects. It can manifest as a decline in the magnitude of the overall effect over time. The key feature of time-lag bias tests is that, as more studies accumulate, the mean effect size is expected to converge on its true value. [26]
Two meta-analyses of the efficacy of reboxetine as an antidepressant demonstrated attempts to detect publication bias in clinical trials. Based on positive trial data, reboxetine was originally passed as a treatment for depression in many countries in Europe and the UK in 2001 (though in practice it is rarely used for this indication). A 2010 meta-analysis concluded that reboxetine was ineffective and that the preponderance of positive-outcome trials reflected publication bias, mostly due to trials published by the drug manufacturer Pfizer. A subsequent meta-analysis published in 2011, based on the original data, found flaws in the 2010 analyses and suggested that the data indicated reboxetine was effective in severe depression (see Reboxetine § Efficacy). Examples of publication bias are given by Ben Goldacre [40] and Peter Wilmshurst. [41]
In the social sciences, a study of published papers exploring the relationship between corporate social and financial performance found that "in economics, finance, and accounting journals, the average correlations were only about half the magnitude of the findings published in Social Issues Management, Business Ethics, or Business and Society journals". [42]
One example cited as an instance of publication bias is the refusal to publish attempted replications of Bem's work that claimed evidence for precognition by The Journal of Personality and Social Psychology (the original publisher of Bem's article). [43]
An analysis [44] comparing studies of gene-disease associations originating in China to those originating outside China found that those conducted within the country reported a stronger association and a more statistically significant result. [45]
John Ioannidis argues that "claimed research findings may often be simply accurate measures of the prevailing bias." [46] He lists the following factors as those that make a paper with a positive result more likely to enter the literature and suppress negative-result papers:
Other factors include experimenter bias and white hat bias.
Publication bias can be contained through better-powered studies, enhanced research standards, and careful consideration of true and non-true relationships. [46] Better-powered studies refer to large studies that deliver definitive results or test major concepts and lead to low-bias meta-analysis. Enhanced research standards such as the pre-registration of protocols, the registration of data collections, and adherence to established protocols are other techniques. To avoid false-positive results, the experimenter must consider the chances that they are testing a true or non-true relationship. This can be undertaken by properly assessing the false positive report probability based on the statistical power of the test [47] and reconfirming (whenever ethically acceptable) established findings of prior studies known to have minimal bias.
In September 2004, editors of prominent medical journals (including the New England Journal of Medicine , The Lancet , Annals of Internal Medicine , and JAMA ) announced that they would no longer publish results of drug research sponsored by pharmaceutical companies unless that research was registered in a public clinical trials registry database from the start. [48] Furthermore, some journals (e.g. Trials), encourage publication of study protocols in their journals. [49]
The World Health Organization (WHO) agreed that basic information about all clinical trials should be registered at the study's inception and that this information should be publicly accessible through the WHO International Clinical Trials Registry Platform. Additionally, the public availability of complete study protocols, alongside reports of trials, is becoming more common for studies. [50]
In a megastudy, a large number of treatments are tested simultaneously. Given the inclusion of different interventions in the study, a megastudy's publication likelihood is less dependent on the statistically significant effect of any specific treatment, so it has been suggested that megastudies may be less prone to publication bias. [51] For example, an intervention found to be ineffective would be easier to publish as part of a megastudy as just one of many studied interventions. In contrast, it might go unreported due to the file-drawer problem if it were the sole focus of a contemplated paper. For the same reason, the megastudy research design may encourage researchers to study not only the interventions they consider more likely to be effective but also those interventions that researchers are less sure about and that they would not pick as the sole focus of the study due to the perceived high risk of a null effect.
Evidence-based medicine (EBM) is "the conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patients. ... [It] means integrating individual clinical expertise with the best available external clinical evidence from systematic research." The aim of EBM is to integrate the experience of the clinician, the values of the patient, and the best available scientific information to guide decision-making about clinical management. The term was originally used to describe an approach to teaching the practice of medicine and improving decisions by individual physicians about individual patients.
Meta-analysis is a method of synthesis of quantitative data from multiple independent studies addressing a common research question. An important part of this method involves computing a combined effect size across all of the studies. As such, this statistical approach involves extracting effect sizes and variance measures from various studies. By combining these effect sizes the statistical power is improved and can resolve uncertainties or discrepancies found in individual studies. Meta-analyses are integral in supporting research grant proposals, shaping treatment guidelines, and influencing health policies. They are also pivotal in summarizing existing research to guide future studies, thereby cementing their role as a fundamental methodology in metascience. Meta-analyses are often, but not always, important components of a systematic review.
A randomized controlled trial is a form of scientific experiment used to control factors not under direct experimental control. Examples of RCTs are clinical trials that compare the effects of drugs, surgical techniques, medical devices, diagnostic procedures, diets or other medical treatments.
In a blind or blinded experiment, information which may influence the participants of the experiment is withheld until after the experiment is complete. Good blinding can reduce or eliminate experimental biases that arise from a participants' expectations, observer's effect on the participants, observer bias, confirmation bias, and other sources. A blind can be imposed on any participant of an experiment, including subjects, researchers, technicians, data analysts, and evaluators. In some cases, while blinding would be useful, it is impossible or unethical. For example, it is not possible to blind a patient to their treatment in a physical therapy intervention. A good clinical protocol ensures that blinding is as effective as possible within ethical and practical constraints.
Reboxetine, sold under the brand name Edronax among others, is a drug of the norepinephrine reuptake inhibitor (NRI) class, marketed as an antidepressant by Pfizer for use in the treatment of major depression, although it has also been used off-label for panic disorder and attention deficit hyperactivity disorder (ADHD). It is approved for use in many countries worldwide, but has not been approved for use in the United States. Although its effectiveness as an antidepressant has been challenged in multiple published reports, its popularity has continued to increase.
Data dredging is the misuse of data analysis to find patterns in data that can be presented as statistically significant, thus dramatically increasing and understating the risk of false positives. This is done by performing many statistical tests on the data and only reporting those that come back with significant results.
A systematic review is a scholarly synthesis of the evidence on a clearly presented topic using critical methods to identify, define and assess research on the topic. A systematic review extracts and interprets data from published studies on the topic, then analyzes, describes, critically appraises and summarizes interpretations into a refined evidence-based conclusion. For example, a systematic review of randomized controlled trials is a way of summarizing and implementing evidence-based medicine.
In statistics, sequential analysis or sequential hypothesis testing is statistical analysis where the sample size is not fixed in advance. Instead data is evaluated as it is collected, and further sampling is stopped in accordance with a pre-defined stopping rule as soon as significant results are observed. Thus a conclusion may sometimes be reached at a much earlier stage than would be possible with more classical hypothesis testing or estimation, at consequently lower financial and/or human cost.
In statistics, (between-) study heterogeneity is a phenomenon that commonly occurs when attempting to undertake a meta-analysis. In a simplistic scenario, studies whose results are to be combined in the meta-analysis would all be undertaken in the same way and to the same experimental protocols. Differences between outcomes would only be due to measurement error. Study heterogeneity denotes the variability in outcomes that goes beyond what would be expected due to measurement error alone.
In medicine an intention-to-treat (ITT) analysis of the results of a randomized controlled trial is based on the initial treatment assignment and not on the treatment eventually received. ITT analysis is intended to avoid various misleading artifacts that can arise in intervention research such as non-random attrition of participants from the study or crossover. ITT is also simpler than other forms of study design and analysis, because it does not require observation of compliance status for units assigned to different treatments or incorporation of compliance into the analysis. Although ITT analysis is widely employed in published clinical trials, it can be incorrectly described and there are some issues with its application. Furthermore, there is no consensus on how to carry out an ITT analysis in the presence of missing outcome data.
In epidemiology, reporting bias is defined as "selective revealing or suppression of information" by subjects. In artificial intelligence research, the term reporting bias is used to refer to people's tendency to under-report all the information available.
A forest plot, also known as a blobbogram, is a graphical display of estimated results from a number of scientific studies addressing the same question, along with the overall results. It was developed for use in medical research as a means of graphically representing a meta-analysis of the results of randomized controlled trials. In the last twenty years, similar meta-analytical techniques have been applied in observational studies and forest plots are often used in presenting the results of such studies also.
A funnel plot is a graph designed to check for the existence of publication bias; funnel plots are commonly used in systematic reviews and meta-analyses. In the absence of publication bias, it assumes that studies with high precision will be plotted near the average, and studies with low precision will be spread evenly on both sides of the average, creating a roughly funnel-shaped distribution. Deviation from this shape can indicate publication bias.
John P. A. Ioannidis is a Greek-American physician-scientist, writer and Stanford University professor who has made contributions to evidence-based medicine, epidemiology, and clinical research. Ioannidis studies scientific research itself - in other words, meta-research - primarily in clinical medicine and the social sciences.
In science, a null result is a result without the expected content: that is, the proposed result is absent. It is an experimental outcome which does not show an otherwise expected effect. This does not imply a result of zero or nothing, simply a result that does not support the hypothesis.
PRISMA is an evidence-based minimum set of items aimed at helping scientific authors to report a wide array of systematic reviews and meta-analyses, primarily used to assess the benefits and harms of a health care intervention. PRISMA focuses on ways in which authors can ensure a transparent and complete reporting of this type of research. The PRISMA standard superseded the earlier QUOROM standard. It offers the replicability of a systematic literature review. Researchers have to figure out research objectives that answer the research question, states the keywords, a set of exclusion and inclusion criteria. In the review stage, relevant articles were searched, irrelevant ones are removed. Articles are analyzed according to some pre-defined categories.
Estimation statistics, or simply estimation, is a data analysis framework that uses a combination of effect sizes, confidence intervals, precision planning, and meta-analysis to plan experiments, analyze data and interpret results. It complements hypothesis testing approaches such as null hypothesis significance testing (NHST), by going beyond the question is an effect present or not, and provides information about how large an effect is. Estimation statistics is sometimes referred to as the new statistics.
The replication crisis is an ongoing methodological crisis in which the results of many scientific studies are difficult or impossible to reproduce. Because the reproducibility of empirical results is an essential part of the scientific method, such failures undermine the credibility of theories building on them and potentially call into question substantial parts of scientific knowledge.
Preregistration is the practice of registering the hypotheses, methods, or analyses of a scientific study before it is conducted. Clinical trial registration is similar, although it may not require the registration of a study's analysis protocol. Finally, registered reports include the peer review and in principle acceptance of a study protocol prior to data collection.
Outcome switching is the practice of changing the primary or secondary outcomes of a clinical trial after its initiation. An outcome is the goal of the clinical trial, such as survival after five years for cancer treatment. Outcome switching can lead to bias and undermine the reliability of the trial, for instance when outcomes are switched after researchers already have access to trial data. That way, researchers can cherry pick an outcome which is statistically significant.