This article's tone or style may not reflect the encyclopedic tone used on Wikipedia.(February 2023) |
This article needs additional citations for verification .(August 2024) |
The Foundations of Statistics are the mathematical and philosophical bases for statistical methods. These bases are the theoretical frameworks that ground and justify methods of statistical inference, estimation, hypothesis testing, uncertainty quantification, and the interpretation of statistical conclusions. Further, a foundation can be used to explain statistical paradoxes, provide descriptions of statistical laws, [1] and guide the application of statistics to real-world problems.
Different statistical foundations may provide different, contrasting perspectives on the analysis and interpretation of data, and some of these contrasts have been subject to centuries of debate. [2] Examples include the Bayesian inference versus frequentist inference; the distinction between Fisher's significance testing and the Neyman-Pearson hypothesis testing; and whether the likelihood principle holds.
Certain frameworks may be preferred for specific applications, such as the use of Bayesian methods in fitting complex ecological models. [3]
Bandyopadhyay & Forster [4] identify four statistical paradigms: classical statistics (error statistics), Bayesian statistics, likelihood-based statistics, and information-based statistics using the Akaike Information Criterion. More recently, Judea Pearl reintroduced formal mathematics by attributing causality in statistical systems that addressed the fundamental limitations of both Bayesian and Neyman-Pearson methods, as discussed in his book Causality .
During the 20th century, the development of classical statistics led to the emergence of two competing foundations for inductive statistical testing. [5] [6] The merits of these models were extensively debated. [7] Although a hybrid approach combining elements of both methods is commonly taught and utilized, the philosophical questions raised during the debate still remain unresolved.[ citation needed ]
Publications by Fisher, like "Statistical Methods for Research Workers" in 1925 and "The Design of Experiments" in 1935, [8] contributed to the popularity of significance testing, which is a probabilistic approach to deductive inference. In practice, a statistic is computed based on the experimental data and the probability of obtaining a value greater than that statistic under a default or "null" model is compared to a predetermined threshold. This threshold represents the level of discord required (typically established by convention).[ citation needed ] One common application of this method is to determine whether a treatment has a noticeable effect based on a comparative experiment. In this case, the null hypothesis corresponds to the absence of a treatment effect, implying that the treated group and the control group are drawn from the same population. Statistical significance measures probability and does not address practical significance. It can be viewed as a criterion for the statistical signal-to-noise ratio. It is important to note that the test cannot prove the hypothesis (of no treatment effect), but it can provide evidence against it.[ citation needed ]
The Fisher significance test involves a single hypothesis, but the choice of the test statistic requires an understanding of relevant directions of deviation from the hypothesized model.
Neyman and Pearson collaborated on the problem of selecting the most appropriate hypothesis based solely on experimental evidence, which differed from significance testing. Their most renowned joint paper, published in 1933, [9] introduced the Neyman-Pearson lemma, which states that a ratio of probabilities serves as an effective criterion for hypothesis selection (with the choice of the threshold being arbitrary). The paper demonstrated the optimality of the Student's t-test, one of the significance tests. Neyman believed that hypothesis testing represented a generalization and improvement of significance testing. The rationale for their methods can be found in their collaborative papers. [10]
Hypothesis testing involves considering multiple hypotheses and selecting one among them, akin to making a multiple-choice decision. The absence of evidence is not an immediate factor to be taken into account. The method is grounded in the assumption of repeated sampling from the same population (the classical frequentist assumption), although Fisher criticized this assumption. [11]
The duration of the dispute allowed for a comprehensive discussion of various fundamental issues in the field of statistics.
Repeated sampling of the same population
Type II errors
Inductive behavior
Fisher's attack on inductive behavior has been largely successful because he selected the field of battle. While operational decisions are routinely made on a variety of criteria (such as cost), scientific conclusions from experimentation are typically made based on probability alone. Fisher's theory of fiduciary inference is flawed
A purely probabilistic theory of tests requires an alternative hypothesis. Fisher's attacks on Type II errors have faded with time. In the intervening years, statistics have separated the exploratory from the confirmatory. In the current environment, the concept of Type II errors are used in power calculations for confirmatory hypothesis tests' sample size determination.
Fisher's attack based on frequentist probability failed but was not without result. He identified a specific case (2×2 table) where the two schools of testing reached different results. This case is one of several that are still troubling. Commentators believe that the "right" answer is context-dependent. [14] Fiducial probability has not fared well, being virtually without advocates, while frequentist probability remains a mainstream interpretation.
Fisher's attack on inductive behavior has been largely successful because he selected the field of battle. While ''operational decisions'' are routinely made on a variety of criteria (such as cost), ''scientific conclusions'' from experimentation are typically made based on probability alone.
During this exchange, Fisher also discussed the requirements for inductive inference, specifically criticizing cost functions that penalize erroneous judgments. Neyman countered by mentioning the use of such functions by Gauss and Laplace. These arguments occurred 15 years after textbooks began teaching a hybrid theory of statistical testing.
Fisher and Neyman held different perspectives on the foundations of statistics (though they both opposed the Bayesian viewpoint): [14]
Fisher and Neyman diverged in their attitudes and, perhaps, their language. Fisher was a scientist and an intuitive mathematician, and inductive reasoning came naturally to him. Neyman, on the other hand, was a rigorous mathematician who relied on deductive reasoning rather than probability calculations based on experiments. [5] Hence, there was an inherent clash between applied and theoretical approaches (between science and mathematics).
In 1938, Neyman relocated to the West Coast of the United States of America, effectively ending his collaboration with Pearson and their work on hypothesis testing. [5] Subsequent developments in the field were carried out by other researchers.
By 1940, textbooks began presenting a hybrid approach that combined elements of significance testing and hypothesis testing. [16] However, none of the main contributors were directly involved in the further development of the hybrid approach currently taught in introductory statistics. [6]
Statistics subsequently branched out into various directions, including decision theory, Bayesian statistics, exploratory data analysis, robust statistics, and non-parametric statistics. Neyman-Pearson hypothesis testing made significant contributions to decision theory, which is widely employed, particularly in statistical quality control. Hypothesis testing also extended its applicability to incorporate prior probabilities, giving it a Bayesian character. While Neyman-Pearson hypothesis testing has evolved into an abstract mathematical subject taught at the post-graduate level, [17] much of what is taught and used in undergraduate education under the umbrella of hypothesis testing can be attributed to Fisher.
There have been no major conflicts between the two classical schools of testing in recent decades, although occasional criticism and disputes persist. However, it is highly unlikely that one theory of statistical testing will completely supplant the other in the foreseeable future.
The hybrid approach, which combines elements from both competing schools of testing, can be interpreted in different ways. Some view it as an amalgamation of two mathematically complementary ideas, [14] while others see it as a flawed union of philosophically incompatible concepts. [18] Fisher's approach had certain philosophical advantages, while Neyman and Pearson emphasized rigorous mathematics. Hypothesis testing remains a subject of controversy for some users, but the most widely accepted alternative method, confidence intervals, is based on the same mathematical principles.
Due to the historical development of testing, there is no single authoritative source that fully encompasses the hybrid theory as it is commonly practiced in statistics. Additionally, the terminology used in this context may lack consistency. Empirical evidence indicates that individuals, including students and instructors in introductory statistics courses, often have a limited understanding of the meaning of hypothesis testing. [19]
Two distinct interpretations of probability have existed for a long time, one based on objective evidence and the other on subjective degrees of belief. The debate between Gauss and Laplace could have taken place more than 200 years ago, giving rise to two competing schools of statistics. Classical inferential statistics emerged primarily during the second quarter of the 20th century, [6] largely in response to the controversial principle of indifference used in Bayesian probability at that time. The resurgence of Bayesian inference was a reaction to the limitations of frequentist probability, leading to further developments and reactions.
While the philosophical interpretations have a long history, the specific statistical terminology is relatively recent. The terms "Bayesian" and "frequent" became standardized in the second half of the 20th century. [20] However, the terminology can be confusing, as the "classical" interpretation of probability aligns with Bayesian principles, while "classical" statistics follow the frequentist approach. Moreover, even within the term "frequentist," there are variations in interpretation, differing between philosophy and physics.
The intricate details of philosophical probability interpretations are explored elsewhere. In the field of statistics, these alternative interpretations allow for the analysis of different datasets using distinct methods based on various models, aiming to achieve slightly different objectives. When comparing the competing schools of thought in statistics, pragmatic criteria beyond philosophical considerations are taken into account.
Fisher and Neyman were significant figures in the development of frequentist (classical) methods. [5] While Fisher had a unique interpretation of probability that differed from Bayesian principles, Neyman adhered strictly to the frequentist approach. In the realm of Bayesian statistical philosophy, mathematics, and methods, de Finetti, [21] Jeffreys, [22] and Savage [23] emerged as notable contributors during the 20th century. Savage played a crucial role in popularizing de Finetti's ideas in English-speaking regions and establishing rigorous Bayesian mathematics. In 1965, Dennis Lindley's two-volume work titled "Introduction to Probability and Statistics from a Bayesian Viewpoint" played a vital role in introducing Bayesian methods to a wide audience. For three generations, statistics have progressed significantly, and the views of early contributors are not necessarily considered authoritative in present times.
The earlier description briefly highlights frequentist inference, which encompasses Fisher's "significance testing" and Neyman-Pearson's "hypothesis testing." Frequentist inference incorporates various perspectives and allows for scientific conclusions, operational decisions, and parameter estimation with or without confidence intervals.
A classical frequency distribution provides information about the probability of the observed data. By applying Bayes' theorem, a more abstract concept is introduced, which involves estimating the probability of a hypothesis (associated with a theory) given the data. This concept, formerly referred to as "inverse probability," is realized through Bayesian inference. Bayesian inference involves updating the probability estimate for a hypothesis as new evidence becomes available. It explicitly considers both the evidence and prior beliefs, enabling the incorporation of multiple sets of evidence.
Frequentists and Bayesians employ distinct probability models. Frequentist typically view parameters as fixed but unknown, whereas Bayesians assign probability distributions to these parameters. As a result, Bayesian discuss probabilities that frequentist do not acknowledge. Bayesian consider the probability of a theory, whereas true frequentists can only assess the evidence's consistency with the theory. For instance, a frequentist does not claim a 95% probability that the true value of a parameter falls within a confidence interval; rather, they state that 95% of confidence intervals encompass the true value.
Bayesian | Frequentist | |
---|---|---|
Basis | Belief (prior) | Behavior (method) |
Resulting Characteristic | Principled Philosophy | Opportunistic Methods |
Distributions | One distribution | Many distributions (bootstrap?) |
Ideal Application | Dynamic (repeated sampling) | Static (one sample) |
Target Audience | Individual (subjective) | Community (objective) |
Modeling Characteristic | Aggressive | Defensive |
Bayesian | Frequentist | |
---|---|---|
Strengths |
|
|
Weaknesses |
|
|
Both the frequentist and Bayesian schools are subject to mathematical critique, and neither readily embraces such criticism. For instance, Stein's paradox highlights the intricacy of determining a "flat" or "uninformative" prior probability distribution in high-dimensional spaces. [2] While Bayesians perceive this as tangential to their fundamental philosophy, they find frequentist plagued with inconsistencies, paradoxes, and unfavorable mathematical behavior. Frequentist traveller can account for most of these issues. Certain "problematic" scenarios, like estimating the weight variability of a herd of elephants based on a single measurement (Basu's elephants), exemplify extreme cases that defy statistical estimation. The principle of likelihood has been a contentious area of debate.
Both the frequentist and Bayesian schools have demonstrated notable accomplishments in addressing practical challenges. Classical statistics, with its reliance on mechanical calculators and specialized printed tables, boasts a longer history of obtaining results. Bayesian methods, on the other hand, have shown remarkable efficacy in analyzing sequentially sampled information, such as radar and sonar data. Several Bayesian techniques, as well as certain recent frequentist methods like the bootstrap, necessitate the computational capabilities that have become widely accessible in the past few decades. There is an ongoing discourse regarding the integration of Bayesian and frequentist approaches, [25] although concerns have been raised regarding the interpretation of results and the potential diminishment of methodological diversity.
Bayesians share a common stance against the limitations of frequent, but they are divided into various philosophical camps (empirical, hierarchical, objective, personal, and subjective), each emphasizing different aspects. A philosopher of statistics from the frequentist perspective has observed a shift from the statistical domain to philosophical interpretations of probability over the past two generations. [27] Some perceive that the successes achieved with Bayesian applications do not sufficiently justify the associated philosophical framework. [28] Bayesian methods often develop practical models that deviate from traditional inference and have minimal reliance on philosophy. [29] Neither the frequentist nor the Bayesian philosophical interpretations of probability can be considered entirely robust. The frequentist view is criticized for being overly rigid and restrictive, while the Bayesian view can encompass both objective and subjective elements, among others.
In common usage, likelihood is often considered synonymous with probability. However, according to statistics, this is not the case. In statistics, probability refers to variable data given a fixed hypothesis, whereas likelihood refers to variable hypotheses given a fixed set of data. For instance, when making repeated measurements with a ruler under fixed conditions, each set of observations corresponds to a probability distribution, and the observations can be seen as a sample from that distribution, following the frequentist interpretation of probability. On the other hand, a set of observations can also arise from sampling various distributions based on different observational conditions. The probabilistic relationship between a fixed sample and a variable distribution stemming from a variable hypothesis is referred to as likelihood, representing the Bayesian view of probability. For instance, a set of length measurements may represent readings taken by observers with specific characteristics and conditions.
Likelihood is a concept that was introduced and developed by Fisher over a span of more than 40 years, although earlier references to the concept exist and Fisher's support for it was not wholehearted. [34] The concept was subsequently accepted and substantially revised by Jeffreys. [35] In 1962, Birnbaum "proved" the likelihood principle based on premises that were widely accepted among statisticians, [36] although his proof has been subject to dispute by statisticians and philosophers. Notably, by 1970, Birnbaum had rejected one of these premises (the conditionality principle) and had also abandoned the likelihood principle due to their incompatibility with the frequentist "confidence concept of statistical evidence." [37] [38] The likelihood principle asserts that all the information in a sample is contained within the likelihood function, which is considered a valid probability distribution by Bayesians but not by frequent.
Certain significance tests employed by frequentists are not consistent with the likelihood principle. Bayesian, on the other hand, embrace the principle as it aligns with their philosophical standpoint (perhaps in response to frequentist discomfort). The likelihood approach is compatible with Bayesian statistical inference, where the posterior Bayes distribution for a parameter is derived by multiplying the prior distribution by the likelihood function using Bayes' Theorem. [34] Frequentist interpret the likelihood principle unfavourably, as it suggests a lack of concern for the reliability of evidence. The likelihood principle, according to Bayesian statistics, implies that information about the experimental design used to collect evidence does not factor into the statistical analysis of the data. [39] Some Bayesian, including Savage,[ citation needed ] acknowledge this implication as a vulnerability.
The likelihood principle's staunchest proponents argue that it provides a more solid foundation for statistics compared to the alternatives presented by Bayesian and frequentist approaches. [40] These supporters include some statisticians and philosophers of science. [41] While Bayesian recognize the importance of likelihood for calculations, they contend that the posterior probability distribution serves as the appropriate basis for inference. [42]
Inferential statistics relies on statistical models. Classical hypothesis testing, for instance, has often relied on the assumption of data normality. To reduce reliance on this assumption, robust and nonparametric statistics have been developed. Bayesian statistics, on the other hand, interpret new observations based on prior knowledge, assuming continuity between the past and present. The experimental design assumes some knowledge of the factors to be controlled, varied, randomized, and observed. Statisticians are aware of the challenges in establishing causation, often stating that "correlation does not imply causation," which is more of a limitation in modelling than a mathematical constraint.
As statistics and data sets have become more complex, [lower-alpha 1] [lower-alpha 2] questions have arisen regarding the validity of models and the inferences drawn from them. There is a wide range of conflicting opinions on modelling.
Models can be based on scientific theory or ad hoc data analysis, each employing different methods. Advocates exist for each approach. [44] Model complexity is a trade-off and less subjective approaches such as the Akaike information criterion and Bayesian information criterion aim to strike a balance. [45]
Concerns have been raised even about simple regression models used in the social sciences, as a multitude of assumptions underlying model validity are often neither mentioned nor verified. In some cases, a favorable comparison between observations and the model is considered sufficient. [46]
Bayesian statistics focuses so tightly on the posterior probability that it ignores the fundamental comparison of observations and model.[ dubious – discuss ] [29]
Traditional observation-based models often fall short in addressing many significant problems, requiring the utilization of a broader range of models, including algorithmic ones. "If the model is a poor emulation of nature, the conclusions may be wrong." [47]
Modelling is frequently carried out inadequately, with improper methods employed, and the reporting of models is often subpar. [48]
Given the lack of a strong consensus on the philosophical review of statistical modeling, many statisticians adhere to the cautionary words of George Box: " All models are wrong, but some are useful."
For a concise introduction to the fundamentals of statistics, refer to Stuart, A.; old, J.K. (1994). "Ch. 8 – Probability and statistical inference" in Kendall's Advanced Theory of Statistics, Volume I: Distribution Theory (6th ed.), published by Edward Arnold.
In his book Statistics as Principled Argument, Robert P. Abelson presents the perspective that statistics serve as a standardized method for resolving disagreements among scientists, who could otherwise engage in endless debates about the merits of their respective positions. From this standpoint, statistics can be seen as a form of rhetoric. However, the effectiveness of statistical methods depends on the consensus among all involved parties regarding the chosen approach. [49]
... the purpose of statistics is to organize a useful argument from quantitative evidence, using a form of principled rhetoric.
Bayesian probability is an interpretation of the concept of probability, in which, instead of frequency or propensity of some phenomenon, probability is interpreted as reasonable expectation representing a state of knowledge or as quantification of a personal belief.
Frequentist probability or frequentism is an interpretation of probability; it defines an event's probability as the limit of its relative frequency in infinitely many trials . Probabilities can be found by a repeatable objective process. The continued use of frequentist methods in scientific inference, however, has been called into question.
In statistics, the likelihood principle is the proposition that, given a statistical model, all the evidence in a sample relevant to model parameters is contained in the likelihood function.
The word probability has been used in a variety of ways since it was first applied to the mathematical study of games of chance. Does probability measure the real, physical, tendency of something to occur, or is it a measure of how strongly one believes it will occur, or does it draw on both these elements? In answering such questions, mathematicians interpret the probability values of probability theory.
Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability. Inferential statistical analysis infers properties of a population, for example by testing hypotheses and deriving estimates. It is assumed that the observed data set is sampled from a larger population.
A statistical hypothesis test is a method of statistical inference used to decide whether the data sufficiently supports a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. Then a decision is made, either by comparing the test statistic to a critical value or equivalently by evaluating a p-value computed from the test statistic. Roughly 100 specialized statistical tests have been defined.
Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Fundamentally, Bayesian inference uses prior knowledge, in the form of a prior distribution in order to estimate posterior probabilities. Bayesian inference is an important technique in statistics, and especially in mathematical statistics. Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range of activities, including science, engineering, philosophy, medicine, sport, and law. In the philosophy of decision theory, Bayesian inference is closely related to subjective probability, often called "Bayesian probability".
In statistics, interval estimation is the use of sample data to estimate an interval of possible values of a parameter of interest. This is in contrast to point estimation, which gives a single value.
In scientific research, the null hypothesis is the claim that the effect being studied does not exist. The null hypothesis can also be described as the hypothesis in which no relationship exists between two sets of data or variables being analyzed. If the null hypothesis is true, any experimentally observed effect is due to chance alone, hence the term "null". In contrast with the null hypothesis, an alternative hypothesis is developed, which claims that a relationship does exist between two variables.
Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability, where probability expresses a degree of belief in an event. The degree of belief may be based on prior knowledge about the event, such as the results of previous experiments, or on personal beliefs about the event. This differs from a number of other interpretations of probability, such as the frequentist interpretation, which views probability as the limit of the relative frequency of an event after many trials. More concretely, analysis in Bayesian methods codifies prior knowledge in the form of a prior distribution.
The Bayes factor is a ratio of two competing statistical models represented by their evidence, and is used to quantify the support for one model over the other. The models in question can have a common set of parameters, such as a null hypothesis and an alternative, but this is not necessary; for instance, it could also be a non-linear model compared to its linear approximation. The Bayes factor can be thought of as a Bayesian analog to the likelihood-ratio test, although it uses the integrated likelihood rather than the maximized likelihood. As such, both quantities only coincide under simple hypotheses. Also, in contrast with null hypothesis significance testing, Bayes factors support evaluation of evidence in favor of a null hypothesis, rather than only allowing the null to be rejected or not rejected.
In probability theory, inverse probability is an old term for the probability distribution of an unobserved variable.
Debabrata Basu was an Indian statistician who made fundamental contributions to the foundations of statistics. Basu invented simple examples that displayed some difficulties of likelihood-based statistics and frequentist statistics; Basu's paradoxes were especially important in the development of survey sampling. In statistical theory, Basu's theorem established the independence of a complete sufficient statistic and an ancillary statistic.
Fiducial inference is one of a number of different types of statistical inference. These are rules, intended for general application, by which conclusions can be drawn from samples of data. In modern statistical practice, attempts to work with fiducial inference have fallen out of fashion in favour of frequentist inference, Bayesian inference and decision theory. However, fiducial inference is important in the history of statistics since its development led to the parallel development of concepts and tools in theoretical statistics that are widely used. Some current research in statistical methodology is either explicitly linked to fiducial inference or is closely connected to it.
Statistics, in the modern sense of the word, began evolving in the 18th century in response to the novel needs of industrializing sovereign states.
Frequentist inference is a type of statistical inference based in frequentist probability, which treats “probability” in equivalent terms to “frequency” and draws conclusions from sample-data by means of emphasizing the frequency or proportion of findings in the data. Frequentist inference underlies frequentist statistics, in which the well-established methodologies of statistical hypothesis testing and confidence intervals are founded.
Statistical proof is the rational demonstration of degree of certainty for a proposition, hypothesis or theory that is used to convince others subsequent to a statistical test of the supporting evidence and the types of inferences that can be drawn from the test scores. Statistical methods are used to increase the understanding of the facts and the proof demonstrates the validity and logic of inference with explicit reference to a hypothesis, the experimental data, the facts, the test, and the odds. Proof has two essential aims: the first is to convince and the second is to explain the proposition through peer and public review.
The philosophy of statistics is the study of the mathematical, conceptual, and philosophical foundations and analyses of statistics and statistical inference. For example, Dennis Lindely argues for the more general analysis of statistics as the study of uncertainty. The subject involves the meaning, justification, utility, use and abuse of statistics and its methodology, and ethical and epistemological issues involved in the consideration of choice and interpretation of data and methods of statistics.
In marketing, Bayesian inference allows for decision making and market research evaluation under uncertainty and with limited data. The communication between marketer and market can be seen as a form of Bayesian persuasion.
Likelihoodist statistics or likelihoodism is an approach to statistics that exclusively or primarily uses the likelihood function. Likelihoodist statistics is a more minor school than the main approaches of Bayesian statistics and frequentist statistics, but has some adherents and applications. The central idea of likelihoodism is the likelihood principle: data are interpreted as evidence, and the strength of the evidence is measured by the likelihood function. Beyond this, there are significant differences within likelihood approaches: "orthodox" likelihoodists consider data only as evidence, and do not use it as the basis of statistical inference, while others make inferences based on likelihood, but without using Bayesian inference or frequentist inference. Likelihoodism is thus criticized for either not providing a basis for belief or action, or not satisfying the requirements of these other schools.
{{cite web}}
: CS1 maint: year (link) University of Houston lecture notes?{{cite book}}
: CS1 maint: multiple names: authors list (link){{cite book}}
: CS1 maint: multiple names: authors list (link)Principal components is an empirical approach while factor analysis and structural equation modeling tend to be theoretical approaches.(p 27)