Philosophy of statistics

Last updated

The philosophy of statistics is the study of the mathematical, conceptual, and philosophical foundations and analyses of statistics and statistical inference. For example, Dennis Lindely argues for the more general analysis of statistics as the study of uncertainty. [1] The subject involves the meaning, justification, utility, use and abuse of statistics and its methodology, and ethical and epistemological issues involved in the consideration of choice and interpretation of data and methods of statistics. [2]

Contents

Topics of interest

Notes

  1. Lindley, Dennis (September 2000). "The philosophy of statistics". Royal Statistical Society. 49 (3): 293–337. doi:10.1111/1467-9884.00238 . Retrieved June 21, 2024.
  2. 1 2 Romijn, Jan-Willem (2014). "Philosophy of statistics". Stanford Encyclopedia of Philosophy.
  3. Hacking 2006.
  4. Breiman 2001.
  5. Porter 1995.

Further reading

Related Research Articles

Bayesian probability is an interpretation of the concept of probability, in which, instead of frequency or propensity of some phenomenon, probability is interpreted as reasonable expectation representing a state of knowledge or as quantification of a personal belief.

<span class="mw-page-title-main">Frequentist probability</span> Interpretation of probability

Frequentist probability or frequentism is an interpretation of probability; it defines an event's probability as the limit of its relative frequency in many trials . Probabilities can be found by a repeatable objective process. The continued use of frequentist methods in scientific inference, however, has been called into question.

The word probability has been used in a variety of ways since it was first applied to the mathematical study of games of chance. Does probability measure the real, physical, tendency of something to occur, or is it a measure of how strongly one believes it will occur, or does it draw on both these elements? In answering such questions, mathematicians interpret the probability values of probability theory.

<span class="mw-page-title-main">Statistical inference</span> Process of using data analysis

Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability. Inferential statistical analysis infers properties of a population, for example by testing hypotheses and deriving estimates. It is assumed that the observed data set is sampled from a larger population.

The following outline is provided as an overview of and topical guide to statistics:

<span class="mw-page-title-main">Statistical hypothesis test</span> Method of statistical inference

A statistical hypothesis test is a method of statistical inference used to decide whether the data sufficiently support a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. Then a decision is made, either by comparing the test statistic to a critical value or equivalently by evaluating a p-value computed from the test statistic. Roughly 100 specialized statistical tests have been defined.

Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Fundamentally, Bayesian inference uses prior knowledge, in the form of a prior distribution in order to estimate posterior probabilities. Bayesian inference is an important technique in statistics, and especially in mathematical statistics. Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range of activities, including science, engineering, philosophy, medicine, sport, and law. In the philosophy of decision theory, Bayesian inference is closely related to subjective probability, often called "Bayesian probability".

In statistics, interval estimation is the use of sample data to estimate an interval of possible values of a parameter of interest. This is in contrast to point estimation, which gives a single value.

Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability, where probability expresses a degree of belief in an event. The degree of belief may be based on prior knowledge about the event, such as the results of previous experiments, or on personal beliefs about the event. This differs from a number of other interpretations of probability, such as the frequentist interpretation, which views probability as the limit of the relative frequency of an event after many trials. More concretely, analysis in Bayesian methods codifies prior knowledge in the form of a prior distribution.

<span class="mw-page-title-main">Credible interval</span> Concept in Bayesian statistics

In Bayesian statistics, a credible interval is an interval used to characterize a probability distribution. It is defined such that an unobserved parameter value has a particular probability to fall within it. For example, in an experiment that determines the distribution of possible values of the parameter , if the probability that lies between 35 and 45 is 0.95, then is a 95% credible interval.

Model selection is the task of selecting a model from among various candidates on the basis of performance criterion to choose the best one. In the context of machine learning and more generally statistical analysis, this may be the selection of a statistical model from a set of candidate models, given data. In the simplest cases, a pre-existing set of data is considered. However, the task can also involve the design of experiments such that the data collected is well-suited to the problem of model selection. Given candidate models of similar predictive or explanatory power, the simplest model is most likely to be the best choice.

Fiducial inference is one of a number of different types of statistical inference. These are rules, intended for general application, by which conclusions can be drawn from samples of data. In modern statistical practice, attempts to work with fiducial inference have fallen out of fashion in favour of frequentist inference, Bayesian inference and decision theory. However, fiducial inference is important in the history of statistics since its development led to the parallel development of concepts and tools in theoretical statistics that are widely used. Some current research in statistical methodology is either explicitly linked to fiducial inference or is closely connected to it.

Statistics, in the modern sense of the word, began evolving in the 18th century in response to the novel needs of industrializing sovereign states.

The foundations of statistics consist of the mathematical and philosophical basis for arguments and inferences made using statistics. This includes the justification for the methods of statistical inference, estimation and hypothesis testing, the quantification of uncertainty in the conclusions of statistical arguments, and the interpretation of those conclusions in probabilistic terms. A valid foundation can be used to explain statistical paradoxes such as Simpson's paradox, provide a precise description of observed statistical laws, and guide the application of statistical conclusions in social and scientific applications.

Frequentist inference is a type of statistical inference based in frequentist probability, which treats “probability” in equivalent terms to “frequency” and draws conclusions from sample-data by means of emphasizing the frequency or proportion of findings in the data. Frequentist inference underlies frequentist statistics, in which the well-established methodologies of statistical hypothesis testing and confidence intervals are founded.

In statistical inference, the concept of a confidence distribution (CD) has often been loosely referred to as a distribution function on the parameter space that can represent confidence intervals of all levels for a parameter of interest. Historically, it has typically been constructed by inverting the upper limits of lower sided confidence intervals of all levels, and it was also commonly associated with a fiducial interpretation, although it is a purely frequentist concept. A confidence distribution is NOT a probability distribution function of the parameter of interest, but may still be a function useful for making inferences.

<span class="mw-page-title-main">Bayesian inference in marketing</span>

In marketing, Bayesian inference allows for decision making and market research evaluation under uncertainty and with limited data. The communication between marketer and market can be seen as a form of Bayesian persuasion.

Likelihoodist statistics or likelihoodism is an approach to statistics that exclusively or primarily uses the likelihood function. Likelihoodist statistics is a more minor school than the main approaches of Bayesian statistics and frequentist statistics, but has some adherents and applications. The central idea of likelihoodism is the likelihood principle: data are interpreted as evidence, and the strength of the evidence is measured by the likelihood function. Beyond this, there are significant differences within likelihood approaches: "orthodox" likelihoodists consider data only as evidence, and do not use it as the basis of statistical inference, while others make inferences based on likelihood, but without using Bayesian inference or frequentist inference. Likelihoodism is thus criticized for either not providing a basis for belief or action, or not satisfying the requirements of these other schools.

In Bayesian statistics, the probability of direction (pd) is a measure of effect existence representing the certainty with which an effect is positive or negative. This index is numerically similar to the frequentist p-value.