Statistical proof

Last updated

Statistical proof is the rational demonstration of degree of certainty for a proposition, hypothesis or theory that is used to convince others subsequent to a statistical test of the supporting evidence and the types of inferences that can be drawn from the test scores. Statistical methods are used to increase the understanding of the facts and the proof demonstrates the validity and logic of inference with explicit reference to a hypothesis, the experimental data, the facts, the test, and the odds. Proof has two essential aims: the first is to convince and the second is to explain the proposition through peer and public review. [1]

Contents

The burden of proof rests on the demonstrable application of the statistical method, the disclosure of the assumptions, and the relevance that the test has with respect to a genuine understanding of the data relative to the external world. There are adherents to several different statistical philosophies of inference, such as Bayes theorem versus the likelihood function, or positivism versus critical rationalism. These methods of reason have direct bearing on statistical proof and its interpretations in the broader philosophy of science. [1] [2]

A common demarcation between science and non-science is the hypothetico-deductive proof of falsification developed by Karl Popper, which is a well-established practice in the tradition of statistics. Other modes of inference, however, may include the inductive and abductive modes of proof. [3] Scientists do not use statistical proof as a means to attain certainty, but to falsify claims and explain theory. Science cannot achieve absolute certainty nor is it a continuous march toward an objective truth as the vernacular as opposed to the scientific meaning of the term "proof" might imply. Statistical proof offers a kind of proof of a theory's falsity and the means to learn heuristically through repeated statistical trials and experimental error. [2] Statistical proof also has applications in legal matters with implications for the legal burden of proof. [4]

Axioms

There are two kinds of axioms, 1) conventions that are taken as true that should be avoided because they cannot be tested, and 2) hypotheses. [5] Proof in the theory of probability was built on four axioms developed in the late 17th century:

  1. The probability of a hypothesis is a non-negative real number: ;
  2. The probability of necessary truth equals one: ;
  3. If two hypotheses h1 and h2 are mutually exclusive, then the sum of their probabilities is equal to the probability of their disjunction: ;
  4. The conditional probability of h1 given h2 is equal to the unconditional probability of the conjunction h1 and h2, divided by the unconditional probability of h2 where that probability is positive , where .

The preceding axioms provide the statistical proof and basis for the laws of randomness, or objective chance from where modern statistical theory has advanced. Experimental data, however, can never prove that the hypotheses (h) is true, but relies on an inductive inference by measuring the probability of the hypotheses relative to the empirical data. The proof is in the rational demonstration of using the logic of inference, math, testing, and deductive reasoning of significance. [1] [2] [6]

Test and proof

The term proof descended from its Latin roots (provable, probable, probare L.) meaning to test. [7] [8] Hence, proof is a form of inference by means of a statistical test. Statistical tests are formulated on models that generate probability distributions. Examples of probability distributions might include the binary, normal, or poisson distribution that give exact descriptions of variables that behave according to natural laws of random chance. When a statistical test is applied to samples of a population, the test determines if the sample statistics are significantly different from the assumed null-model. True values of a population, which are unknowable in practice, are called parameters of the population. Researchers sample from populations, which provide estimates of the parameters, to calculate the mean or standard deviation. If the entire population is sampled, then the sample statistic mean and distribution will converge with the parametric distribution. [9]

Using the scientific method of falsification, the probability value that the sample statistic is sufficiently different from the null-model than can be explained by chance alone is given prior to the test. Most statisticians set the prior probability value at 0.05 or 0.1, which means if the sample statistics diverge from the parametric model more than 5 (or 10) times out of 100, then the discrepancy is unlikely to be explained by chance alone and the null-hypothesis is rejected. Statistical models provide exact outcomes of the parametric and estimates of the sample statistics. Hence, the burden of proof rests in the sample statistics that provide estimates of a statistical model. Statistical models contain the mathematical proof of the parametric values and their probability distributions. [10] [11]

Bayes theorem

Bayesian statistics are based on a different philosophical approach for proof of inference. The mathematical formula for Bayes's theorem is:

The formula is read as the probability of the parameter (or hypothesis =h, as used in the notation on axioms) “given” the data (or empirical observation), where the horizontal bar refers to "given". The right hand side of the formula calculates the prior probability of a statistical model (Pr [Parameter]) with the likelihood (Pr [Data | Parameter]) to produce a posterior probability distribution of the parameter (Pr [Parameter | Data]). The posterior probability is the likelihood that the parameter is correct given the observed data or samples statistics. [12] Hypotheses can be compared using Bayesian inference by means of the Bayes factor, which is the ratio of the posterior odds to the prior odds. It provides a measure of the data and if it has increased or decreased the likelihood of one hypothesis relative to another. [13]

The statistical proof is the Bayesian demonstration that one hypothesis has a higher (weak, strong, positive) likelihood. [13] There is considerable debate if the Bayesian method aligns with Karl Poppers method of proof of falsification, where some have suggested that "...there is no such thing as "accepting" hypotheses at all. All that one does in science is assign degrees of belief..." [14] :180 According to Popper, hypotheses that have withstood testing and have yet to be falsified are not verified but corroborated. Some researches have suggested that Popper's quest to define corroboration on the premise of probability put his philosophy in line with the Bayesian approach. In this context, the likelihood of one hypothesis relative to another may be an index of corroboration, not confirmation, and thus statistically proven through rigorous objective standing. [6] [15]

"Where gross statistical disparities can be shown, they alone may in a proper case constitute prima facie proof of a pattern or practice of discrimination." [nb 1] :271

Statistical proof in a legal proceeding can be sorted into three categories of evidence:

  1. The occurrence of an event, act, or type of conduct,
  2. The identity of the individual(s) responsible
  3. The intent or psychological responsibility [16]

Statistical proof was not regularly applied in decisions concerning United States legal proceedings until the mid 1970s following a landmark jury discrimination case in Castaneda v. Partida. The US Supreme Court ruled that gross statistical disparities constitutes " prima facie proof" of discrimination, resulting in a shift of the burden of proof from plaintiff to defendant. Since that ruling, statistical proof has been used in many other cases on inequality, discrimination, and DNA evidence. [4] [17] [18] However, there is not a one-to-one correspondence between statistical proof and the legal burden of proof. "The Supreme Court has stated that the degrees of rigor required in the fact finding processes of law and science do not necessarily correspond." [18] :1533

In an example of a death row sentence (McCleskey v. Kemp [nb 2] ) concerning racial discrimination, the petitioner, a black man named McCleskey was charged with the murder of a white police officer during a robbery. Expert testimony for McClesky introduced a statistical proof showing that "defendants charged with killing white victims were 4.3 times as likely to receive a death sentence as charged with killing blacks.". [19] :595 Nonetheless, the statistics was insufficient "to prove that the decisionmakers in his case acted with discriminatory purpose." [19] :596 It was further argued that there were "inherent limitations of the statistical proof", [19] :596 because it did not refer to the specifics of the individual. Despite the statistical demonstration of an increased probability of discrimination, the legal burden of proof (it was argued) had to be examined on a case-by-case basis. [19]

See also

Related Research Articles

In statistics, the likelihood principle is the proposition that, given a statistical model, all the evidence in a sample relevant to model parameters is contained in the likelihood function.

<span class="mw-page-title-main">Statistical inference</span> Process of using data analysis

Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability. Inferential statistical analysis infers properties of a population, for example by testing hypotheses and deriving estimates. It is assumed that the observed data set is sampled from a larger population.

A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters.

The likelihood function is the joint probability of observed data viewed as a function of the parameters of a statistical model.

In statistics, the likelihood-ratio test assesses the goodness of fit of two competing statistical models, specifically one found by maximization over the entire parameter space and another found after imposing some constraint, based on the ratio of their likelihoods. If the constraint is supported by the observed data, the two likelihoods should not differ by more than sampling error. Thus the likelihood-ratio test tests whether this ratio is significantly different from one, or equivalently whether its natural logarithm is significantly different from zero.

Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, and especially in mathematical statistics. Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range of activities, including science, engineering, philosophy, medicine, sport, and law. In the philosophy of decision theory, Bayesian inference is closely related to subjective probability, often called "Bayesian probability".

Nonparametric statistics is the type of statistics that is not restricted by assumptions concerning the nature of the population from which a sample is drawn. This is opposed to parametric statistics, for which a problem is restricted a priori by assumptions concerning the specific distribution of the population and parameters. Nonparametric statistics is based on either not assuming a particular distribution or having a distribution specified but with the distribution's parameters not specified in advance. Nonparametric statistics can be used for descriptive statistics or statistical inference. Nonparametric tests are often used when the assumptions of parametric tests are evidently violated.

In statistics, the power of a binary hypothesis test is the probability that the test correctly rejects the null hypothesis when a specific alternative hypothesis is true. It is commonly denoted by , and represents the chances of a true positive detection conditional on the actual existence of an effect to detect. Statistical power ranges from 0 to 1, and as the power of a test increases, the probability of making a type II error by wrongly failing to reject the null hypothesis decreases.

Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability where probability expresses a degree of belief in an event. The degree of belief may be based on prior knowledge about the event, such as the results of previous experiments, or on personal beliefs about the event. This differs from a number of other interpretations of probability, such as the frequentist interpretation that views probability as the limit of the relative frequency of an event after many trials.

In null-hypothesis significance testing, the p-value is the probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct. A very small p-value means that such an extreme observed outcome would be very unlikely under the null hypothesis. Even though reporting p-values of statistical tests is common practice in academic publications of many quantitative fields, misinterpretation and misuse of p-values is widespread and has been a major topic in mathematics and metascience. In 2016, the American Statistician Association (ASA) made a formal statement that "p-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone" and that "a p-value, or statistical significance, does not measure the size of an effect or the importance of a result" or "evidence regarding a model or hypothesis." That said, a 2019 task force by ASA has issued a statement on statistical significance and replicability, concluding with: "p-values and significance tests, when properly applied and interpreted, increase the rigor of the conclusions drawn from data."

The Bayes factor is a ratio of two competing statistical models represented by their evidence, and is used to quantify the support for one model over the other. The models in questions can have a common set of parameters, such as a null hypothesis and an alternative, but this is not necessary; for instance, it could also be a non-linear model compared to its linear approximation. The Bayes factor can be thought of as a Bayesian analog to the likelihood-ratio test, although it uses the (integrated) marginal likelihood rather than the maximized likelihood. As such, both quantities only coincide under simple hypotheses. Also, in contrast with null hypothesis significance testing, Bayes factors support evaluation of evidence in favor of a null hypothesis, rather than only allowing the null to be rejected or not rejected.

In statistics, the Wald test assesses constraints on statistical parameters based on the weighted distance between the unrestricted estimate and its hypothesized value under the null hypothesis, where the weight is the precision of the estimate. Intuitively, the larger this weighted distance, the less likely it is that the constraint is true. While the finite sample distributions of Wald tests are generally unknown, it has an asymptotic χ2-distribution under the null hypothesis, a fact that can be used to determine statistical significance.

In econometrics and statistics, the generalized method of moments (GMM) is a generic method for estimating parameters in statistical models. Usually it is applied in the context of semiparametric models, where the parameter of interest is finite-dimensional, whereas the full shape of the data's distribution function may not be known, and therefore maximum likelihood estimation is not applicable.

A permutation test is an exact statistical hypothesis test making use of the proof by contradiction. A permutation test involves two or more samples. The null hypothesis is that all samples come from the same distribution . Under the null hypothesis, the distribution of the test statistic is obtained by calculating all possible values of the test statistic under possible rearrangements of the observed data. Permutation tests are, therefore, a form of resampling.

In statistics, the Bonferroni correction is a method to counteract the multiple comparisons problem.

Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics that can be used to estimate the posterior distributions of model parameters.

Statistics concerns the collection, organization, analysis, interpretation, and presentation of data, and is used to solve practical problems and draw conclusions. When analyzing data, the approaches used can lead to different conclusions on the same data. For example, weather forecasts often vary among different forecasting agencies that use different algorithms and techniques. Conclusions drawn from statistical analysis often involve uncertainty as they represent the probability of an event occurring. For instance, a weather forecast indicating a 90% probability of rain means it will likely rain, while a 5% probability means it is unlikely to rain. The actual outcome, whether it rains or not, can only be determined after the event.

Frequentist inference is a type of statistical inference based in frequentist probability, which treats “probability” in equivalent terms to “frequency” and draws conclusions from sample-data by means of emphasizing the frequency or proportion of findings in the data. Frequentist-inference underlies frequentist statistics, in which the well-established methodologies of statistical hypothesis testing and confidence intervals are founded.

In statistics, a generalized p-value is an extended version of the classical p-value, which except in a limited number of applications, provides only approximate solutions.

<span class="mw-page-title-main">Bayesian inference in marketing</span>

In marketing, Bayesian inference allows for decision making and market research evaluation under uncertainty and with limited data.

References

  1. 1 2 3 Gold, B.; Simons, R. A. (2008). Proof and other dilemmas: Mathematics and philosophy. Mathematics Association of America Inc. ISBN   978-0-88385-567-6.
  2. 1 2 3 Gattei, S. (2008). Thomas Kuhn's "Linguistic Turn" and the Legacy of Logical Empiricism: Incommensurability, Rationality and the Search for Truth. Ashgate Pub Co. p. 277. ISBN   978-0-7546-6160-3.
  3. Pedemont, B. (2007). "How can the relationship between argumentation and proof be analysed?". Educational Studies in Mathematics. 66 (1): 23–41. doi:10.1007/s10649-006-9057-x. S2CID   121547580.
  4. 1 2 3 Meier, P. (1986). "Damned Liars and Expert Witnesses" (PDF). Journal of the American Statistical Association. 81 (394): 269–276. doi:10.1080/01621459.1986.10478270.
  5. Wiley, E. O. (1975). "Karl R. Popper, Systematics, and Classification: A Reply to Walter Bock and Other Evolutionary Taxonomists". Systematic Zoology . 24 (2): 233–43. doi:10.2307/2412764. ISSN   0039-7989. JSTOR   2412764.
  6. 1 2 Howson, Colin; Urbach, Peter (1991). "Bayesian reasoning in science". Nature . 350 (6317): 371–4. Bibcode:1991Natur.350..371H. doi:10.1038/350371a0. ISSN   1476-4687. S2CID   5419177.
  7. Sundholm, G. (1994). "Proof-Theoretical Semantics and Fregean Identity Criteria for Propositions" (PDF). The Monist. 77 (3): 294–314. doi:10.5840/monist199477315. hdl: 1887/11990 .
  8. Bissell, D. (1996). "Statisticians have a Word for it" (PDF). Teaching Statistics. 18 (3): 87–89. CiteSeerX   10.1.1.385.5823 . doi:10.1111/j.1467-9639.1996.tb00300.x.
  9. Sokal, R. R.; Rohlf, F. J. (1995). Biometry (3rd ed.). W.H. Freeman & Company. pp.  887. ISBN   978-0-7167-2411-7. biometry.
  10. Heath, David (1995). An introduction to experimental design and statistics for biology. CRC Press. ISBN   978-1-85728-132-3.
  11. Hald, Anders (2006). A History of Parametric Statistical Inference from Bernoulli to Fisher, 1713-1935. Springer. p. 260. ISBN   978-0-387-46408-4.
  12. Huelsenbeck, J. P.; Ronquist, F.; Bollback, J. P. (2001). "Bayesian Inference of Phylogeny and Its Impact on Evolutionary Biology" (PDF). Science. 294 (5550): 2310–2314. Bibcode:2001Sci...294.2310H. doi:10.1126/science.1065889. PMID   11743192. S2CID   2138288.
  13. 1 2 Wade, P. R. (2000). "Bayesian methods in conservation biology" (PDF). Conservation Biology. 14 (5): 1308–1316. doi:10.1046/j.1523-1739.2000.99415.x. S2CID   55853118.
  14. Sober, E. (1991). Reconstructing the Past: Parsimony, Evolution, and Inference. A Bradford Book. p. 284. ISBN   978-0-262-69144-4.
  15. Helfenbein, K. G.; DeSalle, R. (2005). "Falsifications and corroborations: Karl Popper's influence on systematics" (PDF). Molecular Phylogenetics and Evolution. 35 (1): 271–280. doi:10.1016/j.ympev.2005.01.003. PMID   15737596.
  16. Fienberg, S. E.; Kadane, J. B. (1983). "The presentation of Bayesian statistical analyses in legal proceedings". Journal of the Royal Statistical Society, Series D. 32 (1/2): 88–98. doi:10.2307/2987595. JSTOR   2987595.
  17. Garaud, M. C. (1990). "Legal Standards and Statistical Proof in Title VII Litigation: In Search of a Coherent Disparate Impact Model". University of Pennsylvania Law Review. 139 (2): 455–503. doi:10.2307/3312286. JSTOR   3312286.
  18. 1 2 The Harvard Law Review Association (1995). "Developments in the Law: Confronting the New Challenges of Scientific Evidence". Harvard Law Review. 108 (7): 1481–1605. doi:10.2307/1341808. JSTOR   1341808.
  19. 1 2 3 4 5 Faigman, D. L. (1991). "Normative Constitutional Fact-Finding": Exploring the Empirical Component of Constitutional Interpretation". University of Pennsylvania Law Review. 139 (3): 541–613. doi:10.2307/3312337. JSTOR   3312337.

Notes