Almost sure hypothesis testing

Last updated

In statistics, almost sure hypothesis testing or a.s. hypothesis testing utilizes almost sure convergence in order to determine the validity of a statistical hypothesis with probability one. This is to say that whenever the null hypothesis is true, then an a.s. hypothesis test will fail to reject the null hypothesis w.p. 1 for all sufficiently large samples. Similarly, whenever the alternative hypothesis is true, then an a.s. hypothesis test will reject the null hypothesis with probability one, for all sufficiently large samples. Along similar lines, an a.s. confidence interval eventually contains the parameter of interest with probability 1. Dembo and Peres (1994) proved the existence of almost sure hypothesis tests.

Null hypothesis

In inferential statistics, the null hypothesis is a general statement or default position that there is no relationship between two measured phenomena, or no association among groups. Testing the null hypothesis—and thus concluding that there are or are not grounds for believing that there is a relationship between two phenomena —is a central task in the modern practice of science; the field of statistics gives precise criteria for rejecting a null hypothesis.

In statistical hypothesis testing, the alternative hypothesis and the null hypothesis are the two rival hypotheses which are compared by a statistical hypothesis test.

In statistics, a confidence interval (CI) is a type of interval estimate, computed from the statistics of the observed data, that might contain the true value of an unknown population parameter. The interval has an associated confidence level that, loosely speaking, quantifies the level of confidence that the parameter lies in the interval. More strictly speaking, the confidence level represents the frequency of possible confidence intervals that contain the true value of the unknown population parameter. In other words, if confidence intervals are constructed using a given confidence level from an infinite number of independent sample statistics, the proportion of those intervals that contain the true value of the parameter will be equal to the confidence level.

Contents

Description

For simplicity, assume we have a sequence of independent and identically distributed normal random variables, , with mean , and unit variance. Suppose that nature or simulation has chosen the true mean to be , then the probability distribution function of the mean, , is given by

where an Iverson bracket has been used. A naïve approach to estimating this distribution function would be to replace true mean on the right hand side with an estimate such as the sample mean, , but

In mathematics, the Iverson bracket, named after Kenneth E. Iverson, is a notation that generalises the Kronecker delta. It converts any logical proposition into a number that is 1 if the proposition is satisfied, and 0 otherwise, and is generally written by putting the proposition inside square brackets:

which means the approximation to the true distribution function will be off by 0.5 at the true mean. However, is nothing more than a one-sided 50% confidence interval; more generally, let be the critical value used in a one-sided confidence interval, then

If we set , then the error of the approximation is reduced from 0.5 to 0.05, which is a factor of 10. Of course, if we let , then

However, this only shows that the expectation is close to the limiting value. Naaman (2016) showed that setting the significance level at with results in a finite number of type I and type II errors w.p.1 under fairly mild regularity conditions. This means that for each , there exists an , such that for all ,

where the equality holds w.p. 1. So the indicator function of a one-sided a.s. confidence interval is a good approximation to the true distribution function.

Applications

Optional stopping

For example, suppose a researcher performed an experiment with a sample size of 10 and found no statistically significant result. Then suppose she decided to add one more observation, and retest continuing this process until a significant result was found. Under this scenario, given the initial batch of 10 observations resulted in an insignificant result, the probability that the experiment will be stopped at some finite sample size, , can be bounded using Boole's inequality

where . This compares favorably with fixed significance level testing which has a finite stopping time with probability one; however, this bound will not be meaningful for all sequences of significance level, as the above sum can be greater than one (setting would be one example). But even using that bandwidth, if the testing was done in batches of 10, then

which results in a relatively large probability that the process will never end.

Publication bias

As another example of the power of this approach, if an academic journal only accepts papers with p-values less than 0.05, then roughly 1 in 20 independent studies of the same effect would find a significant result when there was none. However, if the journal required a minimum sample size of 100 and a maximum significance level is given by , then one would expect roughly 1 in 250 studies would find an effect when there was none (if the minimum sample size was 30, it would still be 1 in 60). If the maximum significance level was given by (which will have better small sample performance with regard to type I error when multiple comparisons are a concern), one would expect roughly 1 in 10000 studies would find an effect when there was none (if the minimum sample size was 30, it would be 1 in 900). Additionally, A.S. hypothesis testing is robust to the multiple comparisons.

Jeffreys–Lindley paradox

Lindley's paradox occurs when

  1. The result is "significant" by a frequentist test, at, for example, the 5% level, indicating sufficient evidence to reject the null hypothesis, and
  2. The posterior probability of the null hypothesis is high, indicating strong evidence that the null hypothesis is in better agreement with the data than is the alternative hypothesis.

However, the paradox does not apply to a.s. hypothesis tests. The Bayesian and the frequentist will eventually reach the same conclusion.

See also

In statistics, the use of Bayes factors is a Bayesian alternative to classical hypothesis testing. Bayesian model comparison is a method of model selection based on Bayes factors. The models under consideration are statistical models. The aim of the Bayes factor is to quantify the support for a model over another, regardless of whether these models are correct. The technical definition of "support" in the context of Bayesian inference is described below.

Lindley's paradox is a counterintuitive situation in statistics in which the Bayesian and frequentist approaches to a hypothesis testing problem give different results for certain choices of the prior distribution. The problem of the disagreement between the two approaches was discussed in Harold Jeffreys' 1939 textbook; it became known as Lindley's paradox after Dennis Lindley called the disagreement a paradox in a 1957 paper.

Related Research Articles

Binomial distribution probability distribution

In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question, and each with its own boolean-valued outcome: a random variable containing a single bit of information: success/yes/true/one or failure/no/false/zero. A single success/failure experiment is also called a Bernoulli trial or Bernoulli experiment and a sequence of outcomes is called a Bernoulli process; for a single trial, i.e., n = 1, the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of statistical significance.

Kolmogorov–Smirnov test nonparametric statistical test

In statistics, the Kolmogorov–Smirnov test is a nonparametric test of the equality of continuous, one-dimensional probability distributions that can be used to compare a sample with a reference probability distribution, or to compare two samples. It is named after Andrey Kolmogorov and Nikolai Smirnov.

Normal distribution probability distribution

In probability theory, the normaldistribution is a very common continuous probability distribution. Normal distributions are important in statistics and are often used in the natural and social sciences to represent real-valued random variables whose distributions are not known. A random variable with a Gaussian distribution is said to be normally distributed and is called a normal deviate.

Exponential distribution probability distribution

In probability theory and statistics, the exponential distribution is the probability distribution that describes the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate. It is a particular case of the gamma distribution. It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless. In addition to being used for the analysis of Poisson point processes it is found in various other contexts.

Chi-squared distribution gamma distribution

In probability theory and statistics, the chi-squared distribution with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables. The chi-squared distribution is a special case of the gamma distribution and is one of the most widely used probability distributions in inferential statistics, notably in hypothesis testing or in construction of confidence intervals. When it is being distinguished from the more general noncentral chi-squared distribution, this distribution is sometimes called the central chi-squared distribution.

In probability theory, Chebyshev's inequality guarantees that, for a wide class of probability distributions, no more than a certain fraction of values can be more than a certain distance from the mean. Specifically, no more than 1/k2 of the distribution's values can be more than k standard deviations away from the mean. The rule is often called Chebyshev's theorem, about the range of standard deviations around the mean, in statistics. The inequality has great utility because it can be applied to any probability distribution in which the mean and variance are defined. For example, it can be used to prove the weak law of large numbers.

The power of a binary hypothesis test is the probability that the test rejects the null hypothesis (H0) when a specific alternative hypothesis (H1) is true. The statistical power ranges from 0 to 1, and as statistical power increases, the probability of making a type II error (wrongly failing to reject the null) decreases. For a type II error probability of β, the corresponding statistical power is 1 − β. For example, if experiment 1 has a statistical power of 0.7, and experiment 2 has a statistical power of 0.95, then there is a stronger probability that experiment 1 had a type II error than experiment 2, and experiment 2 is more reliable than experiment 1 due to the reduction in probability of a type II error. It can be equivalently thought of as the probability of accepting the alternative hypothesis (H1) when it is true—that is, the ability of a test to detect a specific effect, if that specific effect actually exists. That is,

In statistical inference, specifically predictive inference, a prediction interval is an estimate of an interval in which a future observation will fall, with a certain probability, given what has already been observed. Prediction intervals are often used in regression analysis.

The t-test is any statistical hypothesis test in which the test statistic follows a Student's t-distribution under the null hypothesis.

Consistent estimator Statistical estimator converging in probability to a true parameter as sample size increases

In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter θ0—having the property that as the number of data points used increases indefinitely, the resulting sequence of estimates converges in probability to θ0. This means that the distributions of the estimates become more and more concentrated near the true value of the parameter being estimated, so that the probability of the estimator being arbitrarily close to θ0 converges to one.

Uniform distribution (continuous) uniform distribution on an interval

In probability theory and statistics, the continuous uniform distribution or rectangular distribution is a family of symmetric probability distributions such that for each member of the family, all intervals of the same length on the distribution's support are equally probable. The support is defined by the two parameters, a and b, which are its minimum and maximum values. The distribution is often abbreviated U(a,b). It is the maximum entropy probability distribution for a random variable X under no constraint other than that it is contained in the distribution's support.

Noncentral <i>t</i>-distribution

As with other probability distributions with noncentrality parameters, the noncentral t-distribution generalizes a probability distribution – Student's t-distribution – using a noncentrality parameter. Whereas the central distribution describes how a test statistic t is distributed when the difference tested is null, the noncentral distribution describes how t is distributed when the null is false. This leads to its use in statistics, especially calculating statistical power. The noncentral t-distribution is also known as the singly noncentral t-distribution, and in addition to its primary use in statistical inference, is also used in robust modeling for data.

Inverse Gaussian distribution

In probability theory, the inverse Gaussian distribution is a two-parameter family of continuous probability distributions with support on (0,∞).

Truncated normal distribution

In probability and statistics, the truncated normal distribution is the probability distribution derived from that of a normally distributed random variable by bounding the random variable from either below or above. The truncated normal distribution has wide applications in statistics and econometrics. For example, it is used to model the probabilities of the binary outcomes in the probit model and to model censored data in the Tobit model.

Tukey's range test, also known as the Tukey's test, Tukey method, Tukey's honest significance test, or Tukey's HSD test, is a single-step multiple comparison procedure and statistical test. It can be used on raw data or in conjunction with an ANOVA to find means that are significantly different from each other. Named after John Tukey, it compares all possible pairs of means, and is based on a studentized range distribution (q). The Tukey HSD tests should not be confused with the Tukey Mean Difference tests.

The exponential mechanism is a technique for designing differentially private algorithms. It was developed by Frank McSherry and Kunal Talwar. Differential privacy is a technique for releasing statistical information about a database without revealing information about its individual entries.

Stochastic portfolio theory (SPT) is a mathematical theory for analyzing stock market structure and portfolio behavior introduced by E. Robert Fernholz in 2002. It is descriptive as opposed to normative, and is consistent with the observed behavior of actual markets. Normative assumptions, which serve as a basis for earlier theories like modern portfolio theory (MPT) and the capital asset pricing model (CAPM), are absent from SPT.

One of the application of Student's t-test is to test the location of one sequence of independent and identically distributed random variables. If we want to test the locations of multiple sequences of such variables, Šidák correction should be applied in order to calibrate the level of the Student's t-test. Moreover, if we want to test the locations of nearly infinitely many sequences of variables, then Šidák correction should be used, but with caution. More specifically, the validity of Šidák correction depends on how fast the number of sequences goes to infinity.

Additive disequilibrium (D) is a statistic that estimates the difference between observed genotypic frequencies and the genotypic frequencies that would be expected under Hardy–Weinberg equilibrium. At a biallelic locus with alleles 1 and 2, the additive disequilibrium exists according to the equations

References

The Electronic Journal of Statistics is an open access peer-reviewed scientific journal published by the Institute of Mathematical Statistics and the Bernoulli Society. It covers all aspects of statistics and the editor-in-chief is Domenico Marinucci. According to the Journal Citation Reports, the journal has a 2013 impact factor of 1.024. By 2017, the impact factor was recorded as 1.529.