Binomial proportion confidence interval

Last updated

In statistics, a binomial proportion confidence interval is a confidence interval for the probability of success calculated from the outcome of a series of success–failure experiments (Bernoulli trials). In other words, a binomial proportion confidence interval is an interval estimate of a success probability p when only the number of experiments n and the number of successes nS are known.

Contents

There are several formulas for a binomial confidence interval, but all of them rely on the assumption of a binomial distribution. In general, a binomial distribution applies when an experiment is repeated a fixed number of times, each trial of the experiment has two possible outcomes (success and failure), the probability of success is the same for each trial, and the trials are statistically independent. Because the binomial distribution is a discrete probability distribution (i.e., not continuous) and difficult to calculate for large numbers of trials, a variety of approximations are used to calculate this confidence interval, all with their own tradeoffs in accuracy and computational intensity.

A simple example of a binomial distribution is the set of various possible outcomes, and their probabilities, for the number of heads observed when a coin is flipped ten times. The observed binomial proportion is the fraction of the flips that turn out to be heads. Given this observed proportion, the confidence interval for the true probability of the coin landing on heads is a range of possible proportions, which may or may not contain the true proportion. A 95% confidence interval for the proportion, for instance, will contain the true proportion 95% of the times that the procedure for constructing the confidence interval is employed. [1]

Normal approximation interval

Plotting the normal approximation interval on a logistic curve reveals problems of overshoot and zero-width intervals. Normal approx interval and logistic example.png
Plotting the normal approximation interval on a logistic curve reveals problems of overshoot and zero-width intervals.

A commonly used formula for a binomial confidence interval relies on approximating the distribution of error about a binomially-distributed observation, , with a normal distribution. [3] This approximation is based on the central limit theorem and is unreliable when the sample size is small or the success probability is close to 0 or 1. [4]

Using the normal approximation, the success probability p is estimated as

or the equivalent

where is the proportion of successes in a Bernoulli trial process, measured with trials yielding successes and failures, and is the quantile of a standard normal distribution (i.e., the probit) corresponding to the target error rate . For a 95% confidence level, the error , so and .

An important theoretical derivation of this confidence interval involves the inversion of a hypothesis test. Under this formulation, the confidence interval represents those values of the population parameter that would have large p-values if they were tested as a hypothesized population proportion. The collection of values, , for which the normal approximation is valid can be represented as

where is the quantile of a standard normal distribution. Since the test in the middle of the inequality is a Wald test, the normal approximation interval is sometimes called the Wald interval, but it was first described by Pierre-Simon Laplace in 1812. [5]

Standard error of a proportion estimation when using weighted data

Let there be a simple random sample where each is i.i.d from a Bernoulli(p) distribution and weight is the weight for each observation. Standardize the (positive) weights so they sum to 1. The weighted sample proportion is: . Since the are independent and each one has variance , the sampling variance of the proportion therefore is: [6]

.

The standard error of is the square root of this quantity. Because we do not know , we have to estimate it. Although there are many possible estimators, a conventional one is to use , the sample mean, and plug this into the formula. That gives:

For unweighted data, , giving . The SE becomes , leading to the familiar formulas, showing that the calculation for weighted data is a direct generalization of them.

Wilson score interval

Wilson score intervals plotted on a logistic curve, revealing asymmetry and good performance for small n and where p is at or near 0 or 1. Wilson score interval and logistic example.png
Wilson score intervals plotted on a logistic curve, revealing asymmetry and good performance for small n and where p is at or near 0 or 1.

The Wilson score interval is an improvement over the normal approximation interval in multiple respects. It was developed by Edwin Bidwell Wilson (1927). [7] Unlike the symmetric normal approximation interval (above), the Wilson score interval is asymmetric. It does not suffer from problems of overshoot and zero-width intervals that afflict the normal interval, and it may be safely employed with small samples and skewed observations. [3] The observed coverage probability is consistently closer to the nominal value, . [2]

Like the normal interval, but unlike the Clopper-Pearson interval, the interval can be computed directly from a formula.

Wilson started with the normal approximation to the binomial:

with the analytic formula for the sample standard deviation given by

.

Combining the two, and squaring out the radical, gives an equation that is quadratic in p:

Transforming the relation into a standard-form quadratic equation for p, treating and n as known values from the sample (see prior section), and using the value of z that corresponds to the desired confidence for the estimate of p gives this:

,

where all of the values in parentheses are known quantities. The solution for p estimates the upper and lower limits of the confidence interval for p. Hence the probability of success p is estimated by

or the equivalent

The practical observation from using this interval is that it has good properties even for a small number of trials and / or an extreme probability.

Intuitively, the center value of this interval is the weighted average of and , with receiving greater weight as the sample size increases. Formally, the center value corresponds to using a pseudocount of 1/2z², the number of standard deviations of the confidence interval: add this number to both the count of successes and of failures to yield the estimate of the ratio. For the common two standard deviations in each direction interval (approximately 95% coverage, which itself is approximately 1.96 standard deviations), this yields the estimate , which is known as the "plus four rule".

Although the quadratic can be solved explicitly, in most cases Wilson's equations can also be solved numerically using the fixed-point iteration

with .

The Wilson interval can also be derived from the single sample z-test or Pearson's chi-squared test with two categories. The resulting interval,

can then be solved for to produce the Wilson score interval. The test in the middle of the inequality is a score test.

The interval equality principle

The probability density function for the Wilson score interval, plus pdfs at interval bounds. Tail areas are equal. Wilson score pdf and interval equality.png
The probability density function for the Wilson score interval, plus pdfs at interval bounds. Tail areas are equal.

Since the interval is derived by solving from the normal approximation to the binomial, the Wilson score interval has the property of being guaranteed to obtain the same result as the equivalent z-test or chi-squared test.

This property can be visualised by plotting the probability density function for the Wilson score interval (see Wallis 2021: 297-313) [8] and then plotting a normal pdf at each bound. The tail areas of the resulting Wilson and normal distributions, representing the chance of a significant result in that direction, must be equal.

The continuity-corrected Wilson score interval and the Clopper-Pearson interval are also compliant with this property. The practical import is that these intervals may be employed as significance tests, with identical results to the source test, and new tests may be derived by geometry. [8]

Wilson score interval with continuity correction

The Wilson interval may be modified by employing a continuity correction, in order to align the minimum coverage probability, rather than the average coverage probability, with the nominal value, .

Just as the Wilson interval mirrors Pearson's chi-squared test, the Wilson interval with continuity correction mirrors the equivalent Yates' chi-squared test.

The following formulae for the lower and upper bounds of the Wilson score interval with continuity correction are derived from Newcombe (1998). [2]

However, if p = 0, must be taken as 0; if p = 1, is then 1.

Wallis (2021) [8] identifies a simpler method for computing continuity-corrected Wilson intervals that employs functions. For the lower bound, let , where is the selected error level for . Then . This method has the advantage of being further decomposable.

Jeffreys interval

The Jeffreys interval has a Bayesian derivation, but it has good frequentist properties. In particular, it has coverage properties that are similar to those of the Wilson interval, but it is one of the few intervals with the advantage of being equal-tailed (e.g., for a 95% confidence interval, the probabilities of the interval lying above or below the true value are both close to 2.5%). In contrast, the Wilson interval has a systematic bias such that it is centred too close to p = 0.5. [9]

The Jeffreys interval is the Bayesian credible interval obtained when using the non-informative Jeffreys prior for the binomial proportion p. The Jeffreys prior for this problem is a Beta distribution with parameters (1/2, 1/2), it is a conjugate prior. After observing x successes in n trials, the posterior distribution for p is a Beta distribution with parameters (x + 1/2, n  x + 1/2).

When x ≠0 and x  n, the Jeffreys interval is taken to be the 100(1  α)% equal-tailed posterior probability interval, i.e., the α/2 and 1  α/2 quantiles of a Beta distribution with parameters (x + 1/2, n  x + 1/2). These quantiles need to be computed numerically, although this is reasonably simple with modern statistical software.

In order to avoid the coverage probability tending to zero when p  0 or 1, when x = 0 the upper limit is calculated as before but the lower limit is set to 0, and when x = n the lower limit is calculated as before but the upper limit is set to 1. [4]

Clopper–Pearson interval

The Clopper–Pearson interval is an early and very common method for calculating binomial confidence intervals. [10] This is often called an 'exact' method, because it is based on the cumulative probabilities of the binomial distribution (i.e., exactly the correct distribution rather than an approximation). However, in cases where we know the population size, the intervals may not be the smallest possible. For instance, for a population of size 20 with true proportion of 50%, Clopper–Pearson gives [0.272, 0.728], which has width 0.456 (and where bounds are 0.0280 away from the "next achievable values" of 6/20 and 14/20); whereas Wilson's gives [0.299, 0.701], which has width 0.401 (and is 0.0007 away from the next achievable values).

The Clopper–Pearson interval can be written as

or equivalently,

with

where 0 ≤ xn is the number of successes observed in the sample and Bin(n; θ) is a binomial random variable with n trials and probability of success θ.

Equivalently we can say that the Clopper–Pearson interval is with confidence level if is the infimum of those such that the following tests of hypothesis succeed with significance :

  1. H0: with HA:
  2. H0: with HA: .

Because of a relationship between the binomial distribution and the beta distribution, the Clopper–Pearson interval is sometimes presented in an alternate format that uses quantiles from the beta distribution.

where x is the number of successes, n is the number of trials, and B(p; v,w) is the pth quantile from a beta distribution with shape parameters v and w.

Thus, , where:

The binomial proportion confidence interval is then , as follows from the relation between the Binomial distribution cumulative distribution function and the regularized incomplete beta function.

When is either or , closed-form expressions for the interval bounds are available: when the interval is and when it is . [11]

The beta distribution is, in turn, related to the F-distribution so a third formulation of the Clopper–Pearson interval can be written using F quantiles:

where x is the number of successes, n is the number of trials, and F(c; d1, d2) is the c quantile from an F-distribution with d1 and d2 degrees of freedom. [12]

The Clopper–Pearson interval is an exact interval since it is based directly on the binomial distribution rather than any approximation to the binomial distribution. This interval never has less than the nominal coverage for any population proportion, but that means that it is usually conservative. For example, the true coverage rate of a 95% Clopper–Pearson interval may be well above 95%, depending on n and θ. [4] Thus the interval may be wider than it needs to be to achieve 95% confidence. In contrast, it is worth noting that other confidence bounds may be narrower than their nominal confidence width, i.e., the normal approximation (or "standard") interval, Wilson interval, [7] Agresti–Coull interval, [12] etc., with a nominal coverage of 95% may in fact cover less than 95%. [4]

The definition of the Clopper–Pearson interval can also be modified to obtain exact confidence intervals for different distributions. For instance, it can also be applied to the case where the samples are drawn without replacement from a population of a known size, instead of repeated draws of a binomial distribution. In this case, the underlying distribution would be the hypergeometric distribution.

Agresti–Coull interval

The Agresti–Coull interval is also another approximate binomial confidence interval. [12]

Given successes in trials, define

and

Then, a confidence interval for is given by

where is the quantile of a standard normal distribution, as before (for example, a 95% confidence interval requires , thereby producing ). According to Brown, Cai, and DasGupta, [4] taking instead of 1.96 produces the "add 2 successes and 2 failures" interval previously described by Agresti and Coull. [12]

This interval can be summarised as employing the centre-point adjustment, , of the Wilson score interval, and then applying the Normal approximation to this point. [3] [4]

Arcsine transformation

The arcsine transformation has the effect of pulling out the ends of the distribution. [13] While it can stabilize the variance (and thus confidence intervals) of proportion data, its use has been criticized in several contexts. [14]

Let X be the number of successes in n trials and let p = X/n. The variance of p is

Using the arc sine transform the variance of the arcsine of p1/2 is [15]

So, the confidence interval itself has the following form:

where is the quantile of a standard normal distribution.

This method may be used to estimate the variance of p but its use is problematic when p is close to 0 or 1.

ta transform

Let p be the proportion of successes. For 0 ≤ a ≤ 2,

This family is a generalisation of the logit transform which is a special case with a = 1 and can be used to transform a proportional data distribution to an approximately normal distribution. The parameter a has to be estimated for the data set.

Rule of three - for when no successes are observed

The rule of three is used to provide a simple way of stating an approximate 95% confidence interval for p, in the special case that no successes () have been observed. [16] The interval is (0,3/n).

By symmetry, one could expect for only successes (), the interval is (1  3/n,1).

Comparison of different intervals

There are several research papers that compare these and other confidence intervals for the binomial proportion. [3] [2] [17] [18] Both Agresti and Coull (1998) [12] and Ross (2003) [19] point out that exact methods such as the Clopper–Pearson interval may not work as well as certain approximations. The Normal approximation interval and its presentation in textbooks has been heavily criticised, with many statisticians advocating that it be not used. [4] The principal problems are overshoot (bounds exceed [0, 1]), zero-width intervals at = 0 and 1 (falsely implying certainty), [2] and overall inconsistency with significance testing. [3]

Of the approximations listed above, Wilson score interval methods (with or without continuity correction) have been shown to be the most accurate and the most robust, [3] [4] [2] though some prefer the AgrestiCoull approach for larger sample sizes. [4] Wilson and Clopper-Pearson methods obtain consistent results with source significance tests, [8] and this property is decisive for many researchers.

Many of these intervals can be calculated in R using packages like "binom", or in Python using package "ebcic" (Exact Binomial Confidence Interval Calculator).

See also

Related Research Articles

Binomial distribution Probability distribution

In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question, and each with its own Boolean-valued outcome: success or failure. A single success/failure experiment is also called a Bernoulli trial or Bernoulli experiment, and a sequence of outcomes is called a Bernoulli process; for a single trial, i.e., n = 1, the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of statistical significance.

In statistics, the likelihood function measures the goodness of fit of a statistical model to a sample of data for given values of the unknown parameters. It is formed from the joint probability distribution of the sample, but viewed and used as a function of the parameters only, thus treating the random variables as fixed at the observed values.

Gamma distribution Probability distribution

In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-square distribution are special cases of the gamma distribution. There are three different parametrizations in common use:

  1. With a shape parameter k and a scale parameter θ.
  2. With a shape parameter α = k and an inverse scale parameter β = 1/θ, called a rate parameter.
  3. With a shape parameter k and a mean parameter μ = = α/β.

The power of a binary hypothesis test is the probability that the test rejects the null hypothesis when a specific alternative hypothesis is true — i.e., it indicates the probability of avoiding a type II error. The statistical power ranges from 0 to 1, and as statistical power increases, the probability of making a type II error decreases.

In statistics, a confidence interval (CI) is a type of estimate computed from the statistics of the observed data. This proposes a range of plausible values for an unknown parameter. The interval has an associated confidence level that the true parameter is in the proposed range. The confidence level is chosen by the investigator. For a given estimation in a given sample, using a higher confidence level generates a wider confidence interval. In general terms, a confidence interval for an unknown parameter is based on sampling the distribution of a corresponding estimator.

In statistical inference, specifically predictive inference, a prediction interval is an estimate of an interval in which a future observation will fall, with a certain probability, given what has already been observed. Prediction intervals are often used in regression analysis.

Sample size determination is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a population from a sample. In practice, the sample size used in a study is usually determined based on the cost, time, or convenience of collecting the data, and the need for it to offer sufficient statistical power. In complicated studies there may be several different sample sizes: for example, in a stratified survey there would be different sizes for each stratum. In a census, data is sought for an entire population, hence the intended sample size is equal to the population. In experimental design, where a study may be divided into different treatment groups, there may be different sample sizes for each group.

Rice distribution

In probability theory, the Rice distribution or Rician distribution is the probability distribution of the magnitude of a circularly-symmetric bivariate normal random variable, possibly with non-zero mean (noncentral). It was named after Stephen O. Rice.

In Bayesian probability, the Jeffreys prior, named after Sir Harold Jeffreys, is a non-informative (objective) prior distribution for a parameter space; its density function is proportional to the square root of the determinant of the Fisher information matrix:

Generalized inverse Gaussian distribution

In probability theory and statistics, the generalized inverse Gaussian distribution (GIG) is a three-parameter family of continuous probability distributions with probability density function

Empirical distribution function

In statistics, an empirical distribution function is the distribution function associated with the empirical measure of a sample. This cumulative distribution function is a step function that jumps up by 1/n at each of the n data points. Its value at any specified value of the measured variable is the fraction of observations of the measured variable that are less than or equal to the specified value.

Noncentral <i>t</i>-distribution

The noncentral t-distribution generalizes Student's t-distribution using a noncentrality parameter. Whereas the central probability distribution describes how a test statistic t is distributed when the difference tested is null, the noncentral distribution describes how t is distributed when the null is false. This leads to its use in statistics, especially calculating statistical power. The noncentral t-distribution is also known as the singly noncentral t-distribution, and in addition to its primary use in statistical inference, is also used in robust modeling for data.

In geometry, various formalisms exist to express a rotation in three dimensions as a mathematical transformation. In physics, this concept is applied to classical mechanics where rotational kinematics is the science of quantitative description of a purely rotational motion. The orientation of an object at a given instant is described with the same tools, as it is defined as an imaginary rotation from a reference placement in space, rather than an actually observed rotation from a previous placement in space.

Beta-binomial distribution

In probability theory and statistics, the beta-binomial distribution is a family of discrete probability distributions on a finite support of non-negative integers arising when the probability of success in each of a fixed or known number of Bernoulli trials is either unknown or random. The beta-binomial distribution is the binomial distribution in which the probability of success at each of n trials is not fixed but randomly drawn from a beta distribution. It is frequently used in Bayesian statistics, empirical Bayes methods and classical statistics to capture overdispersion in binomial type distributed data.

Neyman construction is a frequentist method to construct an interval at a confidence level such that if we repeat the experiment many times the interval will contain the true value of some parameter a fraction of the time. It is named after Jerzy Neyman.

A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.

In statistics, additive smoothing, also called Laplace smoothing, or Lidstone smoothing, is a technique used to smooth categorical data. Given an observation from a multinomial distribution with trials, a "smoothed" version of the data gives the estimator:

Poisson distribution Discrete probability distribution

In probability theory and statistics, the Poisson distribution, named after French mathematician Siméon Denis Poisson, is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event. The Poisson distribution can also be used for the number of events in other specified intervals such as distance, area or volume.

In statistical inference, the concept of a confidence distribution (CD) has often been loosely referred to as a distribution function on the parameter space that can represent confidence intervals of all levels for a parameter of interest. Historically, it has typically been constructed by inverting the upper limits of lower sided confidence intervals of all levels, and it was also commonly associated with a fiducial interpretation, although it is a purely frequentist concept. A confidence distribution is NOT a probability distribution function of the parameter of interest, but may still be a function useful for making inferences.

References

  1. Sullivan, Lisa (2017-10-27). "Confidence Intervals". Boston University School of Public Health.
  2. 1 2 3 4 5 6 Newcombe, R. G. (1998). "Two-sided confidence intervals for the single proportion: comparison of seven methods". Statistics in Medicine . 17 (8): 857–872. doi:10.1002/(SICI)1097-0258(19980430)17:8<857::AID-SIM777>3.0.CO;2-E. PMID   9595616.
  3. 1 2 3 4 5 6 Wallis, Sean A. (2013). "Binomial confidence intervals and contingency tests: mathematical fundamentals and the evaluation of alternative methods" (PDF). Journal of Quantitative Linguistics. 20 (3): 178–208. doi:10.1080/09296174.2013.799918. S2CID   16741749.
  4. 1 2 3 4 5 6 7 8 9 Brown, Lawrence D.; Cai, T. Tony; DasGupta, Anirban (2001). "Interval Estimation for a Binomial Proportion". Statistical Science. 16 (2): 101–133. CiteSeerX   10.1.1.50.3025 . doi:10.1214/ss/1009213286. MR   1861069. Zbl   1059.62533.
  5. Laplace, Pierre Simon (1812). Théorie analytique des probabilités (in French). Ve. Courcier. p. 283.
  6. How to calculate the standard error of a proportion using weighted data?
  7. 1 2 Wilson, E. B. (1927). "Probable inference, the law of succession, and statistical inference". Journal of the American Statistical Association. 22 (158): 209–212. doi:10.1080/01621459.1927.10502953. JSTOR   2276774.
  8. 1 2 3 4 Wallis, Sean A. (2021). Statistics in Corpus Linguistics - a new approach. New York: Routledge. ISBN   9781138589384.
  9. Cai, TT (2005). "One-sided confidence intervals in discrete distributions". Journal of Statistical Planning and Inference . 131 (1): 63–88. doi:10.1016/j.jspi.2004.01.005.
  10. Clopper, C.; Pearson, E. S. (1934). "The use of confidence or fiducial limits illustrated in the case of the binomial". Biometrika. 26 (4): 404–413. doi:10.1093/biomet/26.4.404.
  11. Thulin, Måns (2014-01-01). "The cost of using exact confidence intervals for a binomial proportion". Electronic Journal of Statistics. 8 (1): 817–840. arXiv: 1303.1288 . doi:10.1214/14-EJS909. ISSN   1935-7524. S2CID   88519382.
  12. 1 2 3 4 5 Agresti, Alan; Coull, Brent A. (1998). "Approximate is better than 'exact' for interval estimation of binomial proportions". The American Statistician. 52 (2): 119–126. doi:10.2307/2685469. JSTOR   2685469. MR   1628435.
  13. Holland, Steven. "Transformations of proportions and percentages". strata.uga.edu. Retrieved 2020-09-08.
  14. Warton, David I.; Hui, Francis K. C. (January 2011). "The arcsine is asinine: the analysis of proportions in ecology". Ecology. 92 (1): 3–10. doi:10.1890/10-0340.1. hdl: 1885/152287 . ISSN   0012-9658.
  15. Shao J (1998) Mathematical statistics. Springer. New York, New York, USA
  16. Steve Simon (2010) "Confidence interval with zero events", The Children's Mercy Hospital, Kansas City, Mo. (website: "Ask Professor Mean at Stats topics or Medical Research Archived October 15, 2011, at the Wayback Machine )
  17. Reiczigel, J (2003). "Confidence intervals for the binomial parameter: some new considerations" (PDF). Statistics in Medicine. 22 (4): 611–621. doi:10.1002/sim.1320. PMID   12590417.
  18. Sauro J., Lewis J.R. (2005) "Comparison of Wald, Adj-Wald, Exact and Wilson intervals Calculator" Archived 2012-06-18 at the Wayback Machine . Proceedings of the Human Factors and Ergonomics Society, 49th Annual Meeting (HFES 2005), Orlando, FL, pp. 2100–2104
  19. Ross, T. D. (2003). "Accurate confidence intervals for binomial proportion and Poisson rate estimation". Computers in Biology and Medicine. 33 (6): 509–531. doi:10.1016/S0010-4825(03)00019-2. PMID   12878234.