1.96

Last updated

95% of the area under the normal distribution lies within 1.96 standard deviations away from the mean. NormalDist1.96.png
95% of the area under the normal distribution lies within 1.96 standard deviations away from the mean.

In probability and statistics, 1.96 is the approximate value of the 97.5 percentile point of the standard normal distribution. 95% of the area under a normal curve lies within roughly 1.96 standard deviations of the mean, and due to the central limit theorem, this number is therefore used in the construction of approximate 95% confidence intervals. Its ubiquity is due to the arbitrary but common convention of using confidence intervals with 95% coverage rather than other coverages (such as 90% or 99%). [1] [2] [3] [4] This convention seems particularly common in medical statistics, [5] [6] [7] but is also common in other areas of application, such as earth sciences, [8] social sciences and business research. [9]

Contents

There is no single accepted name for this number; it is also commonly referred to as the "standard normal deviate", "normal score" or "Z score" for the 97.5 percentile point, or .975 point.

If X has a standard normal distribution, i.e. X ~ N(0,1),

and as the normal distribution is symmetric,

One notation for this number is z.975. [10] From the probability density function of the standard normal distribution, the exact value of z.975 is determined by

History

Ronald Fisher Youngronaldfisher2.JPG
Ronald Fisher

The use of this number in applied statistics can be traced to the influence of Ronald Fisher's classic textbook, Statistical Methods for Research Workers, first published in 1925:

"The value for which P = .05, or 1 in 20, is 1.96 or nearly 2 ; it is convenient to take this point as a limit in judging whether a deviation is to be considered significant or not." [11]

In Table 1 of the same work, he gave the more precise value 1.959964. [12] In 1970, the value truncated to 20 decimal places was calculated to be

1.95996 39845 40054 23552... [13] [14]

The commonly used approximate value of 1.96 is therefore accurate to better than one part in 50,000, which is more than adequate for applied work.

Some people even use the value of 2 in the place of 1.96, reporting a 95.4% confidence interval as a 95% confidence interval. This is not recommended but is occasionally seen. [15]

Software functions

The inverse of the standard normal CDF can be used to compute the value. The following is a table of function calls that return 1.96 in some commonly used applications:

ApplicationFunction call
Excel NORM.S.INV(0.975)
MATLAB norminv(0.975)
R qnorm(0.975)
Python (SciPy) scipy.stats.norm.ppf(0.975)
SAS probit(0.025);
SPSS x = COMPUTE IDF.NORMAL(0.975,0,1).
Stata invnormal(0.975)
Wolfram Language (Mathematica)InverseCDF[NormalDistribution[0, 1], 0.975] [16] [17]

See also

Related Research Articles

Quantile Statistical method of dividing data into equal-sized intervals for analysis

In statistics and probability, quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities, or dividing the observations in a sample in the same way. There is one fewer quantile than the number of groups created. Common quantiles have special names, such as quartiles, deciles, and percentiles. The groups created are termed halves, thirds, quarters, etc., though sometimes the terms for the quantile are used for the groups created, rather than for the cut points.

Students <i>t</i>-distribution Probability distribution

In probability and statistics, Student's t-distribution is any member of a family of continuous probability distributions that arise when estimating the mean of a normally distributed population in situations where the sample size is small and the population's standard deviation is unknown. It was developed by English statistician William Sealy Gosset under the pseudonym "Student".

In medicine and health-related fields, a reference range or reference interval is the range or the interval of values that is deemed normal for a physiological measurement in healthy persons. It is a basis for comparison for a physician or other health professional to interpret a set of test results for a particular patient. Some important reference ranges in medicine are reference ranges for blood tests and reference ranges for urine tests.

Margin of error Statistic expressing the amount of random sampling error in a surveys results

The margin of error is a statistic expressing the amount of random sampling error in the results of a survey. The larger the margin of error, the less confidence one should have that a poll result would reflect the result of a survey of the entire population. The margin of error will be positive whenever a population is incompletely sampled and the outcome measure has positive variance, which is to say, the measure varies.

Confidence interval Range of estimates for an unknown parameter

In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated confidence level; the 95% confidence level is most common, but other levels, such as 90% or 99%, are sometimes used. The confidence level represents the long-run proportion of correspondingly CI that end up containing the true value of the parameter. For example, out of all intervals computed at the 95% level, 95% of them should contain the parameter's true value.

<i>Z</i>-test Statistical test

A Z-test is any statistical test for which the distribution of the test statistic under the null hypothesis can be approximated by a normal distribution. Z-tests test the mean of a distribution. For each significance level in the confidence interval, the Z-test has a single critical value which makes it more convenient than the Student's t-test whose critical values are defined by the sample size. Both the Z test and Student's t-test have similarities in that they both help determine the significance of a set of data. However, the z-test is rarely used in practice because the population deviation is difficult to determine.

In statistics, a k-thpercentile is a score below which a given percentage k of scores in its frequency distribution falls or a score at or below which a given percentage falls. For example, the 50th percentile is the score below which (exclusive) or at or below which (inclusive) 50% of the scores in the distribution may be found. Percentiles are expressed in the same unit of measurement as the input scores; for example, if the scores refer to human weight, the corresponding percentiles will be expressed in kilograms or pounds.

In statistical inference, specifically predictive inference, a prediction interval is an estimate of an interval in which a future observation will fall, with a certain probability, given what has already been observed. Prediction intervals are often used in regression analysis.

Standard error Statistical property

The standard error (SE) of a statistic is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the statistic is the sample mean, it is called the standard error of the mean (SEM).

In statistics, propagation of uncertainty is the effect of variables' uncertainties on the uncertainty of a function based on them. When the variables are the values of experimental measurements they have uncertainties due to measurement limitations which propagate due to the combination of variables in the function.

A tolerance interval is a statistical interval within which, with some confidence level, a specified proportion of a sampled population falls. "More specifically, a 100×p%/100×(1−α) tolerance interval provides limits within which at least a certain proportion (p) of the population falls with a given level of confidence (1−α)." "A tolerance interval (TI) based on a sample is constructed so that it would include at least a proportion p of the sampled population with confidence 1−α; such a TI is usually referred to as p-content − (1−α) coverage TI." "A upper tolerance limit (TL) is simply a 1−α upper confidence limit for the 100 p percentile of the population."

Sample size determination is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a population from a sample. In practice, the sample size used in a study is usually determined based on the cost, time, or convenience of collecting the data, and the need for it to offer sufficient statistical power. In complicated studies there may be several different sample sizes: for example, in a stratified survey there would be different sizes for each stratum. In a census, data is sought for an entire population, hence the intended sample size is equal to the population. In experimental design, where a study may be divided into different treatment groups, there may be different sample sizes for each group.

This glossary of statistics and probability is a list of definitions of terms and concepts used in the mathematical sciences of statistics and probability, their sub-disciplines, and related fields. For additional related terms, see Glossary of mathematics.

In Bayesian statistics, a credible interval is an interval within which an unobserved parameter value falls with a particular probability. It is an interval in the domain of a posterior probability distribution or a predictive distribution. The generalisation to multivariate problems is the credible region. Credible intervals are analogous to confidence intervals in frequentist statistics, although they differ on a philosophical basis: Bayesian intervals treat their bounds as fixed and the estimated parameter as a random variable, whereas frequentist confidence intervals treat their bounds as random variables and the parameter as a fixed value. Also, Bayesian credible intervals use knowledge of the situation-specific prior distribution, while the frequentist confidence intervals do not.

In statistics, a binomial proportion confidence interval is a confidence interval for the probability of success calculated from the outcome of a series of success–failure experiments. In other words, a binomial proportion confidence interval is an interval estimate of a success probability p when only the number of experiments n and the number of successes nS are known.

Bootstrapping is any test or metric that uses random sampling with replacement, and falls under the broader class of resampling methods. Bootstrapping assigns measures of accuracy to sample estimates. This technique allows estimation of the sampling distribution of almost any statistic using random sampling methods.

68–95–99.7 rule Shorthand used in statistics

In statistics, the 68–95–99.7 rule, also known as the empirical rule, is a shorthand used to remember the percentage of values that lie within an interval estimate in a normal distribution: 68%, 95%, and 99.7% of the values lie within one, two, and three standard deviations of the mean, respectively.

In statistics, robust measures of scale are methods that quantify the statistical dispersion in a sample of numerical data while resisting outliers. The most common such robust statistics are the interquartile range (IQR) and the median absolute deviation (MAD). These are contrasted with conventional or non-robust measures of scale, such as sample variance or standard deviation, which are greatly influenced by outliers.

Poisson distribution Discrete probability distribution

In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event. It is named after French mathematician Siméon Denis Poisson. The Poisson distribution can also be used for the number of events in other specified interval types such as distance, area or volume.

In statistics, a population proportion, generally denoted by or the Greek letter , is a parameter that describes a percentage value associated with a population. For example, the 2010 United States Census showed that 83.7% of the American population was identified as not being Hispanic or Latino; the value of .837 is a population proportion. In general, the population proportion and other population parameters are unknown. A census can be conducted in order to determine the actual value of a population parameter, but often a census is not practical due to its costs and time consumption.

References

  1. Rees, DG (1987), Foundations of Statistics, CRC Press, p. 246, ISBN   0-412-28560-6, Why 95% confidence? Why not some other confidence level? The use of 95% is partly convention, but levels such as 90%, 98% and sometimes 99.9% are also used.
  2. "Engineering Statistics Handbook: Confidence Limits for the Mean". National Institute of Standards and Technology. Archived from the original on 5 February 2008. Retrieved 4 February 2008. Although the choice of confidence coefficient is somewhat arbitrary, in practice 90%, 95%, and 99% intervals are often used, with 95% being the most commonly used.
  3. Olson, Eric T; Olson, Tammy Perry (2000), Real-Life Math: Statistics, Walch Publishing, p.  66, ISBN   0-8251-3863-9, While other stricter, or looser, limits may be chosen, the 95 percent interval is very often preferred by statisticians.
  4. Swift, MB. "Comparison of Confidence Intervals for a Poisson Mean - Further Considerations". Communications in Statistics - Theory and Methods. Vol. 38, no. 5. pp. 748–759. doi:10.1080/03610920802255856. In modern applied practice, almost all confidence intervals are stated at the 95% level.
  5. Simon, Steve (2002), Why 95% confidence limits?, archived from the original on 28 January 2008, retrieved 1 February 2008
  6. Moher, D; Schulz, KF; Altman, DG (2001), "The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomised trials.", Lancet, 357 (9263): 1191–1194, doi:10.1016/S0140-6736(00)04337-3, PMID   11323066, S2CID   52871971 , retrieved 4 February 2008
  7. "Resources for Authors: Research". BMJ Publishing Group Ltd. Archived from the original on 18 July 2009. Retrieved 2008-02-04. For standard original research articles please provide the following headings and information: [...] results - main results with (for quantitative studies) 95% confidence intervals and, where appropriate, the exact level of statistical significance and the number need to treat/harm
  8. Borradaile, Graham J. (2003), Statistics of Earth Science Data, Springer, p. 79, ISBN   3-540-43603-0, For simplicity, we adopt the common earth sciences convention of a 95% confidence interval.
  9. Cook, Sarah (2004), Measuring Customer Service Effectiveness, Gower Publishing, p. 24, ISBN   0-566-08538-0, Most researchers use a 95 per cent confidence interval
  10. Gosling, J. (1995), Introductory Statistics, Pascal Press, pp. 78–9, ISBN   1-86441-015-9
  11. Fisher, Ronald (1925), Statistical Methods for Research Workers , Edinburgh: Oliver and Boyd, p.  47, ISBN   0-05-002170-2
  12. Fisher, Ronald (1925), Statistical Methods for Research Workers , Edinburgh: Oliver and Boyd, ISBN   0-05-002170-2 , Table 1
  13. White, John S. (June 1970), "Tables of Normal Percentile Points", Journal of the American Statistical Association, American Statistical Association, 65 (330): 635–638, doi:10.2307/2284575, JSTOR   2284575
  14. Sloane, N. J. A. (ed.). "SequenceA220510". The On-Line Encyclopedia of Integer Sequences . OEIS Foundation.
  15. "Estimating the Population Mean Using Intervals". stat.wmich.edu. Statistical Computation Lab. Archived from the original on 4 July 2018. Retrieved 7 August 2018.
  16. InverseCDF, Wolfram Language Documentation Center.
  17. NormalDistribution, Wolfram Language Documentation Center.

Further reading