**Sample size determination** is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a population from a sample. In practice, the sample size used in a study is usually determined based on the cost, time, or convenience of collecting the data, and the need for it to offer sufficient statistical power. In complicated studies there may be several different sample sizes: for example, in a stratified survey there would be different sizes for each stratum. In a census, data is sought for an entire population, hence the intended sample size is equal to the population. In experimental design, where a study may be divided into different treatment groups, there may be different sample sizes for each group.

- Introduction
- Estimation
- Estimation of a proportion
- Estimation of a mean
- Required sample sizes for hypothesis tests
- Tables
- Mead's resource equation
- Cumulative distribution function
- Stratified sample size
- Qualitative research
- See also
- Notes
- References
- Further reading
- External links

Sample sizes may be chosen in several ways:

- using experience – small samples, though sometimes unavoidable, can result in wide confidence intervals and risk of errors in statistical hypothesis testing.
- using a target variance for an estimate to be derived from the sample eventually obtained, i.e. if a high precision is required (narrow confidence interval) this translates to a low target variance of the estimator.
- using a target for the power of a statistical test to be applied once the sample is collected.
- using a confidence level, i.e. the larger the required confidence level, the larger the sample size (given a constant precision requirement).

Larger sample sizes generally lead to increased precision when estimating unknown parameters. For example, if we wish to know the proportion of a certain species of fish that is infected with a pathogen, we would generally have a more precise estimate of this proportion if we sampled and examined 200 rather than 100 fish. Several fundamental facts of mathematical statistics describe this phenomenon, including the law of large numbers and the central limit theorem.

In some situations, the increase in precision for larger sample sizes is minimal, or even non-existent. This can result from the presence of systematic errors or strong dependence in the data, or if the data follows a heavy-tailed distribution.

Sample sizes may be evaluated by the quality of the resulting estimates. For example, if a proportion is being estimated, one may wish to have the 95% confidence interval be less than 0.06 units wide. Alternatively, sample size may be assessed based on the power of a hypothesis test. For example, if we are comparing the support for a certain political candidate among women with the support for that candidate among men, we may wish to have 80% power to detect a difference in the support levels of 0.04 units.

A relatively simple situation is estimation of a proportion. For example, we may wish to estimate the proportion of residents in a community who are at least 65 years old.

The estimator of a proportion is , where *X* is the number of 'positive' observations (e.g. the number of people out of the *n* sampled people who are at least 65 years old). When the observations are independent, this estimator has a (scaled) binomial distribution (and is also the sample mean of data from a Bernoulli distribution). The maximum variance of this distribution is 0.25, which occurs when the true parameter is *p* = 0.5. In practice, since *p* is unknown, the maximum variance is often used for sample size assessments. If a reasonable estimate for p is known the quantity may be used in place of 0.25.

For sufficiently large *n*, the distribution of will be closely approximated by a normal distribution.^{ [1] } Using this and the Wald method for the binomial distribution, yields a confidence interval of the form

- ,
- where Z is a standard Z-score for the desired level of confidence (1.96 for a 95% confidence interval).

If we wish to have a confidence interval that is *W* units total in width (W/2 on each side of the sample mean), we would solve

for *n*, yielding the sample size

, in the case of using .5 as the most conservative estimate of the proportion. *(Note: W/2 = margin of error.)*

In the figure below one can observe how sample sizes for binomial proportions change given different confidence levels and margins of error.

Otherwise, the formula would be , which yields .

For example, if we are interested in estimating the proportion of the US population who supports a particular presidential candidate, and we want the width of 95% confidence interval to be at most 2 percentage points (0.02), then we would need a sample size of (1.96^{2})/(0.02^{2}) = 9604. It is reasonable to use the 0.5 estimate for p in this case because the presidential races are often close to 50/50, and it is also prudent to use a conservative estimate. The margin of error in this case is 1 percentage point (half of 0.02).

The foregoing is commonly simplified...

will form a 95% confidence interval for the true proportion. If this interval needs to be no more than *W* units wide, the equation

can be solved for *n*, yielding^{ [2] }^{ [3] }*n* = 4/*W*^{2} = 1/*B*^{2} where *B* is the error bound on the estimate, i.e., the estimate is usually given as *within ± B*. So, for *B* = 10% one requires *n* = 100, for *B* = 5% one needs *n* = 400, for *B* = 3% the requirement approximates to *n* = 1000, while for *B* = 1% a sample size of *n* = 10000 is required. These numbers are quoted often in news reports of opinion polls and other sample surveys. However, always remember that the results reported may not be the exact value as numbers are preferably rounded up. Knowing that the value of the *n* is the minimum number of sample points needed to acquire the desired result, the number of respondents then must lie on or above the minimum.

A proportion is a special case of a mean. When estimating the population mean using an independent and identically distributed (iid) sample of size *n*, where each data value has variance *σ*^{2}, the standard error of the sample mean is:

This expression describes quantitatively how the estimate becomes more precise as the sample size increases. Using the central limit theorem to justify approximating the sample mean with a normal distribution yields a confidence interval of the form

- ,
- where Z is a standard Z-score for the desired level of confidence (1.96 for a 95% confidence interval).

If we wish to have a confidence interval that is *W* units total in width (W/2 on each side of the sample mean), we would solve

for *n*, yielding the sample size

*. (Note: W/2 = margin of error.)*

For example, if we are interested in estimating the amount by which a drug lowers a subject's blood pressure with a 95% confidence interval that is six units wide, and we know that the standard deviation of blood pressure in the population is 15, then the required sample size is , which would be rounded up to 97, because the obtained value is the *minimum* sample size, and sample sizes must be integers and must lie on or above the calculated minimum.

A common problem faced by statisticians is calculating the sample size required to yield a certain power for a test, given a predetermined Type I error rate α. As follows, this can be estimated by pre-determined tables for certain values, by Mead's resource equation, or, more generally, by the cumulative distribution function:

^{ [4] }Power | Cohen's d | ||
---|---|---|---|

0.2 | 0.5 | 0.8 | |

0.25 | 84 | 14 | 6 |

0.50 | 193 | 32 | 13 |

0.60 | 246 | 40 | 16 |

0.70 | 310 | 50 | 20 |

0.80 | 393 | 64 | 26 |

0.90 | 526 | 85 | 34 |

0.95 | 651 | 105 | 42 |

0.99 | 920 | 148 | 58 |

The table shown on the right can be used in a two-sample t-test to estimate the sample sizes of an experimental group and a control group that are of equal size, that is, the total number of individuals in the trial is twice that of the number given, and the desired significance level is 0.05.^{ [4] } The parameters used are:

- The desired statistical power of the trial, shown in column to the left.
- Cohen's d (= effect size), which is the expected difference between the means of the target values between the experimental group and the control group, divided by the expected standard deviation.

Mead's resource equation is often used for estimating sample sizes of laboratory animals, as well as in many other laboratory experiments. It may not be as accurate as using other methods in estimating sample size, but gives a hint of what is the appropriate sample size where parameters such as expected standard deviations or expected differences in values between groups are unknown or very hard to estimate.^{ [5] }

All the parameters in the equation are in fact the degrees of freedom of the number of their concepts, and hence, their numbers are subtracted by 1 before insertion into the equation.

The equation is:^{ [5] }

where:

*N*is the total number of individuals or units in the study (minus 1)*B*is the*blocking component*, representing environmental effects allowed for in the design (minus 1)*T*is the*treatment component*, corresponding to the number of treatment groups (including control group) being used, or the number of questions being asked (minus 1)*E*is the degrees of freedom of the*error component*, and should be somewhere between 10 and 20.

For example, if a study using laboratory animals is planned with four treatment groups (*T*=3), with eight animals per group, making 32 animals total (*N*=31), without any further stratification (*B*=0), then *E* would equal 28, which is above the cutoff of 20, indicating that sample size may be a bit too large, and six animals per group might be more appropriate.^{ [6] }

Let *X _{i}*,

and an alternative hypothesis:

for some 'smallest significant difference' *μ*^{*} > 0. This is the smallest value for which we care about observing a difference. Now, if we wish to (1) reject *H*_{0} with a probability of at least 1 − *β* when *H*_{a} is true (i.e. a power of 1 − *β*), and (2) reject *H*_{0} with probability α when *H*_{0} is true, then we need the following:

If *z*_{α} is the upper α percentage point of the standard normal distribution, then

and so

- 'Reject
*H*_{0}if our sample average () is more than '

is a decision rule which satisfies (2). (This is a 1-tailed test.)

Now we wish for this to happen with a probability at least 1 − *β* when *H*_{a} is true. In this case, our sample average will come from a Normal distribution with mean μ^{*}. Therefore, we require

Through careful manipulation, this can be shown (see Statistical power#Example) to happen when

where is the normal cumulative distribution function.

With more complicated sampling techniques, such as stratified sampling, the sample can often be split up into sub-samples. Typically, if there are *H* such sub-samples (from *H* different strata) then each of them will have a sample size *n _{h}*,

There are many reasons to use stratified sampling:^{ [7] } to decrease variances of sample estimates, to use partly non-random methods, or to study strata individually. A useful, partly non-random method would be to sample individuals where easily accessible, but, where not, sample clusters to save travel costs.^{ [8] }

In general, for *H* strata, a weighted sample mean is

with

^{ [9] }

The weights, , frequently, but not always, represent the proportions of the population elements in the strata, and . For a fixed sample size, that is ,

^{ [10] }

which can be made a minimum if the sampling rate within each stratum is made proportional to the standard deviation within each stratum: , where and is a constant such that .

An "optimum allocation" is reached when the sampling rates within the strata are made directly proportional to the standard deviations within the strata and inversely proportional to the square root of the sampling cost per element within the strata, :

^{ [11] }

where is a constant such that , or, more generally, when

^{ [12] }

Sample size determination in qualitative studies takes a different approach. It is generally a subjective judgment, taken as the research proceeds.^{ [13] } One approach is to continue to include further participants or material until saturation is reached.^{ [14] } The number needed to reach saturation has been investigated empirically.^{ [15] }^{ [16] }^{ [17] }^{ [18] }

There is a paucity of reliable guidance on estimating sample sizes before starting the research, with a range of suggestions given.^{ [16] }^{ [19] }^{ [20] }^{ [21] } A tool akin to a quantitative power calculation, based on the negative binomial distribution, has been suggested for thematic analysis.^{ [22] }^{ [21] }

- Design of experiments
- Engineering response surface example under Stepwise regression
- Cohen's h

- ↑ NIST/SEMATECH, "7.2.4.2. Sample sizes required",
*e-Handbook of Statistical Methods.* - ↑ "Inference for Regression".
*utdallas.edu*. - ↑ "Confidence Interval for a Proportion" Archived 2011-08-23 at the Wayback Machine
- 1 2 Chapter 13, page 215, in: Kenny, David A. (1987).
*Statistics for the social and behavioral sciences*. Boston: Little, Brown. ISBN 978-0-316-48915-7. - 1 2 Kirkwood, James; Robert Hubrecht (2010).
*The UFAW Handbook on the Care and Management of Laboratory and Other Research Animals*. Wiley-Blackwell. p. 29. ISBN 978-1-4051-7523-4. online Page 29 - ↑ Isogenic.info > Resource equation by Michael FW Festing. Updated Sept. 2006
- ↑ Kish (1965, Section 3.1)
- ↑ Kish (1965), p. 148.
- ↑ Kish (1965), p. 78.
- ↑ Kish (1965), p. 81.
- ↑ Kish (1965), p. 93.
- ↑ Kish (1965), p. 94.
- ↑ Sandelowski, M. (1995). Sample size in qualitative research.
*Research in Nursing & Health*, 18, 179–183 - ↑ Glaser, B. (1965). The constant comparative method of qualitative analysis.
*Social Problems*, 12, 436–445 - ↑ Francis, Jill J.; Johnston, Marie; Robertson, Clare; Glidewell, Liz; Entwistle, Vikki; Eccles, Martin P.; Grimshaw, Jeremy M. (2010). "What is an adequate sample size? Operationalising data saturation for theory-based interview studies" (PDF).
*Psychology & Health*.**25**(10): 1229–1245. doi:10.1080/08870440903194015. PMID 20204937. S2CID 28152749. - 1 2 Guest, Greg; Bunce, Arwen; Johnson, Laura (2006). "How Many Interviews Are Enough?".
*Field Methods*.**18**: 59–82. doi:10.1177/1525822X05279903. S2CID 62237589. - ↑ Wright, Adam; Maloney, Francine L.; Feblowitz, Joshua C. (2011). "Clinician attitudes toward and use of electronic problem lists: A thematic analysis".
*BMC Medical Informatics and Decision Making*.**11**: 36. doi:10.1186/1472-6947-11-36. PMC 3120635 . PMID 21612639. - ↑ Mason, Mark (2010). "Sample Size and Saturation in PhD Studies Using Qualitative Interviews".
*Forum Qualitative Sozialforschung*.**11**(3): 8. - ↑ Emmel, N. (2013).
*Sampling and choosing cases in qualitative research: A realist approach.*London: Sage. - ↑ Onwuegbuzie, Anthony J.; Leech, Nancy L. (2007). "A Call for Qualitative Power Analyses".
*Quality & Quantity*.**41**: 105–121. doi:10.1007/s11135-005-1098-1. S2CID 62179911. - 1 2 Fugard AJB; Potts HWW (10 February 2015). "Supporting thinking on sample sizes for thematic analyses: A quantitative tool" (PDF).
*International Journal of Social Research Methodology*.**18**(6): 669–684. doi: 10.1080/13645579.2015.1005453 . S2CID 59047474. - ↑ Galvin R (2015). How many interviews are enough? Do qualitative interviews in building energy consumption research produce reliable knowledge? Journal of Building Engineering, 1:2–12.

In statistics, the **standard deviation** is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range.

In probability theory, a **log-normal distribution** is a continuous probability distribution of a random variable whose logarithm is normally distributed. Thus, if the random variable X is log-normally distributed, then *Y* = ln(*X*) has a normal distribution. Equivalently, if Y has a normal distribution, then the exponential function of Y, *X* = exp(*Y*), has a log-normal distribution. A random variable which is log-normally distributed takes only positive real values. It is a convenient and useful model for measurements in exact and engineering sciences, as well as medicine, economics and other topics.

In probability and statistics, **Student's t-distribution** is any member of a family of continuous probability distributions that arise when estimating the mean of a normally-distributed population in situations where the sample size is small and the population's standard deviation is unknown. It was developed by English statistician William Sealy Gosset under the pseudonym "Student".

In probability theory, **Chebyshev's inequality** guarantees that, for a wide class of probability distributions, no more than a certain fraction of values can be more than a certain distance from the mean. Specifically, no more than 1/*k*^{2} of the distribution's values can be *k* or more standard deviations away from the mean. The rule is often called Chebyshev's theorem, about the range of standard deviations around the mean, in statistics. The inequality has great utility because it can be applied to any probability distribution in which the mean and variance are defined. For example, it can be used to prove the weak law of large numbers.

In probability theory and statistics, the **Rayleigh distribution** is a continuous probability distribution for nonnegative-valued random variables. It is essentially a chi distribution with two degrees of freedom.

The **margin of error** is a statistic expressing the amount of random sampling error in the results of a survey. The larger the margin of error, the less confidence one should have that a poll result would reflect the result of a survey of the entire population. The margin of error will be positive whenever a population is incompletely sampled and the outcome measure has positive variance, which is to say, the measure *varies*.

In statistics, a **confidence interval** (**CI**) is a type of estimate computed from the observed data. This gives a range of values for an unknown parameter. The interval has an associated **confidence level** that gives the probability with which an estimated interval will contain the true value of the parameter. The confidence level is chosen by the investigator. For a given estimation in a given sample, using a higher confidence level generates a wider confidence interval. In general terms, a confidence interval for an unknown parameter is based on sampling the distribution of a corresponding estimator.

A ** Z-test** is any statistical test for which the distribution of the test statistic under the null hypothesis can be approximated by a normal distribution. Z-tests test the mean of a distribution. For each significance level in the confidence interval, the

In statistics, an **effect size** is a number measuring the strength of the relationship between two variables in a population, or a sample-based estimate of that quantity. It can refer to the value of a statistic calculated from a sample of data, the value of a parameter for a hypothetical population, or to the equation that operationalizes how statistics or parameters lead to the effect size value. Examples of effect sizes include the correlation between two variables, the regression coefficient in a regression, the mean difference, or the risk of a particular event happening. Effect sizes complement statistical hypothesis testing, and play an important role in power analyses, sample size planning, and in meta-analyses. The cluster of data-analysis methods concerning effect sizes is referred to as estimation statistics.

In statistical inference, specifically predictive inference, a **prediction interval** is an estimate of an interval in which a future observation will fall, with a certain probability, given what has already been observed. Prediction intervals are often used in regression analysis.

The **standard error** (**SE**) of a statistic is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the statistic is the sample mean, it is called the **standard error of the mean** (**SEM**).

A **tolerance interval** is a statistical interval within which, with some confidence level, a specified proportion of a sampled population falls. "More specifically, a 100×p%/100×(1−α) tolerance interval provides limits within which at least a certain proportion (p) of the population falls with a given level of confidence (1−α)." "A tolerance interval (TI) based on a sample is constructed so that it would include at least a proportion p of the sampled population with confidence 1−α; such a TI is usually referred to as p-content − (1−α) coverage TI." "A upper **tolerance limit** (TL) is simply a 1−α upper confidence limit for the 100 p percentile of the population."

In mathematics, **Monte Carlo integration** is a technique for numerical integration using random numbers. It is a particular Monte Carlo method that numerically computes a definite integral. While other algorithms usually evaluate the integrand at a regular grid, Monte Carlo randomly chooses points at which the integrand is evaluated. This method is particularly useful for higher-dimensional integrals.

**Estimation theory** is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the measured data. An estimator attempts to approximate the unknown parameters using the measurements. Read In estimation theory, two approaches are generally considered.

In statistics, the **delta method** is a result concerning the approximate probability distribution for a function of an asymptotically normal statistical estimator from knowledge of the limiting variance of that estimator.

In statistics, a **binomial proportion confidence interval** is a confidence interval for the probability of success calculated from the outcome of a series of success–failure experiments. In other words, a binomial proportion confidence interval is an interval estimate of a success probability *p* when only the number of experiments *n* and the number of successes *n _{S}* are known.

In statistics, a **pivotal quantity** or **pivot** is a function of observations and unobservable parameters such that the function's probability distribution does not depend on the unknown parameters. A pivot quantity need not be a statistic—the function and its *value* can depend on the parameters of the model, but its *distribution* must not. If it is a statistic, then it is known as an *ancillary statistic.*

In probability theory and statistics, the **index of dispersion**, **dispersion index,****coefficient of dispersion,****relative variance**, or **variance-to-mean ratio (VMR)**, like the coefficient of variation, is a normalized measure of the dispersion of a probability distribution: it is a measure used to quantify whether a set of observed occurrences are clustered or dispersed compared to a standard statistical model.

In survey methodology, the **design effect** is the ratio between the variances of two estimators to some parameter of interest. Specifically the ratio of an actual variance of an estimator that is based on a sample from some sampling design, to the variance of an alternative estimator that would be calculated (hypothetically) using a sample from a simple random sample (SRS) of the same number of elements. It measures the expected effect of the design structure on the variance of some estimator of interest. The design effect is a positive real number that can indicate an inflation, or deflation in the variance of an estimator for some parameter, that is due to the study not using SRS.

In statistics, a **population proportion**, generally denoted by or the Greek letter , is a parameter that describes a percentage value associated with a population. For example, the 2010 United States Census showed that 83.7% of the American Population was identified as not being Hispanic or Latino; the value of .837 is a population proportion. In general, the population proportion and other population parameters are unknown. A census can be conducted in order to determine the actual value of a population parameter, but often a census is not practical due to its costs and time consumption.

- Bartlett, J. E., II; Kotrlik, J. W.; Higgins, C. (2001). "Organizational research: Determining appropriate sample size for survey research" (PDF).
*Information Technology, Learning, and Performance Journal*.**19**(1): 43–50. - Kish, L. (1965).
*Survey Sampling*. Wiley. ISBN 978-0-471-48900-9. - Smith, Scott (8 April 2013). "Determining Sample Size: How to Ensure You Get the Correct Sample Size".
*Qualtrics*. Retrieved 19 September 2018. - Israel, Glenn D. (1992). "Determining Sample Size".
*University of Florida, PEOD-6*. Retrieved 29 June 2019. - Rens van de Schoot, Milica Miočević (eds.). 2020. Small Sample Size Solutions (Open Access): A Guide for Applied Researchers and Practitioners. Routledge.

- NIST: Selecting Sample Sizes
- ASTM E122-07: Standard Practice for Calculating Sample Size to Estimate, With Specified Precision, the Average for a Characteristic of a Lot or Process

This page is based on this Wikipedia article

Text is available under the CC BY-SA 4.0 license; additional terms may apply.

Images, videos and audio are available under their respective licenses.

Text is available under the CC BY-SA 4.0 license; additional terms may apply.

Images, videos and audio are available under their respective licenses.