In statistics, a **sampling distribution** or **finite-sample distribution** is the probability distribution of a given random-sample-based statistic. If an arbitrarily large number of samples, each involving multiple observations (data points), were separately used in order to compute one value of a statistic (such as, for example, the sample mean or sample variance) for each sample, then the sampling distribution is the probability distribution of the values that the statistic takes on. In many contexts, only one sample is observed, but the sampling distribution can be found theoretically.

Sampling distributions are important in statistics because they provide a major simplification en route to statistical inference. More specifically, they allow analytical considerations to be based on the probability distribution of a statistic, rather than on the joint probability distribution of all the individual sample values.

The **sampling distribution** of a statistic is the distribution of that statistic, considered as a random variable, when derived from a random sample of size . It may be considered as the distribution of the statistic for *all possible samples from the same population* of a given sample size. The sampling distribution depends on the underlying distribution of the population, the statistic being considered, the sampling procedure employed, and the sample size used. There is often considerable interest in whether the sampling distribution can be approximated by an asymptotic distribution, which corresponds to the limiting case either as the number of random samples of finite size, taken from an infinite population and used to produce the distribution, tends to infinity, or when just one equally-infinite-size "sample" is taken of that same population.

For example, consider a normal population with mean and variance . Assume we repeatedly take samples of a given size from this population and calculate the arithmetic mean for each sample – this statistic is called the sample mean. The distribution of these means, or averages, is called the "sampling distribution of the sample mean". This distribution is normal (*n* is the sample size) since the underlying population is normal, although sampling distributions may also often be close to normal even when the population distribution is not (see central limit theorem). An alternative to the sample mean is the sample median. When calculated from the same population, it has a different sampling distribution to that of the mean and is generally not normal (but it may be close for large sample sizes).

The mean of a sample from a population having a normal distribution is an example of a simple statistic taken from one of the simplest statistical populations. For other statistics and other populations the formulas are more complicated, and often they do not exist in closed-form. In such cases the sampling distributions may be approximated through Monte-Carlo simulations,^{ [1] } bootstrap methods, or asymptotic distribution theory.

The standard deviation of the sampling distribution of a statistic is referred to as the standard error of that quantity. For the case where the statistic is the sample mean, and samples are uncorrelated, the standard error is:

where is the standard deviation of the population distribution of that quantity and is the sample size (number of items in the sample).

An important implication of this formula is that the sample size must be quadrupled (multiplied by 4) to achieve half (1/2) the measurement error. When designing statistical studies where cost is a factor, this may have a role in understanding cost–benefit tradeoffs.

For the case where the statistic is the sample total, and samples are uncorrelated, the standard error is:

where, again, is the standard deviation of the population distribution of that quantity and is the sample size (number of items in the sample).

Population | Statistic | Sampling distribution |
---|---|---|

Normal: | Sample mean from samples of size n | . If the standard deviation is not known, one can consider , which follows the Student's t-distribution with degrees of freedom. Here is the sample variance, and is a pivotal quantity, whose distribution does not depend on . |

Bernoulli: | Sample proportion of "successful trials" | |

Two independent normal populations: and | Difference between sample means, | |

Any absolutely continuous distribution F with density f | Median from a sample of size n = 2k − 1, where sample is ordered to | |

Any distribution with distribution function F | Maximum from a random sample of size n |

In probability theory, a **normal****distribution** is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is

In statistics, the **standard deviation** is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range.

In probability theory and statistics, **variance** is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value. Variance has a central role in statistics, where some ideas that use it include descriptive statistics, statistical inference, hypothesis testing, goodness of fit, and Monte Carlo sampling. Variance is an important tool in the sciences, where statistical analysis of data is common. The variance is the square of the standard deviation, the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by , , , , or .

In probability and statistics, **Student's t-distribution** is any member of a family of continuous probability distributions that arise when estimating the mean of a normally distributed population in situations where the sample size is small and the population's standard deviation is unknown. It was developed by English statistician William Sealy Gosset under the pseudonym "Student".

The statistical **power** of a binary hypothesis test is the probability that the test correctly rejects the null hypothesis when a specific alternative hypothesis is true. It is commonly denoted by , and represents the chances of a "true positive" detection conditional on the actual existence of an effect to detect. Statistical power ranges from 0 to 1, and as the power of a test increases, the probability of making a type II error by wrongly failing to reject the null hypothesis decreases.

In the military science of ballistics, **circular error probable** (**CEP**) is a measure of a weapon system's precision. It is defined as the radius of a circle, centered on the mean, whose perimeter is expected to include the landing points of 50% of the rounds; said otherwise, it is the median error radius. That is, if a given munitions design has a CEP of 100 m, when 100 munitions are targeted at the same point, 50 will fall within a circle with a radius of 100 m around their average impact point.

A ** Z-test** is any statistical test for which the distribution of the test statistic under the null hypothesis can be approximated by a normal distribution. Z-tests test the mean of a distribution. For each significance level in the confidence interval, the

In statistics and optimization, **errors** and **residuals** are two closely related and easily confused measures of the deviation of an observed value of an element of a statistical sample from its "theoretical value". The **error** of an observed value is the deviation of the observed value from the (unobservable) *true* value of a quantity of interest, and the **residual** of an observed value is the difference between the observed value and the *estimated* value of the quantity of interest. The distinction is most important in regression analysis, where the concepts are sometimes called the **regression errors** and **regression residuals** and where they lead to the concept of studentized residuals.

In statistical inference, specifically predictive inference, a **prediction interval** is an estimate of an interval in which a future observation will fall, with a certain probability, given what has already been observed. Prediction intervals are often used in regression analysis.

The **standard error** (**SE**) of a statistic is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the statistic is the sample mean, it is called the **standard error of the mean** (**SEM**).

In statistics, **propagation of uncertainty** is the effect of variables' uncertainties on the uncertainty of a function based on them. When the variables are the values of experimental measurements they have uncertainties due to measurement limitations which propagate due to the combination of variables in the function.

In statistics, a **consistent estimator** or **asymptotically consistent estimator** is an estimator—a rule for computing estimates of a parameter *θ*_{0}—having the property that as the number of data points used increases indefinitely, the resulting sequence of estimates converges in probability to *θ*_{0}. This means that the distributions of the estimates become more and more concentrated near the true value of the parameter being estimated, so that the probability of the estimator being arbitrarily close to *θ*_{0} converges to one.

**Sample size determination** is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a population from a sample. In practice, the sample size used in a study is usually determined based on the cost, time, or convenience of collecting the data, and the need for it to offer sufficient statistical power. In complicated studies there may be several different sample sizes: for example, in a stratified survey there would be different sizes for each stratum. In a census, data is sought for an entire population, hence the intended sample size is equal to the population. In experimental design, where a study may be divided into different treatment groups, there may be different sample sizes for each group.

In statistics, a **pivotal quantity** or **pivot** is a function of observations and unobservable parameters such that the function's probability distribution does not depend on the unknown parameters. A pivot quantity need not be a statistic—the function and its *value* can depend on the parameters of the model, but its *distribution* must not. If it is a statistic, then it is known as an *ancillary statistic.*

**Bootstrapping** is any test or metric that uses random sampling with replacement, and falls under the broader class of resampling methods. Bootstrapping assigns measures of accuracy to sample estimates. This technique allows estimation of the sampling distribution of almost any statistic using random sampling methods.

In statistics, the **68–95–99.7 rule**, also known as the **empirical rule**, is a shorthand used to remember the percentage of values that lie within an interval estimate in a normal distribution: 68%, 95%, and 99.7% of the values lie within one, two, and three standard deviations of the mean, respectively.

In statistics and in particular statistical theory, **unbiased estimation of a standard deviation** is the calculation from a statistical sample of an estimated value of the standard deviation of a population of values, in such a way that the expected value of the calculation equals the true value. Except in some important situations, outlined later, the task has little relevance to applications of statistics since its need is avoided by standard procedures, such as the use of significance tests and confidence intervals, or by using Bayesian analysis.

**Exact statistics**, such as that described in exact test, is a branch of statistics that was developed to provide more accurate results pertaining to statistical testing and interval estimation by eliminating procedures based on asymptotic and approximate statistical methods. The main characteristic of exact methods is that statistical tests and confidence intervals are based on exact probability statements that are valid for any sample size.

In statistics, the ** t-statistic** is the ratio of the departure of the estimated value of a parameter from its hypothesized value to its standard error. It is used in hypothesis testing via Student's

In statistics and probability theory, the **nonparametric skew** is a statistic occasionally used with random variables that take real values. It is a measure of the skewness of a random variable's distribution—that is, the distribution's tendency to "lean" to one side or the other of the mean. Its calculation does not require any knowledge of the form of the underlying distribution—hence the name nonparametric. It has some desirable properties: it is zero for any symmetric distribution; it is unaffected by a scale shift; and it reveals either left- or right-skewness equally well. In some statistical samples it has been shown to be less powerful than the usual measures of skewness in detecting departures of the population from normality.

- ↑ Mooney, Christopher Z. (1999).
*Monte Carlo simulation*. Thousand Oaks, Calif.: Sage. p. 2. ISBN 9780803959439.

- Merberg, A. and S.J. Miller (2008). "The Sample Distribution of the Median".
*Course Notes for Math 162: Mathematical Statistics*, pgs 1–9.

This page is based on this Wikipedia article

Text is available under the CC BY-SA 4.0 license; additional terms may apply.

Images, videos and audio are available under their respective licenses.

Text is available under the CC BY-SA 4.0 license; additional terms may apply.

Images, videos and audio are available under their respective licenses.