# Effective sample size

Last updated

In statistics, effective sample size is a notion defined for a sample from a distribution when the observations in the sample are correlated or weighted. In 1965, Leslie Kish defined it as the original sample size divided by the design effect to reflect the variance from the current sampling design as compared to what would be if the sample was a simple random sample   :162,259

## Correlated observations

Suppose a sample of several independent identically distributed observations $Y_{1},\dots ,Y_{n}$ is drawn from a distribution with mean $\mu$ and standard deviation $\sigma$ . Then the mean of this distribution is estimated by the mean of the sample:

${\hat {\mu }}={\frac {1}{n}}\sum _{i=1}^{n}Y_{i}.$ In that case, the variance of ${\hat {\mu }}$ is given by

$\operatorname {Var} ({\hat {\mu }})={\frac {\sigma ^{2}}{n}}$ However, if the observations in the sample are correlated (in the intraclass correlation sense), then $\operatorname {Var} ({\hat {\mu }})$ is somewhat higher. For instance, if all observations in the sample are completely correlated ($\rho _{(i,j)}=1$ ), then $\operatorname {Var} ({\hat {\mu }})=\sigma ^{2}$ regardless of $n$ .

The effective sample size $n_{\text{eff}}$ is the unique value (not necessarily an integer) such that

$\operatorname {Var} ({\hat {\mu }})={\frac {\sigma ^{2}}{n_{\text{eff}}}}.$ $n_{\text{eff}}$ is a function of the correlation between observations in the sample.

Suppose that all the (non-trivial) correlations are the same and greater than $-1/(n-1)$ , i.e. if $i\neq j$ , then $\rho _{(i,j)}=\rho >-1/(n-1)$ . Then

{\begin{aligned}\operatorname {Var} ({\hat {\mu }})&=\operatorname {Var} \left({\frac {1}{n}}Y_{1}+{\frac {1}{n}}Y_{2}+\cdots +{\frac {1}{n}}Y_{n}\right)\\[5pt]&=\sum _{i=1}^{n}{\frac {1}{n^{2}}}\operatorname {Var} (Y_{i})+\sum _{i=1}^{n}\sum _{j=1,j\neq i}^{n}{\frac {1}{n^{2}}}\operatorname {Cov} (Y_{i},Y_{j})\\[5pt]&=n{\frac {\sigma ^{2}}{n^{2}}}+n(n-1){\frac {\sigma ^{2}\rho }{n^{2}}}\\[5pt]&=\sigma ^{2}{\frac {1+(n-1)\rho }{n}}.\end{aligned}} Therefore

$n_{\text{eff}}={\frac {n}{1+(n-1)\rho }}.$ In the case where $\rho =0$ , then $n_{\text{eff}}=n$ . Similarly, if $\rho =1$ then $n_{\text{eff}}=1$ . And if $-1/(n-1)<\rho <0$ then $n_{\text{eff}}>n$ .

The case where the correlations are not uniform is somewhat more complicated. Note that if the correlation is negative, the effective sample size may be larger than the actual sample size. If we allow the more general form ${\hat {\mu }}=\sum _{i=1}^{n}a_{i}y_{i}$ (where $\sum _{i=1}^{n}a_{i}=1$ ) then it is possible to construct correlation matrices that have an $n_{\text{eff}}>n$ even when all correlations are positive. Intuitively, the maximal value of $n_{\text{eff}}$ over all choices of the coefficients $a_{i}$ may be thought of as the information content of the observed data.

## Weighted samples

If the data has been weighted (the weights don't have to be normalized, i.e. have their sum equal to 1 or n, or some other constant), then several observations composing a sample have been pulled from the distribution with effectively 100% correlation with some previous sample. In this case, the effect is known as Kish's Effective Sample Size   :162,259

$n_{\text{eff}}={\frac {n}{D_{\text{eff}}}}={\frac {n}{\frac {\overline {w^{2}}}{{\overline {w}}^{2}}}}={\frac {n}{\frac {{\frac {1}{n}}\sum _{i=1}^{n}w_{i}^{2}}{\left({\frac {1}{n}}\sum _{i=1}^{n}w_{i}\right)^{2}}}}={\frac {n}{\frac {n\sum _{i=1}^{n}w_{i}^{2}}{(\sum _{i=1}^{n}w_{i})^{2}}}}={\frac {(\sum _{i=1}^{n}w_{i})^{2}}{\sum _{i=1}^{n}w_{i}^{2}}}$ ## Related Research Articles In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range. In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its mean. In other words, it measures how far a set of numbers is spread out from their average value. Variance has a central role in statistics, where some ideas that use it include descriptive statistics, statistical inference, hypothesis testing, goodness of fit, and Monte Carlo sampling. Variance is an important tool in the sciences, where statistical analysis of data is common. The variance is the square of the standard deviation, the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by , , or .

The weighted arithmetic mean is similar to an ordinary arithmetic mean, except that instead of each of the data points contributing equally to the final average, some data points contribute more than others. The notion of weighted mean plays a role in descriptive statistics and also occurs in a more general form in several other areas of mathematics. In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value.

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of a probability distribution by maximizing a likelihood function, so that under the assumed statistical model the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

In probability theory, Chebyshev's inequality guarantees that, for a wide class of probability distributions, no more than a certain fraction of values can be more than a certain distance from the mean. Specifically, no more than 1/k2 of the distribution's values can be k or more standard deviations away from the mean. The rule is often called Chebyshev's theorem, about the range of standard deviations around the mean, in statistics. The inequality has great utility because it can be applied to any probability distribution in which the mean and variance are defined. For example, it can be used to prove the weak law of large numbers. In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. In the broadest sense correlation is any statistical association, though it commonly refers to the degree to which a pair of variables are linearly related. Familiar examples of dependent phenomena include the correlation between the height of parents and their offspring, and the correlation between the price of a good and the quantity the consumers are willing to purchase, as it is depicted in the so-called demand curve. In probability theory and statistics, a covariance matrix is a square matrix giving the covariance between each pair of elements of a given random vector. Any covariance matrix is symmetric and positive semi-definite and its main diagonal contains variances.

In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator measures the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value. MSE is a risk function, corresponding to the expected value of the squared error loss. The fact that MSE is almost always strictly positive is because of randomness or because the estimator does not account for information that could produce a more accurate estimate. In statistics, the Pearson correlation coefficient, also referred to as Pearson's r, the Pearson product-moment correlation coefficient (PPMCC), or the bivariate correlation, is a measure of linear correlation between two sets of data. It is the covariance of two variables, divided by the product of their standard deviations; thus it is essentially a normalised measurement of the covariance, such that the result always has a value between −1 and 1. As with covariance itself, the measure can only reflect a linear correlation of variables, and ignores many other types of relationship or correlation. As a simple example, one would expect the age and height of a sample of teenagers from a high school to have a Pearson correlation coefficient significantly greater than 0, but less than 1. In statistics, Spearman's rank correlation coefficient or Spearman's ρ, named after Charles Spearman and often denoted by the Greek letter (rho) or as , is a nonparametric measure of rank correlation. It assesses how well the relationship between two variables can be described using a monotonic function. In signal processing, cross-correlation is a measure of similarity of two series as a function of the displacement of one relative to the other. This is also known as a sliding dot product or sliding inner-product. It is commonly used for searching a long signal for a shorter, known feature. It has applications in pattern recognition, single particle analysis, electron tomography, averaging, cryptanalysis, and neurophysiology. The cross-correlation is similar in nature to the convolution of two functions. In an autocorrelation, which is the cross-correlation of a signal with itself, there will always be a peak at a lag of zero, and its size will be the signal energy. In statistics, particularly in hypothesis testing, the Hotelling's T-squared distribution (T2), proposed by Harold Hotelling, is a multivariate probability distribution that is tightly related to the F-distribution and is most notable for arising as the distribution of a set of sample statistics that are natural generalizations of the statistics underlying the Student's t-distribution.

In statistics, a pivotal quantity or pivot is a function of observations and unobservable parameters such that the function's probability distribution does not depend on the unknown parameters. A pivot quantity need not be a statistic—the function and its value can depend on the parameters of the model, but its distribution must not. If it is a statistic, then it is known as an ancillary statistic.

In statistics, the bias of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. In statistics, "bias" is an objective property of an estimator. Bias can also be measured with respect to the median, rather than the mean, in which case one distinguishes median-unbiased from the usual mean-unbiasedness property. Bias is a distinct concept from consistency. Consistent estimators converge in probability to the true value of the parameter, but may be biased or unbiased; see bias versus consistency for more.

A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.

In probability theory and statistics, the negative multinomial distribution is a generalization of the negative binomial distribution to more than two outcomes.

A product distribution is a probability distribution constructed as the distribution of the product of random variables having two other known distributions. Given two statistically independent random variables X and Y, the distribution of the random variable Z that is formed as the product

The generalized functional linear model (GFLM) is an extension of the generalized linear model (GLM) that allows one to regress univariate responses of various types on functional predictors, which are mostly random trajectories generated by a square-integrable stochastic processes. Similarly to GLM, a link function relates the expected value of the response variable to a linear predictor, which in case of GFLM is obtained by forming the scalar product of the random predictor function with a smooth parameter function . Functional Linear Regression, Functional Poisson Regression and Functional Binomial Regression, with the important Functional Logistic Regression included, are special cases of GFLM. Applications of GFLM include classification and discrimination of stochastic processes and functional data.

Batch normalization is a method used to make artificial neural networks faster and more stable through normalization of the layers' inputs by re-centering and re-scaling. It was proposed by Sergey Ioffe and Christian Szegedy in 2015.

1. Tom Leinster (December 18, 2014). "Effective Sample Size" (html).
2. Kish, Leslie (1965). "Survey Sampling". New York: John Wiley & Sons, Inc. ISBN   0-471-10949-5.Cite journal requires |journal= (help)