# Inverse-variance weighting

Last updated

In statistics, inverse-variance weighting is a method of aggregating two or more random variables to minimize the variance of the weighted average. Each random variable is weighted in inverse proportion to its variance, i.e. proportional to its precision.

## Contents

Given a sequence of independent observations yi with variances σi2, the inverse-variance weighted average is given by [1]

${\displaystyle {\hat {y}}={\frac {\sum _{i}y_{i}/\sigma _{i}^{2}}{\sum _{i}1/\sigma _{i}^{2}}}.}$

The inverse-variance weighted average has the least variance among all weighted averages, which can be calculated as

${\displaystyle Var({\hat {y}})={\frac {1}{\sum _{i}1/\sigma _{i}^{2}}}.}$

If the variances of the measurements are all equal, then the inverse-variance weighted average becomes the simple average.

Inverse-variance weighting is typically used in statistical meta-analysis or sensor fusion to combine the results from independent measurements.

## Context

Suppose an experimenter wishes to measure the value of a quantity, say the acceleration due to gravity of Earth, whose true value happens to be ${\displaystyle \mu }$. A careful experimenter makes multiple measurements, which we denote with ${\displaystyle n}$ random variables ${\displaystyle X_{1},X_{2},...,X_{n}}$. If they are all noisy but unbiased, i.e., the measuring device does not systematically overestimate or underestimate the true value and the errors are scattered symmetrically, then the expectation value ${\displaystyle E[X_{i}]=\mu }$${\displaystyle \forall i}$. The scatter in the measurement is then characterised by the variance of the random variables ${\displaystyle Var(X_{i}):=\sigma _{i}^{2}}$, and if the measurements are performed under identical scenarios, then all the ${\displaystyle \sigma _{i}}$ are the same, which we shall refer to by ${\displaystyle \sigma }$. Given the ${\displaystyle n}$ measurements, a typical estimator for ${\displaystyle \mu }$, denoted as ${\displaystyle {\hat {\mu }}}$, is given by the simple average ${\displaystyle {\overline {X}}={\frac {1}{n}}\sum _{i}X_{i}}$. Note that this empirical average is also a random variable, whose expectation value ${\displaystyle E[{\overline {X}}]}$ is ${\displaystyle \mu }$ but also has a scatter. If the individual measurements are uncorrelated, the square of the error in the estimate is given by ${\displaystyle Var({\overline {X}})={\frac {1}{n^{2}}}\sum _{i}\sigma _{i}^{2}=\left({\frac {\sigma }{\sqrt {n}}}\right)^{2}}$. Hence, if all the ${\displaystyle \sigma _{i}}$ are equal, then the error in the estimate decreases with increase in ${\displaystyle n}$ as ${\displaystyle 1/{\sqrt {n}}}$, thus making more observations preferred.

Instead of ${\displaystyle n}$ repeated measurements with one instrument, if the experimenter makes ${\displaystyle n}$ of the same quantity with ${\displaystyle n}$ different instruments with varying quality of measurements, then there is no reason to expect the different ${\displaystyle \sigma _{i}}$ to be the same. Some instruments could be noisier than others. In the example of measuring the acceleration due to gravity, the different "instruments" could be measuring ${\displaystyle g}$ from a simple pendulum, from analysing a projectile motion etc. The simple average is no longer an optimal estimator, since the error in ${\displaystyle {\overline {X}}}$ might actually exceed the error in the least noisy measurement if different measurements have very different errors. Instead of discarding the noisy measurements that increase the final error, the experimenter can combine all the measurements with appropriate weights so as to give more importance to the least noisy measurements and vice versa. Given the knowledge of ${\displaystyle \sigma _{1}^{2},\sigma _{2}^{2},...,\sigma _{n}^{2}}$, an optimal estimator to measure ${\displaystyle \mu }$ would be a weighted mean of the measurements ${\displaystyle {\hat {\mu }}={\frac {\sum _{i}w_{i}X_{i}}{\sum _{i}w_{i}}}}$, for the particular choice of the weights ${\displaystyle w_{i}=1/\sigma _{i}^{2}}$. The variance of the estimator ${\displaystyle Var({\hat {\mu }})={\frac {\sum _{i}w_{i}^{2}\sigma _{i}^{2}}{\left(\sum _{i}w_{i}\right)^{2}}}}$, which for the optimal choice of the weights become ${\displaystyle Var({\hat {\mu }}_{\text{opt}})=\left(\sum _{i}\sigma _{i}^{-2}\right)^{-1}.}$

Note that since ${\displaystyle Var({\hat {\mu }}_{\text{opt}})<\min _{j}\sigma _{j}^{2}}$, the estimator has a scatter smaller than the scatter in any individual measurement. Furthermore, the scatter in ${\displaystyle {\hat {\mu }}_{\text{opt}}}$ decreases with adding more measurements, however noisier those measurements may be.

## Derivation

Consider a generic weighted sum ${\displaystyle Y=\sum _{i}w_{i}X_{i}}$, where the weights ${\displaystyle w_{i}}$ are normalised such that ${\displaystyle \sum _{i}w_{i}=1}$. If the ${\displaystyle X_{i}}$ are all independent, the variance of ${\displaystyle Y}$ is given by

${\displaystyle Var(Y)=\sum _{i}w_{i}^{2}\sigma _{i}^{2}.}$

For optimality, we wish to minimise ${\displaystyle Var(Y)}$ which can be done by equating the gradient with respect to the weights of ${\displaystyle Var(Y)}$ to zero, while maintaining the constraint that ${\displaystyle \sum _{i}w_{i}=1}$. Using a Lagrange multiplier ${\displaystyle w_{0}}$ to enforce the constraint, we express the variance

${\displaystyle Var(Y)=\sum _{i}w_{i}^{2}\sigma _{i}^{2}-w_{0}(\sum _{i}w_{i}-1).}$

For ${\displaystyle k>0}$,

${\displaystyle 0={\frac {\partial }{\partial w_{k}}}Var(Y)=2w_{k}\sigma _{k}^{2}-w_{0},}$

which implies that

${\displaystyle w_{k}={\frac {w_{0}/2}{\sigma _{k}^{2}}}.}$

The main takeaway here is that ${\displaystyle w_{k}\propto 1/\sigma _{k}^{2}}$. Since ${\displaystyle \sum _{i}w_{i}=1}$,

${\displaystyle {\frac {2}{w_{0}}}=\sum _{i}{\frac {1}{\sigma _{i}^{2}}}:={\frac {1}{\sigma _{0}^{2}}}.}$

The individual normalised weights are

${\displaystyle w_{k}={\frac {1}{\sigma _{k}^{2}}}\left(\sum _{i}{\frac {1}{\sigma _{i}^{2}}}\right)^{-1}.}$

It is easy to see that this extremum solution corresponds to the minimum from the second partial derivative test by noting that the variance is a quadratic function of the weights. Thus, the minimum variance of the estimator is then given by

${\displaystyle Var(Y)=\sum _{i}{\frac {\sigma _{0}^{4}}{\sigma _{i}^{4}}}\sigma _{i}^{2}=\sigma _{0}^{4}\sum _{i}{\frac {1}{\sigma _{i}^{2}}}=\sigma _{0}^{4}{\frac {1}{\sigma _{0}^{2}}}=\sigma _{0}^{2}={\frac {1}{\sum _{i}1/\sigma _{i}^{2}}}.}$

### Normal Distributions

For normally distributed random variables inverse-variance weighted averages can also be derived as the maximum likelihood estimate for the true value. Furthermore, from a Bayesian perspective the posterior distribution for the true value given normally distributed observations ${\displaystyle y_{i}}$ and a flat prior is a normal distribution with the inverse-variance weighted average as a mean and variance ${\displaystyle Var(Y)}$

## Multivariate Case

For multivariate distributions an equivalent argument leads to an optimal weighting based on the covariance matrices ${\displaystyle \Sigma _{i}}$ of the individual estimates ${\displaystyle x_{i}}$:

${\displaystyle {\hat {x}}=\left(\sum _{i}\Sigma _{i}^{-1}\right)^{-1}\sum _{i}\Sigma _{i}^{-1}x_{i}}$
${\displaystyle Var({\hat {x}})=\left(\sum _{i}\Sigma _{i}^{-1}\right)^{-1}}$

For multivariate distributions the term "precision-weighted" average is more commonly used.

## Related Research Articles

In probability theory, a normaldistribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is

In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range.

In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its mean. In other words, it measures how far a set of numbers is spread out from their average value. Variance has a central role in statistics, where some ideas that use it include descriptive statistics, statistical inference, hypothesis testing, goodness of fit, and Monte Carlo sampling. Variance is an important tool in the sciences, where statistical analysis of data is common. The variance is the square of the standard deviation, the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by , , or .

The weighted arithmetic mean is similar to an ordinary arithmetic mean, except that instead of each of the data points contributing equally to the final average, some data points contribute more than others. The notion of weighted mean plays a role in descriptive statistics and also occurs in a more general form in several other areas of mathematics.

The Allan variance (AVAR), also known as two-sample variance, is a measure of frequency stability in clocks, oscillators and amplifiers. It is named after David W. Allan and expressed mathematically as . The Allan deviation (ADEV), also known as sigma-tau, is the square root of the Allan variance, .

In probability theory, a log-normal distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed. Thus, if the random variable X is log-normally distributed, then Y = ln(X) has a normal distribution. Equivalently, if Y has a normal distribution, then the exponential function of Y, X = exp(Y), has a log-normal distribution. A random variable which is log-normally distributed takes only positive real values. It is a convenient and useful model for measurements in exact and engineering sciences, as well as medicine, economics and other topics.

In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator measures the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value. MSE is a risk function, corresponding to the expected value of the squared error loss. The fact that MSE is almost always strictly positive is because of randomness or because the estimator does not account for information that could produce a more accurate estimate.

In estimation theory and statistics, the Cramér–Rao bound (CRB) expresses a lower bound on the variance of unbiased estimators of a deterministic parameter, stating that the variance of any such estimator is at least as high as the inverse of the Fisher information. The result is named in honor of Harald Cramér and C. R. Rao, but has independently also been derived by Maurice Fréchet, Georges Darmois, as well as Alexander Aitken and Harold Silverstone.

In statistics, sometimes the covariance matrix of a multivariate random variable is not known but has to be estimated. Estimation of covariance matrices then deals with the question of how to approximate the actual covariance matrix on the basis of a sample from the multivariate distribution. Simple cases, where observations are complete, can be dealt with by using the sample covariance matrix. The sample covariance matrix (SCM) is an unbiased and efficient estimator of the covariance matrix if the space of covariance matrices is viewed as an extrinsic convex cone in Rp×p; however, measured using the intrinsic geometry of positive-definite matrices, the SCM is a biased and inefficient estimator. In addition, if the random variable has normal distribution, the sample covariance matrix has Wishart distribution and a slightly differently scaled version of it is the maximum likelihood estimate. Cases involving missing data require deeper considerations. Another issue is the robustness to outliers, to which sample covariance matrices are highly sensitive.

Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the measured data. An estimator attempts to approximate the unknown parameters using the measurements.

In probability theory, the inverse Gaussian distribution is a two-parameter family of continuous probability distributions with support on (0,∞).

In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function. Equivalently, it maximizes the posterior expectation of a utility function. An alternative way of formulating an estimator within Bayesian statistics is maximum a posteriori estimation.

In statistics, the bias of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. In statistics, "bias" is an objective property of an estimator. Bias can also be measured with respect to the median, rather than the mean, in which case one distinguishes median-unbiased from the usual mean-unbiasedness property. Bias is a distinct concept from consistency. Consistent estimators converge in probability to the true value of the parameter, but may be biased or unbiased; see bias versus consistency for more.

In statistics, Bessel's correction is the use of n − 1 instead of n in the formula for the sample variance and sample standard deviation, where n is the number of observations in a sample. This method corrects the bias in the estimation of the population variance. It also partially corrects the bias in the estimation of the population standard deviation. However, the correction often increases the mean squared error in these estimations. This technique is named after Friedrich Bessel.

In probability and statistics, the class of exponential dispersion models (EDM) is a set of probability distributions that represents a generalisation of the natural exponential family. Exponential dispersion models play an important role in statistical theory, in particular in generalized linear models because they have a special structure which enables deductions to be made about appropriate statistical inference.

In probability theory and statistics, the half-normal distribution is a special case of the folded normal distribution.

Experimental uncertainty analysis is a technique that analyses a derived quantity, based on the uncertainties in the experimentally measured quantities that are used in some form of mathematical relationship ("model") to calculate that derived quantity. The model used to convert the measurements into the derived quantity is usually based on fundamental principles of a science or engineering discipline.

Inverse probability weighting is a statistical technique for calculating statistics standardized to a pseudo-population different from that in which the data was collected. Study designs with a disparate sampling population and population of target inference are common in application. There may be prohibitive factors barring researchers from directly sampling from the target population such as cost, time, or ethical concerns. A solution to this problem is to use an alternate design strategy, e.g. stratified sampling. Weighting, when correctly applied, can potentially improve the efficiency and reduce the bias of unweighted estimators.

The multi-fractional order estimator (MFOE) is a straightforward, practical, and flexible alternative to the Kalman filter (KF) for tracking targets. The MFOE is focused strictly on simple and pragmatic fundamentals along with the integrity of mathematical modeling. Like the KF, the MFOE is based on the least squares method (LSM) invented by Gauss and the orthogonality principle at the center of Kalman's derivation. Optimized, the MFOE yields better accuracy than the KF and subsequent algorithms such as the extended KF and the interacting multiple model (IMM). The MFOE is an expanded form of the LSM, which effectively includes the KF and ordinary least squares (OLS) as subsets. OLS is revolutionized in for application in econometrics. The MFOE also intersects with signal processing, estimation theory, economics, finance, statistics, and the method of moments. The MFOE offers two major advances: (1) minimizing the mean squared error (MSE) with fractions of estimated coefficients and (2) describing the effect of deterministic OLS processing of statistical inputs

In statistics, effective sample size is a notion defined for a sample from a distribution when the observations in the sample are correlated or weighted. In 1965, Leslie Kish defined it as the original sample size divided by the design effect to reflect the variance from the current sampling design as compared to what would be if the sample was a simple random sample

## References

1. Joachim Hartung; Guido Knapp; Bimal K. Sinha (2008). . John Wiley & Sons. ISBN   978-0-470-29089-7.