Control variates

Last updated September 20, 2023

The control variates method is a variance reduction technique used in Monte Carlo methods. It exploits information about the errors in estimates of known quantities to reduce the error of an estimate of an unknown quantity.^[1]^[2]^[3]

Underlying principle

Let the unknown parameter of interest be $\mu$ , and assume we have a statistic $m$ such that the expected value of m is μ: $\mathbb {E} \left[m\right]=\mu$ , i.e. m is an unbiased estimator for μ. Suppose we calculate another statistic $t$ such that $\mathbb {E} \left[t\right]=\tau$ is a known value. Then

m^{\star }=m+c\left(t-\tau \right)\,

is also an unbiased estimator for $\mu$ for any choice of the coefficient $c$ . The variance of the resulting estimator $m^{\star }$ is

{\textrm {Var}}\left(m^{\star }\right)={\textrm {Var}}\left(m\right)+c^{2}\,{\textrm {Var}}\left(t\right)+2c\,{\textrm {Cov}}\left(m,t\right).

By differentiating the above expression with respect to $c$ , it can be shown that choosing the optimal coefficient

c^{\star }=-{\frac {{\textrm {Cov}}\left(m,t\right)}{{\textrm {Var}}\left(t\right)}}

minimizes the variance of $m^{\star }$ . (Note that this coefficient is the same as the coefficient obtained from a linear regression.) With this choice,

{\begin{aligned}{\textrm {Var}}\left(m^{\star }\right)&={\textrm {Var}}\left(m\right)-{\frac {\left[{\textrm {Cov}}\left(m,t\right)\right]^{2}}{{\textrm {Var}}\left(t\right)}}\\&=\left(1-\rho _{m,t}^{2}\right){\textrm {Var}}\left(m\right)\end{aligned}}

where

\rho _{m,t}={\textrm {Corr}}\left(m,t\right)\,

is the correlation coefficient of $m$ and $t$ . The greater the value of $\vert \rho _{m,t}\vert$ , the greater the variance reduction achieved.

In the case that ${\textrm {Cov}}\left(m,t\right)$ , ${\textrm {Var}}\left(t\right)$ , and/or $\rho _{m,t}\;$ are unknown, they can be estimated across the Monte Carlo replicates. This is equivalent to solving a certain least squares system; therefore this technique is also known as regression sampling.

When the expectation of the control variable, $\mathbb {E} \left[t\right]=\tau$ , is not known analytically, it is still possible to increase the precision in estimating $\mu$ (for a given fixed simulation budget), provided that the two conditions are met: 1) evaluating $t$ is significantly cheaper than computing $m$ ; 2) the magnitude of the correlation coefficient $|\rho _{m,t}|$ is close to unity. ^[3]

Example

We would like to estimate

I=\int _{0}^{1}{\frac {1}{1+x}}\,\mathrm {d} x

using Monte Carlo integration. This integral is the expected value of $f(U)$ , where

f(U)={\frac {1}{1+U}}

and U follows a uniform distribution [0, 1]. Using a sample of size n denote the points in the sample as $u_{1},\cdots ,u_{n}$ . Then the estimate is given by

I\approx {\frac {1}{n}}\sum _{i}f(u_{i}).

Now we introduce $g(U)=1+U$ as a control variate with a known expected value $\mathbb {E} \left[g\left(U\right)\right]=\int _{0}^{1}(1+x)\,\mathrm {d} x={\tfrac {3}{2}}$ and combine the two into a new estimate

I\approx {\frac {1}{n}}\sum _{i}f(u_{i})+c\left({\frac {1}{n}}\sum _{i}g(u_{i})-3/2\right).

Using $n=1500$ realizations and an estimated optimal coefficient $c^{\star }\approx 0.4773$ we obtain the following results

	Estimate	Variance
Classical estimate	0.69475	0.01947
Control variates	0.69295	0.00060

The variance was significantly reduced after using the control variates technique. (The exact result is $I=\ln 2\approx 0.69314718$ .)

Notes

↑ Lemieux, C. (2017). "Control Variates". Wiley StatsRef: Statistics Reference Online: 1–8. doi:10.1002/9781118445112.stat07947. ISBN 9781118445112.
↑ Glasserman, P. (2004). Monte Carlo Methods in Financial Engineering. New York: Springer. ISBN 0-387-00451-3 (p. 185)
1 2 Botev, Z.; Ridder, A. (2017). "Variance Reduction". Wiley StatsRef: Statistics Reference Online: 1–6. doi:10.1002/9781118445112.stat07975. ISBN 9781118445112.

Related Research Articles

The Cauchy distribution, named after Augustin Cauchy, is a continuous probability distribution. It is also known, especially among physicists, as the Lorentz distribution, Cauchy–Lorentz distribution, Lorentz(ian) function, or Breit–Wigner distribution. The Cauchy distribution $is the distribution of the x -intercept of a ray issuing from with a uniformly distributed angle. It is also the distribution of the ratio of two independent normally distributed random variables with mean zero.$

In probability theory and statistics, variance is the expectation of the squared deviation from the mean of a random variable. The standard deviation is obtained as the square root of the variance. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value. It is the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by $,,,, or .$

In probability theory, the central limit theorem (CLT) establishes that, in many situations, for independent and identically distributed random variables, the sampling distribution of the standardized sample mean tends towards the standard normal distribution even if the original variables themselves are not normally distributed.

<span class="mw-page-title-main">Multivariate normal distribution</span> Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value.

In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics it usually refers to the degree to which a pair of variables are linearly related. Familiar examples of dependent phenomena include the correlation between the height of parents and their offspring, and the correlation between the price of a good and the quantity the consumers are willing to purchase, as it is depicted in the so-called demand curve.

In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the lesser values, the covariance is positive. In the opposite case, when the greater values of one variable mainly correspond to the lesser values of the other,, the covariance is negative. The sign of the covariance, therefore, shows the tendency in the linear relationship between the variables. The magnitude of the covariance is the geometric mean of the variances that are in-common for the two random variables. The correlation coefficient normalizes the covariance by dividing by the geometric mean of the total variances for the two random variables.

<span class="mw-page-title-main">Covariance matrix</span> Measure of covariance of components of a random vector

In probability theory and statistics, a covariance matrix is a square matrix giving the covariance between each pair of elements of a given random vector.

In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval [0, 1] or in terms of two positive parameters, denoted by alpha (α) and beta (β), that appear as exponents of the variable and its complement to 1, respectively, and control the shape of the distribution.

In statistics, the Pearson correlation coefficient (PCC) is a correlation coefficient that measures linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviations; thus, it is essentially a normalized measurement of the covariance, such that the result always has a value between −1 and 1. As with covariance itself, the measure can only reflect a linear correlation of variables, and ignores many other types of relationships or correlations. As a simple example, one would expect the age and height of a sample of teenagers from a high school to have a Pearson correlation coefficient significantly greater than 0, but less than 1.

In statistics, Spearman's rank correlation coefficient or Spearman's ρ, named after Charles Spearman and often denoted by the Greek letter (rho) or as $, is a nonparametric measure of rank correlation. It assesses how well the relationship between two variables can be described using a monotonic function.$

In statistics, canonical-correlation analysis (CCA), also called canonical variates analysis, is a way of inferring information from cross-covariance matrices. If we have two vectors X = (X₁, ..., X_n) and Y = (Y₁, ..., Y_m) of random variables, and there are correlations among the variables, then canonical-correlation analysis will find linear combinations of X and Y which have maximum correlation with each other. T. R. Knapp notes that "virtually all of the commonly encountered parametric tests of significance can be treated as special cases of canonical-correlation analysis, which is the general procedure for investigating the relationships between two sets of variables." The method was first introduced by Harold Hotelling in 1936, although in the context of angles between flats the mathematical concept was published by Jordan in 1875.

In estimation theory and statistics, the Cramér–Rao bound (CRB) relates to estimation of a deterministic parameter. The result is named in honor of Harald Cramér and C. R. Rao, but has also been derived independently by Maurice Fréchet, Georges Darmois, and by Alexander Aitken and Harold Silverstone. It states that the precision of any unbiased estimator is at most the Fisher information; or (equivalently) the reciprocal of the Fisher information is a lower bound on its variance.

In statistics, propagation of uncertainty is the effect of variables' uncertainties on the uncertainty of a function based on them. When the variables are the values of experimental measurements they have uncertainties due to measurement limitations which propagate due to the combination of variables in the function.

In statistics, the Fisher transformation of a Pearson correlation coefficient is its inverse hyperbolic tangent (artanh). When the sample correlation coefficient r is near 1 or -1, its distribution is highly skewed, which makes it difficult to estimate confidence intervals and apply tests of significance for the population correlation coefficient ρ. The Fisher transformation solves this problem by yielding a variable whose distribution is approximately normally distributed, with a variance that is stable over different values of r.

In probability theory and statistics, partial correlation measures the degree of association between two random variables, with the effect of a set of controlling random variables removed. When determining the numerical relationship between two variables of interest, using their correlation coefficient will give misleading results if there is another confounding variable that is numerically related to both variables of interest. This misleading information can be avoided by controlling for the confounding variable, which is done by computing the partial correlation coefficient. This is precisely the motivation for including other right-side variables in a multiple regression; but while multiple regression gives unbiased results for the effect size, it does not give a numerical value of a measure of the strength of the relationship between the two variables of interest.

A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.

In statistics, the antithetic variates method is a variance reduction technique used in Monte Carlo methods. Considering that the error in the simulated signal has a one-over square root convergence, a very large number of sample paths is required to obtain an accurate result. The antithetic variates method reduces the variance of the simulation results.

A product distribution is a probability distribution constructed as the distribution of the product of random variables having two other known distributions. Given two statistically independent random variables X and Y, the distribution of the random variable Z that is formed as the product $is a product distribution .$

The generalized functional linear model (GFLM) is an extension of the generalized linear model (GLM) that allows one to regress univariate responses of various types on functional predictors, which are mostly random trajectories generated by a square-integrable stochastic processes. Similarly to GLM, a link function relates the expected value of the response variable to a linear predictor, which in case of GFLM is obtained by forming the scalar product of the random predictor function $with a smooth parameter function . Functional Linear Regression, Functional Poisson Regression and Functional Binomial Regression, with the important Functional Logistic Regression included, are special cases of GFLM. Applications of GFLM include classification and discrimination of stochastic processes and functional data.$

The multi-fractional order estimator (MFOE) is a straightforward, practical, and flexible alternative to the Kalman filter (KF) for tracking targets. The MFOE is focused strictly on simple and pragmatic fundamentals along with the integrity of mathematical modeling. Like the KF, the MFOE is based on the least squares method (LSM) invented by Gauss and the orthogonality principle at the center of Kalman's derivation. Optimized, the MFOE yields better accuracy than the KF and subsequent algorithms such as the extended KF and the interacting multiple model (IMM). The MFOE is an expanded form of the LSM, which effectively includes the KF and ordinary least squares (OLS) as subsets. OLS is revolutionized in for application in econometrics. The MFOE also intersects with signal processing, estimation theory, economics, finance, statistics, and the method of moments. The MFOE offers two major advances: (1) minimizing the mean squared error (MSE) with fractions of estimated coefficients and (2) describing the effect of deterministic OLS processing of statistical inputs

References

Ross, Sheldon M. (2002) Simulation 3rd edition ISBN 978-0-12-598053-1
Averill M. Law & W. David Kelton (2000), Simulation Modeling and Analysis, 3rd edition. ISBN 0-07-116537-1
S. P. Meyn (2007) Control Techniques for Complex Networks, Cambridge University Press. ISBN 978-0-521-88441-9. Downloadable draft (Section 11.4: Control variates and shadow functions)

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[lemieux17-1] Lemieux, C. (2017). "Control Variates". Wiley StatsRef: Statistics Reference Online: 1–8. doi:10.1002/9781118445112.stat07947. ISBN 9781118445112.

[2] Glasserman, P. (2004). Monte Carlo Methods in Financial Engineering. New York: Springer. ISBN 0-387-00451-3 (p. 185)

[varred17-3] 1 2 Botev, Z.; Ridder, A. (2017). "Variance Reduction". Wiley StatsRef: Statistics Reference Online: 1–6. doi:10.1002/9781118445112.stat07975. ISBN 9781118445112.

[1]

[2]

[3]