Variance reduction

Last updated December 27, 2024

The variance of randomly generated points within a unit square can be reduced through a stratification process. StratifiedPoints.gif — The variance of randomly generated points within a unit square can be reduced through a stratification process.

In mathematics, more specifically in the theory of Monte Carlo methods, variance reduction is a procedure used to increase the precision of the estimates obtained for a given simulation or computational effort.^[1] Every output random variable from the simulation is associated with a variance which limits the precision of the simulation results. In order to make a simulation statistically efficient, i.e., to obtain a greater precision and smaller confidence intervals for the output random variable of interest, variance reduction techniques can be used. The main variance reduction methods are

For simulation with black-box models subset simulation and line sampling can also be used. Under these headings are a variety of specialized techniques; for example, particle transport simulations make extensive use of "weight windows" and "splitting/Russian roulette" techniques, which are a form of importance sampling.

Crude Monte Carlo simulation

Suppose one wants to compute $z:=E(Z)$ with the random variable $Z$ defined on the probability space $(\Omega ,{\mathcal {F}},P)$ . Monte Carlo does this by sampling i.i.d. copies $Z_{1},...,Z_{R}$ of $Z$ and then to estimate $z$ via the sample-mean estimator

{\overline {z}}={\frac {1}{n}}\sum _{i=1}^{n}Z_{i}

Under further mild conditions such as $var(Z)<\infty$ , a central limit theorem will apply such that for large $n\rightarrow \infty$ , the distribution of ${\overline {z}}$ converges to a normal distribution with mean $z$ and standard error $\sigma /{\sqrt {n}}$ . Because the standard deviation only converges towards $0$ at the rate ${\sqrt {n}}$ , implying one needs to increase the number of simulations ( $n$ ) by a factor of $4$ to halve the standard deviation of ${\overline {z}}$ , variance reduction methods are often useful for obtaining more precise estimates for $z$ without needing very large numbers of simulations.

Common Random Numbers (CRN)

The common random numbers variance reduction technique is a popular and useful variance reduction technique which applies when we are comparing two or more alternative configurations (of a system) instead of investigating a single configuration. CRN has also been called correlated sampling, matched streams or matched pairs.

CRN requires synchronization of the random number streams, which ensures that in addition to using the same random numbers to simulate all configurations, a specific random number used for a specific purpose in one configuration is used for exactly the same purpose in all other configurations. For example, in queueing theory, if we are comparing two different configurations of tellers in a bank, we would want the (random) time of arrival of the N-th customer to be generated using the same draw from a random number stream for both configurations.

Underlying principle of the CRN technique

Suppose $X_{1j}$ and $X_{2j}$ are the observations from the first and second configurations on the j-th independent replication.

We want to estimate

\xi =E(X_{1j})-E(X_{2j})=\mu _{1}-\mu _{2}.\,

If we perform n replications of each configuration and let

Z_{j}=X_{1j}-X_{2j}\quad {\mbox{for }}j=1,2,\ldots ,n,

then $E(Z_{j})=\xi$ and $Z(n)={\frac {\sum _{j=1,\ldots ,n}Z_{j}}{n}}$ is an unbiased estimator of $\xi$ .

And since the $Z_{j}$ 's are independent identically distributed random variables,

\operatorname {Var} [Z(n)]={\frac {\operatorname {Var} (Z_{j})}{n}}={\frac {\operatorname {Var} [X_{1j}]+\operatorname {Var} [X_{2j}]-2\operatorname {Cov} [X_{1j},X_{2j}]}{n}}.

In case of independent sampling, i.e., no common random numbers used then Cov(X_1j, X_2j) = 0. But if we succeed to induce an element of positive correlation between X₁ and X₂ such that Cov(X_1j, X_2j) > 0, it can be seen from the equation above that the variance is reduced.

It can also be observed that if the CRN induces a negative correlation, i.e., Cov(X_1j, X_2j) < 0, this technique can actually backfire, where the variance is increased and not decreased (as intended).^[2]

Related Research Articles

Autocorrelation, sometimes known as serial correlation in the discrete time case, is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is the similarity between observations of a random variable as a function of the time lag between them. The analysis of autocorrelation is a mathematical tool for finding repeating patterns, such as the presence of a periodic signal obscured by noise, or identifying the missing fundamental frequency in a signal implied by its harmonic frequencies. It is often used in signal processing for analyzing functions or series of values, such as time domain signals.

In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value. It is the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by $,,,, or .$

<span class="mw-page-title-main">Central limit theorem</span> Fundamental theorem in probability theory and statistics

In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the distribution of a normalized version of the sample mean converges to a standard normal distribution. This holds even if the original variables themselves are not normally distributed. There are several versions of the CLT, each applying in the context of different conditions.

In mathematics, the Wiener process is a real-valued continuous-time stochastic process discovered by Norbert Wiener. It is one of the best known Lévy processes. It occurs frequently in pure and applied mathematics, economics, quantitative finance, evolutionary biology, and physics.

In probability theory, the law of large numbers (LLN) is a mathematical law that states that the average of the results obtained from a large number of independent random samples converges to the true value, if it exists. More formally, the LLN states that given a sample of independent and identically distributed values, the sample mean converges to the true mean.

In probability theory and statistics, covariance is a measure of the joint variability of two random variables.

<span class="mw-page-title-main">Covariance matrix</span> Measure of covariance of components of a random vector

In probability theory and statistics, a covariance matrix is a square matrix giving the covariance between each pair of elements of a given random vector.

In statistics, originally in geostatistics, kriging or Kriging, also known as Gaussian process regression, is a method of interpolation based on Gaussian process governed by prior covariances. Under suitable assumptions of the prior, kriging gives the best linear unbiased prediction (BLUP) at unsampled locations. Interpolating methods based on other criteria such as smoothness may not yield the BLUP. The method is widely used in the domain of spatial analysis and computer experiments. The technique is also known as Wiener–Kolmogorov prediction, after Norbert Wiener and Andrey Kolmogorov.

Importance sampling is a Monte Carlo method for evaluating properties of a particular distribution, while only having samples generated from a different distribution than the distribution of interest. Its introduction in statistics is generally attributed to a paper by Teun Kloek and Herman K. van Dijk in 1978, but its precursors can be found in statistical physics as early as 1949. Importance sampling is also related to umbrella sampling in computational physics. Depending on the application, the term may refer to the process of sampling from this alternative distribution, the process of inference, or both.

In statistics, econometrics, epidemiology and related disciplines, the method of instrumental variables (IV) is used to estimate causal relationships when controlled experiments are not feasible or when a treatment is not successfully delivered to every unit in a randomized experiment. Intuitively, IVs are used when an explanatory variable of interest is correlated with the error term (endogenous), in which case ordinary least squares and ANOVA give biased results. A valid instrument induces changes in the explanatory variable but has no independent effect on the dependent variable and is not correlated with the error term, allowing a researcher to uncover the causal effect of the explanatory variable on the dependent variable.

In statistics, simple linear regression (SLR) is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable and finds a linear function that, as accurately as possible, predicts the dependent variable values as a function of the independent variable. The adjective simple refers to the fact that the outcome variable is related to a single predictor.

In statistics, the delta method is a method of deriving the asymptotic distribution of a random variable. It is applicable when the random variable being considered can be defined as a differentiable function of a random variable which is asymptotically Gaussian.

The control variates method is a variance reduction technique used in Monte Carlo methods. It exploits information about the errors in estimates of known quantities to reduce the error of an estimate of an unknown quantity.

<span class="mw-page-title-main">Truncated normal distribution</span> Type of probability distribution

In probability and statistics, the truncated normal distribution is the probability distribution derived from that of a normally distributed random variable by bounding the random variable from either below or above. The truncated normal distribution has wide applications in statistics and econometrics.

In statistics, the antithetic variates method is a variance reduction technique used in Monte Carlo methods. Considering that the error in the simulated signal has a one-over square root convergence, a very large number of sample paths is required to obtain an accurate result. The antithetic variates method reduces the variance of the simulation results.

In statistics and in probability theory, distance correlation or distance covariance is a measure of dependence between two paired random vectors of arbitrary, not necessarily equal, dimension. The population distance correlation coefficient is zero if and only if the random vectors are independent. Thus, distance correlation measures both linear and nonlinear association between two random variables or random vectors. This is in contrast to Pearson's correlation, which can only detect linear association between two random variables.

Mean-field particle methods are a broad class of interacting type Monte Carlo algorithms for simulating from a sequence of probability distributions satisfying a nonlinear evolution equation. These flows of probability measures can always be interpreted as the distributions of the random states of a Markov process whose transition probabilities depends on the distributions of the current random states. A natural way to simulate these sophisticated nonlinear Markov processes is to sample a large number of copies of the process, replacing in the evolution equation the unknown distributions of the random states by the sampled empirical measures. In contrast with traditional Monte Carlo and Markov chain Monte Carlo methods these mean-field particle techniques rely on sequential interacting samples. The terminology mean-field reflects the fact that each of the samples interacts with the empirical measures of the process. When the size of the system tends to infinity, these random empirical measures converge to the deterministic distribution of the random states of the nonlinear Markov chain, so that the statistical interaction between particles vanishes. In other words, starting with a chaotic configuration based on independent copies of initial state of the nonlinear Markov chain model, the chaos propagates at any time horizon as the size the system tends to infinity; that is, finite blocks of particles reduces to independent copies of the nonlinear Markov process. This result is called the propagation of chaos property. The terminology "propagation of chaos" originated with the work of Mark Kac in 1976 on a colliding mean-field kinetic gas model.

In probability theory and statistics, complex random variables are a generalization of real-valued random variables to complex numbers, i.e. the possible values a complex random variable may take are complex numbers. Complex random variables can always be considered as pairs of real random variables: their real and imaginary parts. Therefore, the distribution of one complex random variable may be interpreted as the joint distribution of two real random variables.

Multilevel Monte Carlo (MLMC) methods in numerical analysis are algorithms for computing expectations that arise in stochastic simulations. Just as Monte Carlo methods, they rely on repeated random sampling, but these samples are taken on different levels of accuracy. MLMC methods can greatly reduce the computational cost of standard Monte Carlo methods by taking most samples with a low accuracy and corresponding low cost, and only very few samples are taken at high accuracy and corresponding high cost.

In the mathematical theory of random processes, the Markov chain central limit theorem has a conclusion somewhat similar in form to that of the classic central limit theorem (CLT) of probability theory, but the quantity in the role taken by the variance in the classic CLT has a more complicated definition. See also the general form of Bienaymé's identity.

References

↑ Botev, Z.; Ridder, A. (2017). "Variance Reduction". Wiley StatsRef: Statistics Reference Online: 1–6. doi:10.1002/9781118445112.stat07975. ISBN 9781118445112.
↑ Hamrick, Jeff. "The Method of Common Random Numbers: An Example". Wolfram Demonstrations Project. Retrieved 29 March 2016.

Hammersley, J. M.; Handscomb, D. C. (1964). Monte Carlo Methods. London: Methuen. ISBN 0-416-52340-4.
Kahn, H.; Marshall, A. W. (1953). "Methods of Reducing Sample Size in Monte Carlo Computations". Journal of the Operations Research Society of America . 1 (5): 263–271. doi:10.1287/opre.1.5.263.
MCNP — A General Monte Carlo N-Particle Transport Code, Version 5 Los Alamos Report LA-UR-03-1987

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[varred17-1] Botev, Z.; Ridder, A. (2017). "Variance Reduction". Wiley StatsRef: Statistics Reference Online: 1–6. doi:10.1002/9781118445112.stat07975. ISBN 9781118445112.

[wolfram-2] Hamrick, Jeff. "The Method of Common Random Numbers: An Example". Wolfram Demonstrations Project. Retrieved 29 March 2016.

[1]

[2]