Monte Carlo integration

Last updated
An illustration of Monte Carlo integration. In this example, the domain D is the inner circle and the domain E is the square. Because the square's area (4) can be easily calculated, the area of the circle (p*1.0 ) can be estimated by the ratio (0.8) of the points inside the circle (40) to the total number of points (50), yielding an approximation for the circle's area of 4*0.8 = 3.2 [?] p. MonteCarloIntegrationCircle.svg
An illustration of Monte Carlo integration. In this example, the domain D is the inner circle and the domain E is the square. Because the square's area (4) can be easily calculated, the area of the circle (π*1.0 ) can be estimated by the ratio (0.8) of the points inside the circle (40) to the total number of points (50), yielding an approximation for the circle's area of 4*0.8 = 3.2 ≈ π.

In mathematics, Monte Carlo integration is a technique for numerical integration using random numbers. It is a particular Monte Carlo method that numerically computes a definite integral. While other algorithms usually evaluate the integrand at a regular grid, [1] Monte Carlo randomly chooses points at which the integrand is evaluated. [2] This method is particularly useful for higher-dimensional integrals. [3]

Contents

There are different methods to perform a Monte Carlo integration, such as uniform sampling, stratified sampling, importance sampling, sequential Monte Carlo (also known as a particle filter), and mean-field particle methods.

Overview

In numerical integration, methods such as the trapezoidal rule use a deterministic approach. Monte Carlo integration, on the other hand, employs a non-deterministic approach: each realization provides a different outcome. In Monte Carlo, the final outcome is an approximation of the correct value with respective error bars, and the correct value is likely to be within those error bars.

The problem Monte Carlo integration addresses is the computation of a multidimensional definite integral where Ω, a subset of , has volume

The naive Monte Carlo approach is to sample points uniformly on Ω: [4] given N uniform samples,

I can be approximated by

This is because the law of large numbers ensures that

Given the estimation of I from QN, the error bars of QN can be estimated by the sample variance using the unbiased estimate of the variance.

which leads to

Since the sequence is bounded due to being identically equal to Var(f), as long as this is assumed finite, this variance decreases asymptotically to zero as 1/N. The estimation of the error of QN is thus which decreases as . This is standard error of the mean multiplied with . This result does not depend on the number of dimensions of the integral, which is the promised advantage of Monte Carlo integration against most deterministic methods that depend exponentially on the dimension. [5] It is important to notice that, unlike in deterministic methods, the estimate of the error is not a strict error bound; random sampling may not uncover all the important features of the integrand that can result in an underestimate of the error.

While the naive Monte Carlo works for simple examples, an improvement over deterministic algorithms can only be accomplished with algorithms that use problem-specific sampling distributions. With an appropriate sample distribution it is possible to exploit the fact that almost all higher-dimensional integrands are very localized and only small subspace notably contributes to the integral. [6] A large part of the Monte Carlo literature is dedicated in developing strategies to improve the error estimates. In particular, stratified sampling—dividing the region in sub-domains—and importance sampling—sampling from non-uniform distributions—are two examples of such techniques.

Example

Relative error as a function of the number of samples, showing the scaling
1
N
{\displaystyle {\tfrac {1}{\sqrt {N}}}} Relative error of a Monte Carlo integration to calculate pi.svg
Relative error as a function of the number of samples, showing the scaling

A paradigmatic example of a Monte Carlo integration is the estimation of π. Consider the function and the set Ω = [−1,1] × [−1,1] with V = 4. Notice that

Thus, a crude way of calculating the value of π with Monte Carlo integration is to pick N random numbers on Ω and compute

In the figure on the right, the relative error is measured as a function of N, confirming the .

C/C++ example

Keep in mind that a true random number generator should be used.

#include<stdio.h>#include<stdlib.h>#include<time.h>intmain(){// Initialize the number of counts to 0, setting the total number to be 100000 in the loop.intthrows=99999,insideCircle=0;doublerandX,randY,pi;srand(time(NULL));// Checks for each random pair of x and y if they are inside circle of radius 1.for(inti=0;i<throws;i++){randX=rand()/(double)RAND_MAX;randY=rand()/(double)RAND_MAX;if(randX*randX+randY*randY<1){insideCircle++;}}// Calculating pi and printing.pi=4.0*insideCircle/throws;printf("%lf\n",pi);}

Python example

Made in Python.

fromnumpyimportrandomthrows=2000inside_circle=0i=0radius=1whilei<throws:# Choose random X and Y centered around 0,0x=random.uniform(-radius,radius)y=random.uniform(-radius,radius)# If the point is inside circle, increase variableifx**2+y**2<=radius**2:inside_circle+=1i+=1# Calculate area and print; should be closer to Pi with increasing number of throwsarea=(((2*radius)**2)*inside_circle)/throwsprint(area)

Wolfram Mathematica example

The code below describes a process of integrating the function from using the Monte-Carlo method in Mathematica:

func[x_]:=1/(1+Sinh[2*x]*(Log[x])^2);(*Sample from truncated normal distribution to speed up convergence*)Distrib[x_,average_,var_]:=PDF[NormalDistribution[average,var],1.1*x-0.1];n=10;RV=RandomVariate[TruncatedDistribution[{0.8,3},NormalDistribution[1,0.399]],n];Int=1/nTotal[func[RV]/Distrib[RV,1,0.399]]*Integrate[Distrib[x,1,0.399],{x,0.8,3}]NIntegrate[func[x],{x,0.8,3}](*Compare with real answer*)

Recursive stratified sampling

An illustration of Recursive Stratified Sampling. In this example, the function:
f
(
x
,
y
)
=
{
1
x
2
+
y
2
<
1
0
x
2
+
y
2
>=
1
{\displaystyle f(x,y)={\begin{cases}1&x^{2}+y^{2}<1\\0&x^{2}+y^{2}\geq 1\end{cases}}}

from the above illustration was integrated within a unit square using the suggested algorithm. The sampled points were recorded and plotted. Clearly stratified sampling algorithm concentrates the points in the regions where the variation of the function is largest. Strata.png
An illustration of Recursive Stratified Sampling. In this example, the function:
from the above illustration was integrated within a unit square using the suggested algorithm. The sampled points were recorded and plotted. Clearly stratified sampling algorithm concentrates the points in the regions where the variation of the function is largest.

Recursive stratified sampling is a generalization of one-dimensional adaptive quadratures to multi-dimensional integrals. On each recursion step the integral and the error are estimated using a plain Monte Carlo algorithm. If the error estimate is larger than the required accuracy the integration volume is divided into sub-volumes and the procedure is recursively applied to sub-volumes.

The ordinary 'dividing by two' strategy does not work for multi-dimensions as the number of sub-volumes grows far too quickly to keep track. Instead one estimates along which dimension a subdivision should bring the most dividends and only subdivides the volume along this dimension.

The stratified sampling algorithm concentrates the sampling points in the regions where the variance of the function is largest thus reducing the grand variance and making the sampling more effective, as shown on the illustration.

The popular MISER routine implements a similar algorithm.

MISER Monte Carlo

The MISER algorithm is based on recursive stratified sampling. This technique aims to reduce the overall integration error by concentrating integration points in the regions of highest variance. [7]

The idea of stratified sampling begins with the observation that for two disjoint regions a and b with Monte Carlo estimates of the integral and and variances and , the variance Var(f) of the combined estimate is given by,

It can be shown that this variance is minimized by distributing the points such that,

Hence the smallest error estimate is obtained by allocating sample points in proportion to the standard deviation of the function in each sub-region.

The MISER algorithm proceeds by bisecting the integration region along one coordinate axis to give two sub-regions at each step. The direction is chosen by examining all d possible bisections and selecting the one which will minimize the combined variance of the two sub-regions. The variance in the sub-regions is estimated by sampling with a fraction of the total number of points available to the current step. The same procedure is then repeated recursively for each of the two half-spaces from the best bisection. The remaining sample points are allocated to the sub-regions using the formula for Na and Nb. This recursive allocation of integration points continues down to a user-specified depth where each sub-region is integrated using a plain Monte Carlo estimate. These individual values and their error estimates are then combined upwards to give an overall result and an estimate of its error.

Importance sampling

There are a variety of importance sampling algorithms, such as

Importance sampling algorithm

Importance sampling provides a very important tool to perform Monte-Carlo integration. [3] [8] The main result of importance sampling to this method is that the uniform sampling of is a particular case of a more generic choice, on which the samples are drawn from any distribution . The idea is that can be chosen to decrease the variance of the measurement QN.

Consider the following example where one would like to numerically integrate a gaussian function, centered at 0, with σ = 1, from −1000 to 1000. Naturally, if the samples are drawn uniformly on the interval [−1000, 1000], only a very small part of them would be significant to the integral. This can be improved by choosing a different distribution from where the samples are chosen, for instance by sampling according to a gaussian distribution centered at 0, with σ = 1. Of course the "right" choice strongly depends on the integrand.

Formally, given a set of samples chosen from a distribution the estimator for I is given by [3]

Intuitively, this says that if we pick a particular sample twice as much as other samples, we weight it half as much as the other samples. This estimator is naturally valid for uniform sampling, the case where is constant.

The Metropolis–Hastings algorithm is one of the most used algorithms to generate from , [3] thus providing an efficient way of computing integrals.

VEGAS Monte Carlo

The VEGAS algorithm approximates the exact distribution by making a number of passes over the integration region which creates the histogram of the function f. Each histogram is used to define a sampling distribution for the next pass. Asymptotically this procedure converges to the desired distribution. [9] In order to avoid the number of histogram bins growing like Kd, the probability distribution is approximated by a separable function: so that the number of bins required is only Kd. This is equivalent to locating the peaks of the function from the projections of the integrand onto the coordinate axes. The efficiency of VEGAS depends on the validity of this assumption. It is most efficient when the peaks of the integrand are well-localized. If an integrand can be rewritten in a form which is approximately separable this will increase the efficiency of integration with VEGAS. VEGAS incorporates a number of additional features, and combines both stratified sampling and importance sampling. [9]

See also

Notes

  1. Press et al. 2007 , Chap. 4
  2. Press et al. 2007 , Chap. 7
  3. 1 2 3 4 Newman & Barkema 1999 , Chap. 2
  4. Newman & Barkema 1999 , Chap. 1
  5. Press et al. 2007
  6. MacKay 2003 , pp. 284–292
  7. Press & Farrar 1990 , pp. 190–195
  8. Kroese, Taimre & Botev 2011
  9. 1 2 Lepage 1978

Related Research Articles

<span class="mw-page-title-main">Autocorrelation</span> Correlation of a signal with a time-shifted copy of itself, as a function of shift

Autocorrelation, sometimes known as serial correlation in the discrete time case, is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is the similarity between observations of a random variable as a function of the time lag between them. The analysis of autocorrelation is a mathematical tool for finding repeating patterns, such as the presence of a periodic signal obscured by noise, or identifying the missing fundamental frequency in a signal implied by its harmonic frequencies. It is often used in signal processing for analyzing functions or series of values, such as time domain signals.

<span class="mw-page-title-main">Normal distribution</span> Probability distribution

In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is The parameter is the mean or expectation of the distribution, while the parameter is the variance. The standard deviation of the distribution is (sigma). A random variable with a Gaussian distribution is said to be normally distributed, and is called a normal deviate.

<span class="mw-page-title-main">Standard deviation</span> In statistics, a measure of variation

In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its mean. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range. The standard deviation is commonly used in the determination of what constitutes an outlier and what does not.

<span class="mw-page-title-main">Variance</span> Statistical measure of how far values spread from their average

In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value. It is the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by , , , , or .

The weighted arithmetic mean is similar to an ordinary arithmetic mean, except that instead of each of the data points contributing equally to the final average, some data points contribute more than others. The notion of weighted mean plays a role in descriptive statistics and also occurs in a more general form in several other areas of mathematics.

<span class="mw-page-title-main">Multivariate normal distribution</span> Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables, each of which clusters around a mean value.

<span class="mw-page-title-main">Law of large numbers</span> Averages of repeated trials converge to the expected value

In probability theory, the law of large numbers (LLN) is a mathematical law that states that the average of the results obtained from a large number of independent random samples converges to the true value, if it exists. More formally, the LLN states that given a sample of independent and identically distributed values, the sample mean converges to the true mean.

Covariance in probability theory and statistics is a measure of the joint variability of two random variables.

In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator measures the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value. MSE is a risk function, corresponding to the expected value of the squared error loss. The fact that MSE is almost always strictly positive is because of randomness or because the estimator does not account for information that could produce a more accurate estimate. In machine learning, specifically empirical risk minimization, MSE may refer to the empirical risk, as an estimate of the true MSE.

The VEGAS algorithm, due to G. Peter Lepage, is a method for reducing error in Monte Carlo simulations by using a known or approximate probability distribution function to concentrate the search in those areas of the integrand that make the greatest contribution to the final integral.

In statistics, propagation of uncertainty is the effect of variables' uncertainties on the uncertainty of a function based on them. When the variables are the values of experimental measurements they have uncertainties due to measurement limitations which propagate due to the combination of variables in the function.

Importance sampling is a Monte Carlo method for evaluating properties of a particular distribution, while only having samples generated from a different distribution than the distribution of interest. Its introduction in statistics is generally attributed to a paper by Teun Kloek and Herman K. van Dijk in 1978, but its precursors can be found in statistical physics as early as 1949. Importance sampling is also related to umbrella sampling in computational physics. Depending on the application, the term may refer to the process of sampling from this alternative distribution, the process of inference, or both.

In statistics, sometimes the covariance matrix of a multivariate random variable is not known but has to be estimated. Estimation of covariance matrices then deals with the question of how to approximate the actual covariance matrix on the basis of a sample from the multivariate distribution. Simple cases, where observations are complete, can be dealt with by using the sample covariance matrix. The sample covariance matrix (SCM) is an unbiased and efficient estimator of the covariance matrix if the space of covariance matrices is viewed as an extrinsic convex cone in Rp×p; however, measured using the intrinsic geometry of positive-definite matrices, the SCM is a biased and inefficient estimator. In addition, if the random variable has a normal distribution, the sample covariance matrix has a Wishart distribution and a slightly differently scaled version of it is the maximum likelihood estimate. Cases involving missing data, heteroscedasticity, or autocorrelated residuals require deeper considerations. Another issue is the robustness to outliers, to which sample covariance matrices are highly sensitive.

Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the measured data. An estimator attempts to approximate the unknown parameters using the measurements. In estimation theory, two approaches are generally considered:

<span class="mw-page-title-main">Ordinary least squares</span> Method for estimating the unknown parameters in a linear regression model

In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable in the input dataset and the output of the (linear) function of the independent variable. Some sources consider OLS to be linear regression.

In statistics, generalized least squares (GLS) is a method used to estimate the unknown parameters in a linear regression model. It is used when there is a non-zero amount of correlation between the residuals in the regression model. GLS is employed to improve statistical efficiency and reduce the risk of drawing erroneous inferences, as compared to conventional least squares and weighted least squares methods. It was first described by Alexander Aitken in 1935.

<span class="mw-page-title-main">Variance reduction</span>

In mathematics, more specifically in the theory of Monte Carlo methods, variance reduction is a procedure used to increase the precision of the estimates obtained for a given simulation or computational effort. Every output random variable from the simulation is associated with a variance which limits the precision of the simulation results. In order to make a simulation statistically efficient, i.e., to obtain a greater precision and smaller confidence intervals for the output random variable of interest, variance reduction techniques can be used. The main variance reduction methods are

The cross-entropy (CE) method is a Monte Carlo method for importance sampling and optimization. It is applicable to both combinatorial and continuous problems, with either a static or noisy objective.

The ensemble Kalman filter (EnKF) is a recursive filter suitable for problems with a large number of variables, such as discretizations of partial differential equations in geophysical models. The EnKF originated as a version of the Kalman filter for large problems, and it is now an important data assimilation component of ensemble forecasting. EnKF is related to the particle filter but the EnKF makes the assumption that all probability distributions involved are Gaussian; when it is applicable, it is much more efficient than the particle filter.

Multiple-try Metropolis (MTM) is a sampling method that is a modified form of the Metropolis–Hastings method, first presented by Liu, Liang, and Wong in 2000. It is designed to help the sampling trajectory converge faster, by increasing both the step size and the acceptance rate.

References