Laplace's method

Last updated

In mathematics, Laplace's method, named after Pierre-Simon Laplace, is a technique used to approximate integrals of the form

Contents

where is a twice-differentiable function, M is a large number, and the endpoints a and b could possibly be infinite. This technique was originally presented in Laplace (1774).

In Bayesian statistics, Laplace's approximation can refer to either approximating the posterior normalizing constant with Laplace's method [1] or approximating the posterior distribution with a Gaussian centered at the maximum a posteriori estimate. [2] Laplace approximations are used in the integrated nested Laplace approximations method for fast approximate Bayesian inference.

Concept

f
(
x
)
=
sin
[?]
(
x
)
x
{\displaystyle f(x)={\tfrac {\sin(x)}{x}}}
has a global maximum at x=0.
e
M
f
(
x
)
{\displaystyle e^{Mf(x)}}
is shown on top for M = 0.5, and at the bottom for M = 3 (both in blue). As M grows the approximation of this function by a Gaussian function (shown in red) improves. This observation underlies Laplace's method. Laplaces method.svg
has a global maximum at x=0. is shown on top for M = 0.5, and at the bottom for M = 3 (both in blue). As M grows the approximation of this function by a Gaussian function (shown in red) improves. This observation underlies Laplace's method.

Suppose the function has a unique global maximum at x0. Let be a constant and consider the following two functions:

Note that x0 will be the global maximum of and as well. Now observe:

As M increases, the ratio for will grow exponentially, while the ratio for does not change. Thus, significant contributions to the integral of this function will come only from points x in a neighbourhood of x0, which can then be estimated.

General theory

To state and motivate the method, one must make several assumptions. It is assumed that x0 is not an endpoint of the interval of integration, that the values cannot be very close to unless x is close to x0.

can be expanded around x0 by Taylor's theorem,

where (see: big O notation).

Since has a global maximum at x0, and since x0 is not an endpoint, it is a stationary point, i.e. . Therefore, the second-order Taylor polynomial approximating is

Then just one more step is needed to get a Gaussian distribution. Since is a global maximum of the function it can stated, by definition of the second derivative, that , thus giving the relation

for x close to x0. The integral can then be approximated with:

(see the picture on the right). This latter integral is a Gaussian integral if the limits of integration go from −∞ to +∞ (which can be assumed because the exponential decays very fast away from x0), and thus it can be calculated. This gives

A generalization of this method and extension to arbitrary precision is provided by Fog (2008).

Formal statement and proof

Suppose is a twice continuously differentiable function on and there exists a unique point such that:

Then:

Proof

Lower bound: Let . Since is continuous there exists such that if then By Taylor's Theorem, for any

Then we have the following lower bound:

where the last equality was obtained by a change of variables

Remember so we can take the square root of its negation.

If we divide both sides of the above inequality by

and take the limit we get:

since this is true for arbitrary we get the lower bound:

Note that this proof works also when or (or both).

Upper bound: The proof is similar to that of the lower bound but there are a few inconveniences. Again we start by picking an but in order for the proof to work we need small enough so that Then, as above, by continuity of and Taylor's Theorem we can find so that if , then

Lastly, by our assumptions (assuming are finite) there exists an such that if , then .

Then we can calculate the following upper bound:

If we divide both sides of the above inequality by

and take the limit we get:

Since is arbitrary we get the upper bound:

And combining this with the lower bound gives the result.

Note that the above proof obviously fails when or (or both). To deal with these cases, we need some extra assumptions. A sufficient (not necessary) assumption is that for

and that the number as above exists (note that this must be an assumption in the case when the interval is infinite). The proof proceeds otherwise as above, but with a slightly different approximation of integrals:

When we divide by

we get for this term

whose limit as is . The rest of the proof (the analysis of the interesting term) proceeds as above.

The given condition in the infinite interval case is, as said above, sufficient but not necessary. However, the condition is fulfilled in many, if not in most, applications: the condition simply says that the integral we are studying must be well-defined (not infinite) and that the maximum of the function at must be a "true" maximum (the number must exist). There is no need to demand that the integral is finite for but it is enough to demand that the integral is finite for some

This method relies on 4 basic concepts such as

Concepts
1. Relative error

The “approximation” in this method is related to the relative error and not the absolute error. Therefore, if we set

the integral can be written as

where is a small number when is a large number obviously and the relative error will be

Now, let us separate this integral into two parts: region and the rest.

2. around the stationary point when is large enough

Let’s look at the Taylor expansion of around x0 and translate x to y because we do the comparison in y-space, we will get

Note that because is a stationary point. From this equation you will find that the terms higher than second derivative in this Taylor expansion is suppressed as the order of so that will get closer to the Gaussian function as shown in figure. Besides,

The figure of
e
M
[
f
(
s
y
+
x
0
)
-
f
(
x
0
)
]
{\displaystyle e^{M[f(sy+x_{0})-f(x_{0})]}}
with
M
{\displaystyle M}
equals 1, 2 and 3, and the red line is the curve of function
e
-
p
y
2
{\displaystyle e^{-\pi y^{2}}}
. For laplace method --- with different M.png
The figure of with equals 1, 2 and 3, and the red line is the curve of function .
3. The larger is, the smaller range of is related

Because we do the comparison in y-space, is fixed in which will cause ; however, is inversely proportional to , the chosen region of will be smaller when is increased.

4. If the integral in Laplace's method converges, the contribution of the region which is not around the stationary point of the integration of its relative error will tend to zero as grows.

Relying on the 3rd concept, even if we choose a very large Dy, sDy will finally be a very small number when is increased to a huge number. Then, how can we guarantee the integral of the rest will tend to 0 when is large enough?

The basic idea is to find a function such that and the integral of will tend to zero when grows. Because the exponential function of will be always larger than zero as long as is a real number, and this exponential function is proportional to the integral of will tend to zero. For simplicity, choose as a tangent through the point as shown in the figure:

m
(
x
)
{\displaystyle m(x)}
is denoted by the two tangent lines passing through
x
=
+-
s
D
y
+
x
0
{\displaystyle x=\pm sD_{y}+x_{0}}
. When
s
D
y
{\displaystyle sD_{y}}
gets smaller, the cover region will be larger. For laplace method --- upper limit function m(x).gif
is denoted by the two tangent lines passing through . When gets smaller, the cover region will be larger.

If the interval of the integration of this method is finite, we will find that no matter is continue in the rest region, it will be always smaller than shown above when is large enough. By the way, it will be proved later that the integral of will tend to zero when is large enough.

If the interval of the integration of this method is infinite, and might always cross to each other. If so, we cannot guarantee that the integral of will tend to zero finally. For example, in the case of will always diverge. Therefore, we need to require that can converge for the infinite interval case. If so, this integral will tend to zero when is large enough and we can choose this as the cross of and

You might ask that why not choose as a convergent integral? Let me use an example to show you the reason. Suppose the rest part of is then and its integral will diverge; however, when the integral of converges. So, the integral of some functions will diverge when is not a large number, but they will converge when is large enough.

Based on these four concepts, we can derive the relative error of this method.

Other formulations

Laplace's approximation is sometimes written as

where is positive.

Importantly, the accuracy of the approximation depends on the variable of integration, that is, on what stays in and what goes into [3]

The derivation of its relative error

First, use to denote the global maximum, which will simplify this derivation. We are interested in the relative error, written as ,

where

So, if we let

and , we can get

since .

For the upper bound, note that thus we can separate this integration into 5 parts with 3 different types (a), (b) and (c), respectively. Therefore,

where and are similar, let us just calculate and and are similar, too, I’ll just calculate .

For , after the translation of , we can get

This means that as long as is large enough, it will tend to zero.

For , we can get

where

and should have the same sign of during this region. Let us choose as the tangent across the point at , i.e. which is shown in the figure

m
(
x
)
{\displaystyle m(x)}
is the tangent lines across the point at
x
=
s
D
y
{\displaystyle x=sD_{y}}
. For laplace method --- upper limit function m(x).gif
is the tangent lines across the point at .

From this figure you can find that when or gets smaller, the region satisfies the above inequality will get larger. Therefore, if we want to find a suitable to cover the whole during the interval of , will have an upper limit. Besides, because the integration of is simple, let me use it to estimate the relative error contributed by this .

Based on Taylor expansion, we can get

and

and then substitute them back into the calculation of ; however, you can find that the remainders of these two expansions are both inversely proportional to the square root of , let me drop them out to beautify the calculation. Keeping them is better, but it will make the formula uglier.

Therefore, it will tend to zero when gets larger, but don't forget that the upper bound of should be considered during this calculation.

About the integration near , we can also use Taylor's Theorem to calculate it. When

and you can find that it is inversely proportional to the square root of . In fact, will have the same behave when is a constant.

Conclusively, the integral near the stationary point will get smaller as gets larger, and the rest parts will tend to zero as long as is large enough; however, we need to remember that has an upper limit which is decided by whether the function is always larger than in the rest region. However, as long as we can find one satisfying this condition, the upper bound of can be chosen as directly proportional to since is a tangent across the point of at . So, the bigger is, the bigger can be.

In the multivariate case where is a -dimensional vector and is a scalar function of , Laplace's approximation is usually written as:

where is the Hessian matrix of evaluated at and where denotes matrix determinant. Analogously to the univariate case, the Hessian is required to be negative definite. [4]

By the way, although denotes a -dimensional vector, the term denotes an infinitesimal volume here, i.e. .

Steepest descent extension

In extensions of Laplace's method, complex analysis, and in particular Cauchy's integral formula, is used to find a contour of steepest descent for an (asymptotically with large M) equivalent integral, expressed as a line integral. In particular, if no point x0 where the derivative of vanishes exists on the real line, it may be necessary to deform the integration contour to an optimal one, where the above analysis will be possible. Again the main idea is to reduce, at least asymptotically, the calculation of the given integral to that of a simpler integral that can be explicitly evaluated. See the book of Erdelyi (1956) for a simple discussion (where the method is termed steepest descents).

The appropriate formulation for the complex z-plane is

for a path passing through the saddle point at z0. Note the explicit appearance of a minus sign to indicate the direction of the second derivative: one must not take the modulus. Also note that if the integrand is meromorphic, one may have to add residues corresponding to poles traversed while deforming the contour (see for example section 3 of Okounkov's paper Symmetric functions and random partitions).

Further generalizations

An extension of the steepest descent method is the so-called nonlinear stationary phase/steepest descent method. Here, instead of integrals, one needs to evaluate asymptotically solutions of Riemann–Hilbert factorization problems.

Given a contour C in the complex sphere, a function defined on that contour and a special point, say infinity, one seeks a function M holomorphic away from the contour C, with prescribed jump across C, and with a given normalization at infinity. If and hence M are matrices rather than scalars this is a problem that in general does not admit an explicit solution.

An asymptotic evaluation is then possible along the lines of the linear stationary phase/steepest descent method. The idea is to reduce asymptotically the solution of the given Riemann–Hilbert problem to that of a simpler, explicitly solvable, Riemann–Hilbert problem. Cauchy's theorem is used to justify deformations of the jump contour.

The nonlinear stationary phase was introduced by Deift and Zhou in 1993, based on earlier work of Its. A (properly speaking) nonlinear steepest descent method was introduced by Kamvissis, K. McLaughlin and P. Miller in 2003, based on previous work of Lax, Levermore, Deift, Venakides and Zhou. As in the linear case, "steepest descent contours" solve a min-max problem. In the nonlinear case they turn out to be "S-curves" (defined in a different context back in the 80s by Stahl, Gonchar and Rakhmanov).

The nonlinear stationary phase/steepest descent method has applications to the theory of soliton equations and integrable models, random matrices and combinatorics.

Median-point approximation generalization

In the generalization, evaluation of the integral is considered equivalent to finding the norm of the distribution with density

Denoting the cumulative distribution , if there is a diffeomorphic Gaussian distribution with density

the norm is given by

and the corresponding diffeomorphism is

where denotes cumulative standard normal distribution function.

In general, any distribution diffeomorphic to the Gaussian distribution has density

and the median-point is mapped to the median of the Gaussian distribution. Matching the logarithm of the density functions and their derivatives at the median point up to a given order yields a system of equations that determine the approximate values of and .

The approximation was introduced in 2019 by D. Makogon and C. Morais Smith primarily in the context of partition function evaluation for a system of interacting fermions. [5]

Complex integrals

For complex integrals in the form:

with we make the substitution t = iu and the change of variable to get the bilateral Laplace transform:

We then split g(c + ix) in its real and complex part, after which we recover u = t/i. This is useful for inverse Laplace transforms, the Perron formula and complex integration.

Example: Stirling's approximation

Laplace's method can be used to derive Stirling's approximation

for a large integer  N.

From the definition of the Gamma function, we have

Now we change variables, letting so that Plug these values back in to obtain

This integral has the form necessary for Laplace's method with

which is twice-differentiable:

The maximum of lies at z0 = 1, and the second derivative of has the value −1 at this point. Therefore, we obtain

See also

Notes

  1. Tierney, Luke; Kadane, Joseph B. (1986). "Accurate Approximations for Posterior Moments and Marginal Densities". J. Amer. Statist. Assoc. 81 (393): 82–86. doi:10.1080/01621459.1986.10478240.
  2. Amaral Turkman, M. Antónia; Paulino, Carlos Daniel; Müller, Peter (2019). "Methods Based on Analytic Approximations". Computational Bayesian Statistics: An Introduction. Cambridge University Press. pp. 150–171. ISBN   978-1-108-70374-1.
  3. Butler, Ronald W (2007). Saddlepoint approximations and applications. Cambridge University Press. ISBN   978-0-521-87250-8.
  4. MacKay, David J. C. (September 2003). Information Theory, Inference and Learning Algorithms. Cambridge: Cambridge University Press. ISBN   9780521642989.
  5. Makogon, D.; Morais Smith, C. (2022-05-03). "Median-point approximation and its application for the study of fermionic systems". Physical Review B. 105 (17): 174505. Bibcode:2022PhRvB.105q4505M. doi:10.1103/PhysRevB.105.174505. hdl: 1874/423769 . S2CID   203591796.

Related Research Articles

<span class="mw-page-title-main">Bessel function</span> Families of solutions to related differential equations

Bessel functions, first defined by the mathematician Daniel Bernoulli and then generalized by Friedrich Bessel, are canonical solutions y(x) of Bessel's differential equation

<span class="mw-page-title-main">Dirac delta function</span> Generalized function whose value is zero everywhere except at zero

In mathematical analysis, the Dirac delta function, also known as the unit impulse, is a generalized function on the real numbers, whose value is zero everywhere except at zero, and whose integral over the entire real line is equal to one. Since there is no function having this property, to model the delta "function" rigorously involves the use of limits or, as is common in mathematics, measure theory and the theory of distributions.

<span class="mw-page-title-main">Fourier series</span> Decomposition of periodic functions into sums of simpler sinusoidal forms

A Fourier series is an expansion of a periodic function into a sum of trigonometric functions. The Fourier series is an example of a trigonometric series, but not all trigonometric series are Fourier series. By expressing a function as a sum of sines and cosines, many problems involving the function become easier to analyze because trigonometric functions are well understood. For example, Fourier series were first used by Joseph Fourier to find solutions to the heat equation. This application is possible because the derivatives of trigonometric functions fall into simple patterns. Fourier series cannot be used to approximate arbitrary functions, because most functions have infinitely many terms in their Fourier series, and the series do not always converge. Well-behaved functions, for example smooth functions, have Fourier series that converge to the original function. The coefficients of the Fourier series are determined by integrals of the function multiplied by trigonometric functions, described in Common forms of the Fourier series below.

<span class="mw-page-title-main">Fokker–Planck equation</span> Partial differential equation

In statistical mechanics and information theory, the Fokker–Planck equation is a partial differential equation that describes the time evolution of the probability density function of the velocity of a particle under the influence of drag forces and random forces, as in Brownian motion. The equation can be generalized to other observables as well. The Fokker-Planck equation has multiple applications in information theory, graph theory, data science, finance, economics etc.

In mathematics, the Hermite polynomials are a classical orthogonal polynomial sequence.

In vector calculus, Green's theorem relates a line integral around a simple closed curve C to a double integral over the plane region D bounded by C. It is the two-dimensional special case of Stokes' theorem.

In mathematics, a Gaussian function, often simply referred to as a Gaussian, is a function of the base form

In mathematical analysis, Fubini's theorem is a result that gives conditions under which it is possible to compute a double integral by using an iterated integral, introduced by Guido Fubini in 1907. It states that if a function is integrable on a rectangle , then one can evaluate the double integral as an iterated integral:

<span class="mw-page-title-main">Improper integral</span> Concept in mathematical analysis

In mathematical analysis, an improper integral is an extension of the notion of a definite integral to cases that violate the usual assumptions for that kind of integral. In the context of Riemann integrals, this typically involves unboundedness, either of the set over which the integral is taken or of the integrand, or both. It may also involve bounded but not closed sets or bounded but not continuous functions. While an improper integral is typically written symbolically just like a standard definite integral, it actually represents a limit of a definite integral or a sum of such limits; thus improper integrals are said to converge or diverge. If a regular definite integral is worked out as if it is improper, the same answer will result.

<span class="mw-page-title-main">Separation of variables</span> Technique for solving differential equations

In mathematics, separation of variables is any of several methods for solving ordinary and partial differential equations, in which algebra allows one to rewrite an equation so that each of two variables occurs on a different side of the equation.

<span class="mw-page-title-main">Gaussian integral</span> Integral of the Gaussian function, equal to sqrt(π)

The Gaussian integral, also known as the Euler–Poisson integral, is the integral of the Gaussian function over the entire real line. Named after the German mathematician Carl Friedrich Gauss, the integral is

<span class="mw-page-title-main">Propagator</span> Function in quantum field theory showing probability amplitudes of moving particles

In quantum mechanics and quantum field theory, the propagator is a function that specifies the probability amplitude for a particle to travel from one place to another in a given period of time, or to travel with a certain energy and momentum. In Feynman diagrams, which serve to calculate the rate of collisions in quantum field theory, virtual particles contribute their propagator to the rate of the scattering event described by the respective diagram. Propagators may also be viewed as the inverse of the wave operator appropriate to the particle, and are, therefore, often called (causal) Green's functions.

In the mathematical field of complex analysis, contour integration is a method of evaluating certain integrals along paths in the complex plane.

<span class="mw-page-title-main">Arc length</span> Distance along a curve

Arc length is the distance between two points along a section of a curve.

In theoretical physics, dimensional regularization is a method introduced by Giambiagi and Bollini as well as – independently and more comprehensively – by 't Hooft and Veltman for regularizing integrals in the evaluation of Feynman diagrams; in other words, assigning values to them that are meromorphic functions of a complex parameter d, the analytic continuation of the number of spacetime dimensions.

<span class="mw-page-title-main">Weierstrass transform</span> "Smoothing" integral transform

In mathematics, the Weierstrass transform of a function f : RR, named after Karl Weierstrass, is a "smoothed" version of f(x) obtained by averaging the values of f, weighted with a Gaussian centered at x.

<span class="mw-page-title-main">Gauss–Hermite quadrature</span> Form of Gaussian quadrature

In numerical analysis, Gauss–Hermite quadrature is a form of Gaussian quadrature for approximating the value of integrals of the following kind:

Beliefs depend on the available information. This idea is formalized in probability theory by conditioning. Conditional probabilities, conditional expectations, and conditional probability distributions are treated on three levels: discrete probabilities, probability density functions, and measure theory. Conditioning leads to a non-random result if the condition is completely specified; otherwise, if the condition is left random, the result of conditioning is also random.

A product distribution is a probability distribution constructed as the distribution of the product of random variables having two other known distributions. Given two statistically independent random variables X and Y, the distribution of the random variable Z that is formed as the product is a product distribution.

In mathematics, singular integral operators of convolution type are the singular integral operators that arise on Rn and Tn through convolution by distributions; equivalently they are the singular integral operators that commute with translations. The classical examples in harmonic analysis are the harmonic conjugation operator on the circle, the Hilbert transform on the circle and the real line, the Beurling transform in the complex plane and the Riesz transforms in Euclidean space. The continuity of these operators on L2 is evident because the Fourier transform converts them into multiplication operators. Continuity on Lp spaces was first established by Marcel Riesz. The classical techniques include the use of Poisson integrals, interpolation theory and the Hardy–Littlewood maximal function. For more general operators, fundamental new techniques, introduced by Alberto Calderón and Antoni Zygmund in 1952, were developed by a number of authors to give general criteria for continuity on Lp spaces. This article explains the theory for the classical operators and sketches the subsequent general theory.

References

This article incorporates material from saddle point approximation on PlanetMath, which is licensed under the Creative Commons Attribution/Share-Alike License.