Exponential tilting

Last updated

Exponential Tilting (ET), Exponential Twisting, or Exponential Change of Measure (ECM) is a distribution shifting technique used in many parts of mathematics. The different exponential tiltings of a random variable is known as the natural exponential family of .

Contents

Exponential Tilting is used in Monte Carlo Estimation for rare-event simulation, and rejection and importance sampling in particular. In mathematical finance [1] Exponential Tilting is also known as Esscher tilting (or the Esscher transform), and often combined with indirect Edgeworth approximation and is used in such contexts as insurance futures pricing. [2]

The earliest formalization of Exponential Tilting is often attributed to Esscher [3] with its use in importance sampling being attributed to David Siegmund. [4]

Overview

Given a random variable with probability distribution , density , and moment generating function (MGF) , the exponentially tilted measure is defined as follows:

where is the cumulant generating function (CGF) defined as

We call

the -tilted density of . It satisfies .

The exponential tilting of a random vector has an analogous definition:

where .

Example

The exponentially tilted measure in many cases has the same parametric form as that of . One-dimensional examples include the normal distribution, the exponential distribution, the binomial distribution and the Poisson distribution.

For example, in the case of the normal distribution, the tilted density is the density. The table below provides more examples of tilted density.

Original distribution [5] [6] θ-Tilted distribution

For some distributions, however, the exponentially tilted distribution does not belong to the same parametric family as . An example of this is the Pareto distribution with , where is well defined for but is not a standard distribution. In such examples, the random variable generation may not always be straightforward. [7]

In statistical mechanics, the energy of a system in equilibrium with a heat bath has the Boltzmann distribution: , where is the inverse temperature. Exponential tilting then corresponds to changing the temperature: .

Similarly, the energy and particle number of a system in equilibrium with a heat and particle bath has the grand canonical distribution: , where is the chemical potential. Exponential tilting then corresponds to changing both the temperature and the chemical potential.

Advantages

In many cases, the tilted distribution belongs to the same parametric family as the original. This is particularly true when the original density belongs to the exponential family of distribution. This simplifies random variable generation during Monte-Carlo simulations. Exponential tilting may still be useful if this is not the case, though normalization must be possible and additional sampling algorithms may be needed.

In addition, there exists a simple relationship between the original and tilted CGF,

We can see this by observing that

Thus,

.

Clearly, this relationship allows for easy calculation of the CGF of the tilted distribution and thus the distributions moments. Moreover, it results in a simple form of the likelihood ratio. Specifically,

.

Properties

This means that the -th cumulant of the tilted is . In particular, the expectation of the tilted distribution is
.
The variance of the tilted distribution is
.
between the tilted distribution and the original distribution of .
.

Applications

Rare-event simulation

The exponential tilting of , assuming it exists, supplies a family of distributions that can be used as proposal distributions for acceptance-rejection sampling or importance distributions for importance sampling. One common application is sampling from a distribution conditional on a sub-region of the domain, i.e. . With an appropriate choice of , sampling from can meaningfully reduce the required amount of sampling or the variance of an estimator.

Saddlepoint approximation

The saddlepoint approximation method is a density approximation methodology often used for the distribution of sums and averages of independent, identically distributed random variables that employs Edgeworth series, but which generally performs better at extreme values. From the definition of the natural exponential family, it follows that

.

Applying the Edgeworth expansion for , we have

where is the standard normal density of

,
,

and are the hermite polynomials.

When considering values of progressively farther from the center of the distribution, and the terms become unbounded. However, for each value of , we can choose such that

This value of is referred to as the saddle-point, and the above expansion is always evaluated at the expectation of the tilted distribution. This choice of leads to the final representation of the approximation given by

[8] [9]

Rejection sampling

Using the tilted distribution as the proposal, the rejection sampling algorithm prescribes sampling from and accepting with probability

where

That is, a uniformly distributed random variable is generated, and the sample from is accepted if

Importance sampling

Applying the exponentially tilted distribution as the importance distribution yields the equation

,

where

is the likelihood function. So, one samples from to estimate the probability under the importance distribution and then multiplies it by the likelihood ratio. Moreover, we have the variance given by

.

Example

Assume independent and identically distributed such that . In order to estimate , we can employ importance sampling by taking

.

The constant can be rewritten as for some other constant . Then,

,

where denotes the defined by the saddle-point equation

.

Stochastic processes

Given the tilting of a normal R.V., it is intuitive that the exponential tilting of , a Brownian motion with drift and variance , is a Brownian motion with drift and variance . Thus, any Brownian motion with drift under can be thought of as a Brownian motion without drift under . To observe this, consider the process . . The likelihood ratio term, , is a martingale and commonly denoted . Thus, a Brownian motion with drift process (as well as many other continuous processes adapted to the Brownian filtration) is a -martingale. [10] [11]

Stochastic Differential Equations

The above leads to the alternate representation of the stochastic differential equation : , where = . Girsanov's Formula states the likelihood ratio . Therefore, Girsanov's Formula can be used to implement importance sampling for certain SDEs.

Tilting can also be useful for simulating a process via rejection sampling of the SDE . We may focus on the SDE since we know that can be written . As previously stated, a Brownian motion with drift can be tilted to a Brownian motion without drift. Therefore, we choose . The likelihood ratio . This likelihood ratio will be denoted . To ensure this is a true likelihood ratio, it must be shown that . Assuming this condition holds, it can be shown that . So, rejection sampling prescribes that one samples from a standard Brownian motion and accept with probability .

Choice of tilting parameter

Siegmund's algorithm

Assume i.i.d. X's with light tailed distribution and . In order to estimate where , when is large and hence small, the algorithm uses exponential tilting to derive the importance distribution. The algorithm is used in many aspects, such as sequential tests, [12] G/G/1 queue waiting times, and is used as the probability of ultimate ruin in ruin theory. In this context, it is logical to ensure that . The criterion , where is s.t. achieves this. Siegmund's algorithm uses , if it exists, where is defined in the following way: . It has been shown that is the only tilting parameter producing bounded relative error (). [13]

Black-Box algorithms

We can only see the input and output of a black box, without knowing its structure. The algorithm is to use only minimal information on its structure. When we generate random numbers, the output may not be within the same common parametric class, such as normal or exponential distributions. An automated way may be used to perform ECM. Let be i.i.d. r.v.’s with distribution ; for simplicity we assume . Define , where , . . . are independent (0, 1) uniforms. A randomized stopping time for , . . . is then a stopping time w.r.t. the filtration , . . . Let further be a class of distributions on with and define by . We define a black-box algorithm for ECM for the given and the given class of distributions as a pair of a randomized stopping time and an measurable r.v. such that is distributed according to for any . Formally, we write this as for all . In other words, the rules of the game are that the algorithm may use simulated values from and additional uniforms to produce an r.v. from . [14]

See also

Related Research Articles

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate expectations, covariances using differentiation based on some useful algebraic properties, as well as for generality, as exponential families are in a sense very natural sets of distributions to consider. The term exponential class is sometimes used in place of "exponential family", or the older term Koopman–Darmois family. Sometimes loosely referred to as "the" exponential family, this class of distributions is distinct because they all possess a variety of desirable properties, most importantly the existence of a sufficient statistic.

In probability theory and statistics, the cumulantsκn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. Any two probability distributions whose moments are identical will have identical cumulants as well, and vice versa.

In numerical analysis and computational statistics, rejection sampling is a basic technique used to generate observations from a distribution. It is also commonly called the acceptance-rejection method or "accept-reject algorithm" and is a type of exact simulation method. The method works for any distribution in with a density.

In mathematical statistics, the Fisher information is a way of measuring the amount of information that an observable random variable X carries about an unknown parameter θ of a distribution that models X. Formally, it is the variance of the score, or the expected value of the observed information.

In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.

In probability and statistics, a circular distribution or polar distribution is a probability distribution of a random variable whose values are angles, usually taken to be in the range [0, 2π). A circular distribution is often a continuous probability distribution, and hence has a probability density, but such distributions can also be discrete, in which case they are called circular lattice distributions. Circular distributions can be used even when the variables concerned are not explicitly angles: the main consideration is that there is not usually any real distinction between events occurring at the opposite ends of the range, and the division of the range could notionally be made at any point.

In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entropy, if nothing is known about a distribution except that it belongs to a certain class, then the distribution with the largest entropy should be chosen as the least-informative default. The motivation is twofold: first, maximizing entropy minimizes the amount of prior information built into the distribution; second, many physical systems tend to move towards maximal entropy configurations over time.

von Mises distribution Probability distribution on the circle

In probability theory and directional statistics, the von Mises distribution is a continuous probability distribution on the circle. It is a close approximation to the wrapped normal distribution, which is the circular analogue of the normal distribution. A freely diffusing angle on a circle is a wrapped normally distributed random variable with an unwrapped variance that grows linearly in time. On the other hand, the von Mises distribution is the stationary distribution of a drift and diffusion process on the circle in a harmonic potential, i.e. with a preferred orientation. The von Mises distribution is the maximum entropy distribution for circular data when the real and imaginary parts of the first circular moment are specified. The von Mises distribution is a special case of the von Mises–Fisher distribution on the N-dimensional sphere.

<span class="mw-page-title-main">Generalized inverse Gaussian distribution</span>

In probability theory and statistics, the generalized inverse Gaussian distribution (GIG) is a three-parameter family of continuous probability distributions with probability density function

In theoretical physics, a source field is a background field coupled to the original field as

In directional statistics, the von Mises–Fisher distribution, is a probability distribution on the -sphere in . If the distribution reduces to the von Mises distribution on the circle.

In probability and statistics, a natural exponential family (NEF) is a class of probability distributions that is a special case of an exponential family (EF).

Atmospheric tides are global-scale periodic oscillations of the atmosphere. In many ways they are analogous to ocean tides. Atmospheric tides can be excited by:

In probability and statistics, the Tweedie distributions are a family of probability distributions which include the purely continuous normal, gamma and inverse Gaussian distributions, the purely discrete scaled Poisson distribution, and the class of compound Poisson–gamma distributions which have positive mass at zero, but are otherwise continuous. Tweedie distributions are a special case of exponential dispersion models and are often used as distributions for generalized linear models.

In probability and statistics, the class of exponential dispersion models (EDM), also called exponential dispersion family (EDF), is a set of probability distributions that represents a generalisation of the natural exponential family. Exponential dispersion models play an important role in statistical theory, in particular in generalized linear models because they have a special structure which enables deductions to be made about appropriate statistical inference.

In probability theory and statistics, the normal-exponential-gamma distribution is a three-parameter family of continuous probability distributions. It has a location parameter , scale parameter and a shape parameter .

In mathematical physics, the Belinfante–Rosenfeld tensor is a modification of the energy–momentum tensor that is constructed from the canonical energy–momentum tensor and the spin current so as to be symmetric yet still conserved.

In statistics, the variance function is a smooth function that depicts the variance of a random quantity as a function of its mean. The variance function is a measure of heteroscedasticity and plays a large role in many settings of statistical modelling. It is a main ingredient in the generalized linear model framework and a tool used in non-parametric regression, semiparametric regression and functional data analysis. In parametric modeling, variance functions take on a parametric form and explicitly describe the relationship between the variance and the mean of a random quantity. In a non-parametric setting, the variance function is assumed to be a smooth function.

<span class="mw-page-title-main">Asymmetric Laplace distribution</span> Continuous probability distribution

In probability theory and statistics, the asymmetric Laplace distribution (ALD) is a continuous probability distribution which is a generalization of the Laplace distribution. Just as the Laplace distribution consists of two exponential distributions of equal scale back-to-back about x = m, the asymmetric Laplace consists of two exponential distributions of unequal scale back to back about x = m, adjusted to assure continuity and normalization. The difference of two variates exponentially distributed with different means and rate parameters will be distributed according to the ALD. When the two rate parameters are equal, the difference will be distributed according to the Laplace distribution.

References

  1. H.U. Gerber & E.S.W. Shiu (1994). "Option pricing by Esscher transforms". Transactions of the Society of Actuaries. 46: 99–191.
  2. Cruz, Marcelo (2015). Fundamental Aspects of Operational Risk and Insurance Analytics. Wiley. pp. 784–796. ISBN   978-1-118-11839-9.
  3. Butler, Ronald (2007). Saddlepoint Approximations with Applications . Cambridge University Press. pp.  156. ISBN   9780521872508.
  4. Siegmund, D. (1976). "Importance Sampling in the Monte Carlo Study of Sequential Tests". The Annals of Statistics . 4 (4): 673–684. doi: 10.1214/aos/1176343541 .
  5. Asmussen Soren & Glynn Peter (2007). Stochastic Simulation. Springer. p. 130. ISBN   978-0-387-30679-7.
  6. Fuh, Cheng-Der; Teng, Huei-Wen; Wang, Ren-Her (2013). "Efficient Importance Sampling for Rare Event Simulation with Applications".{{cite journal}}: Cite journal requires |journal= (help)
  7. Asmussen, Soren & Glynn, Peter (2007). Stochastic Simulation. Springer. pp. 164–167. ISBN   978-0-387-30679-7
  8. Butler, Ronald (2007). Saddlepoint Approximations with Applications . Cambridge University Press. pp.  156–157. ISBN   9780521872508.
  9. Seeber, G.U.H. (1992). Advances in GLIM and Statistical Modelling. Springer. pp. 195–200. ISBN   978-0-387-97873-4.
  10. Asmussen Soren & Glynn Peter (2007). Stochastic Simulation. Springer. p. 407. ISBN   978-0-387-30679-7.
  11. Steele, J. Michael (2001). Stochastic Calculus and Financial Applications . Springer. pp.  213–229. ISBN   978-1-4419-2862-7.
  12. D. Siegmund (1985) Sequential Analysis. Springer-Verlag
  13. Asmussen Soren & Glynn Peter, Peter (2007). Stochastic Simulation. Springer. pp. 164–167. ISBN   978-0-387-30679-7.
  14. Asmussen, Soren & Glynn, Peter (2007). Stochastic Simulation. Springer. pp. 416–420. ISBN   978-0-387-30679-7