Coupling from the past

Last updated September 16, 2022

Among Markov chain Monte Carlo (MCMC) algorithms, coupling from the past is a method for sampling from the stationary distribution of a Markov chain. Contrary to many MCMC algorithms, coupling from the past gives in principle a perfect sample from the stationary distribution. It was invented by James Propp and David Wilson in 1996.

The basic idea

Consider a finite state irreducible aperiodic Markov chain $M$ with state space $S$ and (unique) stationary distribution $\pi$ ( $\pi$ is a probability vector). Suppose that we come up with a probability distribution $\mu$ on the set of maps $f:S\to S$ with the property that for every fixed $s\in S$ , its image $f(s)$ is distributed according to the transition probability of $M$ from state $s$ . An example of such a probability distribution is the one where $f(s)$ is independent from $f(s')$ whenever $s\neq s'$ , but it is often worthwhile to consider other distributions. Now let $f_{j}$ for $j\in \mathbb {Z}$ be independent samples from $\mu$ .

Suppose that $x$ is chosen randomly according to $\pi$ and is independent from the sequence $f_{j}$ . (We do not worry for now where this $x$ is coming from.) Then $f_{-1}(x)$ is also distributed according to $\pi$ , because $\pi$ is $M$ -stationary and our assumption on the law of $f$ . Define

F_{j}:=f_{-1}\circ f_{-2}\circ \cdots \circ f_{-j}.

Then it follows by induction that $F_{j}(x)$ is also distributed according to $\pi$ for every $j\in \mathbb {N}$ . However, it may happen that for some $n\in \mathbb {N}$ the image of the map $F_{n}$ is a single element of $S$ . In other words, $F_{n}(x)=F_{n}(y)$ for each $y\in S$ . Therefore, we do not need to have access to $x$ in order to compute $F_{n}(x)$ . The algorithm then involves finding some $n\in \mathbb {N}$ such that $F_{n}(S)$ is a singleton, and outputting the element of that singleton. The design of a good distribution $\mu$ for which the task of finding such an $n$ and computing $F_{n}$ is not too costly is not always obvious, but has been accomplished successfully in several important instances.^[1]

The monotone case

There is a special class of Markov chains in which there are particularly good choices for $\mu$ and a tool for determining if $|F_{n}(S)|=1$ . (Here $|\cdot |$ denotes cardinality.) Suppose that $S$ is a partially ordered set with order $\leq$ , which has a unique minimal element $s_{0}$ and a unique maximal element $s_{1}$ ; that is, every $s\in S$ satisfies $s_{0}\leq s\leq s_{1}$ . Also, suppose that $\mu$ may be chosen to be supported on the set of monotone maps $f:S\to S$ . Then it is easy to see that $|F_{n}(S)|=1$ if and only if $F_{n}(s_{0})=F_{n}(s_{1})$ , since $F_{n}$ is monotone. Thus, checking this becomes rather easy. The algorithm can proceed by choosing $n:=n_{0}$ for some constant $n_{0}$ , sampling the maps $f_{-1},\dots ,f_{-n}$ , and outputting $F_{n}(s_{0})$ if $F_{n}(s_{0})=F_{n}(s_{1})$ . If $F_{n}(s_{0})\neq F_{n}(s_{1})$ the algorithm proceeds by doubling $n$ and repeating as necessary until an output is obtained. (But the algorithm does not resample the maps $f_{-j}$ which were already sampled; it uses the previously sampled maps when needed.)

Related Research Articles

In statistics and statistical physics, the Metropolis–Hastings algorithm is a Markov chain Monte Carlo (MCMC) method for obtaining a sequence of random samples from a probability distribution from which direct sampling is difficult. This sequence can be used to approximate the distribution or to compute an integral. Metropolis–Hastings and other MCMC algorithms are generally used for sampling from multi-dimensional distributions, especially when the number of dimensions is high. For single-dimensional distributions, there are usually other methods that can directly return independent samples from the distribution, and these are free from the problem of autocorrelated samples that is inherent in MCMC methods.

Gumbel distribution Particular case of the generalized extreme value distribution

In probability theory and statistics, the Gumbel distribution is used to model the distribution of the maximum of a number of samples of various distributions.

In statistics, Gibbs sampling or a Gibbs sampler is a Markov chain Monte Carlo (MCMC) algorithm for obtaining a sequence of observations which are approximated from a specified multivariate probability distribution, when direct sampling is difficult. This sequence can be used to approximate the joint distribution ; to approximate the marginal distribution of one of the variables, or some subset of the variables ; or to compute an integral. Typically, some of the variables correspond to observations whose values are known, and hence do not need to be sampled.

In probability theory, a Lévy process, named after the French mathematician Paul Lévy, is a stochastic process with independent, stationary increments: it represents the motion of a point whose successive displacements are random, in which displacements in pairwise disjoint time intervals are independent, and displacements in different time intervals of the same length have identical probability distributions. A Lévy process may thus be viewed as the continuous-time analog of a random walk.

A continuous-time Markov chain (CTMC) is a continuous stochastic process in which, for each state, the process will change state according to an exponential random variable and then move to a different state as specified by the probabilities of a stochastic matrix. An equivalent formulation describes the process as changing state according to the least value of a set of exponential random variables, one for each possible state it can move to, with the parameters determined by the current state.

In mathematics, mixing is an abstract concept originating from physics: the attempt to describe the irreversible thermodynamic process of mixing in the everyday world: mixing paint, mixing drinks, industrial mixing, etc.

In Bayesian statistics, a maximum a posteriori probability (MAP) estimate is an estimate of an unknown quantity, that equals the mode of the posterior distribution. The MAP can be used to obtain a point estimate of an unobserved quantity on the basis of empirical data. It is closely related to the method of maximum likelihood (ML) estimation, but employs an augmented optimization objective which incorporates a prior distribution over the quantity one wants to estimate. MAP estimation can therefore be seen as a regularization of maximum likelihood estimation.

Estimation of distribution algorithms (EDAs), sometimes called probabilistic model-building genetic algorithms (PMBGAs), are stochastic optimization methods that guide the search for the optimum by building and sampling explicit probabilistic models of promising candidate solutions. Optimization is viewed as a series of incremental updates of a probabilistic model, starting with the model encoding an uninformative prior over admissible solutions and ending with the model that generates only the global optima.

In mathematics, a $π$ -system on a set $is a collection of certain subsets of such that$

In mathematics, ergodicity expresses the idea that a point of a moving system, either a dynamical system or a stochastic process, will eventually visit all parts of the space that the system moves in, in a uniform and random sense. This implies that the average behavior of the system can be deduced from the trajectory of a "typical" point. Equivalently, a sufficiently large collection of random samples from a process can represent the average statistical properties of the entire process. Ergodicity is a property of the system; it is a statement that the system cannot be reduced or factored into smaller components. Ergodic theory is the study of systems possessing ergodicity.

In mathematics, the Kolmogorov extension theorem is a theorem that guarantees that a suitably "consistent" collection of finite-dimensional distributions will define a stochastic process. It is credited to the English mathematician Percy John Daniell and the Russian mathematician Andrey Nikolaevich Kolmogorov.

In mathematics, the Wasserstein distance or Kantorovich–Rubinstein metric is a distance function defined between probability distributions on a given metric space $. It is named after Leonid Vaseršteĭn.$

In probability theory, Dirichlet processes are a family of stochastic processes whose realizations are probability distributions. In other words, a Dirichlet process is a probability distribution whose range is itself a set of probability distributions. It is often used in Bayesian inference to describe the prior knowledge about the distribution of random variables—how likely it is that the random variables are distributed according to one or another particular distribution.

Multiple-try Metropolis (MTM) is a sampling method that is a modified form of the Metropolis–Hastings method, first presented by Liu, Liang, and Wong in 2000. It is designed to help the sampling trajectory converge faster, by increasing both the step size and the acceptance rate.

In queueing theory, a discipline within, the queue is a multi-server queueing model. In Kendall's notation it describes a system where arrivals form a single queue and are governed by a, there are servers, and job service times are exponentially distributed. It is a generalization of which considers only a single server. The model with infinitely many servers is the.

In probability theory, the rectified Gaussian distribution is a modification of the Gaussian distribution when its negative elements are reset to 0. It is essentially a mixture of a discrete distribution and a continuous distribution as a result of censoring.

In machine learning, the kernel embedding of distributions comprises a class of nonparametric methods in which a probability distribution is represented as an element of a reproducing kernel Hilbert space (RKHS). A generalization of the individual data-point feature mapping done in classical kernel methods, the embedding of distributions into infinite-dimensional feature spaces can preserve all of the statistical features of arbitrary distributions, while allowing one to compare and manipulate distributions using Hilbert space operations such as inner products, distances, projections, linear transformations, and spectral analysis. This learning framework is very general and can be applied to distributions over any space $on which a sensible kernel function may be defined. For example, various kernels have been proposed for learning from data which are: vectors in, discrete classes/categories, strings, graphs/networks, images, time series, manifolds, dynamical systems, and other structured objects. The theory behind kernel embeddings of distributions has been primarily developed by Alex Smola, Le Song, Arthur Gretton, and Bernhard Schölkopf. A review of recent works on kernel embedding of distributions can be found in.$

A Markov chain on a measurable state space is a discrete-time-homogeneous Markov chain with a measurable space as state space.

In computational statistics, the preconditioned Crank–Nicolson algorithm (pCN) is a Markov chain Monte Carlo (MCMC) method for obtaining random samples – sequences of random observations – from a target probability distribution for which direct sampling is difficult.

In the mathematical theory of random processes, the Markov chain central limit theorem has a conclusion somewhat similar in form to that of the classic central limit theorem (CLT) of probability theory, but the quantity in the role taken by the variance in the classic CLT has a more complicated definition. See also the general form of Bienaymé's identity.

References

↑ "Web Site for Perfectly Random Sampling with Markov Chains".

Propp, James Gary; Wilson, David Bruce (1996), Proceedings of the Seventh International Conference on Random Structures and Algorithms (Atlanta, GA, 1995), pp. 223–252, MR 1611693
Propp, James; Wilson, David (1998), "Coupling from the past: a user's guide", Microsurveys in discrete probability (Princeton, NJ, 1997), DIMACS Ser. Discrete Math. Theoret. Comput. Sci., vol. 41, Providence, R.I.: American Mathematical Society, pp. 181–192, doi:10.1090/dimacs/041/09, ISBN 9780821808276, MR 1630414, S2CID 2781385

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "Web Site for Perfectly Random Sampling with Markov Chains".

[1]

Coupling from the past

Contents

The basic idea

The monotone case

Related Research Articles

References