Reversible-jump Markov chain Monte Carlo

Last updated

In computational statistics, reversible-jump Markov chain Monte Carlo is an extension to standard Markov chain Monte Carlo (MCMC) methodology, introduced by Peter Green, which allows simulation (the creation of samples) of the posterior distribution on spaces of varying dimensions. [1] Thus, the simulation is possible even if the number of parameters in the model is not known. The "jump" refers to the switching from one parameter space to another during the running of the chain. RJMCMC is useful to compare models of different dimension to see which one fits the data best. It is also useful for predictions of new data points, because we do not need to choose and fix a model, RJMCMC can directly predict the new values for all the models at the same time. Models that suit the data best will be chosen more frequently then the poorer ones.

Contents

Details on the RJMCMC process

Let be a model indicator and the parameter space whose number of dimensions depends on the model . The model indication need not be finite. The stationary distribution is the joint posterior distribution of that takes the values .

The proposal can be constructed with a mapping of and , where is drawn from a random component with density on . The move to state can thus be formulated as

The function

must be one to one and differentiable, and have a non-zero support:

so that there exists an inverse function

that is differentiable. Therefore, the and must be of equal dimension, which is the case if the dimension criterion

is met where is the dimension of . This is known as dimension matching.

If then the dimensional matching condition can be reduced to

with

The acceptance probability will be given by

where denotes the absolute value and is the joint posterior probability

where is the normalising constant.

Software packages

There is an experimental RJ-MCMC tool available for the open source BUGs package.

The Gen probabilistic programming system automates the acceptance probability computation for user-defined reversible jump MCMC kernels as part of its Involution MCMC feature.

Related Research Articles

<span class="mw-page-title-main">Metropolis–Hastings algorithm</span> Monte Carlo algorithm

In statistics and statistical physics, the Metropolis–Hastings algorithm is a Markov chain Monte Carlo (MCMC) method for obtaining a sequence of random samples from a probability distribution from which direct sampling is difficult. This sequence can be used to approximate the distribution or to compute an integral. Metropolis–Hastings and other MCMC algorithms are generally used for sampling from multi-dimensional distributions, especially when the number of dimensions is high. For single-dimensional distributions, there are usually other methods that can directly return independent samples from the distribution, and these are free from the problem of autocorrelated samples that is inherent in MCMC methods.

<span class="mw-page-title-main">Markov chain</span> Random process independent of past history

A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happens next depends only on the state of affairs now." A countably infinite sequence, in which the chain moves state at discrete time steps, gives a discrete-time Markov chain (DTMC). A continuous-time process is called a continuous-time Markov chain (CTMC). It is named after the Russian mathematician Andrey Markov.

A Bayesian network is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). While it is one of several forms of causal notation, causal networks are special cases of Bayesian networks. Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.

<span class="mw-page-title-main">Random walk</span> Mathematical formalization of a path that consists of a succession of random steps

In mathematics, a random walk, sometimes known as a drunkard's walk, is a random process that describes a path that consists of a succession of random steps on some mathematical space.

In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain a sample of the desired distribution by recording states from the chain. The more steps that are included, the more closely the distribution of the sample matches the actual desired distribution. Various algorithms exist for constructing chains, including the Metropolis–Hastings algorithm.

In statistics, Gibbs sampling or a Gibbs sampler is a Markov chain Monte Carlo (MCMC) algorithm for sampling from a specified multivariate probability distribution when direct sampling from the joint distribution is difficult, but sampling from the conditional distribution is more practical. This sequence can be used to approximate the joint distribution ; to approximate the marginal distribution of one of the variables, or some subset of the variables ; or to compute an integral. Typically, some of the variables correspond to observations whose values are known, and hence do not need to be sampled.

A continuous-time Markov chain (CTMC) is a continuous stochastic process in which, for each state, the process will change state according to an exponential random variable and then move to a different state as specified by the probabilities of a stochastic matrix. An equivalent formulation describes the process as changing state according to the least value of a set of exponential random variables, one for each possible state it can move to, with the parameters determined by the current state.

<span class="mw-page-title-main">Discrete-time Markov chain</span> Probability concept

In probability, a discrete-time Markov chain (DTMC) is a sequence of random variables, known as a stochastic process, in which the value of the next variable depends only on the value of the current variable, and not any variables in the past. For instance, a machine may have two states, A and E. When it is in state A, there is a 40% chance of it moving to state E and a 60% chance of it remaining in state A. When it is in state E, there is a 70% chance of it moving to A and a 30% chance of it staying in E. The sequence of states of the machine is a Markov chain. If we denote the chain by then is the state which the machine starts in and is the random variable describing its state after 10 transitions. The process continues forever, indexed by the natural numbers.

Bayesian inference of phylogeny combines the information in the prior and in the data likelihood to create the so-called posterior probability of trees, which is the probability that the tree is correct given the data, the prior and the likelihood model. Bayesian inference was introduced into molecular phylogenetics in the 1990s by three independent groups: Bruce Rannala and Ziheng Yang in Berkeley, Bob Mau in Madison, and Shuying Li in University of Iowa, the last two being PhD students at the time. The approach has become very popular since the release of the MrBayes software in 2001, and is now one of the most popular methods in molecular phylogenetics.

In the mathematical field of dynamical systems, a random dynamical system is a dynamical system in which the equations of motion have an element of randomness to them. Random dynamical systems are characterized by a state space S, a set of maps from S into itself that can be thought of as the set of all possible equations of motion, and a probability distribution Q on the set that represents the random choice of map. Motion in a random dynamical system can be informally thought of as a state evolving according to a succession of maps randomly chosen according to the distribution Q.

An -superprocess, , within mathematics probability theory is a stochastic process on that is usually constructed as a special limit of near-critical branching diffusions.

Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics that can be used to estimate the posterior distributions of model parameters.

In applied mathematics, topological data analysis (TDA) is an approach to the analysis of datasets using techniques from topology. Extraction of information from datasets that are high-dimensional, incomplete and noisy is generally challenging. TDA provides a general framework to analyze such data in a manner that is insensitive to the particular metric chosen and provides dimensionality reduction and robustness to noise. Beyond this, it inherits functoriality, a fundamental concept of modern mathematics, from its topological nature, which allows it to adapt to new mathematical tools.

In probability theory, a Markov kernel is a map that in the general theory of Markov processes plays the role that the transition matrix does in the theory of Markov processes with a finite state space.

In probability theory, a balance equation is an equation that describes the probability flux associated with a Markov chain in and out of states or set of states.

The Hamiltonian Monte Carlo algorithm is a Markov chain Monte Carlo method for obtaining a sequence of random samples which converge to being distributed according to a target probability distribution for which direct sampling is difficult. This sequence can be used to estimate integrals with respect to the target distribution.

In financial mathematics and stochastic optimization, the concept of risk measure is used to quantify the risk involved in a random outcome or risk position. Many risk measures have hitherto been proposed, each having certain characteristics. The entropic value at risk (EVaR) is a coherent risk measure introduced by Ahmadi-Javid, which is an upper bound for the value at risk (VaR) and the conditional value at risk (CVaR), obtained from the Chernoff inequality. The EVaR can also be represented by using the concept of relative entropy. Because of its connection with the VaR and the relative entropy, this risk measure is called "entropic value at risk". The EVaR was developed to tackle some computational inefficiencies of the CVaR. Getting inspiration from the dual representation of the EVaR, Ahmadi-Javid developed a wide class of coherent risk measures, called g-entropic risk measures. Both the CVaR and the EVaR are members of this class.

A Markov chain on a measurable state space is a discrete-time-homogeneous Markov chain with a measurable space as state space.

In computational statistics, the preconditioned Crank–Nicolson algorithm (pCN) is a Markov chain Monte Carlo (MCMC) method for obtaining random samples – sequences of random observations – from a target probability distribution for which direct sampling is difficult.

A Stein discrepancy is a statistical divergence between two probability measures that is rooted in Stein's method. It was first formulated as a tool to assess the quality of Markov chain Monte Carlo samplers, but has since been used in diverse settings in statistics, machine learning and computer science.

References

  1. Green, P.J. (1995). "Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model Determination". Biometrika . 82 (4): 711–732. CiteSeerX   10.1.1.407.8942 . doi:10.1093/biomet/82.4.711. JSTOR   2337340. MR   1380810.