Markov chain mixing time

Last updated

In probability theory, the mixing time of a Markov chain is the time until the Markov chain is "close" to its steady state distribution.

More precisely, a fundamental result about Markov chains is that a finite state irreducible aperiodic chain has a unique stationary distribution π and, regardless of the initial state, the time-t distribution of the chain converges to π as t tends to infinity. Mixing time refers to any of several variant formalizations of the idea: how large must t be until the time-t distribution is approximately π? One variant, total variation distance mixing time, is defined as the smallest t such that the total variation distance of probability measures is small:

Choosing a different , as long as , can only change the mixing time up to a constant factor (depending on ) and so one often fixes and simply writes .

This is the sense in which DaveBayer and Persi Diaconis  ( 1992 ) proved that the number of riffle shuffles needed to mix an ordinary 52 card deck is 7. Mathematical theory focuses on how mixing times change as a function of the size of the structure underlying the chain. For an -card deck, the number of riffle shuffles needed grows as . The most developed theory concerns randomized algorithms for #P-complete algorithmic counting problems such as the number of graph colorings of a given vertex graph. Such problems can, for sufficiently large number of colors, be answered using the Markov chain Monte Carlo method and showing that the mixing time grows only as ( Jerrum 1995 ). This example and the shuffling example possess the rapid mixing property, that the mixing time grows at most polynomially fast in (number of states of the chain). Tools for proving rapid mixing include arguments based on conductance and the method of coupling. In broader uses of the Markov chain Monte Carlo method, rigorous justification of simulation results would require a theoretical bound on mixing time, and many interesting practical cases have resisted such theoretical analysis.

See also

Related Research Articles

In graph theory, an expander graph is a sparse graph that has strong connectivity properties, quantified using vertex, edge or spectral expansion. Expander constructions have spawned research in pure and applied mathematics, with several applications to complexity theory, design of robust computer networks, and the theory of error-correcting codes.

<span class="mw-page-title-main">Shuffling</span> Procedure used to randomize a deck of playing cards

Shuffling is a procedure used to randomize a deck of playing cards to provide an element of chance in card games. Shuffling is often followed by a cut, to help ensure that the shuffler has not manipulated the outcome.

<span class="mw-page-title-main">Metropolis–Hastings algorithm</span> Monte Carlo algorithm

In statistics and statistical physics, the Metropolis–Hastings algorithm is a Markov chain Monte Carlo (MCMC) method for obtaining a sequence of random samples from a probability distribution from which direct sampling is difficult. New samples are added to the sequence in two steps: first a new sample is proposed based on the previous sample, then the proposed sample is either added to the sequence or rejected depending on the value of the probability distribution at that point. The resulting sequence can be used to approximate the distribution or to compute an integral.

<span class="mw-page-title-main">Markov chain</span> Random process independent of past history

A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happens next depends only on the state of affairs now." A countably infinite sequence, in which the chain moves state at discrete time steps, gives a discrete-time Markov chain (DTMC). A continuous-time process is called a continuous-time Markov chain (CTMC). Markov processes are named in honor of the Russian mathematician Andrey Markov.

<span class="mw-page-title-main">Law of large numbers</span> Averages of repeated trials converge to the expected value

In probability theory, the law of large numbers (LLN) is a mathematical theorem that states that the average of the results obtained from a large number of independent random samples converges to the true value, if it exists. More formally, the LLN states that given a sample of independent and identically distributed values, the sample mean converges to the true mean.

In probability theory, de Finetti's theorem states that exchangeable observations are conditionally independent relative to some latent variable. An epistemic probability distribution could then be assigned to this variable. It is named in honor of Bruno de Finetti.

Additive white Gaussian noise (AWGN) is a basic noise model used in information theory to mimic the effect of many random processes that occur in nature. The modifiers denote specific characteristics:

<span class="mw-page-title-main">Random walk</span> Process forming a path from many random steps

In mathematics, a random walk, sometimes known as a drunkard's walk, is a random process that describes a path that consists of a succession of random steps on some mathematical space.

In probability theory, a Chernoff bound is an exponentially decreasing upper bound on the tail of a random variable based on its moment generating function. The minimum of all such exponential bounds forms the Chernoff or Chernoff-Cramér bound, which may decay faster than exponential. It is especially useful for sums of independent random variables, such as sums of Bernoulli random variables.

<span class="mw-page-title-main">Loop-erased random walk</span> Model for a random simple path

In mathematics, loop-erased random walk is a model for a random simple path with important applications in combinatorics, physics and quantum field theory. It is intimately connected to the uniform spanning tree, a model for a random tree. See also random walk for more general treatment of this topic.

In mathematics, the Cheeger bound is a bound of the second largest eigenvalue of the transition matrix of a finite-state, discrete-time, reversible stationary Markov chain. It can be seen as a special case of Cheeger inequalities in expander graphs.

<span class="mw-page-title-main">Conductance (graph theory)</span> A mixing property of Markov chains and graphs

In theoretical computer science, graph theory, and mathematics, the conductance is a parameter of a Markov chain that is closely tied to its mixing time, that is, how rapidly the chain converges to its stationary distribution, should it exist. Equivalently, the conductance can be viewed as a parameter of a directed graph, in which case it can be used to analyze how quickly random walks in the graph converge.

In the mathematical study of stochastic processes, a Harris chain is a Markov chain where the chain returns to a particular part of the state space an unbounded number of times. Harris chains are regenerative processes and are named after Theodore Harris. The theory of Harris chains and Harris recurrence is useful for treating Markov chains on general state spaces.

Property testing is a field of theoretical computer science, concerned with the design of super-fast algorithms for approximate decision making, where the decision refers to properties or parameters of huge objects.

The exponential mechanism is a technique for designing differentially private algorithms. It was developed by Frank McSherry and Kunal Talwar in 2007. Their work was recognized as a co-winner of the 2009 PET Award for Outstanding Research in Privacy Enhancing Technologies.

In the mathematics of permutations and the study of shuffling playing cards, a riffle shuffle permutation is one of the permutations of a set of items that can be obtained by a single riffle shuffle, in which a sorted deck of cards is cut into two packets and then the two packets are interleaved. Beginning with an ordered set, mathematically a riffle shuffle is defined as a permutation on this set containing 1 or 2 rising sequences. The permutations with 1 rising sequence are the identity permutations.

In the mathematics of shuffling playing cards, the Gilbert–Shannon–Reeds model is a probability distribution on riffle shuffle permutations. It forms the basis for a recommendation that a deck of cards should be riffled seven times in order to thoroughly randomize it. It is named after the work of Edgar Gilbert, Claude Shannon, and J. Reeds, reported in a 1955 technical report by Gilbert and in a 1981 unpublished manuscript of Reeds.

In mathematics and theoretical computer science, analysis of Boolean functions is the study of real-valued functions on or from a spectral perspective. The functions studied are often, but not always, Boolean-valued, making them Boolean functions. The area has found many applications in combinatorics, social choice theory, random graphs, and theoretical computer science, especially in hardness of approximation, property testing, and PAC learning.

<span class="mw-page-title-main">Stochastic gradient Langevin dynamics</span> Optimization and sampling technique

Stochastic gradient Langevin dynamics (SGLD) is an optimization and sampling technique composed of characteristics from Stochastic gradient descent, a Robbins–Monro optimization algorithm, and Langevin dynamics, a mathematical extension of molecular dynamics models. Like stochastic gradient descent, SGLD is an iterative optimization algorithm which uses minibatching to create a stochastic gradient estimator, as used in SGD to optimize a differentiable objective function. Unlike traditional SGD, SGLD can be used for Bayesian learning as a sampling method. SGLD may be viewed as Langevin dynamics applied to posterior distributions, but the key difference is that the likelihood gradient terms are minibatched, like in SGD. SGLD, like Langevin dynamics, produces samples from a posterior distribution of parameters based on available data. First described by Welling and Teh in 2011, the method has applications in many contexts which require optimization, and is most notably applied in machine learning problems.

Dynamic discrete choice (DDC) models, also known as discrete choice models of dynamic programming, model an agent's choices over discrete options that have future implications. Rather than assuming observed choices are the result of static utility maximization, observed choices in DDC models are assumed to result from an agent's maximization of the present value of utility, generalizing the utility theory upon which discrete choice models are based.

References