Kolmogorov's criterion

Last updated June 22, 2024

In probability theory, Kolmogorov's criterion, named after Andrey Kolmogorov, is a theorem giving a necessary and sufficient condition for a Markov chain or continuous-time Markov chain to be stochastically identical to its time-reversed version.

Discrete-time Markov chains

The theorem states that an irreducible, positive recurrent, aperiodic Markov chain with transition matrix P is reversible if and only if its stationary Markov chain satisfies^[1]

p_{j_{1}j_{2}}p_{j_{2}j_{3}}\cdots p_{j_{n-1}j_{n}}p_{j_{n}j_{1}}=p_{j_{1}j_{n}}p_{j_{n}j_{n-1}}\cdots p_{j_{3}j_{2}}p_{j_{2}j_{1}}

for all finite sequences of states

j_{1},j_{2},\ldots ,j_{n}\in S.

Here p_ij are components of the transition matrix P, and S is the state space of the chain.

That is, the chain-multiplication along any cycle is the same forwards and backwards.

Example

Consider this figure depicting a section of a Markov chain with states i, j, k and l and the corresponding transition probabilities. Here Kolmogorov's criterion implies that the product of probabilities when traversing through any closed loop must be equal, so the product around the loop i to j to l to k returning to i must be equal to the loop the other way round,

p_{ij}p_{jl}p_{lk}p_{ki}=p_{ik}p_{kl}p_{lj}p_{ji}.

Proof

Let $X$ be the Markov chain and denote by $\pi$ its stationary distribution (such exists since the chain is positive recurrent).

If the chain is reversible, the equality follows from the relation $p_{ji}={\frac {\pi _{i}p_{ij}}{\pi _{j}}}$ .

Now assume that the equality is fulfilled. Fix states $s$ and $t$ . Then

{\text{P}}(X_{n+1}=t,X_{n}=i_{n},\ldots ,X_{0}=s|X_{0}=s)

=p_{si_{1}}p_{i_{1}i_{2}}\cdots p_{i_{n}t}

={\frac {p_{st}}{p_{ts}}}p_{ti_{n}}p_{i_{n}i_{n-1}}\cdots p_{i_{1}s}

={\frac {p_{st}}{p_{ts}}}{\text{P}}(X_{n+1}

=s,X_{n}

=i_{1},\ldots ,X_{0}=t|X_{0}=t)

.

Now sum both sides of the last equality for all possible ordered choices of $n$ states $i_{1},i_{2},\ldots ,i_{n}$ . Thus we obtain $p_{st}^{(n)}={\frac {p_{st}}{p_{ts}}}p_{ts}^{(n)}$ so ${\frac {p_{st}^{(n)}}{p_{ts}^{(n)}}}={\frac {p_{st}}{p_{ts}}}$ . Send $n$ to $\infty$ on the left side of the last. From the properties of the chain follows that $\lim _{n\to \infty }p_{ij}^{(n)}=\pi _{j}$ , hence ${\frac {\pi _{t}}{\pi _{s}}}={\frac {p_{st}}{p_{ts}}}$ which shows that the chain is reversible.

Continuous-time Markov chains

The theorem states that a continuous-time Markov chain with transition rate matrix Q is, under any invariant probability vector, reversible if and only if its transition probabilities satisfy^[1]

q_{j_{1}j_{2}}q_{j_{2}j_{3}}\cdots q_{j_{n-1}j_{n}}q_{j_{n}j_{1}}=q_{j_{1}j_{n}}q_{j_{n}j_{n-1}}\cdots q_{j_{3}j_{2}}q_{j_{2}j_{1}}

for all finite sequences of states

j_{1},j_{2},\ldots ,j_{n}\in S.

The proof for continuous-time Markov chains follows in the same way as the proof for discrete-time Markov chains.

Related Research Articles

A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happens next depends only on the state of affairs now." A countably infinite sequence, in which the chain moves state at discrete time steps, gives a discrete-time Markov chain (DTMC). A continuous-time process is called a continuous-time Markov chain (CTMC). Markov processes are named in honor of the Russian mathematician Andrey Markov.

In mathematics, a stochastic matrix is a square matrix used to describe the transitions of a Markov chain. Each of its entries is a nonnegative real number representing a probability. It is also called a probability matrix, transition matrix, substitution matrix, or Markov matrix. The stochastic matrix was first developed by Andrey Markov at the beginning of the 20th century, and has found use throughout a wide variety of scientific fields, including probability theory, statistics, mathematical finance and linear algebra, as well as computer science and population genetics. There are several different definitions and types of stochastic matrices:

In statistics, Gibbs sampling or a Gibbs sampler is a Markov chain Monte Carlo (MCMC) algorithm for sampling from a specified multivariate probability distribution when direct sampling from the joint distribution is difficult, but sampling from the conditional distribution is more practical. This sequence can be used to approximate the joint distribution ; to approximate the marginal distribution of one of the variables, or some subset of the variables ; or to compute an integral. Typically, some of the variables correspond to observations whose values are known, and hence do not need to be sampled.

In electrical engineering, statistical computing and bioinformatics, the Baum–Welch algorithm is a special case of the expectation–maximization algorithm used to find the unknown parameters of a hidden Markov model (HMM). It makes use of the forward-backward algorithm to compute the statistics for the expectation step.

A continuous-time Markov chain (CTMC) is a continuous stochastic process in which, for each state, the process will change state according to an exponential random variable and then move to a different state as specified by the probabilities of a stochastic matrix. An equivalent formulation describes the process as changing state according to the least value of a set of exponential random variables, one for each possible state it can move to, with the parameters determined by the current state.

A mathematical or physical process is time-reversible if the dynamics of the process remain well-defined when the sequence of time-states is reversed.

In mathematics, subshifts of finite type are used to model dynamical systems, and in particular are the objects of study in symbolic dynamics and ergodic theory. They also describe the set of all possible sequences executed by a finite state machine. The most widely studied shift spaces are the subshifts of finite type.

The principle of detailed balance can be used in kinetic systems which are decomposed into elementary processes. It states that at equilibrium, each elementary process is in equilibrium with its reverse process.

In queueing theory, a discipline within the mathematical theory of probability, a Jackson network is a class of queueing network where the equilibrium distribution is particularly simple to compute as the network has a product-form solution. It was the first significant development in the theory of networks of queues, and generalising and applying the ideas of the theorem to search for similar product-form solutions in other networks has been the subject of much research, including ideas used in the development of the Internet. The networks were first identified by James R. Jackson and his paper was re-printed in the journal Management Science’s ‘Ten Most Influential Titles of Management Sciences First Fifty Years.’

In probability, a discrete-time Markov chain (DTMC) is a sequence of random variables, known as a stochastic process, in which the value of the next variable depends only on the value of the current variable, and not any variables in the past. For instance, a machine may have two states, A and E. When it is in state A, there is a 40% chance of it moving to state E and a 60% chance of it remaining in state A. When it is in state E, there is a 70% chance of it moving to A and a 30% chance of it staying in E. The sequence of states of the machine is a Markov chain. If we denote the chain by $then is the state which the machine starts in and is the random variable describing its state after 10 transitions. The process continues forever, indexed by the natural numbers.$

In mathematics, the Cheeger bound is a bound of the second largest eigenvalue of the transition matrix of a finite-state, discrete-time, reversible stationary Markov chain. It can be seen as a special case of Cheeger inequalities in expander graphs.

In mathematics, a holomorphic vector bundle is a complex vector bundle over a complex manifold $X$ such that the total space $E$ is a complex manifold and the projection map $π : E \to X$ is holomorphic. Fundamental examples are the holomorphic tangent bundle of a complex manifold, and its dual, the holomorphic cotangent bundle. A holomorphic line bundle is a rank one holomorphic vector bundle.

A number of different Markov models of DNA sequence evolution have been proposed. These substitution models differ in terms of the parameters used to describe the rates at which one nucleotide replaces another during evolution. These models are frequently used in molecular phylogenetic analyses. In particular, they are used during the calculation of likelihood of a tree and they are used to estimate the evolutionary distance between sequences from the observed differences between the sequences.

In theoretical computer science, graph theory, and mathematics, the conductance is a parameter of a Markov chain that is closely tied to its mixing time, that is, how rapidly the chain converges to its stationary distribution, should it exist. Equivalently, the conductance can be viewed as a parameter of a directed graph, in which case it can be used to analyze how quickly random walks in the graph converge.

Multiple-try Metropolis (MTM) is a sampling method that is a modified form of the Metropolis–Hastings method, first presented by Liu, Liang, and Wong in 2000. It is designed to help the sampling trajectory converge faster, by increasing both the step size and the acceptance rate.

In probability theory, a balance equation is an equation that describes the probability flux associated with a Markov chain in and out of states or set of states.

In probability theory, uniformization method, is a method to compute transient solutions of finite state continuous-time Markov chains, by approximating the process by a discrete-time Markov chain. The original chain is scaled by the fastest transition rate γ, so that transitions occur at the same rate in every state, hence the name. The method is simple to program and efficiently calculates an approximation to the transient distribution at a single point in time. The method was first introduced by Winfried Grassmann in 1977.

In mathematics, specifically in the theory of Markovian stochastic processes in probability theory, the Chapman–Kolmogorov equation (CKE) is an identity relating the joint probability distributions of different sets of coordinates on a stochastic process. The equation was derived independently by both the British mathematician Sydney Chapman and the Russian mathematician Andrey Kolmogorov. The CKE is prominently used in recent Variational Bayesian methods.

In probability theory, Kelly's lemma states that for a stationary continuous-time Markov chain, a process defined as the time-reversed process has the same stationary distribution as the forward-time process. The theorem is named after Frank Kelly.

Bayesian programming is a formalism and a methodology for having a technique to specify probabilistic models and solve problems when less than the necessary information is available.

References

1 2 Kelly, Frank P. (1979). Reversibility and Stochastic Networks (PDF). Wiley, Chichester. pp. 21–25.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[kelly-1] 1 2 Kelly, Frank P. (1979). Reversibility and Stochastic Networks (PDF). Wiley, Chichester. pp. 21–25.

[1]