Markov chain tree theorem

Last updated

In the mathematical theory of Markov chains, the Markov chain tree theorem is an expression for the stationary distribution of a Markov chain with finitely many states. It sums up terms for the rooted spanning trees of the Markov chain, with a positive combination for each tree. The Markov chain tree theorem is closely related to Kirchhoff's theorem on counting the spanning trees of a graph, from which it can be derived. [1] It was first stated by Hill (1966), for certain Markov chains arising in thermodynamics, [1] [2] and proved in full generality by Leighton & Rivest (1986), motivated by an application in limited-memory estimation of the probability of a biased coin. [1] [3]

A finite Markov chain consists of a finite set of states, and a transition probability for changing from state to state , such that for each state the outgoing transition probabilities sum to one. From an initial choice of state (which turns out to be irrelevant to this problem), each successive state is chosen at random according to the transition probabilities from the previous state. A Markov chain is said to be irreducible when every state can reach every other state through some sequence of transitions, and aperiodic if, for every state, the possible numbers of steps in sequences that start and end in that state have greatest common divisor one. An irreducible and aperiodic Markov chain necessarily has a stationary distribution, a probability distribution on its states that describes the probability of being on a given state after many steps, regardless of the initial choice of state. [1]

The Markov chain tree theorem considers spanning trees for the states of the Markov chain, defined to be trees, directed toward a designated root, in which all directed edges are valid transitions of the given Markov chain. If a transition from state to state has transition probability , then a tree with edge set is defined to have weight equal to the product of its transition probabilities:

Let denote the set of all spanning trees having state at their root. Then, according to the Markov chain tree theorem, the stationary probability for state is proportional to the sum of the weights of the trees rooted at . That is,

where the normalizing constant is the sum of over all spanning trees. [1]

Related Research Articles

Markov chain Random process independent of past history

A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. A countably infinite sequence, in which the chain moves state at discrete time steps, gives a discrete-time Markov chain (DTMC). A continuous-time process is called a continuous-time Markov chain (CTMC). It is named after the Russian mathematician Andrey Markov.

In mathematics, a stochastic matrix is a square matrix used to describe the transitions of a Markov chain. Each of its entries is a nonnegative real number representing a probability. It is also called a probability matrix, transition matrix, substitution matrix, or Markov matrix. The stochastic matrix was first developed by Andrey Markov at the beginning of the 20th century, and has found use throughout a wide variety of scientific fields, including probability theory, statistics, mathematical finance and linear algebra, as well as computer science and population genetics. There are several different definitions and types of stochastic matrices:

A continuous-time Markov chain (CTMC) is a continuous stochastic process in which, for each state, the process will change state according to an exponential random variable and then move to a different state as specified by the probabilities of a stochastic matrix. An equivalent formulation describes the process as changing state according to the least value of a set of exponential random variables, one for each possible state it can move to, with the parameters determined by the current state.

In mathematics, a Markov decision process (MDP) is a discrete-time stochastic control process. It provides a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker. MDPs are useful for studying optimization problems solved via dynamic programming. MDPs were known at least as early as the 1950s; a core body of research on Markov decision processes resulted from Ronald Howard's 1960 book, Dynamic Programming and Markov Processes. They are used in many disciplines, including robotics, automatic control, economics and manufacturing. The name of MDPs comes from the Russian mathematician Andrey Markov as they are an extension of Markov chains.

In mathematics, the Gibbs measure, named after Josiah Willard Gibbs, is a probability measure frequently seen in many problems of probability theory and statistical mechanics. It is a generalization of the canonical ensemble to infinite systems. The canonical ensemble gives the probability of the system X being in state x as

Discrete-time Markov chain Probability concept

In probability, a discrete-time Markov chain (DTMC) is a sequence of random variables, known as a stochastic process, in which the value of the next variable depends only on the value of the current variable, and not any variables in the past. For instance, a machine may have two states, A and E. When it is in state A, there is a 40% chance of it moving to state E and a 60% chance of it remaining in state A. When it is in state E, there is a 70% chance of it moving to A and a 30% chance of it staying in E. The sequence of states of the machine is a Markov chain. If we denote the chain by then is the state which the machine starts in and is the random variable describing its state after 10 transitions. The process continues forever, indexed by the natural numbers.

In mathematics, the Cheeger bound is a bound of the second largest eigenvalue of the transition matrix of a finite-state, discrete-time, reversible stationary Markov chain. It can be seen as a special case of Cheeger inequalities in expander graphs.

In mathematics, ergodicity expresses the idea that a point of a moving system, either a dynamical system or a stochastic process, will eventually visit all parts of the space that the system moves in, in a uniform and random sense. This implies that the average behavior of the system can be deduced from the trajectory of a "typical" point. Equivalently, a sufficiently large collection of random samples from a process can represent the average statistical properties of the entire process. Ergodicity is a property of the system; it is a statement that the system cannot be reduced or factored into smaller components. Ergodic theory is the study of systems possessing ergodicity.

In probability theory, the mixing time of a Markov chain is the time until the Markov chain is "close" to its steady state distribution.

A number of different Markov models of DNA sequence evolution have been proposed. These substitution models differ in terms of the parameters used to describe the rates at which one nucleotide replaces another during evolution. These models are frequently used in molecular phylogenetic analyses. In particular, they are used during the calculation of likelihood of a tree and they are used to estimate the evolutionary distance between sequences from the observed differences between the sequences.

Among Markov chain Monte Carlo (MCMC) algorithms, coupling from the past is a method for sampling from the stationary distribution of a Markov chain. Contrary to many MCMC algorithms, coupling from the past gives in principle a perfect sample from the stationary distribution. It was invented by James Propp and David Wilson in 1996.

In probability theory, Kolmogorov's criterion, named after Andrey Kolmogorov, is a theorem giving a necessary and sufficient condition for a Markov chain or continuous-time Markov chain to be stochastically identical to its time-reversed version.

In the mathematical study of stochastic processes, a Harris chain is a Markov chain where the chain returns to a particular part of the state space an unbounded number of times. Harris chains are regenerative processes and are named after Theodore Harris. The theory of Harris chains and Harris recurrence is useful for treating Markov chains on general state spaces.

In probability theory, Foster's theorem, named after Gordon Foster, is used to draw conclusions about the positive recurrence of Markov chains with countable state spaces. It uses the fact that positive recurrent Markov chains exhibit a notion of "Lyapunov stability" in terms of returning to any state while starting from it within a finite time interval.

In probability theory, a balance equation is an equation that describes the probability flux associated with a Markov chain in and out of states or set of states.

In probability theory, uniformization method, is a method to compute transient solutions of finite state continuous-time Markov chains, by approximating the process by a discrete-time Markov chain. The original chain is scaled by the fastest transition rate γ, so that transitions occur at the same rate in every state, hence the name. The method is simple to program and efficiently calculates an approximation to the transient distribution at a single point in time. The method was first introduced by Winfried Grassmann in 1977.

In probability theory, the matrix analytic method is a technique to compute the stationary probability distribution of a Markov chain which has a repeating structure and a state space which grows unboundedly in no more than one dimension. Such models are often described as M/G/1 type Markov chains because they can describe transitions in an M/G/1 queue. The method is a more complicated version of the matrix geometric method and is the classical solution method for M/G/1 chains.

In probability theory, Kelly's lemma states that for a stationary continuous time Markov chain, a process defined as the time-reversed process has the same stationary distribution as the forward-time process. The theorem is named after Frank Kelly.

In probability theory, Kemeny’s constant is the expected number of time steps required for a Markov chain to transition from a starting state i to a random destination state sampled from the Markov chain's stationary distribution. Surprisingly, this quantity does not depend on which starting state i is chosen. It is in that sense a constant, although it is different for different Markov chains. When first published by John Kemeny in 1960 a prize was offered for an intuitive explanation as to why the quantity was constant.

In the mathematical theory of random processes, the Markov chain central limit theorem has a conclusion somewhat similar in form to that of the classic central limit theorem (CLT) of probability theory, but the quantity in the role taken by the variance in the classic CLT has a more complicated definition. See also the general form of Bienaymé's identity.

References

  1. 1 2 3 4 5 Williams, Lauren K. (May 2022), "The combinatorics of hopping particles and positivity in Markov chains", London Mathematical Society Newsletter (500): 50–59, arXiv: 2202.00214
  2. Hill, Terrell L. (April 1966), "Studies in irreversible thermodynamics IV: diagrammatic representation of steady state fluxes for unimolecular systems", Journal of Theoretical Biology, 10 (3): 442–459, doi:10.1016/0022-5193(66)90137-8
  3. Leighton, Frank Thomson; Rivest, Ronald L. (1986), "Estimating a probability using finite memory", IEEE Transactions on Information Theory, 32 (6): 733–742, doi:10.1109/TIT.1986.1057250