In probability, a discrete-time Markov chain (DTMC) is a sequence of random variables, known as a stochastic process, in which the value of the next variable depends only on the value of the current variable, and not any variables in the past. For instance, a machine may have two states, A and E. When it is in state A, there is a 40% chance of it moving to state E and a 60% chance of it remaining in state A. When it is in state E, there is a 70% chance of it moving to A and a 30% chance of it staying in E. The sequence of states of the machine is a Markov chain. If we denote the chain by then is the state which the machine starts in and is the random variable describing its state after 10 transitions. The process continues forever, indexed by the natural numbers.
An example of a stochastic process which is not a Markov chain is the model of a machine which has states A and E and moves to A from either state with 50% chance if it has ever visited A before, and 20% chance if it has never visited A before (leaving a 50% or 80% chance that the machine moves to E). This is because the behavior of the machine depends on the whole history—if the machine is in E, it may have a 50% or 20% chance of moving to A, depending on its past values. Hence, it does not have the Markov property.
A Markov chain can be described by a stochastic matrix, which lists the probabilities of moving to each state from any individual state. From this matrix, the probability of being in a particular state n steps in the future can be calculated. A Markov chain's state space can be partitioned into communicating classes that describe which states are reachable from each other (in one transition or in many). Each state can be described as transient or recurrent, depending on the probability of the chain ever returning to that state. Markov chains can have properties including periodicity, reversibility and stationarity. A continuous-time Markov chain is like a discrete-time Markov chain, but it moves states continuously through time rather than as discrete time steps. Other stochastic processes can satisfy the Markov property, the property that past behavior does not affect the process, only the present state.
A discrete-time Markov chain is a sequence of random variables with the Markov property, namely that the probability of moving to the next state depends only on the present state and not on the previous states:
The possible values of Xi form a countable set S called the state space of the chain. [1]
Markov chains are often described by a sequence of directed graphs, where the edges of graph n are labeled by the probabilities of going from one state at time n to the other states at time n + 1, The same information is represented by the transition matrix from time n to time n + 1. However, Markov chains are frequently assumed to be time-homogeneous (see variations below), in which case the graph and matrix are independent of n and are thus not presented as sequences.
These descriptions highlight the structure of the Markov chain that is independent of the initial distribution When time-homogeneous, the chain can be interpreted as a state machine assigning a probability of hopping from each vertex or state to an adjacent one. The probability of the machine's state can be analyzed as the statistical behavior of the machine with an element of the state space as input, or as the behavior of the machine with the initial distribution of states as input, where is the Iverson bracket.[ citation needed ]
The probability of going from state i to state j in n time steps is
and the single-step transition is
For a time-homogeneous Markov chain:
and
The n-step transition probabilities satisfy the Chapman–Kolmogorov equation, that for any k such that 0 < k < n,
where S is the state space of the Markov chain. [1]
The marginal distribution Pr(Xn = x) is the distribution over states at time n. The initial distribution is Pr(X0 = x). The evolution of the process through one time step is described by
A state j is said to be accessible from a state i (written i → j) if a system started in state i has a non-zero probability of transitioning into state j at some point. Formally, state j is accessible from state i if there exists an integer nij ≥ 0 such that
A state i is said to communicate with state j (written i ↔ j) if both i → j and j → i. A communicating class is a maximal set of states C such that every pair of states in C communicates with each other. Communication is an equivalence relation, and communicating classes are the equivalence classes of this relation. [1]
A communicating class is closed if the probability of leaving the class is zero, namely if i is in C but j is not, then j is not accessible from i. [1] The set of communicating classes forms a directed, acyclic graph by inheriting the arrows from the original state space. A communicating class is closed if and only if it has no outgoing arrows in this graph.
A state i is said to be essential or final if for all j such that i → j it is also true that j → i. A state i is inessential if it is not essential. [2] A state is final if and only if its communicating class is closed.
A Markov chain is said to be irreducible if its state space is a single communicating class; in other words, if it is possible to get to any state from any state. [1] [3] : 20
A state has period if any return to state must occur in multiples of time steps. Formally, the period of state is defined as
(where is the greatest common divisor) provided that this set is not empty. Otherwise the period is not defined. [1] Note that even though a state has period , it may not be possible to reach the state in steps. For example, suppose it is possible to return to the state in time steps; would be , even though does not appear in this list.
If , then the state is said to be aperiodic. Otherwise (), the state is said to be periodic with period . Periodicity is a class property—that is, if a state has period then every state in its communicating class has period . [1]
A state i is said to be transient if, given that we start in state i, there is a non-zero probability that we will never return to i. Formally, let the random variable Ti be the first return time to state i (the "hitting time"):
The number
is the probability that we return to state i for the first time after n steps. Therefore, state i is transient if
State i is recurrent (or persistent) if it is not transient. Recurrence and transience are class properties, that is, they either hold or do not hold equally for all members of a communicating class. [1]
A state i is recurrent if and only if the expected number of visits to i is infinite: [1]
Even if the hitting time is finite with probability 1, it need not have a finite expectation. The mean recurrence time at state i is the expected return time Mi:
State i is positive recurrent (or non-null persistent) if Mi is finite; otherwise, state i is null recurrent (or null persistent). Positive and null recurrence are classes properties. [1]
A state i is called absorbing if it is impossible to leave this state. Therefore, the state i is absorbing if and only if
If every state can reach an absorbing state, then the Markov chain is an absorbing Markov chain. [4] [5]
A Markov chain is said to be reversible if there is a probability distribution π over its states such that
for all times n and all states i and j. This condition is known as the detailed balance condition (or local balance equation).
Considering a fixed arbitrary time n and using the shorthand
the detailed balance equation can be written more compactly as
The single time-step from n to n + 1 can be thought of as each person i having πi dollars initially and paying each person j a fraction pij of it. The detailed balance condition states that upon each payment, the other person pays exactly the same amount of money back. [6] Clearly the total amount of money π each person has remains the same after the time-step, since every dollar spent is balanced by a corresponding dollar received. This can be shown more formally by the equality
which essentially states that the total amount of money person j receives (including from himself) during the time-step equals the amount of money he pays others, which equals all the money he initially had because it was assumed that all money is spent (that is, pji sums to 1 over i). The assumption is a technical one, because the money not really used is simply thought of as being paid from person j to himself (that is, pjj is not necessarily zero).
As n was arbitrary, this reasoning holds for any n, and therefore for reversible Markov chains π is always a steady-state distribution of Pr(Xn+1 = j | Xn = i) for every n.
If the Markov chain begins in the steady-state distribution, that is, if , then for all and the detailed balance equation can be written as
The left- and right-hand sides of this last equation are identical except for a reversing of the time indices n and n + 1.
Kolmogorov's criterion gives a necessary and sufficient condition for a Markov chain to be reversible directly from the transition matrix probabilities. The criterion requires that the products of probabilities around every closed loop are the same in both directions around the loop.
Reversible Markov chains are common in Markov chain Monte Carlo (MCMC) approaches because the detailed balance equation for a desired distribution π necessarily implies that the Markov chain has been constructed so that π is a steady-state distribution. Even with time-inhomogeneous Markov chains, where multiple transition matrices are used, if each such transition matrix exhibits detailed balance with the desired π distribution, this necessarily implies that π is a steady-state distribution of the Markov chain.
For any time-homogeneous Markov chain given by a transition matrix , any norm on which is induced by a scalar product, and any probability vector , there exists a unique transition matrix which is reversible according to and which is closest to according to the norm The matrix can be computed by solving a quadratic-convex optimization problem. [7]
For example, consider the following Markov chain:
This Markov chain is not reversible. According to the Frobenius Norm the closest reversible Markov chain according to can be computed as
If we choose the probability vector randomly as , then the closest reversible Markov chain according to the Frobenius norm is approximately given by
A distribution is a stationary distribution of the Markov chain with stochastic matrix if and only if . This can be written as: [1]
This condition implies that and hence if the Markov chain has initial distribution then (in distribution) for any . [1]
If a Markov chain is irreducible then it has a stationary distribution if and only if it is positive recurrent, [8] in which case the unique such distribution is given by where is the mean recurrence time of i. [1]
If a chain has more than one closed communicating class, its stationary distributions will not be unique (consider any closed communicating class in the chain; each one will have its own unique stationary distribution . Extending these distributions to the overall chain, setting all values to zero outside the communication class, yields that the set of invariant measures of the original chain is the set of all convex combinations of the 's). However, if a state j is aperiodic, then and for any other state i, let fij be the probability that the chain ever visits state j if it starts at i,
If a state i is periodic with period k > 1 then the limit does not exist, although the limit does exist for every integer r.
A Markov chain need not necessarily be time-homogeneous to have an equilibrium distribution. If there is a probability distribution over states such that
for every state j and every time n then is an equilibrium distribution of the Markov chain. Such can occur in Markov chain Monte Carlo (MCMC) methods in situations where a number of different transition matrices are used, because each is efficient for a particular kind of mixing, but each matrix respects a shared equilibrium distribution.
The hitting time is the time, starting in a given set of states until the chain arrives in a given state or set of states. The distribution of such a time period has a phase type distribution. The simplest such distribution is that of a single exponentially distributed transition.[ citation needed ]
For a subset of states A ⊆ S, the vector kA of hitting times (where element represents the expected value, starting in state i that the chain enters one of the states in the set A) is the minimal non-negative solution to [9]
An instance of ergodic theory, the ergodic theorem for states that for an irreducible aperiodic Markov chain, for any two states i and j, [1]
A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happens next depends only on the state of affairs now." A countably infinite sequence, in which the chain moves state at discrete time steps, gives a discrete-time Markov chain (DTMC). A continuous-time process is called a continuous-time Markov chain (CTMC). Markov processes are named in honor of the Russian mathematician Andrey Markov.
In mathematics, a stochastic matrix is a square matrix used to describe the transitions of a Markov chain. Each of its entries is a nonnegative real number representing a probability. It is also called a probability matrix, transition matrix, substitution matrix, or Markov matrix. The stochastic matrix was first developed by Andrey Markov at the beginning of the 20th century, and has found use throughout a wide variety of scientific fields, including probability theory, statistics, mathematical finance and linear algebra, as well as computer science and population genetics. There are several different definitions and types of stochastic matrices:
In information theory, the asymptotic equipartition property (AEP) is a general property of the output samples of a stochastic source. It is fundamental to the concept of typical set used in theories of data compression.
In electrical engineering, statistical computing and bioinformatics, the Baum–Welch algorithm is a special case of the expectation–maximization algorithm used to find the unknown parameters of a hidden Markov model (HMM). It makes use of the forward-backward algorithm to compute the statistics for the expectation step.
In probability theory, a Lévy process, named after the French mathematician Paul Lévy, is a stochastic process with independent, stationary increments: it represents the motion of a point whose successive displacements are random, in which displacements in pairwise disjoint time intervals are independent, and displacements in different time intervals of the same length have identical probability distributions. A Lévy process may thus be viewed as the continuous-time analog of a random walk.
A continuous-time Markov chain (CTMC) is a continuous stochastic process in which, for each state, the process will change state according to an exponential random variable and then move to a different state as specified by the probabilities of a stochastic matrix. An equivalent formulation describes the process as changing state according to the least value of a set of exponential random variables, one for each possible state it can move to, with the parameters determined by the current state.
In mathematics, a Markov decision process (MDP) is a discrete-time stochastic control process. It provides a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker. MDPs are useful for studying optimization problems solved via dynamic programming. MDPs were known at least as early as the 1950s; a core body of research on Markov decision processes resulted from Ronald Howard's 1960 book, Dynamic Programming and Markov Processes. They are used in many disciplines, including robotics, automatic control, economics and manufacturing. The name of MDPs comes from the Russian mathematician Andrey Markov as they are an extension of Markov chains.
In mathematics, subshifts of finite type are used to model dynamical systems, and in particular are the objects of study in symbolic dynamics and ergodic theory. They also describe the set of all possible sequences executed by a finite state machine. The most widely studied shift spaces are the subshifts of finite type.
In queueing theory, a discipline within the mathematical theory of probability, a Jackson network is a class of queueing network where the equilibrium distribution is particularly simple to compute as the network has a product-form solution. It was the first significant development in the theory of networks of queues, and generalising and applying the ideas of the theorem to search for similar product-form solutions in other networks has been the subject of much research, including ideas used in the development of the Internet. The networks were first identified by James R. Jackson and his paper was re-printed in the journal Management Science’s ‘Ten Most Influential Titles of Management Sciences First Fifty Years.’
In the mathematical discipline of graph theory, the expander walk sampling theorem intuitively states that sampling vertices in an expander graph by doing relatively short random walk can simulate sampling the vertices independently from a uniform distribution. The earliest version of this theorem is due to Ajtai, Komlós & Szemerédi (1987), and the more general version is typically attributed to Gillman (1998).
In the mathematical theory of probability, the entropy rate or source information rate is a function assigning an entropy to a stochastic process.
In probability theory, Kolmogorov's criterion, named after Andrey Kolmogorov, is a theorem giving a necessary and sufficient condition for a Markov chain or continuous-time Markov chain to be stochastically identical to its time-reversed version.
In the mathematical study of stochastic processes, a Harris chain is a Markov chain where the chain returns to a particular part of the state space an unbounded number of times. Harris chains are regenerative processes and are named after Theodore Harris. The theory of Harris chains and Harris recurrence is useful for treating Markov chains on general state spaces.
In probability theory, a balance equation is an equation that describes the probability flux associated with a Markov chain in and out of states or set of states.
In probability theory, uniformization method, is a method to compute transient solutions of finite state continuous-time Markov chains, by approximating the process by a discrete-time Markov chain. The original chain is scaled by the fastest transition rate γ, so that transitions occur at the same rate in every state, hence the name. The method is simple to program and efficiently calculates an approximation to the transient distribution at a single point in time. The method was first introduced by Winfried Grassmann in 1977.
The system size expansion, also known as van Kampen's expansion or the Ω-expansion, is a technique pioneered by Nico van Kampen used in the analysis of stochastic processes. Specifically, it allows one to find an approximation to the solution of a master equation with nonlinear transition rates. The leading order term of the expansion is given by the linear noise approximation, in which the master equation is approximated by a Fokker–Planck equation with linear coefficients determined by the transition rates and stoichiometry of the system.
In probability theory, the matrix analytic method is a technique to compute the stationary probability distribution of a Markov chain which has a repeating structure and a state space which grows unboundedly in no more than one dimension. Such models are often described as M/G/1 type Markov chains because they can describe transitions in an M/G/1 queue. The method is a more complicated version of the matrix geometric method and is the classical solution method for M/G/1 chains.
In probability theory, Kelly's lemma states that for a stationary continuous-time Markov chain, a process defined as the time-reversed process has the same stationary distribution as the forward-time process. The theorem is named after Frank Kelly.
Maximal entropy random walk (MERW) is a popular type of biased random walk on a graph, in which transition probabilities are chosen accordingly to the principle of maximum entropy, which says that the probability distribution which best represents the current state of knowledge is the one with largest entropy. While standard random walk chooses for every vertex uniform probability distribution among its outgoing edges, locally maximizing entropy rate, MERW maximizes it globally by assuming uniform probability distribution among all paths in a given graph.
In the mathematical theory of random processes, the Markov chain central limit theorem has a conclusion somewhat similar in form to that of the classic central limit theorem (CLT) of probability theory, but the quantity in the role taken by the variance in the classic CLT has a more complicated definition. See also the general form of Bienaymé's identity.