Markov chains on a measurable state space

Last updated July 10, 2021

A Markov chain on a measurable state space is a discrete-time-homogeneous Markov chain with a measurable space as state space.

History

The definition of Markov chains has evolved during the 20th century. In 1953 the term Markov chain was used for stochastic processes with discrete or continuous index set, living on a countable or finite state space, see Doob.^[1] or Chung.^[2] Since the late 20th century it became more popular to consider a Markov chain as a stochastic process with discrete index set, living on a measurable state space.^[3]^[4]^[5]

Definition

Denote with $(E,\Sigma )$ a measurable space and with $p$ a Markov kernel with source and target $(E,\Sigma )$ . A stochastic process $(X_{n})_{n\in \mathbb {N} }$ on $(\Omega ,{\mathcal {F}},\mathbb {P} )$ is called a time homogeneous Markov chain with Markov kernel $p$ and start distribution $\mu$ if

\mathbb {P} [X_{0}\in A_{0},X_{1}\in A_{1},\dots ,X_{n}\in A_{n}]=\int _{A_{0}}\dots \int _{A_{n-1}}p(y_{n-1},A_{n})\,p(y_{n-2},dy_{n-1})\dots p(y_{0},dy_{1})\,\mu (dy_{0})

is satisfied for any $n\in \mathbb {N} ,\,A_{0},\dots ,A_{n}\in \Sigma$ . One can construct for any Markov kernel and any probability measure an associated Markov chain.^[4]

Remark about Markov kernel integration

For any measure $\mu \colon \Sigma \to [0,\infty ]$ we denote for $\mu$ -integrable function $f\colon E\to \mathbb {R} \cup \{\infty ,-\infty \}$ the Lebesgue integral as $\int _{E}f(x)\,\mu (dx)$ . For the measure $\nu _{x}\colon \Sigma \to [0,\infty ]$ defined by $\nu _{x}(A):=p(x,A)$ we used the following notation:

\int _{E}f(y)\,p(x,dy):=\int _{E}f(y)\,\nu _{x}(dy).

Basic properties

Starting in a single point

If $\mu$ is a Dirac measure in $x$ , we denote for a Markov kernel $p$ with starting distribution $\mu$ the associated Markov chain as $(X_{n})_{n\in \mathbb {N} }$ on $(\Omega ,{\mathcal {F}},\mathbb {P} _{x})$ and the expectation value

\mathbb {E} _{x}[X]=\int _{\Omega }X(\omega )\,\mathbb {P} _{x}(d\omega )

for a $\mathbb {P} _{x}$ -integrable function $X$ . By definition, we have then $\mathbb {P} _{x}[X_{0}=x]=1$ .

We have for any measurable function $f\colon E\to [0,\infty ]$ the following relation:^[4]

\int _{E}f(y)\,p(x,dy)=\mathbb {E} _{x}[f(X_{1})].

Family of Markov kernels

For a Markov kernel $p$ with starting distribution $\mu$ one can introduce a family of Markov kernels $(p_{n})_{n\in \mathbb {N} }$ by

p_{n+1}(x,A):=\int _{E}p_{n}(y,A)\,p(x,dy)

for $n\in \mathbb {N} ,\,n\geq 1$ and $p_{1}:=p$ . For the associated Markov chain $(X_{n})_{n\in \mathbb {N} }$ according to $p$ and $\mu$ one obtains

\mathbb {P} [X_{0}\in A,\,X_{n}\in B]=\int _{A}p_{n}(x,B)\,\mu (dx)

.

Stationary measure

A probability measure $\mu$ is called stationary measure of a Markov kernel $p$ if

\int _{A}\mu (dx)=\int _{E}p(x,A)\,\mu (dx)

holds for any $A\in \Sigma$ . If $(X_{n})_{n\in \mathbb {N} }$ on $(\Omega ,{\mathcal {F}},\mathbb {P} )$ denotes the Markov chain according to a Markov kernel $p$ with stationary measure $\mu$ , and the distribution of $X_{0}$ is $\mu$ , then all $X_{n}$ have the same probability distribution, namely:

\mathbb {P} [X_{n}\in A]=\mu (A)

for any $A\in \Sigma$ .

Reversibility

A Markov kernel $p$ is called reversible according to a probability measure $\mu$ if

\int _{A}p(x,B)\,\mu (dx)=\int _{B}p(x,A)\,\mu (dx)

holds for any $A,B\in \Sigma$ . Replacing $A=E$ shows that if $p$ is reversible according to $\mu$ , then $\mu$ must be a stationary measure of $p$ .

Related Research Articles

In mathematical analysis and in probability theory, a σ-algebra on a set X is a collection Σ of subsets of X that includes X itself, is closed under complement, and is closed under countable unions.

In the mathematical field of real analysis, the monotone convergence theorem is any of a number of related theorems proving the convergence of monotonic sequences that are also bounded. Informally, the theorems state that if a sequence is increasing and bounded above by a supremum, then the sequence will converge to the supremum; in the same way, if a sequence is decreasing and is bounded below by an infimum, it will converge to the infimum.

In mathematical analysis, Hölder's inequality, named after Otto Hölder, is a fundamental inequality between integrals and an indispensable tool for the study of $L p$ spaces.

In measure theory, Lebesgue's dominated convergence theorem provides sufficient conditions under which almost everywhere convergence of a sequence of functions implies convergence in the L¹ norm. Its power and utility are two of the primary theoretical advantages of Lebesgue integration over Riemann integration.

In probability theory, the conditional expectation, conditional expected value, or conditional mean of a random variable is its expected value – the value it would take “on average” over an arbitrarily large number of occurrences – given that a certain set of "conditions" is known to occur. If the random variable can take on only a finite number of values, the “conditions” are that the variable can only take on a subset of those values. More formally, in the case when the random variable is defined over a discrete probability space, the "conditions" are a partition of this probability space.

In probability theory and statistics, given two jointly distributed random variables $and, the conditional probability distribution of Y given X is the probability distribution of when is known to be a particular value; in some cases the conditional probabilities may be expressed as functions containing the unspecified value of as a parameter. When both and are categorical variables, a conditional probability table is typically used to represent the conditional probability. The conditional distribution contrasts with the marginal distribution of a random variable, which is its distribution without reference to the value of the other variable.$

In mathematics, the total variation identifies several slightly different concepts, related to the structure of the codomain of a function or a measure. For a real-valued continuous function f, defined on an interval [a, b] ⊂ R, its total variation on the interval of definition is a measure of the one-dimensional arclength of the curve with parametric equation x ↦ f(x), for x ∈ [a, b]. Functions whose total variation is finite are called functions of bounded variation.

In mathematics, the Gibbs measure, named after Josiah Willard Gibbs, is a probability measure frequently seen in many problems of probability theory and statistical mechanics. It is a generalization of the canonical ensemble to infinite systems. The canonical ensemble gives the probability of the system X being in state x as

In probability theory, calculation of the sum of normally distributed random variables is an instance of the arithmetic of random variables, which can be quite complex based on the probability distributions of the random variables involved and their relationships.

In mathematics, the Bochner integral, named for Salomon Bochner, extends the definition of Lebesgue integral to functions that take values in a Banach space, as the limit of integrals of simple functions.

In mathematics, a $π$ -system on a set $is a collection of certain subsets of such that$

In probability theory, random element is a generalization of the concept of random variable to more complicated spaces than the simple real line. The concept was introduced by Maurice Fréchet (1948) who commented that the “development of probability theory and expansion of area of its applications have led to necessity to pass from schemes where (random) outcomes of experiments can be described by number or a finite set of numbers, to schemes where outcomes of experiments represent, for example, vectors, functions, processes, fields, series, transformations, and also sets or collections of sets.”

In mathematics, finite-dimensional distributions are a tool in the study of measures and stochastic processes. A lot of information can be gained by studying the "projection" of a measure onto a finite-dimensional vector space.

In mathematics, the Kolmogorov extension theorem is a theorem that guarantees that a suitably "consistent" collection of finite-dimensional distributions will define a stochastic process. It is credited to the English mathematician Percy John Daniell and the Russian mathematician Andrey Nikolaevich Kolmogorov.

In probability theory, a random measure is a measure-valued random element. Random measures are for example used in the theory of random processes, where they form many important point processes such as Poisson point processes and Cox processes.

In the mathematical theory of probability, a Borel right process, named after Émile Borel, is a particular kind of continuous-time random process.

In the mathematical study of stochastic processes, a Harris chain is a Markov chain where the chain returns to a particular part of the state space an unbounded number of times. Harris chains are regenerative processes and are named after Theodore Harris. The theory of Harris chains and Harris recurrence is useful for treating Markov chains on general state spaces.

In mathematics, the Pettis integral or Gelfand–Pettis integral, named after Israel M. Gelfand and Billy James Pettis, extends the definition of the Lebesgue integral to vector-valued functions on a measure space, by exploiting duality. The integral was introduced by Gelfand for the case when the measure space is an interval with Lebesgue measure. The integral is also called the weak integral in contrast to the Bochner integral, which is the strong integral.

In probability theory, a Markov kernel is a map that in the general theory of Markov processes, plays the role that the transition matrix does in the theory of Markov processes with a finite state space.

In mathematics, the Łukaszyk–Karmowski metric is a function defining a distance between two random variables or two random vectors. This function is not a metric as it does not satisfy the identity of indiscernibles condition of the metric, that is for two identical arguments its value is greater than zero. The concept is named after Szymon Łukaszyk and Wojciech Karmowski.

References

↑ Joseph L. Doob: Stochastic Processes. New York: John Wiley & Sons, 1953.
↑ Kai L. Chung: Markov Chains with Stationary Transition Probabilities. Second edition. Berlin: Springer-Verlag, 1974.
↑ Sean Meyn and Richard L. Tweedie: Markov Chains and Stochastic Stability. 2nd edition, 2009.
1 2 3 Daniel Revuz: Markov Chains. 2nd edition, 1984.
↑ Rick Durrett: Probability: Theory and Examples. Fourth edition, 2005.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Joseph L. Doob: Stochastic Processes. New York: John Wiley & Sons, 1953.

[2] Kai L. Chung: Markov Chains with Stationary Transition Probabilities. Second edition. Berlin: Springer-Verlag, 1974.

[3] Sean Meyn and Richard L. Tweedie: Markov Chains and Stochastic Stability. 2nd edition, 2009.

[revuz-4] 1 2 3 Daniel Revuz: Markov Chains. 2nd edition, 1984.

[5] Rick Durrett: Probability: Theory and Examples. Fourth edition, 2005.

[1]

[2]

[3]

[4]

[5]