Markov kernel

Last updated September 12, 2024

In probability theory, a Markov kernel (also known as a stochastic kernel or probability kernel) is a map that in the general theory of Markov processes plays the role that the transition matrix does in the theory of Markov processes with a finite state space.^[1]

Formal definition
Examples
Simple random walk on the integers
General Markov processes with countable state space
Markov kernel defined by a kernel function and a measure
Measurable functions
Galton–Watson process
Composition of Markov Kernels
Probability Space defined by Probability Distribution and a Markov Kernel
Properties
Semidirect product
Regular conditional distribution
Generalizations
External links
References
See also

Formal definition

Let $(X,{\mathcal {A}})$ and $(Y,{\mathcal {B}})$ be measurable spaces. A Markov kernel with source $(X,{\mathcal {A}})$ and target $(Y,{\mathcal {B}})$ , sometimes written as ${\displaystyle \kappa$ , is a function ${\displaystyle \kappa$ with the following properties:

For every (fixed) $B_{0}\in {\mathcal {B}}$ , the map $x\mapsto \kappa (B_{0},x)$ is ${\mathcal {A}}$ -measurable
For every (fixed) $x_{0}\in X$ , the map $B\mapsto \kappa (B,x_{0})$ is a probability measure on $(Y,{\mathcal {B}})$

In other words it associates to each point $x\in X$ a probability measure $\kappa (dy|x):B\mapsto \kappa (B,x)$ on $(Y,{\mathcal {B}})$ such that, for every measurable set $B\in {\mathcal {B}}$ , the map $x\mapsto \kappa (B,x)$ is measurable with respect to the $\sigma$ -algebra ${\mathcal {A}}$ .^[2]

Examples

Simple random walk on the integers

Take $X=Y=\mathbb {Z}$ , and ${\mathcal {A}}={\mathcal {B}}={\mathcal {P}}(\mathbb {Z} )$ (the power set of $\mathbb {Z}$ ). Then a Markov kernel is fully determined by the probability it assigns to singletons $\{m\},\,m\in Y=\mathbb {Z}$ for each $n\in X=\mathbb {Z}$ :

\kappa (B|n)=\sum _{m\in B}\kappa (\{m\}|n),\qquad \forall n\in \mathbb {Z} ,\,\forall B\in {\mathcal {B}}

.

Now the random walk $\kappa$ that goes to the right with probability $p$ and to the left with probability $1-p$ is defined by

\kappa (\{m\}|n)=p\delta _{m,n+1}+(1-p)\delta _{m,n-1},\quad \forall n,m\in \mathbb {Z}

where $\delta$ is the Kronecker delta. The transition probabilities $P(m|n)=\kappa (\{m\}|n)$ for the random walk are equivalent to the Markov kernel.

General Markov processes with countable state space

More generally take $X$ and $Y$ both countable and ${\mathcal {A}}={\mathcal {P}}(X),\ {\mathcal {B}}={\mathcal {P}}(Y)$ . Again a Markov kernel is defined by the probability it assigns to singleton sets for each $i\in X$

\kappa (B|i)=\sum _{j\in B}\kappa (\{j\}|i),\qquad \forall i\in X,\,\forall B\in {\mathcal {B}}

,

We define a Markov process by defining a transition probability $P(j|i)=K_{ji}$ where the numbers $K_{ji}$ define a (countable) stochastic matrix $(K_{ji})$ i.e.

{\begin{aligned}K_{ji}&\geq 0,\qquad &\forall (j,i)\in Y\times X,\\\sum _{j\in Y}K_{ji}&=1,\qquad &\forall i\in X.\\\end{aligned}}

We then define

\kappa (\{j\}|i)=K_{ji}=P(j|i),\qquad \forall i\in X,\quad \forall B\in {\mathcal {B}}

.

Again the transition probability, the stochastic matrix and the Markov kernel are equivalent reformulations.

Markov kernel defined by a kernel function and a measure

Let $\nu$ be a measure on $(Y,{\mathcal {B}})$ , and $k:Y\times X\to [0,\infty ]$ a measurable function with respect to the product $\sigma$ -algebra ${\mathcal {A}}\otimes {\mathcal {B}}$ such that

\int _{Y}k(y,x)\nu (\mathrm {d} y)=1,\qquad \forall x\in X

,

then $\kappa (dy|x)=k(y,x)\nu (dy)$ i.e. the mapping

{\displaystyle {\begin{cases}\kappa

defines a Markov kernel.^[3] This example generalises the countable Markov process example where $\nu$ was the counting measure. Moreover it encompasses other important examples such as the convolution kernels, in particular the Markov kernels defined by the heat equation. The latter example includes the Gaussian kernel on $X=Y=\mathbb {R}$ with $\nu (dx)=dx$ standard Lebesgue measure and

k_{t}(y,x)={\frac {1}{{\sqrt {2\pi }}t}}e^{-(y-x)^{2}/(2t^{2})}

.

Measurable functions

Take $(X,{\mathcal {A}})$ and $(Y,{\mathcal {B}})$ arbitrary measurable spaces, and let $f:X\to Y$ be a measurable function. Now define $\kappa (dy|x)=\delta _{f(x)}(dy)$ i.e.

\kappa (B|x)=\mathbf {1} _{B}(f(x))=\mathbf {1} _{f^{-1}(B)}(x)={\begin{cases}1&{\text{if }}f(x)\in B\\0&{\text{otherwise}}\end{cases}}

for all

B\in {\mathcal {B}}

.

Note that the indicator function $\mathbf {1} _{f^{-1}(B)}$ is ${\mathcal {A}}$ -measurable for all $B\in {\mathcal {B}}$ iff $f$ is measurable.

This example allows us to think of a Markov kernel as a generalised function with a (in general) random rather than certain value. That is, it is a multivalued function where the values are not equally weighted.

Galton–Watson process

As a less obvious example, take $X=\mathbb {N} ,{\mathcal {A}}={\mathcal {P}}(\mathbb {N} )$ , and $(Y,{\mathcal {B}})$ the real numbers $\mathbb {R}$ with the standard sigma algebra of Borel sets. Then

\kappa (B|n)={\begin{cases}\mathbf {1} _{B}(0)&n=0\\\Pr(\xi _{1}+\cdots +\xi _{x}\in B)&n\neq 0\\\end{cases}}

where $x$ is the number of element at the state $n$ , $\xi _{i}$ are i.i.d. random variables (usually with mean 0) and where $\mathbf {1} _{B}$ is the indicator function. For the simple case of coin flips this models the different levels of a Galton board.

Composition of Markov Kernels

Given measurable spaces $(X,{\mathcal {A}})$ , $(Y,{\mathcal {B}})$ we consider a Markov kernel ${\displaystyle \kappa$ as a morphism $\kappa :X\to Y$ . Intuitively, rather than assigning to each $x\in X$ a sharply defined point $y\in Y$ the kernel assigns a "fuzzy" point in $Y$ which is only known with some level of uncertainty, much like actual physical measurements. If we have a third measurable space $(Z,{\mathcal {C}})$ , and probability kernels $\kappa :X\to Y$ and $\lambda :Y\to Z$ , we can define a composition $\lambda \circ \kappa :X\to Z$ by the Chapman-Kolmogorov equation

(\lambda \circ \kappa )(dz|x)=\int _{Y}\lambda (dz|y)\kappa (dy|x)

.

The composition is associative by the Monotone Convergence Theorem and the identity function considered as a Markov kernel (i.e. the delta measure $\kappa _{1}(dx'|x)=\delta _{x}(dx')$ ) is the unit for this composition.

This composition defines the structure of a category on the measurable spaces with Markov kernels as morphisms, first defined by Lawvere,^[4] the category of Markov kernels.

Probability Space defined by Probability Distribution and a Markov Kernel

A composition of a probability space $(X,{\mathcal {A}},P_{X})$ and a probability kernel ${\displaystyle \kappa$ defines a probability space $(Y,{\mathcal {B}},P_{Y}=\kappa \circ P_{X})$ , where the probability measure is given by

P_{Y}(B)=\int _{X}\int _{B}\kappa (dy|x)P_{X}(dx)=\int _{X}\kappa (B|x)P_{X}(dx)=\mathbb {E} _{P_{X}}\kappa (B|\cdot ).

Properties

Semidirect product

Let $(X,{\mathcal {A}},P)$ be a probability space and $\kappa$ a Markov kernel from $(X,{\mathcal {A}})$ to some $(Y,{\mathcal {B}})$ . Then there exists a unique measure $Q$ on $(X\times Y,{\mathcal {A}}\otimes {\mathcal {B}})$ , such that:

Q(A\times B)=\int _{A}\kappa (B|x)\,P(dx),\quad \forall A\in {\mathcal {A}},\quad \forall B\in {\mathcal {B}}.

Regular conditional distribution

Let $(S,Y)$ be a Borel space, $X$ a $(S,Y)$ -valued random variable on the measure space $(\Omega ,{\mathcal {F}},P)$ and ${\mathcal {G}}\subseteq {\mathcal {F}}$ a sub- $\sigma$ -algebra. Then there exists a Markov kernel $\kappa$ from $(\Omega ,{\mathcal {G}})$ to $(S,Y)$ , such that $\kappa (\cdot ,B)$ is a version of the conditional expectation $\mathbb {E} [\mathbf {1} _{\{X\in B\}}\mid {\mathcal {G}}]$ for every $B\in Y$ , i.e.

P(X\in B\mid {\mathcal {G}})=\mathbb {E} \left[\mathbf {1} _{\{X\in B\}}\mid {\mathcal {G}}\right]=\kappa (\cdot ,B),\qquad P{\text{-a.s.}}\,\,\forall B\in {\mathcal {G}}.

It is called regular conditional distribution of $X$ given ${\mathcal {G}}$ and is not uniquely defined.

Generalizations

Transition kernels generalize Markov kernels in the sense that for all $x\in X$ , the map

B\mapsto \kappa (B|x)

can be any type of (non negative) measure, not necessarily a probability measure.

External links

Markov kernel in nLab.

Related Research Articles

In mathematics, the $L p$ spaces are function spaces defined using a natural generalization of the $p$ -norm for finite-dimensional vector spaces. They are sometimes called Lebesgue spaces, named after Henri Lebesgue, although according to the Bourbaki group they were first introduced by Frigyes Riesz.

In mathematical analysis, Hölder's inequality, named after Otto Hölder, is a fundamental inequality between integrals and an indispensable tool for the study of $L p$ spaces.

In mathematics, Fatou's lemma establishes an inequality relating the Lebesgue integral of the limit inferior of a sequence of functions to the limit inferior of integrals of these functions. The lemma is named after Pierre Fatou.

In probability theory, the conditional expectation, conditional expected value, or conditional mean of a random variable is its expected value evaluated with respect to the conditional probability distribution. If the random variable can take on only a finite number of values, the "conditions" are that the variable can only take on a subset of those values. More formally, in the case when the random variable is defined over a discrete probability space, the "conditions" are a partition of this probability space.

In probability theory, a Hunt process is a type of Markov process, named for mathematician Gilbert A. Hunt who first defined them in 1957. Hunt processes were important in the study of probabilistic potential theory until they were superseded by right processes in the 1970s.

In probability theory, random element is a generalization of the concept of random variable to more complicated spaces than the simple real line. The concept was introduced by Maurice Fréchet who commented that the “development of probability theory and expansion of area of its applications have led to necessity to pass from schemes where (random) outcomes of experiments can be described by number or a finite set of numbers, to schemes where outcomes of experiments represent, for example, vectors, functions, processes, fields, series, transformations, and also sets or collections of sets.”

In probability theory, a random measure is a measure-valued random element. Random measures are for example used in the theory of random processes, where they form many important point processes such as Poisson point processes and Cox processes.

In operator theory, a branch of mathematics, a positive-definite kernel is a generalization of a positive-definite function or a positive-definite matrix. It was first introduced by James Mercer in the early 20th century, in the context of solving integral operator equations. Since then, positive-definite functions and their various analogues and generalizations have arisen in diverse parts of mathematics. They occur naturally in Fourier analysis, probability theory, operator theory, complex function-theory, moment problems, integral equations, boundary-value problems for partial differential equations, machine learning, embedding problem, information theory, and other areas.

An $- superprocess,, within mathematics probability theory is a stochastic process on that is usually constructed as a special limit of near-critical branching diffusions.$

In probability theory, regular conditional probability is a concept that formalizes the notion of conditioning on the outcome of a random variable. The resulting conditional probability distribution is a parametrized family of probability measures called a Markov kernel.

In mathematics, specifically in the theory of Markovian stochastic processes in probability theory, the Chapman–Kolmogorov equation (CKE) is an identity relating the joint probability distributions of different sets of coordinates on a stochastic process. The equation was derived independently by both the British mathematician Sydney Chapman and the Russian mathematician Andrey Kolmogorov. The CKE is prominently used in recent Variational Bayesian methods.

In machine learning, the kernel embedding of distributions comprises a class of nonparametric methods in which a probability distribution is represented as an element of a reproducing kernel Hilbert space (RKHS). A generalization of the individual data-point feature mapping done in classical kernel methods, the embedding of distributions into infinite-dimensional feature spaces can preserve all of the statistical features of arbitrary distributions, while allowing one to compare and manipulate distributions using Hilbert space operations such as inner products, distances, projections, linear transformations, and spectral analysis. This learning framework is very general and can be applied to distributions over any space $on which a sensible kernel function may be defined. For example, various kernels have been proposed for learning from data which are: vectors in, discrete classes/categories, strings, graphs/networks, images, time series, manifolds, dynamical systems, and other structured objects. The theory behind kernel embeddings of distributions has been primarily developed by Alex Smola, Le Song, Arthur Gretton, and Bernhard Schölkopf. A review of recent works on kernel embedding of distributions can be found in.$

A Markov chain on a measurable state space is a discrete-time-homogeneous Markov chain with a measurable space as state space.

In the mathematics of probability, a transition kernel or kernel is a function in mathematics that has different applications. Kernels can for example be used to define random measures or stochastic processes. The most important example of kernels are the Markov kernels.

In the mathematical theory of probability, the Ionescu-Tulcea theorem, sometimes called the Ionesco Tulcea extension theorem, deals with the existence of probability measures for probabilistic events consisting of a countably infinite number of individual probabilistic events. In particular, the individual events may be independent or dependent with respect to each other. Thus, the statement goes beyond the mere existence of countable product measures. The theorem was proved by Cassius Ionescu-Tulcea in 1949.

Poisson-type random measures are a family of three random counting measures which are closed under restriction to a subspace, i.e. closed under thinning. They are the only distributions in the canonical non-negative power series family of distributions to possess this property and include the Poisson distribution, negative binomial distribution, and binomial distribution. The PT family of distributions is also known as the Katz family of distributions, the Panjer or (a,b,0) class of distributions and may be retrieved through the Conway–Maxwell–Poisson distribution.

A Stein discrepancy is a statistical divergence between two probability measures that is rooted in Stein's method. It was first formulated as a tool to assess the quality of Markov chain Monte Carlo samplers, but has since been used in diverse settings in statistics, machine learning and computer science.

In probability theory and ergodic theory, a Markov operator is an operator on a certain function space that conserves the mass. If the underlying measurable space is topologically sufficiently rich enough, then the Markov operator admits a kernel representation. Markov operators can be linear or non-linear. Closely related to Markov operators is the Markov semigroup.

In mathematics, the category of Markov kernels, often denoted Stoch, is the category whose objects are measurable spaces and whose morphisms are Markov kernels. It is analogous to the category of sets and functions, but where the arrows can be interpreted as being stochastic.

In mathematics, the Giry monad is a construction that assigns to a measurable space a space of probability measures over it, equipped with a canonical sigma-algebra. It is one of the main examples of a probability monad.

References

↑ Reiss, R. D. (1993). A Course on Point Processes. Springer Series in Statistics. doi:10.1007/978-1-4613-9308-5. ISBN 978-1-4613-9310-8.
↑ Klenke, Achim (2014). Probability Theory: A Comprehensive Course. Universitext (2 ed.). Springer. p. 180. doi:10.1007/978-1-4471-5361-0. ISBN 978-1-4471-5360-3.
↑ Erhan, Cinlar (2011). Probability and Stochastics. New York: Springer. pp. 37–38. ISBN 978-0-387-87858-4.
↑ F. W. Lawvere (1962). "The Category of Probabilistic Mappings" (PDF).

Bauer, Heinz (1996), Probability Theory, de Gruyter, ISBN 3-11-013935-9

§36. Kernels and semigroups of kernels