Markov kernel

Last updated

In probability theory, a Markov kernel (also known as a stochastic kernel or probability kernel) is a map that in the general theory of Markov processes plays the role that the transition matrix does in the theory of Markov processes with a finite state space. [1]

Contents

Formal definition

Let and be measurable spaces. A Markov kernel with source and target , sometimes written as , is a function with the following properties:

  1. For every (fixed) , the map is -measurable
  2. For every (fixed) , the map is a probability measure on

In other words it associates to each point a probability measure on such that, for every measurable set , the map is measurable with respect to the -algebra . [2]

Examples

Simple random walk on the integers

Take , and (the power set of ). Then a Markov kernel is fully determined by the probability it assigns to singletons for each :

.

Now the random walk that goes to the right with probability and to the left with probability is defined by

where is the Kronecker delta. The transition probabilities for the random walk are equivalent to the Markov kernel.

General Markov processes with countable state space

More generally take and both countable and . Again a Markov kernel is defined by the probability it assigns to singleton sets for each

,

We define a Markov process by defining a transition probability where the numbers define a (countable) stochastic matrix i.e.

We then define

.

Again the transition probability, the stochastic matrix and the Markov kernel are equivalent reformulations.

Markov kernel defined by a kernel function and a measure

Let be a measure on , and a measurable function with respect to the product -algebra such that

,

then i.e. the mapping

defines a Markov kernel. [3] This example generalises the countable Markov process example where was the counting measure. Moreover it encompasses other important examples such as the convolution kernels, in particular the Markov kernels defined by the heat equation. The latter example includes the Gaussian kernel on with standard Lebesgue measure and

.

Measurable functions

Take and arbitrary measurable spaces, and let be a measurable function. Now define i.e.

for all .

Note that the indicator function is -measurable for all iff is measurable.

This example allows us to think of a Markov kernel as a generalised function with a (in general) random rather than certain value. That is, it is a multivalued function where the values are not equally weighted.

Galton–Watson process

As a less obvious example, take , and the real numbers with the standard sigma algebra of Borel sets. Then

where is the number of element at the state , are i.i.d. random variables (usually with mean 0) and where is the indicator function. For the simple case of coin flips this models the different levels of a Galton board.

Composition of Markov Kernels

Given measurable spaces , we consider a Markov kernel as a morphism . Intuitively, rather than assigning to each a sharply defined point the kernel assigns a "fuzzy" point in which is only known with some level of uncertainty, much like actual physical measurements. If we have a third measurable space , and probability kernels and , we can define a composition by the Chapman-Kolmogorov equation

.

The composition is associative by the Monotone Convergence Theorem and the identity function considered as a Markov kernel (i.e. the delta measure ) is the unit for this composition.

This composition defines the structure of a category on the measurable spaces with Markov kernels as morphisms, first defined by Lawvere, [4] the category of Markov kernels.

Probability Space defined by Probability Distribution and a Markov Kernel

A composition of a probability space and a probability kernel defines a probability space , where the probability measure is given by

Properties

Semidirect product

Let be a probability space and a Markov kernel from to some . Then there exists a unique measure on , such that:

Regular conditional distribution

Let be a Borel space, a -valued random variable on the measure space and a sub--algebra. Then there exists a Markov kernel from to , such that is a version of the conditional expectation for every , i.e.

It is called regular conditional distribution of given and is not uniquely defined.

Generalizations

Transition kernels generalize Markov kernels in the sense that for all , the map

can be any type of (non negative) measure, not necessarily a probability measure.

Related Research Articles

In mathematics, the Lp spaces are function spaces defined using a natural generalization of the p-norm for finite-dimensional vector spaces. They are sometimes called Lebesgue spaces, named after Henri Lebesgue, although according to the Bourbaki group they were first introduced by Frigyes Riesz.

In mathematical analysis, Hölder's inequality, named after Otto Hölder, is a fundamental inequality between integrals and an indispensable tool for the study of Lp spaces.

In mathematics, Fatou's lemma establishes an inequality relating the Lebesgue integral of the limit inferior of a sequence of functions to the limit inferior of integrals of these functions. The lemma is named after Pierre Fatou.

In probability theory, the conditional expectation, conditional expected value, or conditional mean of a random variable is its expected value evaluated with respect to the conditional probability distribution. If the random variable can take on only a finite number of values, the "conditions" are that the variable can only take on a subset of those values. More formally, in the case when the random variable is defined over a discrete probability space, the "conditions" are a partition of this probability space.

In the mathematical fields of linear algebra and functional analysis, the orthogonal complement of a subspace of a vector space equipped with a bilinear form is the set of all vectors in that are orthogonal to every vector in . Informally, it is called the perp, short for perpendicular complement. It is a subspace of .

In probability theory, random element is a generalization of the concept of random variable to more complicated spaces than the simple real line. The concept was introduced by Maurice Fréchet who commented that the “development of probability theory and expansion of area of its applications have led to necessity to pass from schemes where (random) outcomes of experiments can be described by number or a finite set of numbers, to schemes where outcomes of experiments represent, for example, vectors, functions, processes, fields, series, transformations, and also sets or collections of sets.”

In probability theory, a random measure is a measure-valued random element. Random measures are for example used in the theory of random processes, where they form many important point processes such as Poisson point processes and Cox processes.

In operator theory, a branch of mathematics, a positive-definite kernel is a generalization of a positive-definite function or a positive-definite matrix. It was first introduced by James Mercer in the early 20th century, in the context of solving integral operator equations. Since then, positive-definite functions and their various analogues and generalizations have arisen in diverse parts of mathematics. They occur naturally in Fourier analysis, probability theory, operator theory, complex function-theory, moment problems, integral equations, boundary-value problems for partial differential equations, machine learning, embedding problem, information theory, and other areas.

An -superprocess, , within mathematics probability theory is a stochastic process on that is usually constructed as a special limit of near-critical branching diffusions.

In probability theory, regular conditional probability is a concept that formalizes the notion of conditioning on the outcome of a random variable. The resulting conditional probability distribution is a parametrized family of probability measures called a Markov kernel.

In mathematics, specifically in the theory of Markovian stochastic processes in probability theory, the Chapman–Kolmogorov equation (CKE) is an identity relating the joint probability distributions of different sets of coordinates on a stochastic process. The equation was derived independently by both the British mathematician Sydney Chapman and the Russian mathematician Andrey Kolmogorov. The CKE is prominently used in recent Variational Bayesian methods.

In machine learning, the kernel embedding of distributions comprises a class of nonparametric methods in which a probability distribution is represented as an element of a reproducing kernel Hilbert space (RKHS). A generalization of the individual data-point feature mapping done in classical kernel methods, the embedding of distributions into infinite-dimensional feature spaces can preserve all of the statistical features of arbitrary distributions, while allowing one to compare and manipulate distributions using Hilbert space operations such as inner products, distances, projections, linear transformations, and spectral analysis. This learning framework is very general and can be applied to distributions over any space on which a sensible kernel function may be defined. For example, various kernels have been proposed for learning from data which are: vectors in , discrete classes/categories, strings, graphs/networks, images, time series, manifolds, dynamical systems, and other structured objects. The theory behind kernel embeddings of distributions has been primarily developed by Alex Smola, Le Song , Arthur Gretton, and Bernhard Schölkopf. A review of recent works on kernel embedding of distributions can be found in.

A Markov chain on a measurable state space is a discrete-time-homogeneous Markov chain with a measurable space as state space.

In the mathematics of probability, a transition kernel or kernel is a function in mathematics that has different applications. Kernels can for example be used to define random measures or stochastic processes. The most important example of kernels are the Markov kernels.

In the mathematical theory of probability, the Ionescu-Tulcea theorem, sometimes called the Ionesco Tulcea extension theorem, deals with the existence of probability measures for probabilistic events consisting of a countably infinite number of individual probabilistic events. In particular, the individual events may be independent or dependent with respect to each other. Thus, the statement goes beyond the mere existence of countable product measures. The theorem was proved by Cassius Ionescu-Tulcea in 1949.

Poisson-type random measures are a family of three random counting measures which are closed under restriction to a subspace, i.e. closed under thinning. They are the only distributions in the canonical non-negative power series family of distributions to possess this property and include the Poisson distribution, negative binomial distribution, and binomial distribution. The PT family of distributions is also known as the Katz family of distributions, the Panjer or (a,b,0) class of distributions and may be retrieved through the Conway–Maxwell–Poisson distribution.

A Stein discrepancy is a statistical divergence between two probability measures that is rooted in Stein's method. It was first formulated as a tool to assess the quality of Markov chain Monte Carlo samplers, but has since been used in diverse settings in statistics, machine learning and computer science.

In probability theory and ergodic theory, a Markov operator is an operator on a certain function space that conserves the mass. If the underlying measurable space is topologically sufficiently rich enough, then the Markov operator admits a kernel representation. Markov operators can be linear or non-linear. Closely related to Markov operators is the Markov semigroup.

In mathematics, the category of Markov kernels, often denoted Stoch, is the category whose objects are measurable spaces and whose morphisms are Markov kernels. It is analogous to the category of sets and functions, but where the arrows can be interpreted as being stochastic.

In mathematics, the Giry monad is a construction that assigns to a measurable space a space of probability measures over it, equipped with a canonical sigma-algebra. It is one of the main examples of a probability monad.

References

  1. Reiss, R. D. (1993). A Course on Point Processes. Springer Series in Statistics. doi:10.1007/978-1-4613-9308-5. ISBN   978-1-4613-9310-8.
  2. Klenke, Achim (2014). Probability Theory: A Comprehensive Course. Universitext (2 ed.). Springer. p. 180. doi:10.1007/978-1-4471-5361-0. ISBN   978-1-4471-5360-3.
  3. Erhan, Cinlar (2011). Probability and Stochastics. New York: Springer. pp. 37–38. ISBN   978-0-387-87858-4.
  4. F. W. Lawvere (1962). "The Category of Probabilistic Mappings" (PDF).
§36. Kernels and semigroups of kernels

See also