Giry monad

Last updated July 28, 2024

In mathematics, the Giry monad is a construction that assigns to a measurable space a space of probability measures over it, equipped with a canonical sigma-algebra.^[1]^[2]^[3]^[4]^[5] It is one of the main examples of a probability monad.

Construction

The Giry monad, like every monad, consists of three structures:^[6]^[7]^[8]

A functorial assignment, which in this case assigns to a measurable space $X$ a space of probability measures $PX$ over it;
A natural map $\delta :X\to PX$ called the unit, which in this case assigns to each element of a space the Dirac measure over it;
A natural map ${\mathcal {E}}:PPX\to PX$ called the multiplication, which in this case assigns to each probability measure over probability measures its expected value.

The space of probability measures

Let $(X,{\mathcal {F}})$ be a measurable space. Denote by $PX$ the set of probability measures over $(X,{\mathcal {F}})$ . We equip the set $PX$ with a sigma-algebra as follows. First of all, for every measurable set $A\in {\mathcal {F}}$ , define the map $\varepsilon _{A}:PX\to \mathbb {R}$ by $p\longmapsto p(A)$ . We then define the sigma algebra ${\mathcal {PF}}$ on $PX$ to be the smallest sigma-algebra which makes the maps $\varepsilon _{A}$ measurable, for all $A\in {\mathcal {F}}$ (where $\mathbb {R}$ is assumed equipped with the Borel sigma-algebra). ^[6]

Equivalently, ${\mathcal {PF}}$ can be defined as the smallest sigma-algebra on $PX$ which makes the maps

p\longmapsto \int _{X}f\,dp

measurable for all bounded measurable $f:X\to \mathbb {R}$ .^[9]

The assignment $(X,{\mathcal {F}})\mapsto (PX,{\mathcal {PF}})$ is part of an endofunctor on the category of measurable spaces, usually denoted again by $P$ . Its action on morphisms, i.e. on measurable maps, is via the pushforward of measures. Namely, given a measurable map $f:(X,{\mathcal {F}})\to (Y,{\mathcal {G}})$ , one assigns to $f$ the map $f_{*}:(PX,{\mathcal {PF}})\to (PY,{\mathcal {PG}})$ defined by

f_{*}p\,(B)=p(f^{-1}(B))

for all $p\in PX$ and all measurable sets $B\in {\mathcal {G}}$ . ^[6]

The Dirac delta map

Given a measurable space $(X,{\mathcal {F}})$ , the map ${\displaystyle \delta$ maps an element $x\in X$ to the Dirac measure $\delta _{x}\in PX$ , defined on measurable subsets $A\in {\mathcal {F}}$ by^[6]

\delta _{x}(A)=1_{A}(x)={\begin{cases}1&{\text{if }}x\in A,\\0&{\text{if }}x\notin A.\end{cases}}

The expectation map

Let $\mu \in PPX$ , i.e. a probability measure over the probability measures over $(X,{\mathcal {F}})$ . We define the probability measure ${\mathcal {E}}\mu \in PX$ by

{\mathcal {E}}\mu (A)=\int _{PX}p(A)\,\mu (dp)

for all measurable $A\in {\mathcal {F}}$ . This gives a measurable, natural map ${\mathcal {E}}:(PPX,{\mathcal {PPF}})\to (PX,{\mathcal {PF}})$ .^[6]

Example: mixture distributions

A mixture distribution, or more generally a compound distribution, can be seen as an application of the map ${\mathcal {E}}$ . Let's see this for the case of a finite mixture. Let $p_{1},\dots ,p_{n}$ be probability measures on $(X,{\mathcal {F}})$ , and consider the probability measure $q$ given by the mixture

q(A)=\sum _{i=1}^{n}w_{i}\,p_{i}(A)

for all measurable $A\in {\mathcal {F}}$ , for some weights $w_{i}\geq 0$ satisfying $w_{1}+\dots +w_{n}=1$ . We can view the mixture $q$ as the average $q={\mathcal {E}}\mu$ , where the measure on measures $\mu \in PPX$ , which in this case is discrete, is given by

\mu =\sum _{i=1}^{n}w_{i}\,\delta _{p_{i}}.

More generally, the map ${\mathcal {E}}:PPX\to PX$ can be seen as the most general, non-parametric way to form arbitrary mixture or compound distributions.

The triple $(P,\delta ,{\mathcal {E}})$ is called the Giry monad.^[1]^[2]^[3]^[4]^[5]

Relationship with Markov kernels

One of the properties of the sigma-algebra ${\mathcal {PF}}$ is that given measurable spaces $(X,{\mathcal {F}})$ and $(Y,{\mathcal {G}})$ , we have a bijective correspondence between measurable functions $(X,{\mathcal {F}})\to (PY,{\mathcal {PG}})$ and Markov kernels $(X,{\mathcal {F}})\to (Y,{\mathcal {G}})$ . This allows to view a Markov kernel, equivalently, as a measurably parametrized probability measure.^[10]

In more detail, given a measurable function $f:(X,{\mathcal {F}})\to (PY,{\mathcal {PG}})$ , one can obtain the Markov kernel $f^{\flat }:(X,{\mathcal {F}})\to (Y,{\mathcal {G}})$ as follows,

f^{\flat }(B|x)=f(x)(B)

for every $x\in X$ and every measurable $B\in {\mathcal {G}}$ (note that $f(x)\in PY$ is a probability measure). Conversely, given a Markov kernel $k:(X,{\mathcal {F}})\to (Y,{\mathcal {G}})$ , one can form the measurable function $k^{\sharp }:(X,{\mathcal {F}})\to (PY,{\mathcal {PG}})$ mapping $x\in X$ to the probability measure $k^{\sharp }(x)\in PY$ defined by

k^{\sharp }(x)(B)=k(B|x)

for every measurable $B\in {\mathcal {G}}$ . The two assignments are mutually inverse.

From the point of view of category theory, we can interpret this correspondence as an adjunction

\mathrm {Hom} _{\mathrm {Meas} }(X,PY)\cong \mathrm {Hom} _{\mathrm {Stoch} }(X,Y)

between the category of measurable spaces and the category of Markov kernels. In particular, the category of Markov kernels can be seen as the Kleisli category of the Giry monad.^[3]^[4]^[5]

Product distributions

Given measurable spaces $(X,{\mathcal {F}})$ and $(Y,{\mathcal {G}})$ , one can form the measurable space $(PX,{\mathcal {PX}})\times (PY,{\mathcal {PY}})=(X\times Y,{\mathcal {F}}\times {\mathcal {G}})$ with the product sigma-algebra, which is the product in the category of measurable spaces. Given probability measures $p\in PX$ and $q\in PY$ , one can form the product measure $p\otimes q$ on $(X\times Y,{\mathcal {F}}\times {\mathcal {G}})$ . This gives a natural, measurable map

(PX,{\mathcal {PF}})\times (PY,{\mathcal {PG}})\to {\big (}P(X\times Y),{\mathcal {P(F\times G)}}{\big )}

usually denoted by $\nabla$ or by $\otimes$ .^[4]

The map $\nabla :PX\times PY\to P(X\times Y)$ is in general not an isomorphism, since there are probability measures on $X\times Y$ which are not product distributions, for example in case of correlation. However, the maps $\nabla :PX\times PY\to P(X\times Y)$ and the isomorphism $1\cong P1$ make the Giry monad a monoidal monad, and so in particular a commutative strong monad.^[4]

Further properties

If a measurable space $(X,{\mathcal {F}})$ is standard Borel, so is $(PX,{\mathcal {PF}})$ . Therefore the Giry monad restricts to the full subcategory of standard Borel spaces.^[1]^[4]

The algebras for the Giry monad include compact convex subsets of Euclidean spaces, as well as the extended positive real line $[0,\infty ]$ , with the algebra structure map given by taking expected values.^[11] For example, for $[0,\infty ]$ , the structure map $e:P[0,\infty ]\to [0,\infty ]$ is given by

p\longmapsto \int _{[0,\infty )}x\,p(dx)

whenever

p

is supported on

[0,\infty )

and has finite expected value, and

e(p)=\infty

otherwise.

Citations

1 2 3 Giry (1982)
1 2 Avery (2016) , pp. 1231–1234
1 2 3 Jacobs (2018) , pp. 205–106
1 2 3 4 5 6 Fritz (2020) , pp. 19–23
1 2 3 Moss & Perrone (2022) , pp. 3–4
1 2 3 4 5 Giry (1982) , p. 69
↑ Riehl (2016)
↑ Perrone (2024)
↑ Perrone (2024) , p. 238
↑ Giry (1982) , p. 71
↑ Doberkat (2006) , pp. 1772–1776

Related Research Articles

In mathematics, convolution is a mathematical operation on two functions that produces a third function. The term convolution refers to both the result function and to the process of computing it. It is defined as the integral of the product of the two functions after one is reflected about the y-axis and shifted. The integral is evaluated for all values of shift, producing the convolution function. The choice of which function is reflected and shifted before the integral does not change the integral result. Graphically, it expresses how the 'shape' of one function is modified by the other.

In mathematical analysis and in probability theory, a σ-algebra on a set X is a nonempty collection Σ of subsets of X closed under complement, countable unions, and countable intersections. The ordered pair $is called a measurable space.$

In probability theory, a probability space or a probability triple $is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models the throwing of a die.$

In mathematics, the $L p$ spaces are function spaces defined using a natural generalization of the $p$ -norm for finite-dimensional vector spaces. They are sometimes called Lebesgue spaces, named after Henri Lebesgue, although according to the Bourbaki group they were first introduced by Frigyes Riesz.

In probability theory, the conditional expectation, conditional expected value, or conditional mean of a random variable is its expected value evaluated with respect to the conditional probability distribution. If the random variable can take on only a finite number of values, the "conditions" are that the variable can only take on a subset of those values. More formally, in the case when the random variable is defined over a discrete probability space, the "conditions" are a partition of this probability space.

In mathematics, mixing is an abstract concept originating from physics: the attempt to describe the irreversible thermodynamic process of mixing in the everyday world: e.g. mixing paint, mixing drinks, industrial mixing.

In measure theory, Carathéodory's extension theorem states that any pre-measure defined on a given ring of subsets R of a given set Ω can be extended to a measure on the σ-ring generated by R, and this extension is unique if the pre-measure is σ-finite. Consequently, any pre-measure on a ring containing all intervals of real numbers can be extended to the Borel algebra of the set of real numbers. This is an extremely powerful result of measure theory, and leads, for example, to the Lebesgue measure.

In probability theory, a Hunt process is a type of Markov process, named for mathematician Gilbert A. Hunt who first defined them in 1957. Hunt processes were important in the study of probabilistic potential theory until they were superseded by right processes in the 1970s.

In mathematics, ergodicity expresses the idea that a point of a moving system, either a dynamical system or a stochastic process, will eventually visit all parts of the space that the system moves in, in a uniform and random sense. This implies that the average behavior of the system can be deduced from the trajectory of a "typical" point. Equivalently, a sufficiently large collection of random samples from a process can represent the average statistical properties of the entire process. Ergodicity is a property of the system; it is a statement that the system cannot be reduced or factored into smaller components. Ergodic theory is the study of systems possessing ergodicity.

In measure theory, a pushforward measure is obtained by transferring a measure from one measurable space to another using a measurable function.

In probability theory, a random measure is a measure-valued random element. Random measures are for example used in the theory of random processes, where they form many important point processes such as Poisson point processes and Cox processes.

In probability theory, a standard probability space, also called Lebesgue–Rokhlin probability space or just Lebesgue space is a probability space satisfying certain assumptions introduced by Vladimir Rokhlin in 1940. Informally, it is a probability space consisting of an interval and/or a finite or countable number of atoms.

An $- superprocess,, within mathematics probability theory is a stochastic process on that is usually constructed as a special limit of near-critical branching diffusions.$

In probability theory, a Markov kernel is a map that in the general theory of Markov processes plays the role that the transition matrix does in the theory of Markov processes with a finite state space.

In mathematics, specifically in the theory of Markovian stochastic processes in probability theory, the Chapman–Kolmogorov equation (CKE) is an identity relating the joint probability distributions of different sets of coordinates on a stochastic process. The equation was derived independently by both the British mathematician Sydney Chapman and the Russian mathematician Andrey Kolmogorov. The CKE is prominently used in recent Variational Bayesian methods.

A Markov chain on a measurable state space is a discrete-time-homogeneous Markov chain with a measurable space as state space.

In probability theory and ergodic theory, a Markov operator is an operator on a certain function space that conserves the mass. If the underlying measurable space is topologically sufficiently rich enough, then the Markov operator admits a kernel representation. Markov operators can be linear or non-linear. Closely related to Markov operators is the Markov semigroup.

In mathematics, the category of measurable spaces, often denoted Meas, is the category whose objects are measurable spaces and whose morphisms are measurable maps. This is a category because the composition of two measurable maps is again measurable, and the identity function is measurable.

In mathematics, the category of Markov kernels, often denoted Stoch, is the category whose objects are measurable spaces and whose morphisms are Markov kernels. It is analogous to the category of sets and functions, but where the arrows can be interpreted as being stochastic.

In mathematics, especially in probability theory and ergodic theory, the invariant sigma-algebra is a sigma-algebra formed by sets which are invariant under a group action or dynamical system. It can be interpreted as of being "indifferent" to the dynamics.

References

Giry, Michèle (1982). "A categorical approach to probability theory". Categorical Aspects of Topology and Analysis. Lecture Notes in Mathematics. Vol. 915. Springer. pp. 68–85. doi:10.1007/BFb0092872. ISBN 978-3-540-11211-2.

Doberkat, Ernst-Erich (2006). "Eilenberg-Moore algebras for stochastic relations". Information and Computation. 204 (12): 1756–1781. doi:10.1016/j.ic.2006.09.001.

Avery, Tom (2016). "Codensity and the Giry monad". Journal of Pure and Applied Algebra. 220 (3): 1229–1251. arXiv: 1410.4432 . doi:10.1016/j.jpaa.2015.08.017. S2CID 119695729.

Jacobs, Bart (2018). "From probability monads to commutative effectuses". Journal of Logical and Algebraic Methods in Programming. 94: 200–237. doi:10.1016/j.jlamp.2016.11.006.

Fritz, Tobias (2020). "A synthetic approach to Markov kernels, conditional independence and theorems on sufficient statistics". Advances in Mathematics. 370. arXiv: 1908.07021 . doi:10.1016/j.aim.2020.107239. S2CID 201103837.

Moss, Sean; Perrone, Paolo (2022). "Probability monads with submonads of deterministic states". LICS '22: Proceedings of the 37th Annual ACM/IEEE Symposium on Logic in Computer Science. arXiv: 2204.07003 . doi:10.1145/3531130.3533355.

Riehl, Emily (2016). "Chapter 5. Monads and their Algebras". Category Theory in Context. Dover. ISBN 978-0486809038.

Perrone, Paolo (2024). "Chapter 5. Monads and Comonads". Starting Category Theory. World Scientific. doi:10.1142/9789811286018_0005. ISBN 978-981-12-8600-1.

External links

What is a probability monad?, video tutorial.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[giry-1] 1 2 3 Giry (1982)

[avery-2] 1 2 Avery (2016) , pp. 1231–1234

[jacobs-3] 1 2 3 Jacobs (2018) , pp. 205–106

[fritz-4] 1 2 3 4 5 6 Fritz (2020) , pp. 19–23

[moss-perrone-5] 1 2 3 Moss & Perrone (2022) , pp. 3–4

[giry-construction-6] 1 2 3 4 5 Giry (1982) , p. 69

[riehl-7] Riehl (2016)

[perrone-8] Perrone (2024)

[9] Perrone (2024) , p. 238

[10] Giry (1982) , p. 71

[11] Doberkat (2006) , pp. 1772–1776

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]