Statistical manifold

Last updated

In mathematics, a statistical manifold is a Riemannian manifold, each of whose points is a probability distribution. Statistical manifolds provide a setting for the field of information geometry. The Fisher information metric provides a metric on these manifolds. Following this definition, the log-likelihood function is a differentiable map and the score is an inclusion. [1]

Contents

Examples

The family of all normal distributions can be thought of as a 2-dimensional parametric space parametrized by the expected value μ and the variance σ2  0. Equipped with the Riemannian metric given by the Fisher information matrix, it is a statistical manifold with a geometry modeled on hyperbolic space. A way of picturing the manifold is done by inferring the parametric equations via the Fisher Information rather than starting from the likelihood-function.

A simple example of a statistical manifold, taken from physics, would be the canonical ensemble: it is a one-dimensional manifold, with the temperature T serving as the coordinate on the manifold. For any fixed temperature T, one has a probability space: so, for a gas of atoms, it would be the probability distribution of the velocities of the atoms. As one varies the temperature T, the probability distribution varies.

Another simple example, taken from medicine, would be the probability distribution of patient outcomes, in response to the quantity of medicine administered. That is, for a fixed dose, some patients improve, and some do not: this is the base probability space. If the dosage is varied, then the probability of outcomes changes. Thus, the dosage is the coordinate on the manifold. To be a smooth manifold, one would have to measure outcomes in response to arbitrarily small changes in dosage; this is not a practically realizable example, unless one has a pre-existing mathematical model of dose-response where the dose can be arbitrarily varied.

Definition

Let X be an orientable manifold, and let be a measure on X. Equivalently, let be a probability space on , with sigma algebra and probability .

The statistical manifold S(X) of X is defined as the space of all measures on X (with the sigma-algebra held fixed). Note that this space is infinite-dimensional; it is commonly taken to be a Fréchet space. The points of S(X) are measures.

Rather than dealing with an infinite-dimensional space S(X), it is common to work with a finite-dimensional submanifold, defined by considering a set of probability distributions parameterized by some smooth, continuously varying parameter . That is, one considers only those measures that are selected by the parameter. If the parameter is n-dimensional, then, in general, the submanifold will be as well. All finite-dimensional statistical manifolds can be understood in this way.[ clarification needed ]

See also

Related Research Articles

In statistics, a location parameter of a probability distribution is a scalar- or vector-valued parameter , which determines the "location" or shift of the distribution. In the literature of location parameter estimation, the probability distributions with such parameter are found to be formally defined in one of the following equivalent ways:

In the mathematical field of differential geometry, the Riemann curvature tensor or Riemann–Christoffel tensor is the most common way used to express the curvature of Riemannian manifolds. It assigns a tensor to each point of a Riemannian manifold. It is a local invariant of Riemannian metrics which measures the failure of the second covariant derivatives to commute. A Riemannian manifold has zero curvature if and only if it is flat, i.e. locally isometric to the Euclidean space. The curvature tensor can also be defined for any pseudo-Riemannian manifold, or indeed any manifold equipped with an affine connection.

In information geometry, the Fisher information metric is a particular Riemannian metric which can be defined on a smooth statistical manifold, i.e., a smooth manifold whose points are probability measures defined on a common probability space. It can be used to calculate the informational difference between measurements.

In mathematical statistics, the Fisher information is a way of measuring the amount of information that an observable random variable X carries about an unknown parameter θ of a distribution that models X. Formally, it is the variance of the score, or the expected value of the observed information.

In probability and statistics, a mixture distribution is the probability distribution of a random variable that is derived from a collection of other random variables as follows: first, a random variable is selected by chance from the collection according to given probabilities of selection, and then the value of the selected random variable is realized. The underlying random variables may be random real numbers, or they may be random vectors, in which case the mixture distribution is a multivariate distribution.

In mathematics, a Killing vector field, named after Wilhelm Killing, is a vector field on a Riemannian manifold that preserves the metric. Killing fields are the infinitesimal generators of isometries; that is, flows generated by Killing fields are continuous isometries of the manifold. More simply, the flow generates a symmetry, in the sense that moving each point of an object the same distance in the direction of the Killing vector will not distort distances on the object.

In mathematical statistics, the Kullback–Leibler divergence, denoted , is a type of statistical distance: a measure of how one probability distribution P is different from a second, reference probability distribution Q. A simple interpretation of the KL divergence of P from Q is the expected excess surprise from using Q as a model when the actual distribution is P. While it is a measure of how different two distributions are, and in some sense is thus a "distance", it is not actually a metric, which is the most familiar and formal type of distance. In particular, it is not symmetric in the two distributions, and does not satisfy the triangle inequality. Instead, in terms of information geometry, it is a type of divergence, a generalization of squared distance, and for certain classes of distributions, it satisfies a generalized Pythagorean theorem.

<span class="mw-page-title-main">Directional statistics</span>

Directional statistics is the subdiscipline of statistics that deals with directions, axes or rotations in Rn. More generally, directional statistics deals with observations on compact Riemannian manifolds including the Stiefel manifold.

In quantum field theory, a nonlinear σ model describes a scalar field Σ which takes on values in a nonlinear manifold called the target manifold T. The non-linear σ-model was introduced by Gell-Mann & Lévy, who named it after a field corresponding to a spinless meson called σ in their model. This article deals primarily with the quantization of the non-linear sigma model; please refer to the base article on the sigma model for general definitions and classical (non-quantum) formulations and results.

In Bayesian probability, the Jeffreys prior, named after Sir Harold Jeffreys, is a non-informative prior distribution for a parameter space; its density function is proportional to the square root of the determinant of the Fisher information matrix:

In statistics, a parametric model or parametric family or finite-dimensional model is a particular class of statistical models. Specifically, a parametric model is a family of probability distributions that has a finite number of parameters.

In mathematics, a π-system on a set is a collection of certain subsets of such that

In mathematics, the disintegration theorem is a result in measure theory and probability theory. It rigorously defines the idea of a non-trivial "restriction" of a measure to a measure zero subset of the measure space in question. It is related to the existence of conditional probability measures. In a sense, "disintegration" is the opposite process to the construction of a product measure.

In differential geometry, normal coordinates at a point p in a differentiable manifold equipped with a symmetric affine connection are a local coordinate system in a neighborhood of p obtained by applying the exponential map to the tangent space at p. In a normal coordinate system, the Christoffel symbols of the connection vanish at the point p, thus often simplifying local calculations. In normal coordinates associated to the Levi-Civita connection of a Riemannian manifold, one can additionally arrange that the metric tensor is the Kronecker delta at the point p, and that the first partial derivatives of the metric at p vanish.

Algorithmic inference gathers new developments in the statistical inference methods made feasible by the powerful computing devices widely available to any data analyst. Cornerstones in this field are computational learning theory, granular computing, bioinformatics, and, long ago, structural probability . The main focus is on the algorithms which compute statistics rooting the study of a random phenomenon, along with the amount of data they must feed on to produce reliable results. This shifts the interest of mathematicians from the study of the distribution laws to the functional properties of the statistics, and the interest of computer scientists from the algorithms for processing data to the information they process.

In statistics, identifiability is a property which a model must satisfy for precise inference to be possible. A model is identifiable if it is theoretically possible to learn the true values of this model's underlying parameters after obtaining an infinite number of observations from it. Mathematically, this is equivalent to saying that different values of the parameters must generate different probability distributions of the observable variables. Usually the model is identifiable only under certain technical restrictions, in which case the set of these requirements is called the identification conditions.

In information geometry, a divergence is a kind of statistical distance: a binary function which establishes the separation from one probability distribution to another on a statistical manifold.

In statistics, efficiency is a measure of quality of an estimator, of an experimental design, or of a hypothesis testing procedure. Essentially, a more efficient estimator needs fewer input data or observations than a less efficient one to achieve the Cramér–Rao bound. An efficient estimator is characterized by having the smallest possible variance, indicating that there is a small deviance between the estimated value and the "true" value in the L2 norm sense.

In probability theory, a McKean–Vlasov process is a stochastic process described by a stochastic differential equation where the coefficients of the diffusion depend on the distribution of the solution itself. The equations are a model for Vlasov equation and were first studied by Henry McKean in 1966. It is an example of propagation of chaos, in that it can be obtained as a limit of a mean-field system of interacting particles: as the number of particles tends to infinity, the interactions between any single particle and the rest of the pool will only depend on the particle itself.

Lagrangian field theory is a formalism in classical field theory. It is the field-theoretic analogue of Lagrangian mechanics. Lagrangian mechanics is used to analyze the motion of a system of discrete particles each with a finite number of degrees of freedom. Lagrangian field theory applies to continua and fields, which have an infinite number of degrees of freedom.

References

  1. Murray, Michael K.; Rice, John W. (1993). "The definition of a statistical manifold". Differential Geometry and Statistics. Chapman & Hall. pp. 76–77. ISBN   0-412-39860-5.