In mathematics and statistics, a probability vector or stochastic vector is a vector with non-negative entries that add up to one.
Underlying every probability vector is an experiment that can produce an outcome. To connect this experiment to mathematics, one introduces a discrete random variable, which is a function that assigns a numerical value to each possible outcome. For example, if the experiment consists of rolling a single die, the possible values of this random variable are the integers 1,2,…,6. The associated probability vector has six components, each representing the probability of obtaining the corresponding outcome. More generally, a probability vector of length n represents the distribution of probabilities across the n possible numerical outcomes of a random variable. [1]
The vector gives us the probability mass function of that random variable, which is the standard way of characterizing a discrete probability distribution. [2]
Here are some examples of probability vectors. The vectors can be either columns or rows. [3]
The bounds on variance show that as the number of possible outcomes increases, the variance necessarily decreases toward zero. As a result, the uncertainty associated with any single outcome increases because the components of the probability vector become more nearly equal. In empirical work, this often motivates binning the outcomes to reduce ; although this discards some information contained in the original outcomes, it allows the coarser-grained structure of the distribution to be revealed. The decrease in variance with increasing reflects the same tendency toward uniformity that underlies entropy in information theory and statistical mechanics. [7]
A simplex is the simplest geometric object that fully occupies the region of a given dimension defined by its vertices. It is constructed as the convex hull of n affinely independent points: for it is a line segment, for a triangle, for a tetrahedron, and so on.
The probability simplex (or standard simplex) is the canonical example of a simplex. It is obtained by taking the n standard basis vectors as vertices and forming their convex hull:
This is an -dimensional simplex lying on the affine hyperplane . A random variable with possible outcomes naturally lives in this -simplex rather than an -simplex, because the requirement that all probabilities sum to 1 removes one degree of freedom.
The components serve as barycentric coordinates, giving this simplex an immediate interpretation in probability theory: each vertex corresponds to a certain outcome, and each interior point represents a mixture or distribution over the n outcomes. Every possible discrete probability distribution on n outcomes corresponds to exactly one point in this simplex, and conversely each point of the simplex defines a unique distribution. Moving toward a vertex along barycentric coordinates corresponds to increasing certainty about the outcome, while movement toward the center represents increasing uncertainty resulting from a more uniform distribution.
The probability simplex serves as the canonical simplex in , since any other simplex can be obtained from it by an affine transformation, making it the standard reference for geometric and probabilistic analyses. [8] [9]
Every probability vector of dimension n lies within an (n − 1)-dimensional simplex. The convex hull of this simplex does not form a smooth, gradually changing surface in ; instead, it has sharp vertices, straight edges, and flat faces.
Assigning a zero probability to an outcome corresponds to moving onto a lower-dimensional face of the simplex, since that outcome is no longer possible.
Adding one new possible outcome to the random variable increases n by one and introduces a new orthogonal dimension. A new vertex appears in that dimension, and each face of the previous simplex combines with this vertex to form a new facet of one higher dimension. For example, when a triangle (2-simplex) gains a new vertex, connecting it to each of its three edges produces three new triangular facets, forming a tetrahedron. In the next step, adding another vertex would produce a 4-simplex, whose facets are tetrahedra.
The probability simplex lies on the affine hyperplane in . Its normal vector is with norm . Thus, all points on the hyperplane lie within the positive orthant at the same perpendicular distance from the origin. This is because the projection of any point on the hyperplane onto its normal vector is constant—by definition of a plane. The Euclidean distance from the origin to individual points on the plane varies, but the length of their perpendicular projection (the component along ) remains fixed. Affine independence means that the defining points in hyperspace are located relative to one another, not with respect to the origin as in the case of linear independence. This allows affinely independent objects to “float” relative to the origin, since their defining equation includes a constant term that specifies their offset along the normal direction. Changing this constant translates the entire object parallel to itself, preserving its internal relationships while changing its position in space.
The centroid, corresponding to the uniform distribution, is . It lies at both a Euclidean and perpendicular distance from the origin, since the line from the origin to the centroid coincides with the simplex’s normal vector. Each vertex is at an equal Euclidean distance from the centroid.