Let A be a set of probability distributions over an alphabet X, and let q be an arbitrary distribution over X (where q may or may not be in A). Suppose we draw n i.i.d. samples from q, represented by the vector . Then, we have the following bound on the probability that the empirical measure of the samples falls within the set A:
In words, the probability of drawing an atypical distribution is bounded by a function of the KL divergence from the true distribution to the atypical one; in the case that we consider a set of possible atypical distributions, there is a dominant atypical distribution, given by the information projection.
Furthermore, if A is a closed set, then
Technical statement
Define:
is a finite set with size . Understood as “alphabet”.
is the simplex spanned by the alphabet. It is a subset of .
is a random variable taking values in . Take samples from the distribution , then is the frequency probability vector for the sample.
is the space of values that can take. In other words, it is
Sanov, I. N. (1957) "On the probability of large deviations of random variables". Mat. Sbornik 42(84), No. 1, 11–44.
Санов, И. Н. (1957) "О вероятности больших отклонений случайных величин". МАТЕМАТИЧЕСКИЙ СБОРНИК' 42(84), No. 1, 11–44.
This page is based on this Wikipedia article Text is available under the CC BY-SA 4.0 license; additional terms may apply. Images, videos and audio are available under their respective licenses.