Chapman–Robbins bound

Last updated March 15, 2023 • 2 min readFrom Wikipedia, The Free Encyclopedia

In statistics, the Chapman–Robbins bound or Hammersley–Chapman–Robbins bound is a lower bound on the variance of estimators of a deterministic parameter. It is a generalization of the Cramér–Rao bound; compared to the Cramér–Rao bound, it is both tighter and applicable to a wider range of problems. However, it is usually more difficult to compute.

Statement

Let $\Theta$ be the set of parameters for a family of probability distributions $\{\mu _{\theta }:\theta \in \Theta \}$ .

For any two $\theta ,\theta '\in \Theta$ , let $\chi ^{2}(\mu _{\theta '};\mu _{\theta })$ be the $\chi ^{2}$ -divergence from $\mu _{\theta }$ to $\mu _{\theta '}$ . Then

Theorem — Given any scalar function on the parameter ${\hat {g}}:\Theta \to \mathbb {R}$ , and any two $\theta ,\theta '\in \Theta$ , we have $\operatorname {Var} _{\theta }[{\hat {g}}]\geq \sup _{\theta '\neq \theta \in \Theta }{\frac {(E_{\theta '}[{\hat {g}}]-E_{\theta }[{\hat {g}}])^{2}}{\chi ^{2}(\mu _{\theta '};\mu _{\theta })}}$ .

A generalization to the multivariable case is^[3]

Theorem — Given any multivariate function on the model ${\hat {g}}:\Theta \to \mathbb {R} ^{m}$ , and any $\theta ,\theta '\in \Theta$ , $\chi ^{2}(\mu _{\theta '};\mu _{\theta })\geq (E_{\theta '}[{\hat {g}}]-E_{\theta }[{\hat {g}}])^{T}\operatorname {Cov} _{\theta }[{\hat {g}}]^{-1}(E_{\theta '}[{\hat {g}}]-E_{\theta }[{\hat {g}}])$

Proof

By the variational representation of chi-squared divergence,^[3]

\chi ^{2}(P;Q)=\sup _{g}{\frac {(E_{P}[g]-E_{Q}[g])^{2}}{\operatorname {Var} _{Q}[g]}}

Plug in $g={\hat {g}},P=\mu _{\theta '},Q=\mu _{\theta }$ , to obtain

\chi ^{2}(\mu _{\theta '};\mu _{\theta })\geq {\frac {(E_{\theta '}[{\hat {g}}]-E_{\theta }[{\hat {g}}])^{2}}{\operatorname {Var} _{\theta }[{\hat {g}}]}}

Switch the denominator and the left side, and take supremum over $\theta '$ to obtain the single-variate case. For the multivariate case, we define ${\textstyle h=\sum _{i=1}^{m}v_{i}{\hat {g}}_{i}}$ for any $v\neq 0\in \mathbb {R} ^{m}$ . Then plug in $g=h$ in the variational representation to obtain

\chi ^{2}(\mu _{\theta '};\mu _{\theta })\geq {\frac {(E_{\theta '}[h]-E_{\theta }[h])^{2}}{\operatorname {Var} _{\theta }[h]}}={\frac {\langle v,E_{\theta '}[{\hat {g}}]-E_{\theta }[{\hat {g}}]\rangle ^{2}}{v^{T}\operatorname {Cov} _{\theta }[{\hat {g}}]v}}

Take supremum over $v\neq 0\in \mathbb {R} ^{m}$ , using the linear algebra fact that $\sup _{v\neq 0}{\frac {v^{T}ww^{T}v}{v^{T}Mv}}=w^{T}M^{-1}w$ , we obtain the multivariate case.

Relation to Cramér–Rao bound

The expression inside the supremum in the Chapman–Robbins bound converges to the Cramér–Rao bound when $\theta '\to \theta$ , assuming the regularity conditions of the Cramér–Rao bound hold. This implies that, when both bounds exist, the Chapman–Robbins version is always at least as tight as the Cramér–Rao bound; in many cases, it is substantially tighter.

The Chapman–Robbins bound also holds under much weaker regularity conditions. For example, no assumption is made regarding differentiability of the probability density function p(x; θ). When p(x; θ) is non-differentiable, the Fisher information is not defined, and hence the Cramér–Rao bound does not exist.

Related Research Articles

In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value. Variance has a central role in statistics, where some ideas that use it include descriptive statistics, statistical inference, hypothesis testing, goodness of fit, and Monte Carlo sampling. Variance is an important tool in the sciences, where statistical analysis of data is common. The variance is the square of the standard deviation, the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by $,,,, or .$

In statistics, the likelihood-ratio test assesses the goodness of fit of two competing statistical models based on the ratio of their likelihoods, specifically one found by maximization over the entire parameter space and another found after imposing some constraint. If the constraint is supported by the observed data, the two likelihoods should not differ by more than sampling error. Thus the likelihood-ratio test tests whether this ratio is significantly different from one, or equivalently whether its natural logarithm is significantly different from zero.

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator measures the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value. MSE is a risk function, corresponding to the expected value of the squared error loss. The fact that MSE is almost always strictly positive is because of randomness or because the estimator does not account for information that could produce a more accurate estimate. In machine learning, specifically empirical risk minimization, MSE may refer to the empirical risk, as an estimate of the true MSE.

In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval [0, 1] in terms of two positive parameters, denoted by alpha (α) and beta (β), that appear as exponents of the variable and its complement to 1, respectively, and control the shape of the distribution.

In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate expectations, covariances using differentiation based on some useful algebraic properties, as well as for generality, as exponential families are in a sense very natural sets of distributions to consider. The term exponential class is sometimes used in place of "exponential family", or the older term Koopman–Darmois family. The terms "distribution" and "family" are often used loosely: specifically, an exponential family is a set of distributions, where the specific distribution varies with the parameter; however, a parametric family of distributions is often referred to as "a distribution", and the set of all exponential families is sometimes loosely referred to as "the" exponential family. They are distinct because they possess a variety of desirable properties, most importantly the existence of a sufficient statistic.

In estimation theory and statistics, the Cramér–Rao bound (CRB) expresses a lower bound on the variance of unbiased estimators of a deterministic parameter, the variance of any such estimator is at least as high as the inverse of the Fisher information. Equivalently, it expresses an upper bound on the precision of unbiased estimators: the precision of any such estimator is at most the Fisher information. The result is named in honor of Harald Cramér and C. R. Rao, but has independently also been derived by Maurice Fréchet, Georges Darmois, as well as Alexander Aitken and Harold Silverstone.

In mathematical statistics, the Fisher information is a way of measuring the amount of information that an observable random variable X carries about an unknown parameter θ of a distribution that models X. Formally, it is the variance of the score, or the expected value of the observed information.

In statistics, the Wald test assesses constraints on statistical parameters based on the weighted distance between the unrestricted estimate and its hypothesized value under the null hypothesis, where the weight is the precision of the estimate. Intuitively, the larger this weighted distance, the less likely it is that the constraint is true. While the finite sample distributions of Wald tests are generally unknown, it has an asymptotic χ²-distribution under the null hypothesis, a fact that can be used to determine statistical significance.

In econometrics and statistics, the generalized method of moments (GMM) is a generic method for estimating parameters in statistical models. Usually it is applied in the context of semiparametric models, where the parameter of interest is finite-dimensional, whereas the full shape of the data's distribution function may not be known, and therefore maximum likelihood estimation is not applicable.

In statistics, the delta method is a result concerning the approximate probability distribution for a function of an asymptotically normal statistical estimator from knowledge of the limiting variance of that estimator.

In statistics, the bias of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. In statistics, "bias" is an objective property of an estimator. Bias is a distinct concept from consistency: consistent estimators converge in probability to the true value of the parameter, but may be biased or unbiased; see bias versus consistency for more.

In probability and statistics, a natural exponential family (NEF) is a class of probability distributions that is a special case of an exponential family (EF).

A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.

In fluid dynamics, the Oseen equations describe the flow of a viscous and incompressible fluid at small Reynolds numbers, as formulated by Carl Wilhelm Oseen in 1910. Oseen flow is an improved description of these flows, as compared to Stokes flow, with the (partial) inclusion of convective acceleration.

In information theory and statistics, Kullback's inequality is a lower bound on the Kullback–Leibler divergence expressed in terms of the large deviations rate function. If P and Q are probability distributions on the real line, such that P is absolutely continuous with respect to Q, i.e. P << Q, and whose first moments exist, then

In statistics, efficiency is a measure of quality of an estimator, of an experimental design, or of a hypothesis testing procedure. Essentially, a more efficient estimator needs fewer input data or observations than a less efficient one to achieve the Cramér–Rao bound. An efficient estimator is characterized by having the smallest possible variance, indicating that there is a small deviance between the estimated value and the "true" value in the L2 norm sense.

In probability theory and statistics, the Hermite distribution, named after Charles Hermite, is a discrete probability distribution used to model count data with more than one parameter. This distribution is flexible in terms of its ability to allow a moderate over-dispersion in the data.

In statistics, the variance function is a smooth function which depicts the variance of a random quantity as a function of its mean. The variance function is a measure of heteroscedasticity and plays a large role in many settings of statistical modelling. It is a main ingredient in the generalized linear model framework and a tool used in non-parametric regression, semiparametric regression and functional data analysis. In parametric modeling, variance functions take on a parametric form and explicitly describe the relationship between the variance and the mean of a random quantity. In a non-parametric setting, the variance function is assumed to be a smooth function.

In the mathematical theory of random processes, the Markov chain central limit theorem has a conclusion somewhat similar in form to that of the classic central limit theorem (CLT) of probability theory, but the quantity in the role taken by the variance in the classic CLT has a more complicated definition. See also the general form of Bienaymé's identity.

References

↑ Hammersley, J. M. (1950), "On estimating restricted parameters", Journal of the Royal Statistical Society, Series B, 12 (2): 192–240, JSTOR 2983981, MR 0040631
↑ Chapman, D. G.; Robbins, H. (1951), "Minimum variance estimation without regularity assumptions", Annals of Mathematical Statistics , 22 (4): 581–586, doi: 10.1214/aoms/1177729548 , JSTOR 2236927, MR 0044084
1 2 Polyanskiy, Yury (2017). "Lecture notes on information theory, chapter 29, ECE563 (UIUC)" (PDF). Lecture notes on information theory. Archived (PDF) from the original on 2022-05-24. Retrieved 2022-05-24.

Chapman–Robbins bound

Contents

Statement

Proof

Relation to Cramér–Rao bound

See also

Related Research Articles

References

Further reading