Invariant estimator

Last updated January 31, 2023

In statistics, the concept of being an invariant estimator is a criterion that can be used to compare the properties of different estimators for the same quantity. It is a way of formalising the idea that an estimator should have certain intuitively appealing qualities. Strictly speaking, "invariant" would mean that the estimates themselves are unchanged when both the measurements and the parameters are transformed in a compatible way, but the meaning has been extended to allow the estimates to change in appropriate ways with such transformations.^[1] The term equivariant estimator is used in formal mathematical contexts that include a precise description of the relation of the way the estimator changes in response to changes to the dataset and parameterisation: this corresponds to the use of "equivariance" in more general mathematics.

General setting

Background

In statistical inference, there are several approaches to estimation theory that can be used to decide immediately what estimators should be used according to those approaches. For example, ideas from Bayesian inference would lead directly to Bayesian estimators. Similarly, the theory of classical statistical inference can sometimes lead to strong conclusions about what estimator should be used. However, the usefulness of these theories depends on having a fully prescribed statistical model and may also depend on having a relevant loss function to determine the estimator. Thus a Bayesian analysis might be undertaken, leading to a posterior distribution for relevant parameters, but the use of a specific utility or loss function may be unclear. Ideas of invariance can then be applied to the task of summarising the posterior distribution. In other cases, statistical analyses are undertaken without a fully defined statistical model or the classical theory of statistical inference cannot be readily applied because the family of models being considered are not amenable to such treatment. In addition to these cases where general theory does not prescribe an estimator, the concept of invariance of an estimator can be applied when seeking estimators of alternative forms, either for the sake of simplicity of application of the estimator or so that the estimator is robust.

The concept of invariance is sometimes used on its own as a way of choosing between estimators, but this is not necessarily definitive. For example, a requirement of invariance may be incompatible with the requirement that the estimator be mean-unbiased; on the other hand, the criterion of median-unbiasedness is defined in terms of the estimator's sampling distribution and so is invariant under many transformations.

One use of the concept of invariance is where a class or family of estimators is proposed and a particular formulation must be selected amongst these. One procedure is to impose relevant invariance properties and then to find the formulation within this class that has the best properties, leading to what is called the optimal invariant estimator.

Some classes of invariant estimators

There are several types of transformations that are usefully considered when dealing with invariant estimators. Each gives rise to a class of estimators which are invariant to those particular types of transformation.

Shift invariance: Notionally, estimates of a location parameter should be invariant to simple shifts of the data values. If all data values are increased by a given amount, the estimate should change by the same amount. When considering estimation using a weighted average, this invariance requirement immediately implies that the weights should sum to one. While the same result is often derived from a requirement for unbiasedness, the use of "invariance" does not require that a mean value exists and makes no use of any probability distribution at all.
Scale invariance: Note that this topic about the invariance of the estimator scale parameter not to be confused with the more general scale invariance about the behavior of systems under aggregate properties (in physics).
Parameter-transformation invariance: Here, the transformation applies to the parameters alone. The concept here is that essentially the same inference should be made from data and a model involving a parameter θ as would be made from the same data if the model used a parameter φ, where φ is a one-to-one transformation of θ, φ=h(θ). According to this type of invariance, results from transformation-invariant estimators should also be related by φ=h(θ). Maximum likelihood estimators have this property when the transformation is monotonic. Though the asymptotic properties of the estimator might be invariant, the small sample properties can be different, and a specific distribution needs to be derived.^[2]
Permutation invariance: Where a set of data values can be represented by a statistical model that they are outcomes from independent and identically distributed random variables, it is reasonable to impose the requirement that any estimator of any property of the common distribution should be permutation-invariant: specifically that the estimator, considered as a function of the set of data-values, should not change if items of data are swapped within the dataset.

The combination of permutation invariance and location invariance for estimating a location parameter from an independent and identically distributed dataset using a weighted average implies that the weights should be identical and sum to one. Of course, estimators other than a weighted average may be preferable.

Optimal invariant estimators

Under this setting, we are given a set of measurements $x$ which contains information about an unknown parameter $\theta$ . The measurements $x$ are modelled as a vector random variable having a probability density function $f(x|\theta )$ which depends on a parameter vector $\theta$ .

The problem is to estimate $\theta$ given $x$ . The estimate, denoted by $a$ , is a function of the measurements and belongs to a set $A$ . The quality of the result is defined by a loss function $L=L(a,\theta )$ which determines a risk function $R=R(a,\theta )=E[L(a,\theta )|\theta ]$ . The sets of possible values of $x$ , $\theta$ , and $a$ are denoted by $X$ , $\Theta$ , and $A$ , respectively.

In classification

In statistical classification, the rule which assigns a class to a new data-item can be considered to be a special type of estimator. A number of invariance-type considerations can be brought to bear in formulating prior knowledge for pattern recognition.

Mathematical setting

Definition

An invariant estimator is an estimator which obeys the following two rules:^{[ citation needed ]}

Principle of Rational Invariance: The action taken in a decision problem should not depend on transformation on the measurement used
Invariance Principle: If two decision problems have the same formal structure (in terms of $X$ , $\Theta$ , $f(x|\theta )$ and $L$ ), then the same decision rule should be used in each problem.

To define an invariant or equivariant estimator formally, some definitions related to groups of transformations are needed first. Let $X$ denote the set of possible data-samples. A group of transformations of $X$ , to be denoted by $G$ , is a set of (measurable) 1:1 and onto transformations of $X$ into itself, which satisfies the following conditions:

If $g_{1}\in G$ and $g_{2}\in G$ then $g_{1}g_{2}\in G\,$
If $g\in G$ then $g^{-1}\in G$ , where $g^{-1}(g(x))=x\,.$ (That is, each transformation has an inverse within the group.)
$e\in G$ (i.e. there is an identity transformation $e(x)=x\,$ )

Datasets $x_{1}$ and $x_{2}$ in $X$ are equivalent if $x_{1}=g(x_{2})$ for some $g\in G$ . All the equivalent points form an equivalence class. Such an equivalence class is called an orbit (in $X$ ). The $x_{0}$ orbit, $X(x_{0})$ , is the set $X(x_{0})=\{g(x_{0}):g\in G\}$ . If $X$ consists of a single orbit then $g$ is said to be transitive.

A family of densities $F$ is said to be invariant under the group $G$ if, for every $g\in G$ and $\theta \in \Theta$ there exists a unique $\theta ^{*}\in \Theta$ such that $Y=g(x)$ has density $f(y|\theta ^{*})$ . $\theta ^{*}$ will be denoted ${\bar {g}}(\theta )$ .

If $F$ is invariant under the group $G$ then the loss function $L(\theta ,a)$ is said to be invariant under $G$ if for every $g\in G$ and $a\in A$ there exists an $a^{*}\in A$ such that $L(\theta ,a)=L({\bar {g}}(\theta ),a^{*})$ for all $\theta \in \Theta$ . The transformed value $a^{*}$ will be denoted by ${\tilde {g}}(a)$ .

In the above, ${\bar {G}}=\{{\bar {g}}:g\in G\}$ is a group of transformations from $\Theta$ to itself and ${\tilde {G}}=\{{\tilde {g}}:g\in G\}$ is a group of transformations from $A$ to itself.

An estimation problem is invariant(equivariant) under $G$ if there exist three groups $G,{\bar {G}},{\tilde {G}}$ as defined above.

For an estimation problem that is invariant under $G$ , estimator $\delta (x)$ is an invariant estimator under $G$ if, for all $x\in X$ and $g\in G$ ,

\delta (g(x))={\tilde {g}}(\delta (x)).

Properties

The risk function of an invariant estimator, $\delta$ , is constant on orbits of $\Theta$ . Equivalently $R(\theta ,\delta )=R({\bar {g}}(\theta ),\delta )$ for all $\theta \in \Theta$ and ${\bar {g}}\in {\bar {G}}$ .
The risk function of an invariant estimator with transitive ${\bar {g}}$ is constant.

For a given problem, the invariant estimator with the lowest risk is termed the "best invariant estimator". Best invariant estimator cannot always be achieved. A special case for which it can be achieved is the case when ${\bar {g}}$ is transitive.

Example: Location parameter

Suppose $\theta$ is a location parameter if the density of $X$ is of the form $f(x-\theta )$ . For $\Theta =A=\mathbb {R} ^{1}$ and $L=L(a-\theta )$ , the problem is invariant under $g={\bar {g}}={\tilde {g}}=\{g_{c}:g_{c}(x)=x+c,c\in \mathbb {R} \}$ . The invariant estimator in this case must satisfy

\delta (x+c)=\delta (x)+c,{\text{ for all }}c\in \mathbb {R} ,

thus it is of the form $\delta (x)=x+K$ ( $K\in \mathbb {R}$ ). ${\bar {g}}$ is transitive on $\Theta$ so the risk does not vary with $\theta$ : that is, $R(\theta ,\delta )=R(0,\delta )=\operatorname {E} [L(X+K)|\theta =0]$ . The best invariant estimator is the one that brings the risk $R(\theta ,\delta )$ to minimum.

In the case that L is the squared error $\delta (x)=x-\operatorname {E} [X|\theta =0].$

Pitman estimator

The estimation problem is that $X=(X_{1},\dots ,X_{n})$ has density $f(x_{1}-\theta ,\dots ,x_{n}-\theta )$ , where θ is a parameter to be estimated, and where the loss function is $L(|a-\theta |)$ . This problem is invariant with the following (additive) transformation groups:

G=\{g_{c}:g_{c}(x)=(x_{1}+c,\dots ,x_{n}+c),c\in \mathbb {R} ^{1}\},

{\bar {G}}=\{g_{c}:g_{c}(\theta )=\theta +c,c\in \mathbb {R} ^{1}\},

{\tilde {G}}=\{g_{c}:g_{c}(a)=a+c,c\in \mathbb {R} ^{1}\}.

The best invariant estimator $\delta (x)$ is the one that minimizes

{\frac {\int _{-\infty }^{\infty }L(\delta (x)-\theta )f(x_{1}-\theta ,\dots ,x_{n}-\theta )d\theta }{\int _{-\infty }^{\infty }f(x_{1}-\theta ,\dots ,x_{n}-\theta )d\theta }},

and this is Pitman's estimator (1939).

For the squared error loss case, the result is

\delta (x)={\frac {\int _{-\infty }^{\infty }\theta f(x_{1}-\theta ,\dots ,x_{n}-\theta )d\theta }{\int _{-\infty }^{\infty }f(x_{1}-\theta ,\dots ,x_{n}-\theta )d\theta }}.

If $x\sim N(\theta 1_{n},I)\,\!$ (i.e. a multivariate normal distribution with independent, unit-variance components) then

\delta _{\text{Pitman}}=\delta _{ML}={\frac {\sum {x_{i}}}{n}}.

If $x\sim C(\theta 1_{n},I\sigma ^{2})\,\!$ (independent components having a Cauchy distribution with scale parameter σ) then $\delta _{\text{Pitman}}\neq \delta _{ML}$ ,. However the result is

\delta _{\text{Pitman}}=\sum _{k=1}^{n}{x_{k}\left[{\frac {{\text{Re}}\{w_{k}\}}{\sum _{m=1}^{n}{{\text{Re}}\{w_{k}\}}}}\right]},\qquad n>1,

with

w_{k}=\prod _{j\neq k}\left[{\frac {1}{(x_{k}-x_{j})^{2}+4\sigma ^{2}}}\right]\left[1-{\frac {2\sigma }{(x_{k}-x_{j})}}i\right].

Related Research Articles

In statistics, a location parameter of a probability distribution is a scalar- or vector-valued parameter $, which determines the "location" or shift of the distribution. In the literature of location parameter estimation, the probability distributions with such parameter are found to be formally defined in one of the following equivalent ways:$

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-square distribution are special cases of the gamma distribution. There are two equivalent parameterizations in common use:

With a shape parameter $and a scale parameter .$
With a shape parameter $and an inverse scale parameter, called a rate parameter.$

In statistics, the Rao–Blackwell theorem, sometimes referred to as the Rao–Blackwell–Kolmogorov theorem, is a result which characterizes the transformation of an arbitrarily crude estimator into an estimator that is optimal by the mean-squared-error criterion or any of a variety of similar criteria.

In mathematics, the Poisson summation formula is an equation that relates the Fourier series coefficients of the periodic summation of a function to values of the function's continuous Fourier transform. Consequently, the periodic summation of a function is completely defined by discrete samples of the original function's Fourier transform. And conversely, the periodic summation of a function's Fourier transform is completely defined by discrete samples of the original function. The Poisson summation formula was discovered by Siméon Denis Poisson and is sometimes called Poisson resummation.

In control theory and signal processing, a linear, time-invariant system is said to be minimum-phase if the system and its inverse are causal and stable.

In statistics a minimum-variance unbiased estimator (MVUE) or uniformly minimum-variance unbiased estimator (UMVUE) is an unbiased estimator that has lower variance than any other unbiased estimator for all possible values of the parameter.

In quantum field theory, the theta vacuum is the semi-classical vacuum state of non-abelian Yang–Mills theories specified by the vacuum angleθ that arises when the state is written as a superposition of an infinite set of topologically distinct vacuum states. The dynamical effects of the vacuum are captured in the Lagrangian formalism through the presence of a θ-term which in quantum chromodynamics leads to the fine tuning problem known as the strong CP problem. It was discovered in 1976 by Curtis Callan, Roger Dashen, and David Gross, and independently by Roman Jackiw and Claudio Rebbi.

In theoretical physics, the Wess–Zumino model has become the first known example of an interacting four-dimensional quantum field theory with linearly realised supersymmetry. In 1974, Julius Wess and Bruno Zumino studied, using modern terminology, dynamics of a single chiral superfield whose cubic superpotential leads to a renormalizable theory.

In mathematics, in particular in algebraic geometry and differential geometry, Dolbeault cohomology is an analog of de Rham cohomology for complex manifolds. Let M be a complex manifold. Then the Dolbeault cohomology groups $depend on a pair of integers p and q and are realized as a subquotient of the space of complex differential forms of degree (p, q).$

In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function. Equivalently, it maximizes the posterior expectation of a utility function. An alternative way of formulating an estimator within Bayesian statistics is maximum a posteriori estimation.

Covariance matrix adaptation evolution strategy (CMA-ES) is a particular kind of strategy for numerical optimization. Evolution strategies (ES) are stochastic, derivative-free methods for numerical optimization of non-linear or non-convex continuous optimization problems. They belong to the class of evolutionary algorithms and evolutionary computation. An evolutionary algorithm is broadly based on the principle of biological evolution, namely the repeated interplay of variation and selection: in each generation (iteration) new individuals are generated by variation, usually in a stochastic way, of the current parental individuals. Then, some individuals are selected to become the parents in the next generation based on their fitness or objective function value $. Like this, over the generation sequence, individuals with better and better -values are generated.$

Stochastic approximation methods are a family of iterative methods typically used for root-finding problems or for optimization problems. The recursive update rules of stochastic approximation methods can be used, among other things, for solving linear systems when the collected data is corrupted by noise, or for approximating extreme values of functions which cannot be computed directly, but only estimated via noisy observations.

In statistical decision theory, where we are faced with the problem of estimating a deterministic parameter (vector) $from observations an estimator is called minimax if its maximal risk is minimal among all estimators of . In a sense this means that is an estimator which performs best in the worst possible case allowed in the problem.$

In mathematics, Maass forms or Maass wave forms are studied in the theory of automorphic forms. Maass forms are complex-valued smooth functions of the upper half plane, which transform in a similar way under the operation of a discrete subgroup $of as modular forms. They are Eigenforms of the hyperbolic Laplace Operator defined on and satisfy certain growth conditions at the cusps of a fundamental domain of . In contrast to the modular forms the Maass forms need not be holomorphic. They were studied first by Hans Maass in 1949.$

<span class="mw-page-title-main">Lie point symmetry</span>

Lie point symmetry is a concept in advanced mathematics. Towards the end of the nineteenth century, Sophus Lie introduced the notion of Lie group in order to study the solutions of ordinary differential equations (ODEs). He showed the following main property: the order of an ordinary differential equation can be reduced by one if it is invariant under one-parameter Lie group of point transformations. This observation unified and extended the available integration techniques. Lie devoted the remainder of his mathematical career to developing these continuous groups that have now an impact on many areas of mathematically based sciences. The applications of Lie groups to differential systems were mainly established by Lie and Emmy Noether, and then advocated by Élie Cartan.

In physics, geometrothermodynamics (GTD) is a formalism developed in 2007 by Hernando Quevedo to describe the properties of thermodynamic systems in terms of concepts of differential geometry.

Pure inductive logic (PIL) is the area of mathematical logic concerned with the philosophical and mathematical foundations of probabilistic inductive reasoning. It combines classical predicate logic and probability theory. Probability values are assigned to sentences of a first-order relational language to represent degrees of belief that should be held by a rational agent. Conditional probability values represent degrees of belief based on the assumption of some received evidence.

A Stein discrepancy is a statistical divergence between two probability measures that is rooted in Stein's method. It was first formulated as a tool to assess the quality of Markov chain Monte Carlo samplers, but has since been used in diverse settings in statistics, machine learning and computer science.

In theoretical physics, more specifically in quantum field theory and supersymmetry, supersymmetric Yang–Mills, also known as super Yang–Mills and abbreviated to SYM, is a supersymmetric generalization of Yang–Mills theory, which is a gauge theory that plays an important part in the mathematical formulation of forces in particle physics.

References

↑ see section 5.2.1 in Gourieroux, C. and Monfort, A. (1995). Statistics and econometric models, volume 1. Cambridge University Press.
↑ Gouriéroux and Monfort (1995)

Berger, James O. (1985). Statistical decision theory and Bayesian Analysis (2nd ed.). New York: Springer-Verlag. ISBN 0-387-96098-8. MR 0804611.^{[ page needed ]}
Freue, Gabriela V. Cohen (2007). "The Pitman estimator of the Cauchy location parameter". Journal of Statistical Planning and Inference. 137 (6): 1900–1913. doi:10.1016/j.jspi.2006.05.002.
Pitman, E.J.G. (1939). "The estimation of the location and scale parameters of a continuous population of any given form". Biometrika . 30 (3/4): 391–421. doi:10.1093/biomet/30.3-4.391. JSTOR 2332656.
Pitman, E.J.G. (1939). "Tests of Hypotheses Concerning Location and Scale Parameters". Biometrika . 31 (1/2): 200–215. doi:10.1093/biomet/31.1-2.200. JSTOR 2334983.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] see section 5.2.1 in Gourieroux, C. and Monfort, A. (1995). Statistics and econometric models, volume 1. Cambridge University Press.

[2] Gouriéroux and Monfort (1995)

[1]

[2]