Variance-stabilizing transformation

Last updated September 03, 2024

In applied statistics, a variance-stabilizing transformation is a data transformation that is specifically chosen either to simplify considerations in graphical exploratory data analysis or to allow the application of simple regression-based or analysis of variance techniques.^[1]

Overview

The aim behind the choice of a variance-stabilizing transformation is to find a simple function ƒ to apply to values x in a data set to create new values $y = ƒ (x)$ such that the variability of the values y is not related to their mean value. For example, suppose that the values x are realizations from different Poisson distributions: i.e. the distributions each have different mean values μ. Then, because for the Poisson distribution the variance is identical to the mean, the variance varies with the mean. However, if the simple variance-stabilizing transformation

y={\sqrt {x}}\,

is applied, the sampling variance associated with observation will be nearly constant: see Anscombe transform for details and some alternative transformations.

While variance-stabilizing transformations are well known for certain parametric families of distributions, such as the Poisson and the binomial distribution, some types of data analysis proceed more empirically: for example by searching among power transformations to find a suitable fixed transformation. Alternatively, if data analysis suggests a functional form for the relation between variance and mean, this can be used to deduce a variance-stabilizing transformation.^[2] Thus if, for a mean μ,

\operatorname {var} (X)=h(\mu ),\,

a suitable basis for a variance stabilizing transformation would be

y\propto \int ^{x}{\frac {1}{\sqrt {h(\mu )}}}\,d\mu ,

where the arbitrary constant of integration and an arbitrary scaling factor can be chosen for convenience.

Example: relative variance

If $X$ is a positive random variable and for some constant, s, the variance is given as $h (μ) = s 2 μ 2$ then the standard deviation is proportional to the mean, which is called fixed relative error. In this case, the variance-stabilizing transformation is

y=\int ^{x}{\frac {d\mu }{\sqrt {s^{2}\mu ^{2}}}}={\frac {1}{s}}\ln(x)\propto \log(x)\,.

That is, the variance-stabilizing transformation is the logarithmic transformation.

Example: absolute plus relative variance

If the variance is given as $h (μ) = σ 2 + s 2 μ 2$ then the variance is dominated by a fixed variance $σ 2$ when $| μ |$ is small enough and is dominated by the relative variance $s 2 μ 2$ when $| μ |$ is large enough. In this case, the variance-stabilizing transformation is

y=\int ^{x}{\frac {d\mu }{\sqrt {\sigma ^{2}+s^{2}\mu ^{2}}}}={\frac {1}{s}}\operatorname {asinh} {\frac {x}{\sigma /s}}\propto \operatorname {asinh} {\frac {x}{\lambda }}\,.

That is, the variance-stabilizing transformation is the inverse hyperbolic sine of the scaled value $x / λ$ for $λ = σ / s$ .

Example: pearson correlation

The Fisher transformation is a variance stabilizing transformation for the pearson correlation coefficient.

Relationship to the delta method

Here, the delta method is presented in a rough way, but it is enough to see the relation with the variance-stabilizing transformations. To see a more formal approach see delta method.

Let $X$ be a random variable, with $E[X]=\mu$ and $\operatorname {Var} (X)=\sigma ^{2}$ . Define $Y=g(X)$ , where $g$ is a regular function. A first order Taylor approximation for $Y=g(x)$ is:

$Y=g(X)\approx g(\mu )+g'(\mu )(X-\mu )$

From the equation above, we obtain:

E[Y]\approx g(\mu )

and

\operatorname {Var} [Y]\approx \sigma ^{2}g'(\mu )^{2}

This approximation method is called delta method.

Consider now a random variable $X$ such that $E[X]=\mu$ and $\operatorname {Var} [X]=h(\mu )$ . Notice the relation between the variance and the mean, which implies, for example, heteroscedasticity in a linear model. Therefore, the goal is to find a function $g$ such that $Y=g(X)$ has a variance independent (at least approximately) of its expectation.

Imposing the condition $\operatorname {Var} [Y]\approx h(\mu )g'(\mu )^{2}={\text{constant}}$ , this equality implies the differential equation:

{\frac {dg}{d\mu }}={\frac {C}{\sqrt {h(\mu )}}}

This ordinary differential equation has, by separation of variables, the following solution:

g(\mu )=\int {\frac {C\,d\mu }{\sqrt {h(\mu )}}}

This last expression appeared for the first time in a M. S. Bartlett paper.^[3]

Related Research Articles

In probability theory and statistics, kurtosis refers to the degree of “tailedness” in the probability distribution of a real-valued random variable. Similar to skewness, kurtosis provides insight into specific characteristics of a distribution. Various methods exist for quantifying kurtosis in theoretical distributions, and corresponding techniques allow estimation based on sample data from a population. It’s important to note that different measures of kurtosis can yield varying interpretations.

In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is $The parameter is the mean or expectation of the distribution, while the parameter is the variance. The standard deviation of the distribution is . A random variable with a Gaussian distribution is said to be normally distributed, and is called a normal deviate .$

In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its mean. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range. The standard deviation is commonly used in the determination of what constitutes an outlier and what does not.

In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value. It is the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by $,,,, or .$

<span class="mw-page-title-main">Central limit theorem</span> Fundamental theorem in probability theory and statistics

In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the distribution of a normalized version of the sample mean converges to a standard normal distribution. This holds even if the original variables themselves are not normally distributed. There are several versions of the CLT, each applying in the context of different conditions.

<span class="mw-page-title-main">Log-normal distribution</span> Probability distribution

In probability theory, a log-normal (or lognormal) distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed. Thus, if the random variable $X$ is log-normally distributed, then $Y = ln(X)$ has a normal distribution. Equivalently, if $Y$ has a normal distribution, then the exponential function of $Y$ , $X = exp(Y)$ , has a log-normal distribution. A random variable which is log-normally distributed takes only positive real values. It is a convenient and useful model for measurements in exact and engineering sciences, as well as medicine, economics and other topics (e.g., energies, concentrations, lengths, prices of financial instruments, and other metrics).

In mathematics, a Gaussian function, often simply referred to as a Gaussian, is a function of the base form $and with parametric extension for arbitrary real constants a, b and non-zero c . It is named after the mathematician Carl Friedrich Gauss. The graph of a Gaussian is a characteristic symmetric "bell curve" shape. The parameter a is the height of the curve's peak, b is the position of the center of the peak, and c controls the width of the "bell".$

In probability theory and statistics, the Rayleigh distribution is a continuous probability distribution for nonnegative-valued random variables. Up to rescaling, it coincides with the chi distribution with two degrees of freedom. The distribution is named after Lord Rayleigh.

In Bayesian statistics, the Jeffreys prior is a non-informative prior distribution for a parameter space. Named after Sir Harold Jeffreys, its density function is proportional to the square root of the determinant of the Fisher information matrix:

In probability theory, it is possible to approximate the moments of a function f of a random variable X using Taylor expansions, provided that f is sufficiently differentiable and that the moments of X are finite.

In statistics, a pivotal quantity or pivot is a function of observations and unobservable parameters such that the function's probability distribution does not depend on the unknown parameters. A pivot need not be a statistic — the function and its 'value' can depend on the parameters of the model, but its 'distribution' must not. If it is a statistic, then it is known as an 'ancillary statistic'.

Covariance matrix adaptation evolution strategy (CMA-ES) is a particular kind of strategy for numerical optimization. Evolution strategies (ES) are stochastic, derivative-free methods for numerical optimization of non-linear or non-convex continuous optimization problems. They belong to the class of evolutionary algorithms and evolutionary computation. An evolutionary algorithm is broadly based on the principle of biological evolution, namely the repeated interplay of variation and selection: in each generation (iteration) new individuals are generated by variation of the current parental individuals, usually in a stochastic way. Then, some individuals are selected to become the parents in the next generation based on their fitness or objective function value $. Like this, individuals with better and better -values are generated over the generation sequence.$

In statistics, the bias of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. In statistics, "bias" is an objective property of an estimator. Bias is a distinct concept from consistency: consistent estimators converge in probability to the true value of the parameter, but may be biased or unbiased.

A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.

In probability and statistics, the Tweedie distributions are a family of probability distributions which include the purely continuous normal, gamma and inverse Gaussian distributions, the purely discrete scaled Poisson distribution, and the class of compound Poisson–gamma distributions which have positive mass at zero, but are otherwise continuous. Tweedie distributions are a special case of exponential dispersion models and are often used as distributions for generalized linear models.

In probability theory and statistics, the half-normal distribution is a special case of the folded normal distribution.

Experimental uncertainty analysis is a technique that analyses a derived quantity, based on the uncertainties in the experimentally measured quantities that are used in some form of mathematical relationship ("model") to calculate that derived quantity. The model used to convert the measurements into the derived quantity is usually based on fundamental principles of a science or engineering discipline.

A paired difference test, better known as a paired comparison, is a type of location test that is used when comparing two sets of paired measurements to assess whether their population means differ. A paired difference test is designed for situations where there is dependence between pairs of measurements. That applies in a within-subjects study design, i.e., in a study where the same set of subjects undergo both of the conditions being compared.

In statistics and probability theory, the nonparametric skew is a statistic occasionally used with random variables that take real values. It is a measure of the skewness of a random variable's distribution—that is, the distribution's tendency to "lean" to one side or the other of the mean. Its calculation does not require any knowledge of the form of the underlying distribution—hence the name nonparametric. It has some desirable properties: it is zero for any symmetric distribution; it is unaffected by a scale shift; and it reveals either left- or right-skewness equally well. In some statistical samples it has been shown to be less powerful than the usual measures of skewness in detecting departures of the population from normality.

In statistics, the variance function is a smooth function that depicts the variance of a random quantity as a function of its mean. The variance function is a measure of heteroscedasticity and plays a large role in many settings of statistical modelling. It is a main ingredient in the generalized linear model framework and a tool used in non-parametric regression, semiparametric regression and functional data analysis. In parametric modeling, variance functions take on a parametric form and explicitly describe the relationship between the variance and the mean of a random quantity. In a non-parametric setting, the variance function is assumed to be a smooth function.

References

↑ Everitt, B. S. (2002). The Cambridge Dictionary of Statistics (2nd ed.). CUP. ISBN 0-521-81099-X.
↑ Dodge, Y. (2003). The Oxford Dictionary of Statistical Terms . OUP. ISBN 0-19-920613-9.
↑ Bartlett, M. S. (1947). "The Use of Transformations". Biometrics. 3: 39–52. doi:10.2307/3001536.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Everitt, B. S. (2002). The Cambridge Dictionary of Statistics (2nd ed.). CUP. ISBN 0-521-81099-X.

[2] Dodge, Y. (2003). The Oxford Dictionary of Statistical Terms . OUP. ISBN 0-19-920613-9.

[3] Bartlett, M. S. (1947). "The Use of Transformations". Biometrics. 3: 39–52. doi:10.2307/3001536.

[1]

[2]

[3]