Fisher transformation

Last updated
A graph of the transformation (in orange). The untransformed sample correlation coefficient is plotted on the horizontal axis, and the transformed coefficient is plotted on the vertical axis. The identity function (gray) is also shown for comparison. Fisher transformation.svg
A graph of the transformation (in orange). The untransformed sample correlation coefficient is plotted on the horizontal axis, and the transformed coefficient is plotted on the vertical axis. The identity function (gray) is also shown for comparison.

In statistics, the Fisher transformation (or Fisher z-transformation) of a Pearson correlation coefficient is its inverse hyperbolic tangent (artanh). When the sample correlation coefficient r is near 1 or -1, its distribution is highly skewed, which makes it difficult to estimate confidence intervals and apply tests of significance for the population correlation coefficient ρ. [1] [2] [3] The Fisher transformation solves this problem by yielding a variable whose distribution is approximately normally distributed, with a variance that is stable over different values of r.

Contents

Definition

Given a set of N bivariate sample pairs (Xi, Yi), i = 1, ..., N, the sample correlation coefficient r is given by

Here stands for the covariance between the variables and and stands for the standard deviation of the respective variable. Fisher's z-transformation of r is defined as

where "ln" is the natural logarithm function and "artanh" is the inverse hyperbolic tangent function.

If (X, Y) has a bivariate normal distribution with correlation ρ and the pairs (Xi, Yi) are independent and identically distributed, then z is approximately normally distributed with mean

and standard deviation

where N is the sample size, and ρ is the true correlation coefficient.

This transformation, and its inverse

can be used to construct a large-sample confidence interval for r using standard normal theory and derivations. See also application to partial correlation.

Derivation

Fisher Transformation with
r
=
0.9
{\displaystyle \rho =0.9}
and
N
=
30
{\displaystyle N=30}
. Illustrated is the exact probability density function of
r
{\displaystyle r}
(in black), together with the probability density functions of the usual Fisher transformation (blue) and that obtained by including extra terms that depend on
N
{\displaystyle N}
(red). The latter approximation is visually indistinguishable from the exact answer (its maximum error is 0.3%, compared to 3.4% of basic Fisher). Fisher Transformation.png
Fisher Transformation with and . Illustrated is the exact probability density function of (in black), together with the probability density functions of the usual Fisher transformation (blue) and that obtained by including extra terms that depend on (red). The latter approximation is visually indistinguishable from the exact answer (its maximum error is 0.3%, compared to 3.4% of basic Fisher).

Hotelling gives a concise derivation of the Fisher transformation. [4]

To derive the Fisher transformation, one starts by considering an arbitrary increasing, twice-differentiable function of , say . Finding the first term in the large- expansion of the corresponding skewness results [5] in

Setting and solving the corresponding differential equation for yields the inverse hyperbolic tangent function.

Similarly expanding the mean m and variance v of , one gets

m =

and

v =

respectively.

The extra terms are not part of the usual Fisher transformation. For large values of and small values of they represent a large improvement of accuracy at minimal cost, although they greatly complicate the computation of the inverse – a closed-form expression is not available. The near-constant variance of the transformation is the result of removing its skewness – the actual improvement is achieved by the latter, not by the extra terms. Including the extra terms, i.e., computing (z-m)/v1/2, yields:

which has, to an excellent approximation, a standard normal distribution. [6]

Calculator for the confidence belt of r-squared values (or coefficient of determination/explanation or goodness of fit). Rsquared.png
Calculator for the confidence belt of r-squared values (or coefficient of determination/explanation or goodness of fit).

Application

The application of Fisher's transformation can be enhanced using a software calculator as shown in the figure. Assuming that the r-squared value found is 0.80, that there are 30 data [ clarification needed ], and accepting a 90% confidence interval, the r-squared value in another random sample from the same population may range from 0.588 to 0.921. When r-squared is outside this range, the population is considered to be different.

Discussion

The Fisher transformation is an approximate variance-stabilizing transformation for r when X and Y follow a bivariate normal distribution. This means that the variance of z is approximately constant for all values of the population correlation coefficient ρ. Without the Fisher transformation, the variance of r grows smaller as |ρ| gets closer to 1. Since the Fisher transformation is approximately the identity function when |r| < 1/2, it is sometimes useful to remember that the variance of r is well approximated by 1/N as long as |ρ| is not too large and N is not too small. This is related to the fact that the asymptotic variance of r is 1 for bivariate normal data.

The behavior of this transform has been extensively studied since Fisher introduced it in 1915. Fisher himself found the exact distribution of z for data from a bivariate normal distribution in 1921; Gayen in 1951 [8] determined the exact distribution of z for data from a bivariate Type A Edgeworth distribution. Hotelling in 1953 calculated the Taylor series expressions for the moments of z and several related statistics [9] and Hawkins in 1989 discovered the asymptotic distribution of z for data from a distribution with bounded fourth moments. [10]

An alternative to the Fisher transformation is to use the exact confidence distribution density for ρ given by [11] [12]

where is the Gaussian hypergeometric function and .

Other uses

While the Fisher transformation is mainly associated with the Pearson product-moment correlation coefficient for bivariate normal observations, it can also be applied to Spearman's rank correlation coefficient in more general cases. [13] A similar result for the asymptotic distribution applies, but with a minor adjustment factor: see the cited article for details.

See also

Related Research Articles

<span class="mw-page-title-main">Multivariate normal distribution</span> Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value.

<span class="mw-page-title-main">Student's t-distribution</span> Probability distribution

In probability and statistics, Student's t distribution is a continuous probability distribution that generalizes the standard normal distribution. Like the latter, it is symmetric around zero and bell-shaped.

<span class="mw-page-title-main">Correlation</span> Statistical concept

In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics it usually refers to the degree to which a pair of variables are linearly related. Familiar examples of dependent phenomena include the correlation between the height of parents and their offspring, and the correlation between the price of a good and the quantity the consumers are willing to purchase, as it is depicted in the so-called demand curve.

<span class="mw-page-title-main">Covariance matrix</span> Measure of covariance of components of a random vector

In probability theory and statistics, a covariance matrix is a square matrix giving the covariance between each pair of elements of a given random vector.

<span class="mw-page-title-main">Beta distribution</span> Probability distribution

In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval [0, 1] or in terms of two positive parameters, denoted by alpha (α) and beta (β), that appear as exponents of the variable and its complement to 1, respectively, and control the shape of the distribution.

<span class="mw-page-title-main">Pearson correlation coefficient</span> Measure of linear correlation

In statistics, the Pearson correlation coefficient (PCC) is a correlation coefficient that measures linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviations; thus, it is essentially a normalized measurement of the covariance, such that the result always has a value between −1 and 1. As with covariance itself, the measure can only reflect a linear correlation of variables, and ignores many other types of relationships or correlations. As a simple example, one would expect the age and height of a sample of teenagers from a high school to have a Pearson correlation coefficient significantly greater than 0, but less than 1.

<span class="mw-page-title-main">Spearman's rank correlation coefficient</span> Nonparametric measure of rank correlation

In statistics, Spearman's rank correlation coefficient or Spearman's ρ, named after Charles Spearman and often denoted by the Greek letter (rho) or as , is a nonparametric measure of rank correlation. It assesses how well the relationship between two variables can be described using a monotonic function.

In statistics, propagation of uncertainty is the effect of variables' uncertainties on the uncertainty of a function based on them. When the variables are the values of experimental measurements they have uncertainties due to measurement limitations which propagate due to the combination of variables in the function.

<span class="mw-page-title-main">Rice distribution</span> Probability distribution

In probability theory, the Rice distribution or Rician distribution is the probability distribution of the magnitude of a circularly-symmetric bivariate normal random variable, possibly with non-zero mean (noncentral). It was named after Stephen O. Rice (1907–1986).

Probability theory and statistics have some commonly used conventions, in addition to standard mathematical notation and mathematical symbols.

In probability theory and statistics, the noncentral chi distribution is a noncentral generalization of the chi distribution. It is also known as the generalized Rayleigh distribution.

In statistics, a pivotal quantity or pivot is a function of observations and unobservable parameters such that the function's probability distribution does not depend on the unknown parameters. A pivot need not be a statistic — the function and its 'value' can depend on the parameters of the model, but its 'distribution' must not. If it is a statistic, then it is known as an 'ancillary statistic'.

In statistics, the multivariate t-distribution is a multivariate probability distribution. It is a generalization to random vectors of the Student's t-distribution, which is a distribution applicable to univariate random variables. While the case of a random matrix could be treated within this structure, the matrix t-distribution is distinct and makes particular use of the matrix structure.

In probability theory and statistics, partial correlation measures the degree of association between two random variables, with the effect of a set of controlling random variables removed. When determining the numerical relationship between two variables of interest, using their correlation coefficient will give misleading results if there is another confounding variable that is numerically related to both variables of interest. This misleading information can be avoided by controlling for the confounding variable, which is done by computing the partial correlation coefficient. This is precisely the motivation for including other right-side variables in a multiple regression; but while multiple regression gives unbiased results for the effect size, it does not give a numerical value of a measure of the strength of the relationship between the two variables of interest.

A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.

<span class="mw-page-title-main">Conway–Maxwell–Poisson distribution</span> Probability distribution

In probability theory and statistics, the Conway–Maxwell–Poisson distribution is a discrete probability distribution named after Richard W. Conway, William L. Maxwell, and Siméon Denis Poisson that generalizes the Poisson distribution by adding a parameter to model overdispersion and underdispersion. It is a member of the exponential family, has the Poisson distribution and geometric distribution as special cases and the Bernoulli distribution as a limiting case.

A product distribution is a probability distribution constructed as the distribution of the product of random variables having two other known distributions. Given two statistically independent random variables X and Y, the distribution of the random variable Z that is formed as the product is a product distribution.

In statistical inference, the concept of a confidence distribution (CD) has often been loosely referred to as a distribution function on the parameter space that can represent confidence intervals of all levels for a parameter of interest. Historically, it has typically been constructed by inverting the upper limits of lower sided confidence intervals of all levels, and it was also commonly associated with a fiducial interpretation, although it is a purely frequentist concept. A confidence distribution is NOT a probability distribution function of the parameter of interest, but may still be a function useful for making inferences.

In probability theory and statistics, the generalized multivariate log-gamma (G-MVLG) distribution is a multivariate distribution introduced by Demirhan and Hamurkaroglu in 2011. The G-MVLG is a flexible distribution. Skewness and kurtosis are well controlled by the parameters of the distribution. This enables one to control dispersion of the distribution. Because of this property, the distribution is effectively used as a joint prior distribution in Bayesian analysis, especially when the likelihood is not from the location-scale family of distributions such as normal distribution.

Bayesian quadrature is a method for approximating intractable integration problems. It falls within the class of probabilistic numerical methods. Bayesian quadrature views numerical integration as a Bayesian inference task, where function evaluations are used to estimate the integral of that function. For this reason, it is sometimes also referred to as "Bayesian probabilistic numerical integration" or "Bayesian numerical integration". The name "Bayesian cubature" is also sometimes used when the integrand is multi-dimensional. A potential advantage of this approach is that it provides probabilistic uncertainty quantification for the value of the integral.

References

  1. Fisher, R. A. (1915). "Frequency distribution of the values of the correlation coefficient in samples of an indefinitely large population". Biometrika. 10 (4): 507–521. doi:10.2307/2331838. hdl: 2440/15166 . JSTOR   2331838.
  2. Fisher, R. A. (1921). "On the 'probable error' of a coefficient of correlation deduced from a small sample" (PDF). Metron. 1: 3–32.
  3. Rick Wicklin. Fisher's transformation of the correlation coefficient. September 20, 2017. https://blogs.sas.com/content/iml/2017/09/20/fishers-transformation-correlation.html. Accessed Feb 15,2022.
  4. Hotelling, Harold (1953). "New Light on the Correlation Coefficient and its Transforms". Journal of the Royal Statistical Society, Series B (Methodological). 15 (2): 193–225. doi:10.1111/j.2517-6161.1953.tb00135.x. ISSN   0035-9246.
  5. Winterbottom, Alan (1979). "A Note on the Derivation of Fisher's Transformation of the Correlation Coefficient". The American Statistician. 33 (3): 142–143. doi:10.2307/2683819. ISSN   0003-1305. JSTOR   2683819.
  6. Vrbik, Jan (December 2005). "Population moments of sampling distributions". Computational Statistics. 20 (4): 611–621. doi:10.1007/BF02741318. S2CID   120592303.
  7. r-squared calculator
  8. Gayen, A. K. (1951). "The Frequency Distribution of the Product-Moment Correlation Coefficient in Random Samples of Any Size Drawn from Non-Normal Universes". Biometrika. 38 (1/2): 219–247. doi:10.1093/biomet/38.1-2.219. JSTOR   2332329.
  9. Hotelling, H (1953). "New light on the correlation coefficient and its transforms". Journal of the Royal Statistical Society, Series B. 15 (2): 193–225. JSTOR   2983768.
  10. Hawkins, D. L. (1989). "Using U statistics to derive the asymptotic distribution of Fisher's Z statistic". The American Statistician . 43 (4): 235–237. doi:10.2307/2685369. JSTOR   2685369.
  11. Taraldsen, Gunnar (2021). "The Confidence Density for Correlation". Sankhya A. doi: 10.1007/s13171-021-00267-y . ISSN   0976-8378. S2CID   244594067.
  12. Taraldsen, Gunnar (2020). "Confidence in Correlation". doi:10.13140/RG.2.2.23673.49769.{{cite journal}}: Cite journal requires |journal= (help)
  13. Zar, Jerrold H. (2005). "Spearman Rank Correlation: Overview". Encyclopedia of Biostatistics. doi:10.1002/9781118445112.stat05964. ISBN   9781118445112.