Power transform

Last updated

In statistics, a power transform is a family of functions applied to create a monotonic transformation of data using power functions. It is a data transformation technique used to stabilize variance, make the data more normal distribution-like, improve the validity of measures of association (such as the Pearson correlation between variables), and for other data stabilization procedures.

Contents

Power transforms are used in multiple fields, including multi-resolution and wavelet analysis, [1] statistical data analysis, medical research, modeling of physical processes, [2] geochemical data analysis, [3] epidemiology [4] and many other clinical, environmental and social research areas.

Definition

The power transformation is defined as a continuous function of power parameter λ, typically given in piece-wise form that makes it continuous at the point of singularity (λ = 0). For data vectors (y1,..., yn) in which each yi > 0, the power transform is

where

is the geometric mean of the observations y1, ..., yn. The case for is the limit as approaches 0. To see this, note that - using Taylor series. Then , and everything but becomes negligible for sufficiently small.

The inclusion of the (λ  1)th power of the geometric mean in the denominator simplifies the scientific interpretation of any equation involving , because the units of measurement do not change as λ changes.

Box and Cox (1964) introduced the geometric mean into this transformation by first including the Jacobian of rescaled power transformation

with the likelihood. This Jacobian is as follows:

This allows the normal log likelihood at its maximum to be written as follows:

From here, absorbing into the expression for produces an expression that establishes that minimizing the sum of squares of residuals from is equivalent to maximizing the sum of the normal log likelihood of deviations from and the log of the Jacobian of the transformation.

The value at Y = 1 for any λ is 0, and the derivative with respect to Y there is 1 for any λ. Sometimes Y is a version of some other variable scaled to give Y = 1 at some sort of average value.

The transformation is a power transformation, but done in such a way as to make it continuous with the parameter λ at λ = 0. It has proved popular in regression analysis, including econometrics.

Box and Cox also proposed a more general form of the transformation that incorporates a shift parameter.

which holds if yi + α > 0 for all i. If τ(Y, λ, α) follows a truncated normal distribution, then Y is said to follow a Box–Cox distribution.

Bickel and Doksum eliminated the need to use a truncated distribution by extending the range of the transformation to all y, as follows:

where sgn(.) is the sign function. This change in definition has little practical import as long as is less than , which it usually is. [5]

Bickel and Doksum also proved that the parameter estimates are consistent and asymptotically normal under appropriate regularity conditions, though the standard Cramér–Rao lower bound can substantially underestimate the variance when parameter values are small relative to the noise variance. [5] However, this problem of underestimating the variance may not be a substantive problem in many applications. [6] [7]

Box–Cox transformation

The one-parameter Box–Cox transformations are defined as

and the two-parameter Box–Cox transformations as

as described in the original article. [8] [9] Moreover, the first transformations hold for , and the second for . [8]

The parameter is estimated using the profile likelihood function and using goodness-of-fit tests. [10]

Confidence interval

Confidence interval for the Box–Cox transformation can be asymptotically constructed using Wilks's theorem on the profile likelihood function to find all the possible values of that fulfill the following restriction: [11]

Example

The BUPA liver data set [12] contains data on liver enzymes ALT and γGT. Suppose we are interested in using log(γGT) to predict ALT. A plot of the data appears in panel (a) of the figure. There appears to be non-constant variance, and a Box–Cox transformation might help.

BUPA BoxCox.JPG

The log-likelihood of the power parameter appears in panel (b). The horizontal reference line is at a distance of χ12/2 from the maximum and can be used to read off an approximate 95% confidence interval for λ. It appears as though a value close to zero would be good, so we take logs.

Possibly, the transformation could be improved by adding a shift parameter to the log transformation. Panel (c) of the figure shows the log-likelihood. In this case, the maximum of the likelihood is close to zero suggesting that a shift parameter is not needed. The final panel shows the transformed data with a superimposed regression line.

Note that although Box–Cox transformations can make big improvements in model fit, there are some issues that the transformation cannot help with. In the current example, the data are rather heavy-tailed so that the assumption of normality is not realistic and a robust regression approach leads to a more precise model.

Econometric application

Economists often characterize production relationships by some variant of the Box–Cox transformation. [13]

Consider a common representation of production Q as dependent on services provided by a capital stock K and by labor hours N:

Solving for Q by inverting the Box–Cox transformation we find

which is known as the constant elasticity of substitution (CES) production function.

The CES production function is a homogeneous function of degree one.

When λ = 1, this produces the linear production function:

When λ → 0 this produces the famous Cobb–Douglas production function:

Activities and demonstrations

The SOCR resource pages contain a number of hands-on interactive activities [14] demonstrating the Box–Cox (power) transformation using Java applets and charts. These directly illustrate the effects of this transform on Q–Q plots, X–Y scatterplots, time-series plots and histograms.

Yeo–Johnson transformation

The Yeo–Johnson transformation [15] allows also for zero and negative values of . can be any real number, where produces the identity transformation. The transformation law reads:

Notes

  1. Gao, Peisheng; Wu, Weilin (2006). "Power Quality Disturbances Classification using Wavelet and Support Vector Machines". Sixth International Conference on Intelligent Systems Design and Applications. ISDA '06. Vol. 1. Washington, DC, USA: IEEE Computer Society. pp. 201–206. doi:10.1109/ISDA.2006.217. ISBN   9780769525280. S2CID   2444503.
  2. Gluzman, S.; Yukalov, V. I. (2006-01-01). "Self-similar power transforms in extrapolation problems". Journal of Mathematical Chemistry. 39 (1): 47–56. arXiv: cond-mat/0606104 . Bibcode:2006cond.mat..6104G. doi:10.1007/s10910-005-9003-7. ISSN   1572-8897. S2CID   118965098.
  3. Howarth, R. J.; Earle, S. A. M. (1979-02-01). "Application of a generalized power transformation to geochemical data". Journal of the International Association for Mathematical Geology. 11 (1): 45–62. doi:10.1007/BF01043245. ISSN   1573-8868. S2CID   121582755.
  4. Peters, J. L.; Rushton, L.; Sutton, A. J.; Jones, D. R.; Abrams, K. R.; Mugglestone, M. A. (2005). "Bayesian methods for the cross-design synthesis of epidemiological and toxicological evidence". Journal of the Royal Statistical Society, Series C. 54: 159–172. doi:10.1111/j.1467-9876.2005.00476.x. S2CID   121909404.
  5. 1 2 Bickel, Peter J.; Doksum, Kjell A. (June 1981). "An analysis of transformations revisited". Journal of the American Statistical Association. 76 (374): 296–311. doi:10.1080/01621459.1981.10477649.
  6. Sakia, R. M. (1992), "The Box–Cox transformation technique: a review", The Statistician, 41 (2): 169–178, CiteSeerX   10.1.1.469.7176 , doi:10.2307/2348250, JSTOR   2348250
  7. Li, Fengfei (April 11, 2005), Box–Cox Transformations: An Overview (PDF) (slide presentation), Sao Paulo, Brazil: University of Sao Paulo, Brazil, retrieved 2014-11-02
  8. 1 2 Box, George E. P.; Cox, D. R. (1964). "An analysis of transformations". Journal of the Royal Statistical Society, Series B. 26 (2): 211–252. JSTOR   2984418. MR   0192611.
  9. Johnston, J. (1984). Econometric Methods (Third ed.). New York: McGraw-Hill. pp. 61–74. ISBN   978-0-07-032685-9.
  10. Asar, O.; Ilk, O.; Dag, O. (2017). "Estimating Box-Cox power transformation parameter via goodness-of-fit tests". Communications in Statistics - Simulation and Computation. 46 (1): 91–105. arXiv: 1401.3812 . doi:10.1080/03610918.2014.957839. S2CID   41501327.
  11. Abramovich, Felix; Ritov, Ya'acov (2013). Statistical Theory: A Concise Introduction. CRC Press. pp. 121–122. ISBN   978-1-4398-5184-5.
  12. BUPA liver disorder dataset
  13. Zarembka, P. (1974). "Transformation of Variables in Econometrics". Frontiers in Econometrics. New York: Academic Press. pp. 81–104. ISBN   0-12-776150-0.
  14. Power Transform Family Graphs, SOCR webpages
  15. Yeo, In-Kwon; Johnson, Richard A. (2000). "A New Family of Power Transformations to Improve Normality or Symmetry". Biometrika . 87 (4): 954–959. doi:10.1093/biomet/87.4.954. JSTOR   2673623.

Related Research Articles

<span class="mw-page-title-main">Exponential distribution</span> Probability distribution

In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the distance between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate; the distance parameter could be any meaningful mono-dimensional measure of the process, such as time between production errors, or length along a roll of fabric in the weaving manufacturing process. It is a particular case of the gamma distribution. It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless. In addition to being used for the analysis of Poisson point processes it is found in various other contexts.

<span class="mw-page-title-main">Weibull distribution</span> Continuous probability distribution

In probability theory and statistics, the Weibull distribution is a continuous probability distribution. It models a broad range of random variables, largely in the nature of a time to failure or time between events. Examples are maximum one-day rainfalls and the time a user spends on a web page.

<span class="mw-page-title-main">Beta distribution</span> Probability distribution

In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval [0, 1] or in terms of two positive parameters, denoted by alpha (α) and beta (β), that appear as exponents of the variable and its complement to 1, respectively, and control the shape of the distribution.

<span class="mw-page-title-main">Gamma distribution</span> Probability distribution

In probability theory and statistics, the gamma distribution is a versatile two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. There are two equivalent parameterizations in common use:

  1. With a shape parameter k and a scale parameter θ
  2. With a shape parameter and an inverse scale parameter , called a rate parameter.

In probability theory, a compound Poisson distribution is the probability distribution of the sum of a number of independent identically-distributed random variables, where the number of terms to be added is itself a Poisson-distributed variable. The result can be either a continuous or a discrete distribution.

<span class="mw-page-title-main">Dirichlet distribution</span> Probability distribution

In probability and statistics, the Dirichlet distribution (after Peter Gustav Lejeune Dirichlet), often denoted , is a family of continuous multivariate probability distributions parameterized by a vector of positive reals. It is a multivariate generalization of the beta distribution, hence its alternative name of multivariate beta distribution (MBD). Dirichlet distributions are commonly used as prior distributions in Bayesian statistics, and in fact, the Dirichlet distribution is the conjugate prior of the categorical distribution and multinomial distribution.

Variational Bayesian methods are a family of techniques for approximating intractable integrals arising in Bayesian inference and machine learning. They are typically used in complex statistical models consisting of observed variables as well as unknown parameters and latent variables, with various sorts of relationships among the three types of random variables, as might be described by a graphical model. As typical in Bayesian inference, the parameters and latent variables are grouped together as "unobserved variables". Variational Bayesian methods are primarily used for two purposes:

  1. To provide an analytical approximation to the posterior probability of the unobserved variables, in order to do statistical inference over these variables.
  2. To derive a lower bound for the marginal likelihood of the observed data. This is typically used for performing model selection, the general idea being that a higher marginal likelihood for a given model indicates a better fit of the data by that model and hence a greater probability that the model in question was the one that generated the data.

In statistics, Poisson regression is a generalized linear model form of regression analysis used to model count data and contingency tables. Poisson regression assumes the response variable Y has a Poisson distribution, and assumes the logarithm of its expected value can be modeled by a linear combination of unknown parameters. A Poisson regression model is sometimes known as a log-linear model, especially when used to model contingency tables.

The term Tsallis statistics usually refers to the collection of mathematical functions and associated probability distributions that were originated by Constantino Tsallis. Using that collection, it is possible to derive Tsallis distributions from the optimization of the Tsallis entropic form. A continuous real parameter q can be used to adjust the distributions, so that distributions which have properties intermediate to that of Gaussian and Lévy distributions can be created. The parameter q represents the degree of non-extensivity of the distribution. Tsallis statistics are useful for characterising complex, anomalous diffusion.

Expected shortfall (ES) is a risk measure—a concept used in the field of financial risk measurement to evaluate the market risk or credit risk of a portfolio. The "expected shortfall at q% level" is the expected return on the portfolio in the worst of cases. ES is an alternative to value at risk that is more sensitive to the shape of the tail of the loss distribution.

A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.

In probability theory and statistics, the normal-gamma distribution is a bivariate four-parameter family of continuous probability distributions. It is the conjugate prior of a normal distribution with unknown mean and precision.

In probability and statistics, the Tweedie distributions are a family of probability distributions which include the purely continuous normal, gamma and inverse Gaussian distributions, the purely discrete scaled Poisson distribution, and the class of compound Poisson–gamma distributions which have positive mass at zero, but are otherwise continuous. Tweedie distributions are a special case of exponential dispersion models and are often used as distributions for generalized linear models.

<span class="mw-page-title-main">Poisson distribution</span> Discrete probability distribution

In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known constant mean rate and independently of the time since the last event. It can also be used for the number of events in other types of intervals than time, and in dimension greater than 1.

<span class="mw-page-title-main">Multivariate stable distribution</span>

The multivariate stable distribution is a multivariate probability distribution that is a multivariate generalisation of the univariate stable distribution. The multivariate stable distribution defines linear relations between stable distribution marginals. In the same way as for the univariate case, the distribution is defined in terms of its characteristic function.

A geometric stable distribution or geo-stable distribution is a type of leptokurtic probability distribution. Geometric stable distributions were introduced in Klebanov, L. B., Maniya, G. M., and Melamed, I. A. (1985). A problem of Zolotarev and analogs of infinitely divisible and stable distributions in a scheme for summing a random number of random variables. These distributions are analogues for stable distributions for the case when the number of summands is random, independent of the distribution of summand, and having geometric distribution. The geometric stable distribution may be symmetric or asymmetric. A symmetric geometric stable distribution is also referred to as a Linnik distribution. The Laplace distribution and asymmetric Laplace distribution are special cases of the geometric stable distribution. The Mittag-Leffler distribution is also a special case of a geometric stable distribution.

<i>q</i>-exponential distribution

The q-exponential distribution is a probability distribution arising from the maximization of the Tsallis entropy under appropriate constraints, including constraining the domain to be positive. It is one example of a Tsallis distribution. The q-exponential is a generalization of the exponential distribution in the same way that Tsallis entropy is a generalization of standard Boltzmann–Gibbs entropy or Shannon entropy. The exponential distribution is recovered as

<i>q</i>-Weibull distribution

In statistics, the q-Weibull distribution is a probability distribution that generalizes the Weibull distribution and the Lomax distribution. It is one example of a Tsallis distribution.

Lambda calculus is a formal mathematical system based on lambda abstraction and function application. Two definitions of the language are given here: a standard definition, and a definition using mathematical formulas.

Response modeling methodology (RMM) is a general platform for statistical modeling of a linear/nonlinear relationship between a response variable (dependent variable) and a linear predictor (a linear combination of predictors/effects/factors/independent variables), often denoted the linear predictor function. It is generally assumed that the modeled relationship is monotone convex (delivering monotone convex function) or monotone concave (delivering monotone concave function). However, many non-monotone functions, like the quadratic equation, are special cases of the general model.

References