Exponentially modified Gaussian distribution

Last updated
EMG
Probability density function
EMG Distribution PDF.png
Cumulative distribution function
EMG Distribution CDF.png
Parameters μR — mean of Gaussian component
σ2 > 0 — variance of Gaussian component
λ > 0 — rate of exponential component
Support xR
PDF
CDF


where

Contents

is the CDF of a Gaussian distribution
Mean
Mode

Variance
Skewness
Excess kurtosis
MGF
CF

In probability theory, an exponentially modified Gaussian distribution (EMG, also known as exGaussian distribution) describes the sum of independent normal and exponential random variables. An exGaussian random variable Z may be expressed as Z = X + Y, where X and Y are independent, X is Gaussian with mean μ and variance σ2, and Y is exponential of rate λ. It has a characteristic positive skew from the exponential component.

It may also be regarded as a weighted function of a shifted exponential with the weight being a function of the normal distribution.

Definition

The probability density function (pdf) of the exponentially modified Gaussian distribution is [1]

where erfc is the complementary error function defined as

This density function is derived via convolution of the normal and exponential probability density functions.

Alternative forms for computation

An alternative but equivalent form of the EMG distribution is used for description of peak shape in chromatography. [2] This is as follows

where

is the amplitude of Gaussian,
is exponent relaxation time, is a variance of exponential probability density function.

This function cannot be calculated for some values of parameters (for example, ) because of arithmetic overflow. Alternative, but equivalent form of writing the function was proposed by Delley: [3]

where is a scaled complementary error function

In the case of this formula arithmetic overflow is also possible, region of overflow is different from the first formula, except for very small τ.

For small τ it is reasonable to use asymptotic form of the second formula:

Decision on formula usage is made on the basis of the parameter :

for z < 0 computation should be made [2] according to the first formula,
for 0 ≤ z ≤ 6.71·107 (in the case of double-precision floating-point format) according to the second formula,
and for z > 6.71·107 according to the third formula.

Mode (position of apex, most probable value) is calculated [2] using derivative of formula 2; the inverse of scaled complementary error function erfcxinv() is used for calculation. Approximate values are also proposed by Kalambet et al. [2] Though the mode is at a value higher than that of the original Gaussian, the apex is always located on the original (unmodified) Gaussian.

Parameter estimation

There are three parameters: the mean of the normal distribution (μ), the standard deviation of the normal distribution (σ) and the exponential decay parameter (τ = 1 / λ). The shape K = τ / σ is also sometimes used to characterise the distribution. Depending on the values of the parameters, the distribution may vary in shape from almost normal to almost exponential.

The parameters of the distribution can be estimated from the sample data with the method of moments as follows: [4] [5]

where m is the sample mean, s is the sample standard deviation, and γ1 is the skewness.

Solving these for the parameters gives:

Recommendations

Ratcliff has suggested that there be at least 100 data points in the sample before the parameter estimates should be regarded as reliable. [6] Vincent averaging may be used with smaller samples, as this procedure only modestly distorts the shape of the distribution. [7] These point estimates may be used as initial values that can be refined with more powerful methods, including a least-squares optimization, which has shown to work for the Multimodal Exponentially Modified Gaussian (MEMG) case. [8] A code implementation with analytical MEMG derivatives and an optional oscillation term for sound processing is released as part of an open-source project. [9]

Confidence intervals

There are currently no published tables available for significance testing with this distribution. The distribution can be simulated by forming the sum of two random variables one drawn from a normal distribution and the other from an exponential.

Skew

The value of the nonparametric skew

of this distribution lies between 0 and 0.31. [10] [11] The lower limit is approached when the normal component dominates, and the upper when the exponential component dominates.

Occurrence

The distribution is used as a theoretical model for the shape of chromatographic peaks. [1] [2] [12] It has been proposed as a statistical model of intermitotic time in dividing cells. [13] [14] It is also used in modelling cluster ion beams. [15] It is commonly used in psychology and other brain sciences in the study of response times. [16] [17] [18] In a slight variant where the mean of the Normal component is set to zero, it is also used in Stochastic Frontier Analysis, as one of the distributional specifications for the composed error term that models inefficiency. [19] In signal processing, EMGs have been extended to the multimodal case with an optional oscillation term to represent digitized sound signals. [8]

This family of distributions is a special or limiting case of the normal-exponential-gamma distribution. This can also be seen as a three-parameter generalization of a normal distribution to add skew; another distribution like that is the skew normal distribution, which has thinner tails. The distribution is a compound probability distribution in which the mean of a normal distribution varies randomly as a shifted exponential distribution.[ citation needed ]

A Gaussian minus exponential distribution has been suggested for modelling option prices. [20] If such a random variable Y has parameters μ, σ, λ, then its negative -Y has an exponentially modified Gaussian distribution with parameters , σ, λ, and thus Y has mean and variance .

Related Research Articles

<span class="mw-page-title-main">Normal distribution</span> Probability distribution

In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is The parameter is the mean or expectation of the distribution, while the parameter is the variance. The standard deviation of the distribution is (sigma). A random variable with a Gaussian distribution is said to be normally distributed, and is called a normal deviate.

<span class="mw-page-title-main">Multivariate normal distribution</span> Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables, each of which clusters around a mean value.

<span class="mw-page-title-main">Log-normal distribution</span> Probability distribution

In probability theory, a log-normal (or lognormal) distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed. Thus, if the random variable X is log-normally distributed, then Y = ln(X) has a normal distribution. Equivalently, if Y has a normal distribution, then the exponential function of Y, X = exp(Y), has a log-normal distribution. A random variable which is log-normally distributed takes only positive real values. It is a convenient and useful model for measurements in exact and engineering sciences, as well as medicine, economics and other topics (e.g., energies, concentrations, lengths, prices of financial instruments, and other metrics).

<span class="mw-page-title-main">Student's t-distribution</span> Probability distribution

In probability theory and statistics, Student's t distribution is a continuous probability distribution that generalizes the standard normal distribution. Like the latter, it is symmetric around zero and bell-shaped.

Variational Bayesian methods are a family of techniques for approximating intractable integrals arising in Bayesian inference and machine learning. They are typically used in complex statistical models consisting of observed variables as well as unknown parameters and latent variables, with various sorts of relationships among the three types of random variables, as might be described by a graphical model. As typical in Bayesian inference, the parameters and latent variables are grouped together as "unobserved variables". Variational Bayesian methods are primarily used for two purposes:

  1. To provide an analytical approximation to the posterior probability of the unobserved variables, in order to do statistical inference over these variables.
  2. To derive a lower bound for the marginal likelihood of the observed data. This is typically used for performing model selection, the general idea being that a higher marginal likelihood for a given model indicates a better fit of the data by that model and hence a greater probability that the model in question was the one that generated the data.
<span class="mw-page-title-main">Pearson distribution</span> Family of continuous probability distributions

The Pearson distribution is a family of continuous probability distributions. It was first published by Karl Pearson in 1895 and subsequently extended by him in 1901 and 1916 in a series of articles on biostatistics.

In Bayesian statistics, the Jeffreys prior is a non-informative prior distribution for a parameter space. Named after Sir Harold Jeffreys, its density function is proportional to the square root of the determinant of the Fisher information matrix:

<span class="mw-page-title-main">Inverse Gaussian distribution</span> Family of continuous probability distributions

In probability theory, the inverse Gaussian distribution is a two-parameter family of continuous probability distributions with support on (0,∞).

Expected shortfall (ES) is a risk measure—a concept used in the field of financial risk measurement to evaluate the market risk or credit risk of a portfolio. The "expected shortfall at q% level" is the expected return on the portfolio in the worst of cases. ES is an alternative to value at risk that is more sensitive to the shape of the tail of the loss distribution.

A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.

In probability theory and statistics, the normal-gamma distribution is a bivariate four-parameter family of continuous probability distributions. It is the conjugate prior of a normal distribution with unknown mean and precision.

<span class="mw-page-title-main">Truncated normal distribution</span> Type of probability distribution

In probability and statistics, the truncated normal distribution is the probability distribution derived from that of a normally distributed random variable by bounding the random variable from either below or above. The truncated normal distribution has wide applications in statistics and econometrics.

In financial mathematics, tail value at risk (TVaR), also known as tail conditional expectation (TCE) or conditional tail expectation (CTE), is a risk measure associated with the more general value at risk. It quantifies the expected value of the loss given that an event outside a given probability level has occurred.

In probability and statistics, the Tweedie distributions are a family of probability distributions which include the purely continuous normal, gamma and inverse Gaussian distributions, the purely discrete scaled Poisson distribution, and the class of compound Poisson–gamma distributions which have positive mass at zero, but are otherwise continuous. Tweedie distributions are a special case of exponential dispersion models and are often used as distributions for generalized linear models.

In probability and statistics, the class of exponential dispersion models (EDM), also called exponential dispersion family (EDF), is a set of probability distributions that represents a generalisation of the natural exponential family. Exponential dispersion models play an important role in statistical theory, in particular in generalized linear models because they have a special structure which enables deductions to be made about appropriate statistical inference.

<span class="mw-page-title-main">Normal-inverse-gamma distribution</span>

In probability theory and statistics, the normal-inverse-gamma distribution is a four-parameter family of multivariate continuous probability distributions. It is the conjugate prior of a normal distribution with unknown mean and variance.

In probability and statistics, a compound probability distribution is the probability distribution that results from assuming that a random variable is distributed according to some parametrized distribution, with the parameters of that distribution themselves being random variables. If the parameter is a scale parameter, the resulting mixture is also called a scale mixture.

In probability and statistics, the skewed generalized "t" distribution is a family of continuous probability distributions. The distribution was first introduced by Panayiotis Theodossiou in 1998. The distribution has since been used in different applications. There are different parameterizations for the skewed generalized t distribution.

<span class="mw-page-title-main">Asymmetric Laplace distribution</span> Continuous probability distribution

In probability theory and statistics, the asymmetric Laplace distribution (ALD) is a continuous probability distribution which is a generalization of the Laplace distribution. Just as the Laplace distribution consists of two exponential distributions of equal scale back-to-back about x = m, the asymmetric Laplace consists of two exponential distributions of unequal scale back to back about x = m, adjusted to assure continuity and normalization. The difference of two variates exponentially distributed with different means and rate parameters will be distributed according to the ALD. When the two rate parameters are equal, the difference will be distributed according to the Laplace distribution.

A mixed Poisson distribution is a univariate discrete probability distribution in stochastics. It results from assuming that the conditional distribution of a random variable, given the value of the rate parameter, is a Poisson distribution, and that the rate parameter itself is considered as a random variable. Hence it is a special case of a compound probability distribution. Mixed Poisson distributions can be found in actuarial mathematics as a general approach for the distribution of the number of claims and is also examined as an epidemiological model. It should not be confused with compound Poisson distribution or compound Poisson process.

References

  1. 1 2 Grushka, Eli (1972). "Characterization of Exponentially Modified Gaussian Peaks in Chromatography". Analytical Chemistry. 44 (11): 1733–1738. doi:10.1021/ac60319a011. PMID   22324584.
  2. 1 2 3 4 5 Kalambet, Y.; Kozmin, Y.; Mikhailova, K.; Nagaev, I.; Tikhonov, P. (2011). "Reconstruction of chromatographic peaks using the exponentially modified Gaussian function". Journal of Chemometrics. 25 (7): 352. doi:10.1002/cem.1343. S2CID   121781856.
  3. Delley, R (1985). "Series for the Exponentially Modified Gaussian Peak Shape". Anal. Chem. 57: 388. doi:10.1021/ac00279a094.
  4. Dyson, N. A. (1998). Chromatographic Integration Methods. Royal Society of Chemistry, Information Services. p. 27. ISBN   9780854045105 . Retrieved 2015-05-15.
  5. Olivier J. and Norberg M. M. (2010) Positively skewed data: Revisiting the Box−Cox power transformation. Int. J. Psych. Res. 3 (1) 68−75.
  6. Ratcliff, R (1979). "Group reaction time distributions and an analysis of distribution statistics". Psychol. Bull. 86 (3): 446–461. CiteSeerX   10.1.1.409.9863 . doi:10.1037/0033-2909.86.3.446. PMID   451109.
  7. Vincent, S. B. (1912). "The functions of the vibrissae in the behaviour of the white rat". Animal Behaviour Monographs. 1 (5): 7–81.
  8. 1 2 Hahne, C. (2022). "Multimodal Exponentially Modified Gaussian Oscillators". IEEE International Ultrasonic Symposium 2022 (IUS): 1–4. arXiv: 2209.12202 .
  9. "MEMG on GitHub". GitHub .
  10. Heathcote, A (1996). "RTSYS: A DOS application for the analysis of reaction time data". Behavior Research Methods, Instruments, & Computers. 28 (3): 427–445. doi: 10.3758/bf03200523 . hdl: 1959.13/28044 .
  11. Ulrich, R.; Miller, J. (1994). "Effects of outlier exclusion on reaction time analysis". J. Exp. Psych.: General. 123 (1): 34–80. doi:10.1037/0096-3445.123.1.34. PMID   8138779.
  12. Gladney, HM; Dowden, BF; Swalen, JD (1969). "Computer-Assisted Gas-Liquid Chromatography". Anal. Chem. 41 (7): 883–888. doi:10.1021/ac60276a013.
  13. Golubev, A. (2010). "Exponentially modified Gaussian (EMG) relevance to distributions related to cell proliferation and differentiation". Journal of Theoretical Biology. 262 (2): 257–266. Bibcode:2010JThBi.262..257G. doi:10.1016/j.jtbi.2009.10.005. PMID   19825376.
  14. Tyson, D. R.; Garbett, S. P.; Frick, P. L.; Quaranta, V. (2012). "Fractional proliferation: A method to deconvolve cell population dynamics from single-cell data". Nature Methods. 9 (9): 923–928. doi:10.1038/nmeth.2138. PMC   3459330 . PMID   22886092.
  15. Nicolaescu, D.; Takaoka, G. H.; Ishikawa, J. (2006). "Multiparameter characterization of cluster ion beams". Journal of Vacuum Science & Technology B: Microelectronics and Nanometer Structures. 24 (5): 2236. Bibcode:2006JVSTB..24.2236N. doi:10.1116/1.2335433.
  16. Palmer, EM; Horowitz Todd, S; Torralba, A; Wolfe, JM (2011). "What are the shapes of response time distributions in visual search?". J Exp Psychol. 37 (1): 58–71. doi:10.1037/a0020747. PMC   3062635 . PMID   21090905.
  17. Rohrer, D; Wixted, JT (1994). "An analysis of latency and interresponse time in free recall". Memory & Cognition. 22 (5): 511–524. doi: 10.3758/BF03198390 . PMID   7968547.
  18. Soltanifar, M; Escobar, M; Dupuis, A; Schachar, R (2021). "A Bayesian Mixture Modelling of Stop Signal Reaction Time Distributions: The Second Contextual Solution for the Problem of Aftereffects of Inhibition on SSRT Estimations". Brain Sciences. 11 (9): 1–26. doi: 10.3390/brainsci11081102 . PMC   8391500 . PMID   34439721.
  19. Lovell, Knox CA; S.C. Kumbhakar (2000). Stochastic Frontier Analysis. Cambridge University Press. pp. 80–82. ISBN   0-521-48184-8.
  20. Peter Carr and Dilip B. Madan, Saddlepoint Methods for Option Pricing, The Journal of Computational Finance (49–61) Volume 13/Number 1, Fall 2009