Unimodality

Last updated

In mathematics, unimodality means possessing a unique mode. More generally, unimodality means there is only a single highest value, somehow defined, of some mathematical object. [1]

Contents

Unimodal probability distribution

Figure 1. Probability density function of normal distributions, an example of unimodal distribution. Normal distribution pdf.svg
Figure 1. Probability density function of normal distributions, an example of unimodal distribution.
Figure 2. A simple bimodal distribution. Bimodal.png
Figure 2. A simple bimodal distribution.
Figure 3. A bimodal distribution. Note that only the largest peak would correspond to a mode in the strict sense of the definition of mode Bimodal geological.PNG
Figure 3. A bimodal distribution. Note that only the largest peak would correspond to a mode in the strict sense of the definition of mode

In statistics, a unimodal probability distribution or unimodal distribution is a probability distribution which has a single peak. The term "mode" in this context refers to any peak of the distribution, not just to the strict definition of mode which is usual in statistics.

If there is a single mode, the distribution function is called "unimodal". If it has more modes it is "bimodal" (2), "trimodal" (3), etc., or in general, "multimodal". [2] Figure 1 illustrates normal distributions, which are unimodal. Other examples of unimodal distributions include Cauchy distribution, Student's t-distribution, chi-squared distribution and exponential distribution. Among discrete distributions, the binomial distribution and Poisson distribution can be seen as unimodal, though for some parameters they can have two adjacent values with the same probability.

Figure 2 and Figure 3 illustrate bimodal distributions.

Other definitions

Other definitions of unimodality in distribution functions also exist.

In continuous distributions, unimodality can be defined through the behavior of the cumulative distribution function (cdf). [3] If the cdf is convex for x < m and concave for x > m, then the distribution is unimodal, m being the mode. Note that under this definition the uniform distribution is unimodal, [4] as well as any other distribution in which the maximum distribution is achieved for a range of values, e.g. trapezoidal distribution. Usually this definition allows for a discontinuity at the mode; usually in a continuous distribution the probability of any single value is zero, while this definition allows for a non-zero probability, or an "atom of probability", at the mode.

Criteria for unimodality can also be defined through the characteristic function of the distribution [3] or through its Laplace–Stieltjes transform. [5]

Another way to define a unimodal discrete distribution is by the occurrence of sign changes in the sequence of differences of the probabilities. [6] A discrete distribution with a probability mass function, , is called unimodal if the sequence has exactly one sign change (when zeroes don't count).

Uses and results

One reason for the importance of distribution unimodality is that it allows for several important results. Several inequalities are given below which are only valid for unimodal distributions. Thus, it is important to assess whether or not a given data set comes from a unimodal distribution. Several tests for unimodality are given in the article on multimodal distribution.

Inequalities

Gauss's inequality

A first important result is Gauss's inequality. [7] Gauss's inequality gives an upper bound on the probability that a value lies more than any given distance from its mode. This inequality depends on unimodality.

Vysochanskiï–Petunin inequality

A second is the Vysochanskiï–Petunin inequality, [8] a refinement of the Chebyshev inequality. The Chebyshev inequality guarantees that in any probability distribution, "nearly all" the values are "close to" the mean value. The Vysochanskiï–Petunin inequality refines this to even nearer values, provided that the distribution function is continuous and unimodal. Further results were shown by Sellke and Sellke. [9]

Mode, median and mean

Gauss also showed in 1823 that for a unimodal distribution [10]

and

where the median is ν, the mean is μ and ω is the root mean square deviation from the mode.

It can be shown for a unimodal distribution that the median ν and the mean μ lie within (3/5)1/2 ≈ 0.7746 standard deviations of each other. [11] In symbols,

where | . | is the absolute value.

In 2020, Bernard, Kazzi, and Vanduffel generalized the previous inequality by deriving the maximum distance between the symmetric quantile average and the mean, [12]

The maximum distance is minimized at (i.e., when the symmetric quantile average is equal to ), which indeed motivates the common choice of the median as a robust estimator for the mean. Moreover, when , the bound is equal to , which is the maximum distance between the median and the mean of a unimodal distribution.

A similar relation holds between the median and the mode θ: they lie within 31/2 ≈ 1.732 standard deviations of each other:

It can also be shown that the mean and the mode lie within 31/2 of each other:

Skewness and kurtosis

Rohatgi and Szekely claimed that the skewness and kurtosis of a unimodal distribution are related by the inequality: [13]

where κ is the kurtosis and γ is the skewness. Klaassen, Mokveld, and van Es showed that this only applies in certain settings, such as the set of unimodal distributions where the mode and mean coincide. [14]

They derived a weaker inequality which applies to all unimodal distributions: [14]

This bound is sharp, as it is reached by the equal-weights mixture of the uniform distribution on [0,1] and the discrete distribution at {0}.

Unimodal function

As the term "modal" applies to data sets and probability distribution, and not in general to functions, the definitions above do not apply. The definition of "unimodal" was extended to functions of real numbers as well.

A common definition is as follows: a function f(x) is a unimodal function if for some value m, it is monotonically increasing for x  m and monotonically decreasing for x  m. In that case, the maximum value of f(x) is f(m) and there are no other local maxima.

Proving unimodality is often hard. One way consists in using the definition of that property, but it turns out to be suitable for simple functions only. A general method based on derivatives exists, [15] but it does not succeed for every function despite its simplicity.

Examples of unimodal functions include quadratic polynomial functions with a negative quadratic coefficient, tent map functions, and more.

The above is sometimes related to as strong unimodality, from the fact that the monotonicity implied is strong monotonicity. A function f(x) is a weakly unimodal function if there exists a value m for which it is weakly monotonically increasing for x  m and weakly monotonically decreasing for x  m. In that case, the maximum value f(m) can be reached for a continuous range of values of x. An example of a weakly unimodal function which is not strongly unimodal is every other row in Pascal's triangle.

Depending on context, unimodal function may also refer to a function that has only one local minimum, rather than maximum. [16] For example, local unimodal sampling, a method for doing numerical optimization, is often demonstrated with such a function. It can be said that a unimodal function under this extension is a function with a single local extremum.

One important property of unimodal functions is that the extremum can be found using search algorithms such as golden section search, ternary search or successive parabolic interpolation. [17]

Other extensions

A function f(x) is "S-unimodal" (often referred to as "S-unimodal map") if its Schwarzian derivative is negative for all , where is the critical point. [18]

In computational geometry if a function is unimodal it permits the design of efficient algorithms for finding the extrema of the function. [19]

A more general definition, applicable to a function f(X) of a vector variable X is that f is unimodal if there is a one-to-one differentiable mapping X = G(Z) such that f(G(Z)) is convex. Usually one would want G(Z) to be continuously differentiable with nonsingular Jacobian matrix.

Quasiconvex functions and quasiconcave functions extend the concept of unimodality to functions whose arguments belong to higher-dimensional Euclidean spaces.

See also

Related Research Articles

In statistics, a central tendency is a central or typical value for a probability distribution.

<span class="mw-page-title-main">Median</span> Middle quantile of a data set or probability distribution

The median of a set of numbers is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as the “middle" value. The basic feature of the median in describing data compared to the mean is that it is not skewed by a small proportion of extremely large or small values, and therefore provides a better representation of the center. Median income, for example, may be a better way to describe the center of the income distribution because increases in the largest incomes alone have no effect on the median. For this reason, the median is of central importance in robust statistics.

<span class="mw-page-title-main">Normal distribution</span> Probability distribution

In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is The parameter is the mean or expectation of the distribution, while the parameter is the variance. The standard deviation of the distribution is (sigma). A random variable with a Gaussian distribution is said to be normally distributed, and is called a normal deviate.

<span class="mw-page-title-main">Skewness</span> Measure of the asymmetry of random variables

In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined.

<span class="mw-page-title-main">Student's t-distribution</span> Probability distribution

In probability theory and statistics, Student's t distribution is a continuous probability distribution that generalizes the standard normal distribution. Like the latter, it is symmetric around zero and bell-shaped.

<span class="mw-page-title-main">Chi-squared distribution</span> Probability distribution and special case of gamma distribution

In probability theory and statistics, the chi-squared distribution with degrees of freedom is the distribution of a sum of the squares of independent standard normal random variables.

In probability theory, Chebyshev's inequality provides an upper bound on the probability of deviation of a random variable from its mean. More specifically, the probability that a random variable deviates from its mean by more than is at most , where is any positive constant and is the standard deviation.

<span class="mw-page-title-main">Beta distribution</span> Probability distribution

In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval [0, 1] or in terms of two positive parameters, denoted by alpha (α) and beta (β), that appear as exponents of the variable and its complement to 1, respectively, and control the shape of the distribution.

<span class="mw-page-title-main">Gamma distribution</span> Probability distribution

In probability theory and statistics, the gamma distribution is a versatile two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. There are two equivalent parameterizations in common use:

  1. With a shape parameter α and a scale parameter θ
  2. With a shape parameter and a rate parameter

In probability theory, the Vysochanskij–Petunin inequality gives a lower bound for the probability that a random variable with finite variance lies within a certain number of standard deviations of the variable's mean, or equivalently an upper bound for the probability that it lies further away. The sole restrictions on the distribution are that it be unimodal and have finite variance; here unimodal implies that it is a continuous probability distribution except at the mode, which may have a non-zero probability.

<span class="mw-page-title-main">Multimodal distribution</span> Probability distribution with more than one mode

In statistics, a multimodaldistribution is a probability distribution with more than one mode. These appear as distinct peaks in the probability density function, as shown in Figures 1 and 2. Categorical, continuous, and discrete data can all form multimodal distributions. Among univariate analyses, multimodal distributions are commonly bimodal.

In statistics, the mode is the value that appears most often in a set of data values. If X is a discrete random variable, the mode is the value x at which the probability mass function takes its maximum value. In other words, it is the value that is most likely to be sampled.

Noncentral <i>t</i>-distribution Probability distribution

The noncentral t-distribution generalizes Student's t-distribution using a noncentrality parameter. Whereas the central probability distribution describes how a test statistic t is distributed when the difference tested is null, the noncentral distribution describes how t is distributed when the null is false. This leads to its use in statistics, especially calculating statistical power. The noncentral t-distribution is also known as the singly noncentral t-distribution, and in addition to its primary use in statistical inference, is also used in robust modeling for data.

Expected shortfall (ES) is a risk measure—a concept used in the field of financial risk measurement to evaluate the market risk or credit risk of a portfolio. The "expected shortfall at q% level" is the expected return on the portfolio in the worst of cases. ES is an alternative to value at risk that is more sensitive to the shape of the tail of the loss distribution.

<span class="mw-page-title-main">Quantile function</span> Statistical function that defines the quantiles of a probability distribution

In probability and statistics, the quantile function outputs the value of a random variable such that its probability is less than or equal to an input probability value. Intuitively, the quantile function associates with a range at and below a probability input the likelihood that a random variable is realized in that range for some probability distribution. It is also called the percentile function, percent-point function, inverse cumulative distribution function or inverse distribution function.

In probability and statistics, the Hellinger distance is used to quantify the similarity between two probability distributions. It is a type of f-divergence. The Hellinger distance is defined in terms of the Hellinger integral, which was introduced by Ernst Hellinger in 1909.

The Birnbaum–Saunders distribution, also known as the fatigue life distribution, is a probability distribution used extensively in reliability applications to model failure times. There are several alternative formulations of this distribution in the literature. It is named after Z. W. Birnbaum and S. C. Saunders.

In probability theory, Gauss's inequality gives an upper bound on the probability that a unimodal random variable lies more than any given distance from its mode.

<span class="mw-page-title-main">Poisson distribution</span> Discrete probability distribution

In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known constant mean rate and independently of the time since the last event. It can also be used for the number of events in other types of intervals than time, and in dimension greater than 1.

In statistics and probability theory, the nonparametric skew is a statistic occasionally used with random variables that take real values. It is a measure of the skewness of a random variable's distribution—that is, the distribution's tendency to "lean" to one side or the other of the mean. Its calculation does not require any knowledge of the form of the underlying distribution—hence the name nonparametric. It has some desirable properties: it is zero for any symmetric distribution; it is unaffected by a scale shift; and it reveals either left- or right-skewness equally well. In some statistical samples it has been shown to be less powerful than the usual measures of skewness in detecting departures of the population from normality.

References

  1. Weisstein, Eric W. "Unimodal". MathWorld .
  2. Weisstein, Eric W. "Mode". MathWorld .
  3. 1 2 A.Ya. Khinchin (1938). "On unimodal distributions". Trams. Res. Inst. Math. Mech. (in Russian). 2 (2). University of Tomsk: 1–7.
  4. Ushakov, N.G. (2001) [1994], "Unimodal distribution", Encyclopedia of Mathematics , EMS Press
  5. Vladimirovich Gnedenko and Victor Yu Korolev (1996). Random summation: limit theorems and applications. CRC-Press. ISBN   0-8493-2875-6. p. 31
  6. Medgyessy, P. (March 1972). "On the unimodality of discrete distributions". Periodica Mathematica Hungarica. 2 (1–4): 245–257. doi:10.1007/bf02018665. S2CID   119817256.
  7. Gauss, C. F. (1823). "Theoria Combinationis Observationum Erroribus Minimis Obnoxiae, Pars Prior". Commentationes Societatis Regiae Scientiarum Gottingensis Recentiores. 5.
  8. D. F. Vysochanskij, Y. I. Petunin (1980). "Justification of the 3σ rule for unimodal distributions". Theory of Probability and Mathematical Statistics. 21: 25–36.
  9. Sellke, T.M.; Sellke, S.H. (1997). "Chebyshev inequalities for unimodal distributions". American Statistician . 51 (1). American Statistical Association: 34–40. doi:10.2307/2684690. JSTOR   2684690.
  10. Gauss C.F. Theoria Combinationis Observationum Erroribus Minimis Obnoxiae. Pars Prior. Pars Posterior. Supplementum. Theory of the Combination of Observations Least Subject to Errors. Part One. Part Two. Supplement. 1995. Translated by G.W. Stewart. Classics in Applied Mathematics Series, Society for Industrial and Applied Mathematics, Philadelphia
  11. Basu, S.; Dasgupta, A. (1997). "The Mean, Median, and Mode of Unimodal Distributions: A Characterization". Theory of Probability & Its Applications. 41 (2): 210–223. doi:10.1137/S0040585X97975447.
  12. Bernard, Carole; Kazzi, Rodrigue; Vanduffel, Steven (2020). "Range Value-at-Risk bounds for unimodal distributions under partial information". Insurance: Mathematics and Economics. 94: 9–24. doi: 10.1016/j.insmatheco.2020.05.013 .
  13. Rohatgi, Vijay K.; Székely, Gábor J. (1989). "Sharp inequalities between skewness and kurtosis". Statistics & Probability Letters. 8 (4): 297–299. doi:10.1016/0167-7152(89)90035-7.
  14. 1 2 Klaassen, Chris A.J.; Mokveld, Philip J.; Van Es, Bert (2000). "Squared skewness minus kurtosis bounded by 186/125 for unimodal distributions". Statistics & Probability Letters. 50 (2): 131–135. doi:10.1016/S0167-7152(00)00090-0.
  15. "On the unimodality of METRIC Approximation subject to normally distributed demands" (PDF). Method in appendix D, Example in theorem 2 page 5. Retrieved 2013-08-28.
  16. "Mathematical Programming Glossary" . Retrieved 2020-03-29.
  17. Demaine, Erik D.; Langerman, Stefan (2005). "Optimizing a 2D Function Satisfying Unimodality Properties". In Brodal, Gerth Stølting; Leonardi, Stefano (eds.). Algorithms – ESA 2005. Lecture Notes in Computer Science. Vol. 3669. Berlin, Heidelberg: Springer. pp. 887–898. doi:10.1007/11561071_78. ISBN   978-3-540-31951-1.
  18. See e.g. John Guckenheimer; Stewart Johnson (July 1990). "Distortion of S-Unimodal Maps". Annals of Mathematics. Second Series. 132 (1): 71–130. doi:10.2307/1971501. JSTOR   1971501.
  19. Godfried T. Toussaint (June 1984). "Complexity, convexity, and unimodality". International Journal of Computer and Information Sciences. 13 (3): 197–217. doi:10.1007/bf00979872. S2CID   11577312.