Contraharmonic mean

Last updated

In mathematics, a contraharmonic mean is a function complementary to the harmonic mean. The contraharmonic mean is a special case of the Lehmer mean, , where p = 2.

Contents

Definition

The contraharmonic mean of a set of positive numbers is defined as the arithmetic mean of the squares of the numbers divided by the arithmetic mean of the numbers:

Properties

It is easy to show that this satisfies the characteristic properties of a mean of some list of values :

The first property implies the fixed point property, that for all k > 0,

C(k, k, ..., k) = k

The contraharmonic mean is higher in value than the arithmetic mean and also higher than the root mean square:

where x is a list of values, H is the harmonic mean, G is geometric mean, L is the logarithmic mean, A is the arithmetic mean, R is the root mean square and C is the contraharmonic mean. Unless all values of x are the same, the ≤ signs above can be replaced by <.

The name contraharmonic may be due to the fact that when taking the mean of only two variables, the contraharmonic mean is as high above the arithmetic mean as the arithmetic mean is above the harmonic mean (i.e., the arithmetic mean of the two variables is equal to the arithmetic mean of their harmonic and contraharmonic means).

Two-variable formulae

From the formulas for the arithmetic mean and harmonic mean of two variables we have:

Notice that for two variables the average of the harmonic and contraharmonic means is exactly equal to the arithmetic mean:

A(H(a, b), C(a, b)) = A(a, b)

As a gets closer to 0 then H(a, b) also gets closer to 0. The harmonic mean is very sensitive to low values. On the other hand, the contraharmonic mean is sensitive to larger values, so as a approaches 0 then C(a, b) approaches b (so their average remains A(a, b)).

There are two other notable relationships between 2-variable means. First, the geometric mean of the arithmetic and harmonic means is equal to the geometric mean of the two values:

The second relationship is that the geometric mean of the arithmetic and contraharmonic means is the root mean square:

The contraharmonic mean of two variables can be constructed geometrically using a trapezoid (see ).

Additional constructions

The contraharmonic mean can be constructed on a circle similar to the way the Pythagorean means of two variables are constructed. The contraharmonic mean is the remainder of the diameter on which the harmonic mean lies.

Properties

The contraharmonic mean of a random variable is equal to the sum of the arithmetic mean and the variance divided by the arithmetic mean. [1] Since the variance is always ≥0 the contraharmonic mean is always greater than or equal to the arithmetic mean.

The ratio of the variance and the mean was proposed as a test statistic by Clapham. [2] This statistic is the contraharmonic mean less one.

It is also related to Katz's statistic [3]

where m is the mean, s2 the variance and n is the sample size.

Jn is asymptotically normally distributed with a mean of zero and variance of 1.

Uses in statistics

The problem of a size biased sample was discussed by Cox in 1969 on a problem of sampling fibres. The expectation of size biased sample is equal to its contraharmonic mean. [4]

The probability of a fibre being sampled is proportional to its length. Because of this the usual sample mean (arithmetic mean) is a biased estimator of the true mean. To see this consider

where f(x) is the true population distribution, g(x) is the length weighted distribution and m is the sample mean. Taking the usual expectation of the mean here gives the contraharmonic mean rather than the usual (arithmetic) mean of the sample. This problem can be overcome by taking instead the expectation of the harmonic mean (1/x). The expectation and variance of 1/x are

and has variance

where E is the expectation operator. Asymptotically E[1/x] is distributed normally.

The asymptotic efficiency of length biased sampling depends compared to random sampling on the underlying distribution. if f(x) is log normal the efficiency is 1 while if the population is gamma distributed with index b, the efficiency is b/(b − 1).

This distribution has been used in several areas. [5] [6]

It has been used in image analysis. [7]

History

The contraharmonic mean was discovered by the Greek mathematician Eudoxus in the 4th century BCE.

See also

Related Research Articles

<span class="mw-page-title-main">Expected value</span> Average value of a random variable

In probability theory, the expected value is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a large number of independently selected outcomes of a random variable.

<span class="mw-page-title-main">Geometric mean</span> N-th root of the product of n numbers

In mathematics, the geometric mean is a mean or average which indicates a central tendency of a finite set of real numbers by using the product of their values. The geometric mean is defined as the nth root of the product of n numbers, i.e., for a set of numbers a1, a2, ..., an, the geometric mean is defined as

In mathematics, the harmonic mean is one of several kinds of average, and in particular, one of the Pythagorean means. It is sometimes appropriate for situations when the average rate is desired.

<span class="mw-page-title-main">Normal distribution</span> Probability distribution

In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is

<span class="mw-page-title-main">Standard deviation</span> In statistics, a measure of variation

In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range.

<span class="mw-page-title-main">Variance</span> Statistical measure of how far values spread from their average

In probability theory and statistics, variance is the squared deviation from the mean of a random variable. The variance is also often defined as the square of the standard deviation. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value. It is the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by , , , , or .

The weighted arithmetic mean is similar to an ordinary arithmetic mean, except that instead of each of the data points contributing equally to the final average, some data points contribute more than others. The notion of weighted mean plays a role in descriptive statistics and also occurs in a more general form in several other areas of mathematics.

In probability theory, the central limit theorem (CLT) establishes that, in many situations, for independent and identically distributed random variables, the sampling distribution of the standardized sample mean tends towards the standard normal distribution even if the original variables themselves are not normally distributed.

<span class="mw-page-title-main">Log-normal distribution</span> Probability distribution

In probability theory, a log-normal (or lognormal) distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed. Thus, if the random variable X is log-normally distributed, then Y = ln(X) has a normal distribution. Equivalently, if Y has a normal distribution, then the exponential function of Y, X = exp(Y), has a log-normal distribution. A random variable which is log-normally distributed takes only positive real values. It is a convenient and useful model for measurements in exact and engineering sciences, as well as medicine, economics and other topics (e.g., energies, concentrations, lengths, prices of financial instruments, and other metrics).

<span class="mw-page-title-main">Wiener process</span> Stochastic process generalizing Brownian motion

In mathematics, the Wiener process is a real-valued continuous-time stochastic process named in honor of American mathematician Norbert Wiener for his investigations on the mathematical properties of the one-dimensional Brownian motion. It is often also called Brownian motion due to its historical connection with the physical process of the same name originally observed by Scottish botanist Robert Brown. It is one of the best known Lévy processes and occurs frequently in pure and applied mathematics, economics, quantitative finance, evolutionary biology, and physics.

In probability theory, Chebyshev's inequality guarantees that, for a wide class of probability distributions, no more than a certain fraction of values can be more than a certain distance from the mean. Specifically, no more than 1/k2 of the distribution's values can be k or more standard deviations away from the mean. The rule is often called Chebyshev's theorem, about the range of standard deviations around the mean, in statistics. The inequality has great utility because it can be applied to any probability distribution in which the mean and variance are defined. For example, it can be used to prove the weak law of large numbers.

In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the lesser values, the covariance is positive. In the opposite case, when the greater values of one variable mainly correspond to the lesser values of the other,, the covariance is negative. The sign of the covariance, therefore, shows the tendency in the linear relationship between the variables. The magnitude of the covariance is the geometric mean of the variances that are in-common for the two random variables. The correlation coefficient normalizes the covariance by dividing by the geometric mean of the total variances for the two random variables.

<span class="mw-page-title-main">Covariance matrix</span> Measure of covariance of components of a random vector

In probability theory and statistics, a covariance matrix is a square matrix giving the covariance between each pair of elements of a given random vector. Any covariance matrix is symmetric and positive semi-definite and its main diagonal contains variances.

<span class="mw-page-title-main">Pearson correlation coefficient</span> Measure of linear correlation

In statistics, the Pearson correlation coefficient ― also known as Pearson's r, the Pearson product-moment correlation coefficient (PPMCC), the bivariate correlation, or colloquially simply as the correlation coefficient ― is a measure of linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviations; thus, it is essentially a normalized measurement of the covariance, such that the result always has a value between −1 and 1. As with covariance itself, the measure can only reflect a linear correlation of variables, and ignores many other types of relationships or correlations. As a simple example, one would expect the age and height of a sample of teenagers from a high school to have a Pearson correlation coefficient significantly greater than 0, but less than 1.

In statistics, the Wishart distribution is a generalization to multiple dimensions of the gamma distribution. It is named in honor of John Wishart, who first formulated the distribution in 1928.

<span class="mw-page-title-main">Pythagorean means</span> Classical averages studied in ancient Greece

In mathematics, the three classical Pythagorean means are the arithmetic mean (AM), the geometric mean (GM), and the harmonic mean (HM). These means were studied with proportions by Pythagoreans and later generations of Greek mathematicians because of their importance in geometry and music.

In mathematics, the Lehmer mean of a tuple of positive real numbers, named after Derrick Henry Lehmer, is defined as:

In computational learning theory, Rademacher complexity, named after Hans Rademacher, measures richness of a class of real-valued functions with respect to a probability distribution.

In probability theory, concentration inequalities provide bounds on how a random variable deviates from some value. The law of large numbers of classical probability theory states that sums of independent random variables are, under very mild conditions, close to their expectation with a large probability. Such sums are the most basic examples of random variables concentrated around their mean. Recent results show that such behavior is shared by other functions of independent random variables.

In statistics, the complex Wishart distribution is a complex version of the Wishart distribution. It is the distribution of times the sample Hermitian covariance matrix of zero-mean independent Gaussian random variables. It has support for Hermitian positive definite matrices.

References

  1. Kingley MSC (1989) The distribution of hauled out ringed seals an interpretation of Taylor's law. Oecologia 79: 106-110
  2. Clapham AR (1936) Overdispersion in grassland communities and the use of statistical methods in plant ecology. J Ecol 14: 232
  3. Katz L (1965) United treatment of a broad class of discrete probability distributions. in Proceedings of the International Symposium on Discrete Distributions. Montreal
  4. Zelen M (1972) Length-biased sampling and biomedical problems. In Biometric Society Meeting, Dallas, Texas
  5. Keillor BD, D'Amico M & Horton V (2001) Global Consumer Tendencies. Psychology & Marketing 18(1) 1-19
  6. Sudman (1980) Quota sampling techniques and weighting procedures to correct for frequency bias
  7. Pathak M, Singh S (2014) Comparative analysis of image denoising techniques. International Journal of Computer Science & Engineering Technology 5 (2) 160-167