In mathematics, the harmonic mean is a kind of average, one of the Pythagorean means.
It is the most appropriate average for ratios and rates such as speeds, [1] [2] and is normally only used for positive arguments. [3]
The harmonic mean is the reciprocal of the arithmetic mean of the reciprocals of the numbers, that is, the generalized f-mean with . For example, the harmonic mean of 1, 4, and 4 is
The harmonic mean H of the positive real numbers is [4]
It is the reciprocal of the arithmetic mean of the reciprocals, and vice versa:
where the arithmetic mean is
The harmonic mean is a Schur-concave function, and is greater than or equal to the minimum of its arguments: for positive arguments, . Thus, the harmonic mean cannot be made arbitrarily large by changing some values to bigger ones (while having at least one value unchanged). [ citation needed ]
The harmonic mean is also concave for positive arguments, an even stronger property than Schur-concavity.[ citation needed ]
For all positive data sets containing at least one pair of nonequal values, the harmonic mean is always the least of the three Pythagorean means, [5] while the arithmetic mean is always the greatest of the three and the geometric mean is always in between. (If all values in a nonempty data set are equal, the three means are always equal.)
It is the special case M−1 of the power mean:
Since the harmonic mean of a list of numbers tends strongly toward the least elements of the list, it tends (compared to the arithmetic mean) to mitigate the impact of large outliers and aggravate the impact of small ones.
The arithmetic mean is often mistakenly used in places calling for the harmonic mean. [6] In the speed example below for instance, the arithmetic mean of 40 is incorrect, and too big.
The harmonic mean is related to the other Pythagorean means, as seen in the equation below. This can be seen by interpreting the denominator to be the arithmetic mean of the product of numbers n times but each time omitting the j-th term. That is, for the first term, we multiply all n numbers except the first; for the second, we multiply all n numbers except the second; and so on. The numerator, excluding the n, which goes with the arithmetic mean, is the geometric mean to the power n. Thus the n-th harmonic mean is related to the n-th geometric and arithmetic means. The general formula is
If a set of non-identical numbers is subjected to a mean-preserving spread — that is, two or more elements of the set are "spread apart" from each other while leaving the arithmetic mean unchanged — then the harmonic mean always decreases. [7]
For the special case of just two numbers, and , the harmonic mean can be written as: [4]
(Note that the harmonic mean is undefined if , i.e. .)
In this special case, the harmonic mean is related to the arithmetic mean and the geometric mean by [4]
Since by the inequality of arithmetic and geometric means, this shows for the n = 2 case that H≤G (a property that in fact holds for all n). It also follows that , meaning the two numbers' geometric mean equals the geometric mean of their arithmetic and harmonic means.
For the special case of three numbers, , and , the harmonic mean can be written as: [4]
Three positive numbers H, G, and A are respectively the harmonic, geometric, and arithmetic means of three positive numbers if and only if [8] : p.74, #1834 the following inequality holds
If a set of weights , ..., is associated to the data set , ..., , the weighted harmonic mean is defined by [9]
The unweighted harmonic mean can be regarded as the special case where all of the weights are equal.
The prime number theorem states that the number of primes less than or equal to is asymptotically equal to the harmonic mean of the first natural numbers. [10]
In many situations involving rates and ratios, the harmonic mean provides the correct average. For instance, if a vehicle travels a certain distance d outbound at a speed x (e.g. 60 km/h) and returns the same distance at a speed y (e.g. 20 km/h), then its average speed is the harmonic mean of x and y (30 km/h), not the arithmetic mean (40 km/h). The total travel time is the same as if it had traveled the whole distance at that average speed. This can be proven as follows: [11]
Average speed for the entire journey =Total distance traveled/Sum of time for each segment=2d/d/x + d/y = 2xy/x + y
However, if the vehicle travels for a certain amount of time at a speed x and then the same amount of time at a speed y, then its average speed is the arithmetic mean of x and y, which in the above example is 40 km/h.
Average speed for the entire journey =Total distance traveled/Sum of time for each segment=xt + yt/2t=x + y/2
The same principle applies to more than two segments: given a series of sub-trips at different speeds, if each sub-trip covers the same distance, then the average speed is the harmonic mean of all the sub-trip speeds; and if each sub-trip takes the same amount of time, then the average speed is the arithmetic mean of all the sub-trip speeds. (If neither is the case, then a weighted harmonic mean or weighted arithmetic mean is needed. For the arithmetic mean, the speed of each portion of the trip is weighted by the duration of that portion, while for the harmonic mean, the corresponding weight is the distance. In both cases, the resulting formula reduces to dividing the total distance by the total time.)
However, one may avoid the use of the harmonic mean for the case of "weighting by distance". Pose the problem as finding "slowness" of the trip where "slowness" (in hours per kilometre) is the inverse of speed. When trip slowness is found, invert it so as to find the "true" average trip speed. For each trip segment i, the slowness si = 1/speedi. Then take the weighted arithmetic mean of the si's weighted by their respective distances (optionally with the weights normalized so they sum to 1 by dividing them by trip length). This gives the true average slowness (in time per kilometre). It turns out that this procedure, which can be done with no knowledge of the harmonic mean, amounts to the same mathematical operations as one would use in solving this problem by using the harmonic mean. Thus it illustrates why the harmonic mean works in this case.
Similarly, if one wishes to estimate the density of an alloy given the densities of its constituent elements and their mass fractions (or, equivalently, percentages by mass), then the predicted density of the alloy (exclusive of typically minor volume changes due to atom packing effects) is the weighted harmonic mean of the individual densities, weighted by mass, rather than the weighted arithmetic mean as one might at first expect. To use the weighted arithmetic mean, the densities would have to be weighted by volume. Applying dimensional analysis to the problem while labeling the mass units by element and making sure that only like element-masses cancel makes this clear.
If one connects two electrical resistors in parallel, one having resistance x (e.g., 60 Ω) and one having resistance y (e.g., 40 Ω), then the effect is the same as if one had used two resistors with the same resistance, both equal to the harmonic mean of x and y (48 Ω): the equivalent resistance, in either case, is 24 Ω (one-half of the harmonic mean). This same principle applies to capacitors in series or to inductors in parallel.
However, if one connects the resistors in series, then the average resistance is the arithmetic mean of x and y (50 Ω), with total resistance equal to twice this, the sum of x and y (100 Ω). This principle applies to capacitors in parallel or to inductors in series.
As with the previous example, the same principle applies when more than two resistors, capacitors or inductors are connected, provided that all are in parallel or all are in series.
The "conductivity effective mass" of a semiconductor is also defined as the harmonic mean of the effective masses along the three crystallographic directions. [12]
As for other optic equations, the thin lens equation 1/f = 1/u + 1/v can be rewritten such that the focal length f is one-half of the harmonic mean of the distances of the subject u and object v from the lens. [13]
Two thin lenses of focal length f1 and f2 in series is equivalent to two thin lenses of focal length fhm, their harmonic mean, in series. Expressed as optical power, two thin lenses of optical powers P1 and P2 in series is equivalent to two thin lenses of optical power Pam, their arithmetic mean, in series.
The weighted harmonic mean is the preferable method for averaging multiples, such as the price–earnings ratio (P/E). If these ratios are averaged using a weighted arithmetic mean, high data points are given greater weights than low data points. The weighted harmonic mean, on the other hand, correctly weights each data point. [14] The simple weighted arithmetic mean when applied to non-price normalized ratios such as the P/E is biased upwards and cannot be numerically justified, since it is based on equalized earnings; just as vehicles speeds cannot be averaged for a roundtrip journey (see above). [15]
In any triangle, the radius of the incircle is one-third of the harmonic mean of the altitudes.
For any point P on the minor arc BC of the circumcircle of an equilateral triangle ABC, with distances q and t from B and C respectively, and with the intersection of PA and BC being at a distance y from point P, we have that y is half the harmonic mean of q and t. [16]
In a right triangle with legs a and b and altitude h from the hypotenuse to the right angle, h2 is half the harmonic mean of a2 and b2. [17] [18]
Let t and s (t > s) be the sides of the two inscribed squares in a right triangle with hypotenuse c. Then s2 equals half the harmonic mean of c2 and t2.
Let a trapezoid have vertices A, B, C, and D in sequence and have parallel sides AB and CD. Let E be the intersection of the diagonals, and let F be on side DA and G be on side BC such that FEG is parallel to AB and CD. Then FG is the harmonic mean of AB and DC. (This is provable using similar triangles.)
One application of this trapezoid result is in the crossed ladders problem, where two ladders lie oppositely across an alley, each with feet at the base of one sidewall, with one leaning against a wall at height A and the other leaning against the opposite wall at height B, as shown. The ladders cross at a height of h above the alley floor. Then h is half the harmonic mean of A and B. This result still holds if the walls are slanted but still parallel and the "heights" A, B, and h are measured as distances from the floor along lines parallel to the walls. This can be proved easily using the area formula of a trapezoid and area addition formula.
In an ellipse, the semi-latus rectum (the distance from a focus to the ellipse along a line parallel to the minor axis) is the harmonic mean of the maximum and minimum distances of the ellipse from a focus.
In computer science, specifically information retrieval and machine learning, the harmonic mean of the precision (true positives per predicted positive) and the recall (true positives per real positive) is often used as an aggregated performance score for the evaluation of algorithms and systems: the F-score (or F-measure). This is used in information retrieval because only the positive class is of relevance, while number of negatives, in general, is large and unknown. [19] It is thus a trade-off as to whether the correct positive predictions should be measured in relation to the number of predicted positives or the number of real positives, so it is measured versus a putative number of positives that is an arithmetic mean of the two possible denominators.
A consequence arises from basic algebra in problems where people or systems work together. As an example, if a gas-powered pump can drain a pool in 4 hours and a battery-powered pump can drain the same pool in 6 hours, then it will take both pumps 6·4/6 + 4, which is equal to 2.4 hours, to drain the pool together. This is one-half of the harmonic mean of 6 and 4: 2·6·4/6 + 4 = 4.8. That is, the appropriate average for the two types of pump is the harmonic mean, and with one pair of pumps (two pumps), it takes half this harmonic mean time, while with two pairs of pumps (four pumps) it would take a quarter of this harmonic mean time.
In hydrology, the harmonic mean is similarly used to average hydraulic conductivity values for a flow that is perpendicular to layers (e.g., geologic or soil) - flow parallel to layers uses the arithmetic mean. This apparent difference in averaging is explained by the fact that hydrology uses conductivity, which is the inverse of resistivity.
In sabermetrics, a baseball player's Power–speed number is the harmonic mean of their home run and stolen base totals.
In population genetics, the harmonic mean is used when calculating the effects of fluctuations in the census population size on the effective population size. The harmonic mean takes into account the fact that events such as population bottleneck increase the rate genetic drift and reduce the amount of genetic variation in the population. This is a result of the fact that following a bottleneck very few individuals contribute to the gene pool limiting the genetic variation present in the population for many generations to come.
When considering fuel economy in automobiles two measures are commonly used – miles per gallon (mpg), and litres per 100 km. As the dimensions of these quantities are the inverse of each other (one is distance per volume, the other volume per distance) when taking the mean value of the fuel economy of a range of cars one measure will produce the harmonic mean of the other – i.e., converting the mean value of fuel economy expressed in litres per 100 km to miles per gallon will produce the harmonic mean of the fuel economy expressed in miles per gallon. For calculating the average fuel consumption of a fleet of vehicles from the individual fuel consumptions, the harmonic mean should be used if the fleet uses miles per gallon, whereas the arithmetic mean should be used if the fleet uses litres per 100 km. In the USA the CAFE standards (the federal automobile fuel consumption standards) make use of the harmonic mean.
In chemistry and nuclear physics the average mass per particle of a mixture consisting of different species (e.g., molecules or isotopes) is given by the harmonic mean of the individual species' masses weighted by their respective mass fraction.
The harmonic mean of a beta distribution with shape parameters α and β is:
The harmonic mean with α < 1 is undefined because its defining expression is not bounded in [0, 1].
Letting α = β
showing that for α = β the harmonic mean ranges from 0 for α = β = 1, to 1/2 for α = β → ∞.
The following are the limits with one parameter finite (non-zero) and the other parameter approaching these limits:
With the geometric mean the harmonic mean may be useful in maximum likelihood estimation in the four parameter case.
A second harmonic mean (H1 − X) also exists for this distribution
This harmonic mean with β < 1 is undefined because its defining expression is not bounded in [ 0, 1 ].
Letting α = β in the above expression
showing that for α = β the harmonic mean ranges from 0, for α = β = 1, to 1/2, for α = β → ∞.
The following are the limits with one parameter finite (non zero) and the other approaching these limits:
Although both harmonic means are asymmetric, when α = β the two means are equal.
The harmonic mean ( H ) of the lognormal distribution of a random variable X is [20]
where μ and σ2 are the parameters of the distribution, i.e. the mean and variance of the distribution of the natural logarithm of X.
The harmonic and arithmetic means of the distribution are related by
where Cv and μ* are the coefficient of variation and the mean of the distribution respectively..
The geometric (G), arithmetic and harmonic means of the distribution are related by [21]
The harmonic mean of type 1 Pareto distribution is [22]
where k is the scale parameter and α is the shape parameter.
For a random sample, the harmonic mean is calculated as above. Both the mean and the variance may be infinite (if it includes at least one term of the form 1/0).
The mean of the sample m is asymptotically distributed normally with variance s2.
The variance of the mean itself is [23]
where m is the arithmetic mean of the reciprocals, x are the variates, n is the population size and E is the expectation operator.
Assuming that the variance is not infinite and that the central limit theorem applies to the sample then using the delta method, the variance is
where H is the harmonic mean, m is the arithmetic mean of the reciprocals
s2 is the variance of the reciprocals of the data
and n is the number of data points in the sample.
A jackknife method of estimating the variance is possible if the mean is known. [24] This method is the usual 'delete 1' rather than the 'delete m' version.
This method first requires the computation of the mean of the sample (m)
where x are the sample values.
A series of value wi is then computed where
The mean (h) of the wi is then taken:
The variance of the mean is
Significance testing and confidence intervals for the mean can then be estimated with the t test.
Assume a random variate has a distribution f( x ). Assume also that the likelihood of a variate being chosen is proportional to its value. This is known as length based or size biased sampling.
Let μ be the mean of the population. Then the probability density function f*( x ) of the size biased population is
The expectation of this length biased distribution E*( x ) is [23]
where σ2 is the variance.
The expectation of the harmonic mean is the same as the non-length biased version E( x )
The problem of length biased sampling arises in a number of areas including textile manufacture [25] pedigree analysis [26] and survival analysis [27]
Akman et al. have developed a test for the detection of length based bias in samples. [28]
If X is a positive random variable and q > 0 then for all ε > 0 [29]
Assuming that X and E(X) are > 0 then [29]
This follows from Jensen's inequality.
Gurland has shown that [30] for a distribution that takes only positive values, for any n > 0
Under some conditions [31]
where ~ means approximately equal to.
Assuming that the variates (x) are drawn from a lognormal distribution there are several possible estimators for H:
where
Of these H3 is probably the best estimator for samples of 25 or more. [32]
A first order approximation to the bias and variance of H1 are [33]
where Cv is the coefficient of variation.
Similarly a first order approximation to the bias and variance of H3 are [33]
In numerical experiments H3 is generally a superior estimator of the harmonic mean than H1. [33] H2 produces estimates that are largely similar to H1.
The Environmental Protection Agency recommends the use of the harmonic mean in setting maximum toxin levels in water. [34]
In geophysical reservoir engineering studies, the harmonic mean is widely used. [35]
A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means in mathematics, especially in statistics. Each attempts to summarize or typify a given group of data, illustrating the magnitude and sign of the data set. Which of these measures is most illuminating depends on what is being measured, and on context and purpose.
In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value. It is the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by , , , , or .
The weighted arithmetic mean is similar to an ordinary arithmetic mean, except that instead of each of the data points contributing equally to the final average, some data points contribute more than others. The notion of weighted mean plays a role in descriptive statistics and also occurs in a more general form in several other areas of mathematics.
In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the distribution of a normalized version of the sample mean converges to a standard normal distribution. This holds even if the original variables themselves are not normally distributed. There are several versions of the CLT, each applying in the context of different conditions.
In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the distance between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate; the distance parameter could be any meaningful mono-dimensional measure of the process, such as time between production errors, or length along a roll of fabric in the weaving manufacturing process. It is a particular case of the gamma distribution. It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless. In addition to being used for the analysis of Poisson point processes it is found in various other contexts.
In probability theory and statistics, the chi-squared distribution with degrees of freedom is the distribution of a sum of the squares of independent standard normal random variables.
In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval [0, 1] or in terms of two positive parameters, denoted by alpha (α) and beta (β), that appear as exponents of the variable and its complement to 1, respectively, and control the shape of the distribution.
In mathematics, the inequality of arithmetic and geometric means, or more briefly the AM–GM inequality, states that the arithmetic mean of a list of non-negative real numbers is greater than or equal to the geometric mean of the same list; and further, that the two means are equal if and only if every number in the list is the same.
In mathematical analysis, Cesàro summation assigns values to some infinite sums that are not necessarily convergent in the usual sense. The Cesàro sum is defined as the limit, as n tends to infinity, of the sequence of arithmetic means of the first n partial sums of the series.
Weighted least squares (WLS), also known as weighted linear regression, is a generalization of ordinary least squares and linear regression in which knowledge of the unequal variance of observations (heteroscedasticity) is incorporated into the regression. WLS is also a specialization of generalized least squares, when all the off-diagonal entries of the covariance matrix of the errors, are null.
In statistics, simple linear regression (SLR) is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable and finds a linear function that, as accurately as possible, predicts the dependent variable values as a function of the independent variable. The adjective simple refers to the fact that the outcome variable is related to a single predictor.
In statistics, Poisson regression is a generalized linear model form of regression analysis used to model count data and contingency tables. Poisson regression assumes the response variable Y has a Poisson distribution, and assumes the logarithm of its expected value can be modeled by a linear combination of unknown parameters. A Poisson regression model is sometimes known as a log-linear model, especially when used to model contingency tables.
In statistics, the delta method is a method of deriving the asymptotic distribution of a random variable. It is applicable when the random variable being considered can be defined as a differentiable function of a random variable which is asymptotically Gaussian.
In mathematics, the three classical Pythagorean means are the arithmetic mean (AM), the geometric mean (GM), and the harmonic mean (HM). These means were studied with proportions by Pythagoreans and later generations of Greek mathematicians because of their importance in geometry and music.
In mathematics, a contraharmonic mean is a function complementary to the harmonic mean. The contraharmonic mean is a special case of the Lehmer mean, , where p = 2.
In mathematics and statistics, a circular mean or angular mean is a mean designed for angles and similar cyclic quantities, such as times of day, and fractional parts of real numbers.
In probability theory and statistics, the normal-gamma distribution is a bivariate four-parameter family of continuous probability distributions. It is the conjugate prior of a normal distribution with unknown mean and precision.
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known constant mean rate and independently of the time since the last event. It can also be used for the number of events in other types of intervals than time, and in dimension greater than 1.
In mathematics and statistics, the Fréchet mean is a generalization of centroids to metric spaces, giving a single representative point or central tendency for a cluster of points. It is named after Maurice Fréchet. Karcher mean is the renaming of the Riemannian Center of Mass construction developed by Karsten Grove and Hermann Karcher. On the real numbers, the arithmetic mean, median, geometric mean, and harmonic mean can all be interpreted as Fréchet means for different distance functions.
{{cite web}}
: CS1 maint: archived copy as title (link).{{cite web}}
: CS1 maint: archived copy as title (link)