Quasi-arithmetic mean

Last updated

In mathematics and statistics, the quasi-arithmetic mean or generalised f-mean or Kolmogorov-Nagumo-de Finetti mean [1] is one generalisation of the more familiar means such as the arithmetic mean and the geometric mean, using a function . It is also called Kolmogorov mean after Soviet mathematician Andrey Kolmogorov. It is a broader generalization than the regular generalized mean.

Contents

Definition

If f is a function which maps an interval of the real line to the real numbers, and is both continuous and injective, the f-mean of numbers is defined as , which can also be written

We require f to be injective in order for the inverse function to exist. Since is defined over an interval, lies within the domain of .

Since f is injective and continuous, it follows that f is a strictly monotonic function, and therefore that the f-mean is neither larger than the largest number of the tuple nor smaller than the smallest number in .

Examples

Properties

The following properties hold for for any single function :

Symmetry: The value of is unchanged if its arguments are permuted.

Idempotency: for all x, .

Monotonicity: is monotonic in each of its arguments (since is monotonic).

Continuity: is continuous in each of its arguments (since is continuous).

Replacement: Subsets of elements can be averaged a priori, without altering the mean, given that the multiplicity of elements is maintained. With it holds:

Partitioning: The computation of the mean can be split into computations of equal sized sub-blocks:

Self-distributivity: For any quasi-arithmetic mean of two variables: .

Mediality: For any quasi-arithmetic mean of two variables:.

Balancing: For any quasi-arithmetic mean of two variables:.

Central limit theorem  : Under regularity conditions, for a sufficiently large sample, is approximately normal. [2] A similar result is available for Bajraktarević means and deviation means, which are generalizations of quasi-arithmetic means. [3] [4]

Scale-invariance: The quasi-arithmetic mean is invariant with respect to offsets and scaling of : .

Characterization

There are several different sets of properties that characterize the quasi-arithmetic mean (i.e., each function that satisfies these properties is an f-mean for some function f).

Homogeneity

Means are usually homogeneous, but for most functions , the f-mean is not. Indeed, the only homogeneous quasi-arithmetic means are the power means (including the geometric mean); see HardyLittlewoodPólya, page 68.

The homogeneity property can be achieved by normalizing the input values by some (homogeneous) mean .

However this modification may violate monotonicity and the partitioning property of the mean.

Generalizations

Consider a Legendre-type strictly convex function . Then the gradient map is globally invertible and the weighted multivariate quasi-arithmetic mean [9] is defined by , where is a normalized weight vector ( by default for a balanced average). From the convex duality, we get a dual quasi-arithmetic mean associated to the quasi-arithmetic mean . For example, take for a symmetric positive-definite matrix. The pair of matrix quasi-arithmetic means yields the matrix harmonic mean:

See also

Related Research Articles

<span class="mw-page-title-main">Arithmetic–geometric mean</span> Mathematical function of two positive real arguments

In mathematics, the arithmetic–geometric mean of two positive real numbers x and y is the mutual limit of a sequence of arithmetic means and a sequence of geometric means. The arithmetic–geometric mean is used in fast algorithms for exponential, trigonometric functions, and other special functions, as well as some mathematical constants, in particular, computing π.

<span class="mw-page-title-main">Gradient</span> Multivariate derivative (mathematics)

In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field whose value at a point gives the direction and the rate of fastest increase. The gradient transforms like a vector under change of basis of the space of variables of . If the gradient of a function is non-zero at a point , the direction of the gradient is the direction in which the function increases most quickly from , and the magnitude of the gradient is the rate of increase in that direction, the greatest absolute directional derivative. Further, a point where the gradient is the zero vector is known as a stationary point. The gradient thus plays a fundamental role in optimization theory, where it is used to minimize a function by gradient descent. In coordinate-free terms, the gradient of a function may be defined by:

<span class="mw-page-title-main">Laplace's equation</span> Second-order partial differential equation

In mathematics and physics, Laplace's equation is a second-order partial differential equation named after Pierre-Simon Laplace, who first studied its properties. This is often written as or where is the Laplace operator, is the divergence operator, is the gradient operator, and is a twice-differentiable real-valued function. The Laplace operator therefore maps a scalar function to another scalar function.

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

<span class="mw-page-title-main">Law of large numbers</span> Averages of repeated trials converge to the expected value

In probability theory, the law of large numbers (LLN) is a mathematical law that states that the average of the results obtained from a large number of independent random samples converges to the true value, if it exists. More formally, the LLN states that given a sample of independent and identically distributed values, the sample mean converges to the true mean.

In mathematics, the Laplace operator or Laplacian is a differential operator given by the divergence of the gradient of a scalar function on Euclidean space. It is usually denoted by the symbols , (where is the nabla operator), or . In a Cartesian coordinate system, the Laplacian is given by the sum of second partial derivatives of the function with respect to each independent variable. In other coordinate systems, such as cylindrical and spherical coordinates, the Laplacian also has a useful form. Informally, the Laplacian Δf (p) of a function f at a point p measures by how much the average value of f over small spheres or balls centered at p deviates from f (p).

<span class="mw-page-title-main">Spherical harmonics</span> Special mathematical functions defined on the surface of a sphere

In mathematics and physical science, spherical harmonics are special functions defined on the surface of a sphere. They are often employed in solving partial differential equations in many scientific fields. The table of spherical harmonics contains a list of common spherical harmonics.

In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate expectations, covariances using differentiation based on some useful algebraic properties, as well as for generality, as exponential families are in a sense very natural sets of distributions to consider. The term exponential class is sometimes used in place of "exponential family", or the older term Koopman–Darmois family. Sometimes loosely referred to as "the" exponential family, this class of distributions is distinct because they all possess a variety of desirable properties, most importantly the existence of a sufficient statistic.

In numerical analysis and computational statistics, rejection sampling is a basic technique used to generate observations from a distribution. It is also commonly called the acceptance-rejection method or "accept-reject algorithm" and is a type of exact simulation method. The method works for any distribution in with a density.

In multivariable calculus, the implicit function theorem is a tool that allows relations to be converted to functions of several real variables. It does so by representing the relation as the graph of a function. There may not be a single function whose graph can represent the entire relation, but there may be such a function on a restriction of the domain of the relation. The implicit function theorem gives a sufficient condition to ensure that there is such a function.

In mathematical statistics, the Fisher information is a way of measuring the amount of information that an observable random variable X carries about an unknown parameter θ of a distribution that models X. Formally, it is the variance of the score, or the expected value of the observed information.

In statistics, Poisson regression is a generalized linear model form of regression analysis used to model count data and contingency tables. Poisson regression assumes the response variable Y has a Poisson distribution, and assumes the logarithm of its expected value can be modeled by a linear combination of unknown parameters. A Poisson regression model is sometimes known as a log-linear model, especially when used to model contingency tables.

In statistics, the delta method is a method of deriving the asymptotic distribution of a random variable. It is applicable when the random variable being considered can be defined as a differentiable function of a random variable which is asymptotically Gaussian.

Covariance matrix adaptation evolution strategy (CMA-ES) is a particular kind of strategy for numerical optimization. Evolution strategies (ES) are stochastic, derivative-free methods for numerical optimization of non-linear or non-convex continuous optimization problems. They belong to the class of evolutionary algorithms and evolutionary computation. An evolutionary algorithm is broadly based on the principle of biological evolution, namely the repeated interplay of variation and selection: in each generation (iteration) new individuals are generated by variation of the current parental individuals, usually in a stochastic way. Then, some individuals are selected to become the parents in the next generation based on their fitness or objective function value . Like this, individuals with better and better -values are generated over the generation sequence.

<span class="mw-page-title-main">Amoeba (mathematics)</span> Set associated with a complex-valued polynomial

In complex analysis, a branch of mathematics, an amoeba is a set associated with a polynomial in one or more complex variables. Amoebas have applications in algebraic geometry, especially tropical geometry.

In mathematical physics, the Berezin integral, named after Felix Berezin,, is a way to define integration for functions of Grassmann variables. It is not an integral in the Lebesgue sense; the word "integral" is used because the Berezin integral has properties analogous to the Lebesgue integral and because it extends the path integral in physics, where it is used as a sum over histories for fermions.

Natural evolution strategies (NES) are a family of numerical optimization algorithms for black box problems. Similar in spirit to evolution strategies, they iteratively update the (continuous) parameters of a search distribution by following the natural gradient towards higher expected fitness.

In statistics, the variance function is a smooth function that depicts the variance of a random quantity as a function of its mean. The variance function is a measure of heteroscedasticity and plays a large role in many settings of statistical modelling. It is a main ingredient in the generalized linear model framework and a tool used in non-parametric regression, semiparametric regression and functional data analysis. In parametric modeling, variance functions take on a parametric form and explicitly describe the relationship between the variance and the mean of a random quantity. In a non-parametric setting, the variance function is assumed to be a smooth function.

In mathematics, calculus on Euclidean space is a generalization of calculus of functions in one or several variables to calculus of functions on Euclidean space as well as a finite-dimensional real vector space. This calculus is also known as advanced calculus, especially in the United States. It is similar to multivariable calculus but is somewhat more sophisticated in that it uses linear algebra more extensively and covers some concepts from differential geometry such as differential forms and Stokes' formula in terms of differential forms. This extensive use of linear algebra also allows a natural generalization of multivariable calculus to calculus on Banach spaces or topological vector spaces.

A Stein discrepancy is a statistical divergence between two probability measures that is rooted in Stein's method. It was first formulated as a tool to assess the quality of Markov chain Monte Carlo samplers, but has since been used in diverse settings in statistics, machine learning and computer science.

References

  1. Nielsen, Frank; Nock, Richard (June 2017). "Generalizing skew Jensen divergences and Bregman divergences with comparative convexity". IEEE Signal Processing Letters. 24 (8): 2. arXiv: 1702.04877 . Bibcode:2017ISPL...24.1123N. doi:10.1109/LSP.2017.2712195. S2CID   31899023.
  2. de Carvalho, Miguel (2016). "Mean, what do you Mean?". The American Statistician . 70 (3): 764‒776. doi:10.1080/00031305.2016.1148632. hdl: 20.500.11820/fd7a8991-69a4-4fe5-876f-abcd2957a88c . S2CID   219595024.
  3. Barczy, Mátyás; Burai, Pál (2022-04-01). "Limit theorems for Bajraktarević and Cauchy quotient means of independent identically distributed random variables". Aequationes Mathematicae. 96 (2): 279–305. doi:10.1007/s00010-021-00813-x. ISSN   1420-8903.
  4. Barczy, Mátyás; Páles, Zsolt (2023-09-01). "Limit Theorems for Deviation Means of Independent and Identically Distributed Random Variables". Journal of Theoretical Probability. 36 (3): 1626–1666. doi:10.1007/s10959-022-01225-6. ISSN   1572-9230.
  5. 1 2 Aczél, J.; Dhombres, J. G. (1989). Functional equations in several variables. With applications to mathematics, information theory and to the natural and social sciences. Encyclopedia of Mathematics and its Applications, 31. Cambridge: Cambridge Univ. Press.
  6. Grudkin, Anton (2019). "Characterization of the quasi-arithmetic mean". Math stackexchange.
  7. Aumann, Georg (1937). "Vollkommene Funktionalmittel und gewisse Kegelschnitteigenschaften". Journal für die reine und angewandte Mathematik . 1937 (176): 49–55. doi:10.1515/crll.1937.176.49. S2CID   115392661.
  8. Aumann, Georg (1934). "Grundlegung der Theorie der analytischen Analytische Mittelwerte". Sitzungsberichte der Bayerischen Akademie der Wissenschaften: 45–81.
  9. Nielsen, Frank (2023). "Beyond scalar quasi-arithmetic means: Quasi-arithmetic averages and quasi-arithmetic mixtures in information geometry". arXiv: 2301.10980 [cs.IT].

[10] MR4355191 - Characterization of quasi-arithmetic means without regularity condition

Burai, P.; Kiss, G.; Szokol, P. Acta Math. Hungar. 165 (2021), no. 2, 474–485.

[11]

MR4574540 - A dichotomy result for strictly increasing bisymmetric maps

Burai, Pál; Kiss, Gergely; Szokol, Patricia

J. Math. Anal. Appl. 526 (2023), no. 2, Paper No. 127269, 9 pp.