In mathematics and statistics, the quasi-arithmetic mean or generalised f-mean or Kolmogorov-Nagumo-de Finetti mean [1] is one generalisation of the more familiar means such as the arithmetic mean and the geometric mean, using a function . It is also called Kolmogorov mean after Soviet mathematician Andrey Kolmogorov. It is a broader generalization than the regular generalized mean.
If f is a function which maps an interval of the real line to the real numbers, and is both continuous and injective, the f-mean of numbers is defined as , which can also be written
We require f to be injective in order for the inverse function to exist. Since is defined over an interval, lies within the domain of .
Since f is injective and continuous, it follows that f is a strictly monotonic function, and therefore that the f-mean is neither larger than the largest number of the tuple nor smaller than the smallest number in .
The following properties hold for for any single function :
Symmetry: The value of is unchanged if its arguments are permuted.
Idempotency: for all x, .
Monotonicity: is monotonic in each of its arguments (since is monotonic).
Continuity: is continuous in each of its arguments (since is continuous).
Replacement: Subsets of elements can be averaged a priori, without altering the mean, given that the multiplicity of elements is maintained. With it holds:
Partitioning: The computation of the mean can be split into computations of equal sized sub-blocks:
Self-distributivity: For any quasi-arithmetic mean of two variables: .
Mediality: For any quasi-arithmetic mean of two variables:.
Balancing: For any quasi-arithmetic mean of two variables:.
Central limit theorem : Under regularity conditions, for a sufficiently large sample, is approximately normal. [2] A similar result is available for Bajraktarević means and deviation means, which are generalizations of quasi-arithmetic means. [3] [4]
Scale-invariance: The quasi-arithmetic mean is invariant with respect to offsets and scaling of : .
There are several different sets of properties that characterize the quasi-arithmetic mean (i.e., each function that satisfies these properties is an f-mean for some function f).
Means are usually homogeneous, but for most functions , the f-mean is not. Indeed, the only homogeneous quasi-arithmetic means are the power means (including the geometric mean); see Hardy–Littlewood–Pólya, page 68.
The homogeneity property can be achieved by normalizing the input values by some (homogeneous) mean .
However this modification may violate monotonicity and the partitioning property of the mean.
Consider a Legendre-type strictly convex function . Then the gradient map is globally invertible and the weighted multivariate quasi-arithmetic mean [9] is defined by , where is a normalized weight vector ( by default for a balanced average). From the convex duality, we get a dual quasi-arithmetic mean associated to the quasi-arithmetic mean . For example, take for a symmetric positive-definite matrix. The pair of matrix quasi-arithmetic means yields the matrix harmonic mean:
In mathematics, the arithmetic–geometric mean of two positive real numbers x and y is the mutual limit of a sequence of arithmetic means and a sequence of geometric means. The arithmetic–geometric mean is used in fast algorithms for exponential, trigonometric functions, and other special functions, as well as some mathematical constants, in particular, computing π.
In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field whose value at a point gives the direction and the rate of fastest increase. The gradient transforms like a vector under change of basis of the space of variables of . If the gradient of a function is non-zero at a point , the direction of the gradient is the direction in which the function increases most quickly from , and the magnitude of the gradient is the rate of increase in that direction, the greatest absolute directional derivative. Further, a point where the gradient is the zero vector is known as a stationary point. The gradient thus plays a fundamental role in optimization theory, where it is used to minimize a function by gradient descent. In coordinate-free terms, the gradient of a function may be defined by:
In mathematics and physics, Laplace's equation is a second-order partial differential equation named after Pierre-Simon Laplace, who first studied its properties. This is often written as or where is the Laplace operator, is the divergence operator, is the gradient operator, and is a twice-differentiable real-valued function. The Laplace operator therefore maps a scalar function to another scalar function.
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.
In probability theory, the law of large numbers (LLN) is a mathematical law that states that the average of the results obtained from a large number of independent random samples converges to the true value, if it exists. More formally, the LLN states that given a sample of independent and identically distributed values, the sample mean converges to the true mean.
In mathematics, the Laplace operator or Laplacian is a differential operator given by the divergence of the gradient of a scalar function on Euclidean space. It is usually denoted by the symbols , (where is the nabla operator), or . In a Cartesian coordinate system, the Laplacian is given by the sum of second partial derivatives of the function with respect to each independent variable. In other coordinate systems, such as cylindrical and spherical coordinates, the Laplacian also has a useful form. Informally, the Laplacian Δf (p) of a function f at a point p measures by how much the average value of f over small spheres or balls centered at p deviates from f (p).
In mathematics and physical science, spherical harmonics are special functions defined on the surface of a sphere. They are often employed in solving partial differential equations in many scientific fields. The table of spherical harmonics contains a list of common spherical harmonics.
In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate expectations, covariances using differentiation based on some useful algebraic properties, as well as for generality, as exponential families are in a sense very natural sets of distributions to consider. The term exponential class is sometimes used in place of "exponential family", or the older term Koopman–Darmois family. Sometimes loosely referred to as "the" exponential family, this class of distributions is distinct because they all possess a variety of desirable properties, most importantly the existence of a sufficient statistic.
In numerical analysis and computational statistics, rejection sampling is a basic technique used to generate observations from a distribution. It is also commonly called the acceptance-rejection method or "accept-reject algorithm" and is a type of exact simulation method. The method works for any distribution in with a density.
In multivariable calculus, the implicit function theorem is a tool that allows relations to be converted to functions of several real variables. It does so by representing the relation as the graph of a function. There may not be a single function whose graph can represent the entire relation, but there may be such a function on a restriction of the domain of the relation. The implicit function theorem gives a sufficient condition to ensure that there is such a function.
In mathematical statistics, the Fisher information is a way of measuring the amount of information that an observable random variable X carries about an unknown parameter θ of a distribution that models X. Formally, it is the variance of the score, or the expected value of the observed information.
In statistics, Poisson regression is a generalized linear model form of regression analysis used to model count data and contingency tables. Poisson regression assumes the response variable Y has a Poisson distribution, and assumes the logarithm of its expected value can be modeled by a linear combination of unknown parameters. A Poisson regression model is sometimes known as a log-linear model, especially when used to model contingency tables.
In statistics, the delta method is a method of deriving the asymptotic distribution of a random variable. It is applicable when the random variable being considered can be defined as a differentiable function of a random variable which is asymptotically Gaussian.
Covariance matrix adaptation evolution strategy (CMA-ES) is a particular kind of strategy for numerical optimization. Evolution strategies (ES) are stochastic, derivative-free methods for numerical optimization of non-linear or non-convex continuous optimization problems. They belong to the class of evolutionary algorithms and evolutionary computation. An evolutionary algorithm is broadly based on the principle of biological evolution, namely the repeated interplay of variation and selection: in each generation (iteration) new individuals are generated by variation of the current parental individuals, usually in a stochastic way. Then, some individuals are selected to become the parents in the next generation based on their fitness or objective function value . Like this, individuals with better and better -values are generated over the generation sequence.
In complex analysis, a branch of mathematics, an amoeba is a set associated with a polynomial in one or more complex variables. Amoebas have applications in algebraic geometry, especially tropical geometry.
In mathematical physics, the Berezin integral, named after Felix Berezin,, is a way to define integration for functions of Grassmann variables. It is not an integral in the Lebesgue sense; the word "integral" is used because the Berezin integral has properties analogous to the Lebesgue integral and because it extends the path integral in physics, where it is used as a sum over histories for fermions.
Natural evolution strategies (NES) are a family of numerical optimization algorithms for black box problems. Similar in spirit to evolution strategies, they iteratively update the (continuous) parameters of a search distribution by following the natural gradient towards higher expected fitness.
In statistics, the variance function is a smooth function that depicts the variance of a random quantity as a function of its mean. The variance function is a measure of heteroscedasticity and plays a large role in many settings of statistical modelling. It is a main ingredient in the generalized linear model framework and a tool used in non-parametric regression, semiparametric regression and functional data analysis. In parametric modeling, variance functions take on a parametric form and explicitly describe the relationship between the variance and the mean of a random quantity. In a non-parametric setting, the variance function is assumed to be a smooth function.
In mathematics, calculus on Euclidean space is a generalization of calculus of functions in one or several variables to calculus of functions on Euclidean space as well as a finite-dimensional real vector space. This calculus is also known as advanced calculus, especially in the United States. It is similar to multivariable calculus but is somewhat more sophisticated in that it uses linear algebra more extensively and covers some concepts from differential geometry such as differential forms and Stokes' formula in terms of differential forms. This extensive use of linear algebra also allows a natural generalization of multivariable calculus to calculus on Banach spaces or topological vector spaces.
A Stein discrepancy is a statistical divergence between two probability measures that is rooted in Stein's method. It was first formulated as a tool to assess the quality of Markov chain Monte Carlo samplers, but has since been used in diverse settings in statistics, machine learning and computer science.
[10] MR4355191 - Characterization of quasi-arithmetic means without regularity condition
Burai, P.; Kiss, G.; Szokol, P. Acta Math. Hungar. 165 (2021), no. 2, 474–485.
[11]
MR4574540 - A dichotomy result for strictly increasing bisymmetric maps
Burai, Pál; Kiss, Gergely; Szokol, Patricia
J. Math. Anal. Appl. 526 (2023), no. 2, Paper No. 127269, 9 pp.