Jensen's inequality

Last updated March 08, 2024

Visualizing convexity and Jensen's inequality

In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function. It was proved by Jensen in 1906,^[1] building on an earlier proof of the same inequality for doubly-differentiable functions by Otto Hölder in 1889.^[2] Given its generality, the inequality appears in many forms depending on the context, some of which are presented below. In its simplest form the inequality states that the convex transformation of a mean is less than or equal to the mean applied after convex transformation; it is a simple corollary that the opposite is true of concave transformations.^[3]

Statements

The classical form of Jensen's inequality involves several numbers and weights. The inequality can be stated quite generally using either the language of measure theory or (equivalently) probability. In the probabilistic setting, the inequality can be further generalized to its full strength.

Finite form

For a real convex function $\varphi$ , numbers $x_{1},x_{2},\ldots ,x_{n}$ in its domain, and positive weights $a_{i}$ , Jensen's inequality can be stated as:

\varphi \left({\frac {\sum a_{i}x_{i}}{\sum a_{i}}}\right)\leq {\frac {\sum a_{i}\varphi (x_{i})}{\sum a_{i}}}

(1)

and the inequality is reversed if $\varphi$ is concave, which is

\varphi \left({\frac {\sum a_{i}x_{i}}{\sum a_{i}}}\right)\geq {\frac {\sum a_{i}\varphi (x_{i})}{\sum a_{i}}}.

(2)

Equality holds if and only if $x_{1}=x_{2}=\cdots =x_{n}$ or $\varphi$ is linear on a domain containing $x_{1},x_{2},\cdots ,x_{n}$ .

As a particular case, if the weights $a_{i}$ are all equal, then ( 1 ) and ( 2 ) become

\varphi \left({\frac {\sum x_{i}}{n}}\right)\leq {\frac {\sum \varphi (x_{i})}{n}}

(3)

\varphi \left({\frac {\sum x_{i}}{n}}\right)\geq {\frac {\sum \varphi (x_{i})}{n}}

(4)

For instance, the function $log(x)$ is concave , so substituting $\varphi (x)=\log(x)$ in the previous formula ( 4 ) establishes the (logarithm of the) familiar arithmetic-mean/geometric-mean inequality:

\log \!\left({\frac {\sum _{i=1}^{n}x_{i}}{n}}\right)\geq {\frac {\sum _{i=1}^{n}\log \!\left(x_{i}\right)}{n}}

\exp \!\left(\log \!\left({\frac {\sum _{i=1}^{n}x_{i}}{n}}\right)\right)\geq \exp \!\left({\frac {\sum _{i=1}^{n}\log \!\left(x_{i}\right)}{n}}\right)

{\frac {x_{1}+x_{2}+\cdots +x_{n}}{n}}\geq {\sqrt[{n}]{x_{1}\cdot x_{2}\cdots x_{n}}}

A common application has $x$ as a function of another variable (or set of variables) $t$ , that is, $x_{i}=g(t_{i})$ . All of this carries directly over to the general continuous case: the weights $a i$ are replaced by a non-negative integrable function $f (x)$ , such as a probability distribution, and the summations are replaced by integrals.

Measure-theoretic form

Let $(\Omega ,A,\mu )$ be a probability space. Let $f:\Omega \to \mathbb {R}$ be a $\mu$ -measurable function and ${\displaystyle \varphi$ be convex. Then:^[5]

\varphi \left(\int _{\Omega }f\,\mathrm {d} \mu \right)\leq \int _{\Omega }\varphi \circ f\,\mathrm {d} \mu

In real analysis, we may require an estimate on

\varphi \left(\int _{a}^{b}f(x)\,dx\right)

where $a,b\in \mathbb {R}$ , and $f\colon [a,b]\to \mathbb {R}$ is a non-negative Lebesgue-integrable function. In this case, the Lebesgue measure of $[a,b]$ need not be unity. However, by integration by substitution, the interval can be rescaled so that it has measure unity. Then Jensen's inequality can be applied to get^[6]

\varphi \left({\frac {1}{b-a}}\int _{a}^{b}f(x)\,dx\right)\leq {\frac {1}{b-a}}\int _{a}^{b}\varphi (f(x))\,dx.

Probabilistic form

The same result can be equivalently stated in a probability theory setting, by a simple change of notation. Let $(\Omega ,{\mathfrak {F}},\operatorname {P} )$ be a probability space, X an integrable real-valued random variable and $φ$ a convex function. Then:

\varphi \left(\operatorname {E} [X]\right)\leq \operatorname {E} \left[\varphi (X)\right].

^[7]

In this probability setting, the measure $μ$ is intended as a probability $\operatorname {P}$ , the integral with respect to $μ$ as an expected value $\operatorname {E}$ , and the function $f$ as a random variable X.

Note that the equality holds if and only if $φ$ is a linear function on some convex set $A$ such that $\mathrm {P} (X\in A)=1$ (which follows by inspecting the measure-theoretical proof below).

General inequality in a probabilistic setting

More generally, let T be a real topological vector space, and X a T-valued integrable random variable. In this general setting, integrable means that there exists an element $\operatorname {E} [X]$ in T, such that for any element z in the dual space of T: $\operatorname {E} |\langle z,X\rangle |<\infty$ , and $\langle z,\operatorname {E} [X]\rangle =\operatorname {E} [\langle z,X\rangle ]$ . Then, for any measurable convex function $φ$ and any sub-σ-algebra ${\mathfrak {G}}$ of ${\mathfrak {F}}$ :

\varphi \left(\operatorname {E} \left[X\mid {\mathfrak {G}}\right]\right)\leq \operatorname {E} \left[\varphi (X)\mid {\mathfrak {G}}\right].

Here $\operatorname {E} [\cdot \mid {\mathfrak {G}}]$ stands for the expectation conditioned to the σ-algebra ${\mathfrak {G}}$ . This general statement reduces to the previous ones when the topological vector space $T$ is the real axis, and ${\mathfrak {G}}$ is the trivial $σ$ -algebra ${\emptyset, Ω}$ (where $\emptyset$ is the empty set, and $Ω$ is the sample space).^[8]

A sharpened and generalized form

Let X be a one-dimensional random variable with mean $\mu$ and variance $\sigma ^{2}\geq 0$ . Let $\varphi (x)$ be a twice differentiable function, and define the function

h(x)\triangleq {\frac {\varphi \left(x\right)-\varphi \left(\mu \right)}{\left(x-\mu \right)^{2}}}-{\frac {\varphi '\left(\mu \right)}{x-\mu }}.

Then^[9]

\sigma ^{2}\inf {\frac {\varphi ''(x)}{2}}\leq \sigma ^{2}\inf h(x)\leq E\left[\varphi \left(X\right)\right]-\varphi \left(E[X]\right)\leq \sigma ^{2}\sup h(x)\leq \sigma ^{2}\sup {\frac {\varphi ''(x)}{2}}.

In particular, when $\varphi (x)$ is convex, then $\varphi ''(x)\geq 0$ , and the standard form of Jensen's inequality immediately follows for the case where $\varphi (x)$ is additionally assumed to be twice differentiable.

Proofs

Intuitive graphical proof

Jensen's inequality can be proved in several ways, and three different proofs corresponding to the different statements above will be offered. Before embarking on these mathematical derivations, however, it is worth analyzing an intuitive graphical argument based on the probabilistic case where $X$ is a real number (see figure). Assuming a hypothetical distribution of $X$ values, one can immediately identify the position of $\operatorname {E} [X]$ and its image $\varphi (\operatorname {E} [X])$ in the graph. Noticing that for convex mappings $Y = φ (x)$ of some $x$ values the corresponding distribution of $Y$ values is increasingly "stretched up" for increasing values of $X$ , it is easy to see that the distribution of $Y$ is broader in the interval corresponding to $X > X 0$ and narrower in $X < X 0$ for any $X 0$ ; in particular, this is also true for $X_{0}=\operatorname {E} [X]$ . Consequently, in this picture the expectation of $Y$ will always shift upwards with respect to the position of $\varphi (\operatorname {E} [X])$ . A similar reasoning holds if the distribution of $X$ covers a decreasing portion of the convex function, or both a decreasing and an increasing portion of it. This "proves" the inequality, i.e.

\varphi (\operatorname {E} [X])\leq \operatorname {E} [\varphi (X)]=\operatorname {E} [Y],

with equality when $φ (X)$ is not strictly convex, e.g. when it is a straight line, or when $X$ follows a degenerate distribution (i.e. is a constant).

The proofs below formalize this intuitive notion.

Proof 1 (finite form)

If $λ 1$ and $λ 2$ are two arbitrary nonnegative real numbers such that $λ 1 + λ 2 = 1$ then convexity of $φ$ implies

\forall x_{1},x_{2}:\qquad \varphi \left(\lambda _{1}x_{1}+\lambda _{2}x_{2}\right)\leq \lambda _{1}\,\varphi (x_{1})+\lambda _{2}\,\varphi (x_{2}).

This can be generalized: if $λ 1, ..., λ n$ are nonnegative real numbers such that $λ 1 + ... + λ n = 1$ , then

\varphi (\lambda _{1}x_{1}+\lambda _{2}x_{2}+\cdots +\lambda _{n}x_{n})\leq \lambda _{1}\,\varphi (x_{1})+\lambda _{2}\,\varphi (x_{2})+\cdots +\lambda _{n}\,\varphi (x_{n}),

for any $x 1, ..., x n$ .

The finite form of the Jensen's inequality can be proved by induction: by convexity hypotheses, the statement is true for n = 2. Suppose the statement is true for some n, so

\varphi \left(\sum _{i=1}^{n}\lambda _{i}x_{i}\right)\leq \sum _{i=1}^{n}\lambda _{i}\varphi \left(x_{i}\right)

for any $λ 1, ..., λ n$ such that $λ 1 + ... + λ n = 1$ .

One needs to prove it for $n + 1$ . At least one of the $λ i$ is strictly smaller than $1$ , say $λ n +1$ ; therefore by convexity inequality:

{\begin{aligned}\varphi \left(\sum _{i=1}^{n+1}\lambda _{i}x_{i}\right)&=\varphi \left((1-\lambda _{n+1})\sum _{i=1}^{n}{\frac {\lambda _{i}}{1-\lambda _{n+1}}}x_{i}+\lambda _{n+1}x_{n+1}\right)\\&\leq (1-\lambda _{n+1})\varphi \left(\sum _{i=1}^{n}{\frac {\lambda _{i}}{1-\lambda _{n+1}}}x_{i}\right)+\lambda _{n+1}\,\varphi (x_{n+1}).\end{aligned}}

Since $λ 1 + ... + λ n + λ n +1 = 1$ ,

\sum _{i=1}^{n}{\frac {\lambda _{i}}{1-\lambda _{n+1}}}=1

,

applying the inductive hypothesis gives

\varphi \left(\sum _{i=1}^{n}{\frac {\lambda _{i}}{1-\lambda _{n+1}}}x_{i}\right)\leq \sum _{i=1}^{n}{\frac {\lambda _{i}}{1-\lambda _{n+1}}}\varphi (x_{i})

therefore

{\begin{aligned}\varphi \left(\sum _{i=1}^{n+1}\lambda _{i}x_{i}\right)&\leq (1-\lambda _{n+1})\sum _{i=1}^{n}{\frac {\lambda _{i}}{1-\lambda _{n+1}}}\varphi (x_{i})+\lambda _{n+1}\,\varphi (x_{n+1})=\sum _{i=1}^{n+1}\lambda _{i}\varphi (x_{i})\end{aligned}}

We deduce the equality is true for $n + 1$ , by induction it follows that the result is also true for all integer $n$ greater than 2.

In order to obtain the general inequality from this finite form, one needs to use a density argument. The finite form can be rewritten as:

\varphi \left(\int x\,d\mu _{n}(x)\right)\leq \int \varphi (x)\,d\mu _{n}(x),

where μ_n is a measure given by an arbitrary convex combination of Dirac deltas:

\mu _{n}=\sum _{i=1}^{n}\lambda _{i}\delta _{x_{i}}.

Since convex functions are continuous, and since convex combinations of Dirac deltas are weakly dense in the set of probability measures (as could be easily verified), the general statement is obtained simply by a limiting procedure.

Proof 2 (measure-theoretic form)

Let $g$ be a real-valued $\mu$ -integrable function on a probability space $\Omega$ , and let $\varphi$ be a convex function on the real numbers. Since $\varphi$ is convex, at each real number $x$ we have a nonempty set of subderivatives, which may be thought of as lines touching the graph of $\varphi$ at $x$ , but which are below the graph of $\varphi$ at all points (support lines of the graph).

Now, if we define

x_{0}:=\int _{\Omega }g\,d\mu ,

because of the existence of subderivatives for convex functions, we may choose $a$ and $b$ such that

ax+b\leq \varphi (x),

for all real $x$ and

ax_{0}+b=\varphi (x_{0}).

But then we have that

\varphi \circ g(\omega )\geq ag(\omega )+b

for almost all $\omega \in \Omega$ . Since we have a probability measure, the integral is monotone with $\mu (\Omega )=1$ so that

\int _{\Omega }\varphi \circ g\,d\mu \geq \int _{\Omega }(ag+b)\,d\mu =a\int _{\Omega }g\,d\mu +b\int _{\Omega }d\mu =ax_{0}+b=\varphi (x_{0})=\varphi \left(\int _{\Omega }g\,d\mu \right),

as desired.

Proof 3 (general inequality in a probabilistic setting)

Let X be an integrable random variable that takes values in a real topological vector space T. Since $\varphi :T\to \mathbb {R}$ is convex, for any $x,y\in T$ , the quantity

{\frac {\varphi (x+\theta \,y)-\varphi (x)}{\theta }},

is decreasing as $θ$ approaches 0⁺. In particular, the subdifferential of $\varphi$ evaluated at $x$ in the direction $y$ is well-defined by

(D\varphi )(x)\cdot y:=\lim _{\theta \downarrow 0}{\frac {\varphi (x+\theta \,y)-\varphi (x)}{\theta }}=\inf _{\theta \neq 0}{\frac {\varphi (x+\theta \,y)-\varphi (x)}{\theta }}.

It is easily seen that the subdifferential is linear in $y$ ^{[ citation needed ]} (that is false and the assertion requires Hahn-Banach theorem to be proved) and, since the infimum taken in the right-hand side of the previous formula is smaller than the value of the same term for $θ = 1$ , one gets

\varphi (x)\leq \varphi (x+y)-(D\varphi )(x)\cdot y.

In particular, for an arbitrary sub- $σ$ -algebra ${\mathfrak {G}}$ we can evaluate the last inequality when $x=\operatorname {E} [X\mid {\mathfrak {G}}],\,y=X-\operatorname {E} [X\mid {\mathfrak {G}}]$ to obtain

\varphi (\operatorname {E} [X\mid {\mathfrak {G}}])\leq \varphi (X)-(D\varphi )(\operatorname {E} [X\mid {\mathfrak {G}}])\cdot (X-\operatorname {E} [X\mid {\mathfrak {G}}]).

Now, if we take the expectation conditioned to ${\mathfrak {G}}$ on both sides of the previous expression, we get the result since:

\operatorname {E} \left[\left[(D\varphi )(\operatorname {E} [X\mid {\mathfrak {G}}])\cdot (X-\operatorname {E} [X\mid {\mathfrak {G}}])\right]\mid {\mathfrak {G}}\right]=(D\varphi )(\operatorname {E} [X\mid {\mathfrak {G}}])\cdot \operatorname {E} [\left(X-\operatorname {E} [X\mid {\mathfrak {G}}]\right)\mid {\mathfrak {G}}]=0,

by the linearity of the subdifferential in the y variable, and the following well-known property of the conditional expectation:

\operatorname {E} \left[\left(\operatorname {E} [X\mid {\mathfrak {G}}]\right)\mid {\mathfrak {G}}\right]=\operatorname {E} [X\mid {\mathfrak {G}}].

Applications and special cases

Form involving a probability density function

Suppose $Ω$ is a measurable subset of the real line and f(x) is a non-negative function such that

\int _{-\infty }^{\infty }f(x)\,dx=1.

In probabilistic language, f is a probability density function.

Then Jensen's inequality becomes the following statement about convex integrals:

If g is any real-valued measurable function and ${\textstyle \varphi }$ is convex over the range of g, then

\varphi \left(\int _{-\infty }^{\infty }g(x)f(x)\,dx\right)\leq \int _{-\infty }^{\infty }\varphi (g(x))f(x)\,dx.

If g(x) = x, then this form of the inequality reduces to a commonly used special case:

\varphi \left(\int _{-\infty }^{\infty }x\,f(x)\,dx\right)\leq \int _{-\infty }^{\infty }\varphi (x)\,f(x)\,dx.

This is applied in Variational Bayesian methods.

Example: even moments of a random variable

If g(x) = x²ⁿ, and X is a random variable, then g is convex as

{\frac {d^{2}g}{dx^{2}}}(x)=2n(2n-1)x^{2n-2}\geq 0\quad \forall \ x\in \mathbb {R}

and so

g(\operatorname {E} [X])=(\operatorname {E} [X])^{2n}\leq \operatorname {E} [X^{2n}].

In particular, if some even moment 2n of X is finite, X has a finite mean. An extension of this argument shows X has finite moments of every order $l\in \mathbb {N}$ dividing n.

Alternative finite form

Let $Ω = {x 1, ... x n},$ and take $μ$ to be the counting measure on $Ω$ , then the general form reduces to a statement about sums:

\varphi \left(\sum _{i=1}^{n}g(x_{i})\lambda _{i}\right)\leq \sum _{i=1}^{n}\varphi (g(x_{i}))\lambda _{i},

provided that $λ i \geq 0$ and

\lambda _{1}+\cdots +\lambda _{n}=1.

There is also an infinite discrete form.

Statistical physics

Jensen's inequality is of particular importance in statistical physics when the convex function is an exponential, giving:

e^{\operatorname {E} [X]}\leq \operatorname {E} \left[e^{X}\right],

where the expected values are with respect to some probability distribution in the random variable $X$ .

Proof: Let $\varphi (x)=e^{x}$ in $\varphi \left(\operatorname {E} [X]\right)\leq \operatorname {E} \left[\varphi (X)\right].$

Information theory

If $p (x)$ is the true probability density for $X$ , and $q (x)$ is another density, then applying Jensen's inequality for the random variable $Y (X) = q (X)/ p (X)$ and the convex function $φ (y) = -log(y)$ gives

\operatorname {E} [\varphi (Y)]\geq \varphi (\operatorname {E} [Y])

Therefore:

-D(p(x)\|q(x))=\int p(x)\log \left({\frac {q(x)}{p(x)}}\right)\,dx\leq \log \left(\int p(x){\frac {q(x)}{p(x)}}\,dx\right)=\log \left(\int q(x)\,dx\right)=0

a result called Gibbs' inequality.

It shows that the average message length is minimised when codes are assigned on the basis of the true probabilities p rather than any other distribution q. The quantity that is non-negative is called the Kullback–Leibler divergence of q from p, where $D(p(x)\|q(x))=\int p(x)\log \left({\frac {p(x)}{q(x)}}\right)dx$ .

Since $-log(x)$ is a strictly convex function for $x > 0$ , it follows that equality holds when $p (x)$ equals $q (x)$ almost everywhere.

Rao–Blackwell theorem

If L is a convex function and ${\mathfrak {G}}$ a sub-sigma-algebra, then, from the conditional version of Jensen's inequality, we get

L(\operatorname {E} [\delta (X)\mid {\mathfrak {G}}])\leq \operatorname {E} [L(\delta (X))\mid {\mathfrak {G}}]\quad \Longrightarrow \quad \operatorname {E} [L(\operatorname {E} [\delta (X)\mid {\mathfrak {G}}])]\leq \operatorname {E} [L(\delta (X))].

So if δ(X) is some estimator of an unobserved parameter θ given a vector of observables X; and if T(X) is a sufficient statistic for θ; then an improved estimator, in the sense of having a smaller expected loss L, can be obtained by calculating

\delta _{1}(X)=\operatorname {E} _{\theta }[\delta (X')\mid T(X')=T(X)],

the expected value of δ with respect to θ, taken over all possible vectors of observations X compatible with the same value of T(X) as that observed. Further, because T is a sufficient statistics, $\delta _{1}(X)$ does not depend on θ, hence, becomes a statistics.

This result is known as the Rao–Blackwell theorem.

Notes

↑ Jensen, J. L. W. V. (1906). "Sur les fonctions convexes et les inégalités entre les valeurs moyennes". Acta Mathematica . 30 (1): 175–193. doi: 10.1007/BF02418571 .
↑ Guessab, A.; Schmeisser, G. (2013). "Necessary and sufficient conditions for the validity of Jensen's inequality". Archiv der Mathematik. 100 (6): 561–570. doi:10.1007/s00013-013-0522-3. MR 3069109. S2CID 56372266.
↑ Dekking, F.M.; Kraaikamp, C.; Lopuhaa, H.P.; Meester, L.E. (2005). A Modern Introduction to Probability and Statistics: Understanding Why and How. Springer Texts in Statistics. London: Springer. doi:10.1007/1-84628-168-7. ISBN 978-1-85233-896-1.
↑ Gao, Xiang; Sitharam, Meera; Roitberg, Adrian (2019). "Bounds on the Jensen Gap, and Implications for Mean-Concentrated Distributions" (PDF). The Australian Journal of Mathematical Analysis and Applications. 16 (2). arXiv: 1712.05267 .
↑ p. 25 of Rick Durrett (2019). Probability: Theory and Examples (5th ed.). Cambridge University Press. ISBN 978-1108473682.
↑ Niculescu, Constantin P. "Integral inequalities", P. 12.
↑ p. 29 of Rick Durrett (2019). Probability: Theory and Examples (5th ed.). Cambridge University Press. ISBN 978-1108473682.
↑ Attention: In this generality additional assumptions on the convex function and/ or the topological vector space are needed, see Example (1.3) on p. 53 in Perlman, Michael D. (1974). "Jensen's Inequality for a Convex Vector-Valued Function on an Infinite-Dimensional Space". Journal of Multivariate Analysis. 4 (1): 52–65. doi: 10.1016/0047-259X(74)90005-0 . hdl: 11299/199167 .
↑ Liao, J.; Berg, A (2018). "Sharpening Jensen's Inequality". American Statistician . 73 (3): 278–281. arXiv: 1707.08644 . doi:10.1080/00031305.2017.1419145. S2CID 88515366.
↑ Bradley, CJ (2006). Introduction to Inequalities. Leeds, United Kingdom: United Kingdom Mathematics Trust. p. 97. ISBN 978-1-906001-11-7.

Related Research Articles

In number theory, an arithmetic, arithmetical, or number-theoretic function is generally any function f(n) whose domain is the positive integers and whose range is a subset of the complex numbers. Hardy & Wright include in their definition the requirement that an arithmetical function "expresses some arithmetical property of n". There is a larger class of number-theoretic functions that do not fit this definition, for example, the prime-counting functions. This article provides links to functions of both classes.

In mathematics, the $L p$ spaces are function spaces defined using a natural generalization of the $p$ -norm for finite-dimensional vector spaces. They are sometimes called Lebesgue spaces, named after Henri Lebesgue, although according to the Bourbaki group they were first introduced by Frigyes Riesz.

In mathematical analysis, the Minkowski inequality establishes that the L^p spaces are normed vector spaces. Let $be a measure space, let and let and be elements of Then is in and we have the triangle inequality$

In probability theory, the Borel–Kolmogorov paradox is a paradox relating to conditional probability with respect to an event of probability zero. It is named after Émile Borel and Andrey Kolmogorov.

In the theory of stochastic processes, the Karhunen–Loève theorem, also known as the Kosambi–Karhunen–Loève theorem states that a stochastic process can be represented as an infinite linear combination of orthogonal functions, analogous to a Fourier series representation of a function on a bounded interval. The transformation is also known as Hotelling transform and eigenvector transform, and is closely related to principal component analysis (PCA) technique widely used in image processing and in data analysis in many fields.

In Bayesian probability, the Jeffreys prior, named after Sir Harold Jeffreys, is a non-informative prior distribution for a parameter space; its density function is proportional to the square root of the determinant of the Fisher information matrix:

In probability theory and statistics, the characteristic function of any real-valued random variable completely defines its probability distribution. If a random variable admits a probability density function, then the characteristic function is the Fourier transform of the probability density function. Thus it provides an alternative route to analytical results compared with working directly with probability density functions or cumulative distribution functions. There are particularly simple results for the characteristic functions of distributions defined by the weighted sums of random variables.

In mathematics, subharmonic and superharmonic functions are important classes of functions used extensively in partial differential equations, complex analysis and potential theory.

A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.

In mathematics, the spectral theory of ordinary differential equations is the part of spectral theory concerned with the determination of the spectrum and eigenfunction expansion associated with a linear ordinary differential equation. In his dissertation, Hermann Weyl generalized the classical Sturm–Liouville theory on a finite closed interval to second order differential operators with singularities at the endpoints of the interval, possibly semi-infinite or infinite. Unlike the classical case, the spectrum may no longer consist of just a countable set of eigenvalues, but may also contain a continuous part. In this case the eigenfunction expansion involves an integral over the continuous part with respect to a spectral measure, given by the Titchmarsh–Kodaira formula. The theory was put in its final simplified form for singular differential equations of even degree by Kodaira and others, using von Neumann's spectral theorem. It has had important applications in quantum mechanics, operator theory and harmonic analysis on semisimple Lie groups.

In mathematics, the Plancherel theorem for spherical functions is an important result in the representation theory of semisimple Lie groups, due in its final form to Harish-Chandra. It is a natural generalisation in non-commutative harmonic analysis of the Plancherel formula and Fourier inversion formula in the representation theory of the group of real numbers in classical harmonic analysis and has a similarly close interconnection with the theory of differential equations. It is the special case for zonal spherical functions of the general Plancherel theorem for semisimple Lie groups, also proved by Harish-Chandra. The Plancherel theorem gives the eigenfunction expansion of radial functions for the Laplacian operator on the associated symmetric space X; it also gives the direct integral decomposition into irreducible representations of the regular representation on $L 2 (X)$ . In the case of hyperbolic space, these expansions were known from prior results of Mehler, Weyl and Fock.

In mathematics, the Fortuin–Kasteleyn–Ginibre (FKG) inequality is a correlation inequality, a fundamental tool in statistical mechanics and probabilistic combinatorics, due to Cees M. Fortuin, Pieter W. Kasteleyn, and Jean Ginibre. Informally, it says that in many random systems, increasing events are positively correlated, while an increasing and a decreasing event are negatively correlated. It was obtained by studying the random cluster model.

In mathematics, the Pettis integral or Gelfand–Pettis integral, named after Israel M. Gelfand and Billy James Pettis, extends the definition of the Lebesgue integral to vector-valued functions on a measure space, by exploiting duality. The integral was introduced by Gelfand for the case when the measure space is an interval with Lebesgue measure. The integral is also called the weak integral in contrast to the Bochner integral, which is the strong integral.

In mathematics, singular integral operators of convolution type are the singular integral operators that arise on Rⁿ and Tⁿ through convolution by distributions; equivalently they are the singular integral operators that commute with translations. The classical examples in harmonic analysis are the harmonic conjugation operator on the circle, the Hilbert transform on the circle and the real line, the Beurling transform in the complex plane and the Riesz transforms in Euclidean space. The continuity of these operators on L² is evident because the Fourier transform converts them into multiplication operators. Continuity on L^p spaces was first established by Marcel Riesz. The classical techniques include the use of Poisson integrals, interpolation theory and the Hardy–Littlewood maximal function. For more general operators, fundamental new techniques, introduced by Alberto Calderón and Antoni Zygmund in 1952, were developed by a number of authors to give general criteria for continuity on L^p spaces. This article explains the theory for the classical operators and sketches the subsequent general theory.

Proximal gradientmethods for learning is an area of research in optimization and statistical learning theory which studies algorithms for a general class of convex regularization problems where the regularization penalty may not be differentiable. One such example is $regularization of the form$

In machine learning, the kernel embedding of distributions comprises a class of nonparametric methods in which a probability distribution is represented as an element of a reproducing kernel Hilbert space (RKHS). A generalization of the individual data-point feature mapping done in classical kernel methods, the embedding of distributions into infinite-dimensional feature spaces can preserve all of the statistical features of arbitrary distributions, while allowing one to compare and manipulate distributions using Hilbert space operations such as inner products, distances, projections, linear transformations, and spectral analysis. This learning framework is very general and can be applied to distributions over any space $on which a sensible kernel function may be defined. For example, various kernels have been proposed for learning from data which are: vectors in, discrete classes/categories, strings, graphs/networks, images, time series, manifolds, dynamical systems, and other structured objects. The theory behind kernel embeddings of distributions has been primarily developed by Alex Smola, Le Song, Arthur Gretton, and Bernhard Schölkopf. A review of recent works on kernel embedding of distributions can be found in.$

<span class="mw-page-title-main">Lie algebra extension</span> Creating a "larger" Lie algebra from a smaller one, in one of several ways

In the theory of Lie groups, Lie algebras and their representation theory, a Lie algebra extension $e$ is an enlargement of a given Lie algebra $g$ by another Lie algebra $h$ . Extensions arise in several ways. There is the trivial extension obtained by taking a direct sum of two Lie algebras. Other types are the split extension and the central extension. Extensions may arise naturally, for instance, when forming a Lie algebra from projective group representations. Such a Lie algebra will contain central charges.

In number theory, the prime omega functions $and count the number of prime factors of a natural number Thereby counts each distinct prime factor, whereas the related function counts the total number of prime factors of honoring their multiplicity. That is, if we have a prime factorization of of the form for distinct primes, then the respective prime omega functions are given by and . These prime factor counting functions have many important number theoretic relations.$

In mathematics, calculus on Euclidean space is a generalization of calculus of functions in one or several variables to calculus of functions on Euclidean space $as well as a finite-dimensional real vector space. This calculus is also known as advanced calculus, especially in the United States. It is similar to multivariable calculus but is somewhat more sophisticated in that it uses linear algebra more extensively and covers some concepts from differential geometry such as differential forms and Stokes' formula in terms of differential forms. This extensive use of linear algebra also allows a natural generalization of multivariable calculus to calculus on Banach spaces or topological vector spaces.$

In functional analysis, double operator integrals (DOI) are integrals of the form

References

David Chandler (1987). Introduction to Modern Statistical Mechanics . Oxford. ISBN 0-19-504277-8.
Tristan Needham (1993) "A Visual Explanation of Jensen's Inequality", American Mathematical Monthly 100(8):768–71.
Nicola Fusco; Paolo Marcellini; Carlo Sbordone (1996). Analisi Matematica Due. Liguori. ISBN 978-88-207-2675-1.
Walter Rudin (1987). Real and Complex Analysis. McGraw-Hill. ISBN 0-07-054234-1.
Rick Durrett (2019). Probability: Theory and Examples (5th ed.). Cambridge University Press. p. 430. ISBN 978-1108473682 . Retrieved 21 Dec 2020.
Sam Savage (2012) The Flaw of Averages: Why We Underestimate Risk in the Face of Uncertainty (1st ed.) Wiley. ISBN 978-0471381976

External links

Jensen's Operator Inequality of Hansen and Pedersen.
"Jensen inequality", Encyclopedia of Mathematics , EMS Press, 2001 [1994]
Weisstein, Eric W. "Jensen's inequality". MathWorld .
Arthur Lohwater (1982). "Introduction to Inequalities". Online e-book in PDF format.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Jensen, J. L. W. V. (1906). "Sur les fonctions convexes et les inégalités entre les valeurs moyennes". Acta Mathematica . 30 (1): 175–193. doi: 10.1007/BF02418571 .

[2] Guessab, A.; Schmeisser, G. (2013). "Necessary and sufficient conditions for the validity of Jensen's inequality". Archiv der Mathematik. 100 (6): 561–570. doi:10.1007/s00013-013-0522-3. MR 3069109. S2CID 56372266.

[3] Dekking, F.M.; Kraaikamp, C.; Lopuhaa, H.P.; Meester, L.E. (2005). A Modern Introduction to Probability and Statistics: Understanding Why and How. Springer Texts in Statistics. London: Springer. doi:10.1007/1-84628-168-7. ISBN 978-1-85233-896-1.

[Gao_et_al.-4] Gao, Xiang; Sitharam, Meera; Roitberg, Adrian (2019). "Bounds on the Jensen Gap, and Implications for Mean-Concentrated Distributions" (PDF). The Australian Journal of Mathematical Analysis and Applications. 16 (2). arXiv: 1712.05267 .

[5] . 25 of Rick Durrett (2019). Probability: Theory and Examples (5th ed.). Cambridge University Press. ISBN 978-1108473682.

[6] Niculescu, Constantin P. "Integral inequalities", P. 12.

[7] . 29 of Rick Durrett (2019). Probability: Theory and Examples (5th ed.). Cambridge University Press. ISBN 978-1108473682.

[8] Attention: In this generality additional assumptions on the convex function and/ or the topological vector space are needed, see Example (1.3) on p. 53 in Perlman, Michael D. (1974). "Jensen's Inequality for a Convex Vector-Valued Function on an Infinite-Dimensional Space". Journal of Multivariate Analysis. 4 (1): 52–65. doi: 10.1016/0047-259X(74)90005-0 . hdl: 11299/199167 .

[Liao_&_Berg-9] Liao, J.; Berg, A (2018). "Sharpening Jensen's Inequality". American Statistician . 73 (3): 278–281. arXiv: 1707.08644 . doi:10.1080/00031305.2017.1419145. S2CID 88515366.

[10] Bradley, CJ (2006). Introduction to Inequalities. Leeds, United Kingdom: United Kingdom Mathematics Trust. p. 97. ISBN 978-1-906001-11-7.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

v t e Convex analysis and variational analysis
Basic concepts	Convex combination Convex function Convex set
Topics (list)	Choquet theory Convex geometry Convex metric space Convex optimization Duality Lagrange multiplier Legendre transformation Locally convex topological vector space Simplex
Maps	Convex conjugate Concave (Closed K- Logarithmically Proper Pseudo- Quasi-) Convex function Invex function Legendre transformation Semi-continuity Subderivative
Main results (list)	Carathéodory's theorem Ekeland's variational principle Fenchel–Moreau theorem Fenchel-Young inequality Jensen's inequality Hermite–Hadamard inequality Krein–Milman theorem Mazur's lemma Shapley–Folkman lemma Robinson–Ursescu Simons Ursescu
Sets	Convex hull (Orthogonally, Pseudo-) Convex set Effective domain Epigraph Hypograph John ellipsoid Lens Radial set/Algebraic interior Zonotope
Series	Convex series related ((cs, lcs)-closed, (cs, bcs)-complete, (lower) ideally convex, (Hx), and (Hwx))
Duality	Dual system Duality gap Strong duality Weak duality
Applications and related	Convexity in economics