Normalizing constant

Last updated June 20, 2024

In probability theory, a normalizing constant or normalizing factor is used to reduce any probability function to a probability density function with total probability of one.

For example, a Gaussian function can be normalized into a probability density function, which gives the standard normal distribution. In Bayes' theorem, a normalizing constant is used to ensure that the sum of all possible hypotheses equals 1. Other uses of normalizing constants include making the value of a Legendre polynomial at 1 and in the orthogonality of orthonormal functions.

A similar concept has been used in areas other than probability, such as for polynomials.

Definition

In probability theory, a normalizing constant is a constant by which an everywhere non-negative function must be multiplied so the area under its graph is 1, e.g., to make it a probability density function or a probability mass function.^[1]^[2]

Examples

If we start from the simple Gaussian function

p(x)=e^{-x^{2}/2},\quad x\in (-\infty ,\infty )

we have the corresponding Gaussian integral

\int _{-\infty }^{\infty }p(x)\,dx=\int _{-\infty }^{\infty }e^{-x^{2}/2}\,dx={\sqrt {2\pi \,}},

Now if we use the latter's reciprocal value as a normalizing constant for the former, defining a function $\varphi (x)$ as

\varphi (x)={\frac {1}{\sqrt {2\pi \,}}}p(x)={\frac {1}{\sqrt {2\pi \,}}}e^{-x^{2}/2}

so that its integral is unit

\int _{-\infty }^{\infty }\varphi (x)\,dx=\int _{-\infty }^{\infty }{\frac {1}{\sqrt {2\pi \,}}}e^{-x^{2}/2}\,dx=1

then the function $\varphi (x)$ is a probability density function.^[3] This is the density of the standard normal distribution. (Standard, in this case, means the expected value is 0 and the variance is 1.)

And constant ${\textstyle {\frac {1}{\sqrt {2\pi }}}}$ is the normalizing constant of function $p(x)$ .

Similarly,

\sum _{n=0}^{\infty }{\frac {\lambda ^{n}}{n!}}=e^{\lambda },

and consequently

f(n)={\frac {\lambda ^{n}e^{-\lambda }}{n!}}

is a probability mass function on the set of all nonnegative integers.^[4] This is the probability mass function of the Poisson distribution with expected value λ.

Note that if the probability density function is a function of various parameters, so too will be its normalizing constant. The parametrised normalizing constant for the Boltzmann distribution plays a central role in statistical mechanics. In that context, the normalizing constant is called the partition function.

Bayes' theorem

Bayes' theorem says that the posterior probability measure is proportional to the product of the prior probability measure and the likelihood function. Proportional to implies that one must multiply or divide by a normalizing constant to assign measure 1 to the whole space, i.e., to get a probability measure. In a simple discrete case we have

P(H_{0}|D)={\frac {P(D|H_{0})P(H_{0})}{P(D)}}

where P(H₀) is the prior probability that the hypothesis is true; P(D|H₀) is the conditional probability of the data given that the hypothesis is true, but given that the data are known it is the likelihood of the hypothesis (or its parameters) given the data; P(H₀|D) is the posterior probability that the hypothesis is true given the data. P(D) should be the probability of producing the data, but on its own is difficult to calculate, so an alternative way to describe this relationship is as one of proportionality:

P(H_{0}|D)\propto P(D|H_{0})P(H_{0}).

Since P(H|D) is a probability, the sum over all possible (mutually exclusive) hypotheses should be 1, leading to the conclusion that

P(H_{0}|D)={\frac {P(D|H_{0})P(H_{0})}{\displaystyle \sum _{i}P(D|H_{i})P(H_{i})}}.

In this case, the reciprocal of the value

P(D)=\sum _{i}P(D|H_{i})P(H_{i})\;

is the normalizing constant.^[5] It can be extended from countably many hypotheses to uncountably many by replacing the sum by an integral.

For concreteness, there are many methods of estimating the normalizing constant for practical purposes. Methods include the bridge sampling technique, the naive Monte Carlo estimator, the generalized harmonic mean estimator, and importance sampling.^[6]

Non-probabilistic uses

The Legendre polynomials are characterized by orthogonality with respect to the uniform measure on the interval [−1, 1] and the fact that they are normalized so that their value at 1 is 1. The constant by which one multiplies a polynomial so its value at 1 is a normalizing constant.

Orthonormal functions are normalized such that

\langle f_{i},\,f_{j}\rangle =\,\delta _{i,j}

with respect to some inner product $⟨ f, g ⟩$ .

The constant $1/ \sqrt 2$ is used to establish the hyperbolic functions cosh and sinh from the lengths of the adjacent and opposite sides of a hyperbolic triangle.

Related Research Articles

In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is

In mathematical analysis, the Dirac delta function, also known as the unit impulse, is a generalized function on the real numbers, whose value is zero everywhere except at zero, and whose integral over the entire real line is equal to one. Since there is no function having this property, modelling the delta "function" rigorously involves the use of limits or, as is common in mathematics, measure theory and the theory of distributions.

In mathematics, the Hermite polynomials are a classical orthogonal polynomial sequence.

In mathematics and physical science, spherical harmonics are special functions defined on the surface of a sphere. They are often employed in solving partial differential equations in many scientific fields. The table of spherical harmonics contains a list of common spherical harmonics.

In mathematics, theta functions are special functions of several complex variables. They show up in many topics, including Abelian varieties, moduli spaces, quadratic forms, and solitons. As Grassmann algebras, they appear in quantum field theory.

The Gram–Charlier A series, and the Edgeworth series are series that approximate a probability distribution in terms of its cumulants. The series are the same; but, the arrangement of terms differ. The key idea of these expansions is to write the characteristic function of the distribution whose probability density function $f$ is to be approximated in terms of the characteristic function of a distribution with known and suitable properties, and to recover $f$ through the inverse Fourier transform.

In the theory of stochastic processes, the Karhunen–Loève theorem, also known as the Kosambi–Karhunen–Loève theorem states that a stochastic process can be represented as an infinite linear combination of orthogonal functions, analogous to a Fourier series representation of a function on a bounded interval. The transformation is also known as Hotelling transform and eigenvector transform, and is closely related to principal component analysis (PCA) technique widely used in image processing and in data analysis in many fields.

In mathematics, the Gauss–Kuzmin–Wirsing operator is the transfer operator of the Gauss map that takes a positive number to the fractional part of its reciprocal. It is named after Carl Gauss, Rodion Kuzmin, and Eduard Wirsing. It occurs in the study of continued fractions; it is also related to the Riemann zeta function.

In quantum field theory, a quartic interaction is a type of self-interaction in a scalar field. Other types of quartic interactions may be found under the topic of four-fermion interactions. A classical free scalar field $satisfies the Klein-Gordon equation. If a scalar field is denoted, a quartic interaction is represented by adding a potential energy term to the Lagrangian density. The coupling constant is dimensionless in 4-dimensional spacetime.$

In Bayesian probability, the Jeffreys prior, named after Sir Harold Jeffreys, is a non-informative prior distribution for a parameter space; its density function is proportional to the square root of the determinant of the Fisher information matrix:

In mathematics, the explicit formulae for L-functions are relations between sums over the complex number zeroes of an L-function and sums over prime powers, introduced by Riemann (1859) for the Riemann zeta function. Such explicit formulae have been applied also to questions on bounding the discriminant of an algebraic number field, and the conductor of a number field.

In probability theory and statistics, the characteristic function of any real-valued random variable completely defines its probability distribution. If a random variable admits a probability density function, then the characteristic function is the Fourier transform of the probability density function. Thus it provides an alternative route to analytical results compared with working directly with probability density functions or cumulative distribution functions. There are particularly simple results for the characteristic functions of distributions defined by the weighted sums of random variables.

In mathematics, the secondary measure associated with a measure of positive density ρ when there is one, is a measure of positive density μ, turning the secondary polynomials associated with the orthogonal polynomials for ρ into an orthogonal system.

In physics and mathematics, the solid harmonics are solutions of the Laplace equation in spherical polar coordinates, assumed to be (smooth) functions $. There are two kinds: the regular solid harmonics, which are well-defined at the origin and the irregular solid harmonics, which are singular at the origin. Both sets of functions play an important role in potential theory, and are obtained by rescaling spherical harmonics appropriately:$

Common integrals in quantum field theory are all variations and generalizations of Gaussian integrals to the complex plane and to multiple dimensions. Other integrals can be approximated by versions of the Gaussian integral. Fourier integrals are also considered.

The Mehler kernel is a complex-valued function found to be the propagator of the quantum harmonic oscillator.

<span class="mw-page-title-main">Wrapped exponential distribution</span> Probability distribution

In probability theory and directional statistics, a wrapped exponential distribution is a wrapped probability distribution that results from the "wrapping" of the exponential distribution around the unit circle.

In mathematics, Ramanujan's Master Theorem, named after Srinivasa Ramanujan, is a technique that provides an analytic expression for the Mellin transform of an analytic function.

In representation theory of mathematics, the Waldspurger formula relates the special values of two L-functions of two related admissible irreducible representations. Let $k$ be the base field, $f$ be an automorphic form over $k$ , $π$ be the representation associated via the Jacquet–Langlands correspondence with f. Goro Shimura (1976) proved this formula, when $and f is a cusp form; Günter Harder made the same discovery at the same time in an unpublished paper. Marie-France Vignéras (1980) proved this formula, when and f is a newform. Jean-Loup Waldspurger, for whom the formula is named, reproved and generalized the result of Vignéras in 1985 via a totally different method which was widely used thereafter by mathematicians to prove similar formulas.$

References

↑ Continuous Distributions at Department of Mathematical Sciences: University of Alabama in Huntsville
↑ Feller 1968 , p. 22
↑ Feller 1968 , p. 174
↑ Feller 1968 , p. 156
↑ Feller 1968 , p. 124
↑ Gronau, Quentin (2020). "bridgesampling: An R Package for Estimating Normalizing Constants" (PDF). The Comprehensive R Archive Network. Retrieved September 11, 2021.

Feller, William (1968). An Introduction to Probability Theory and its Applications (volume I). John Wiley & Sons. ISBN 0-471-25708-7.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Continuous Distributions at Department of Mathematical Sciences: University of Alabama in Huntsville

[2] Feller 1968 , p. 22

[3] Feller 1968 , p. 174

[4] Feller 1968 , p. 156

[5] Feller 1968 , p. 124

[6] Gronau, Quentin (2020). "bridgesampling: An R Package for Estimating Normalizing Constants" (PDF). The Comprehensive R Archive Network. Retrieved September 11, 2021.

[1]

[2]

[3]

[4]

[5]

[6]