This article's lead section does not adequately summarize key points of its contents. Please consider expanding the lead to provide an accessible overview of all important aspects of the article. (March 2014) |

The concept of a **normalizing constant** arises in probability theory and a variety of other areas of mathematics. The normalizing constant is used to reduce any probability function to a probability density function with total probability of one.

In probability theory, a **normalizing constant** is a constant by which an everywhere non-negative function must be multiplied so the area under its graph is 1, e.g., to make it a probability density function or a probability mass function.^{ [1] }^{ [2] }

If we start from the simple Gaussian function

we have the corresponding Gaussian integral

Now if we use the latter's reciprocal value as a normalizing constant for the former, defining a function as

so that its integral is unit

then the function is a probability density function.^{ [3] } This is the density of the standard normal distribution. (*Standard*, in this case, means the expected value is 0 and the variance is 1.)

And constant is the **normalizing constant** of function .

Similarly,

and consequently

is a probability mass function on the set of all nonnegative integers.^{ [4] } This is the probability mass function of the Poisson distribution with expected value λ.

Note that if the probability density function is a function of various parameters, so too will be its normalizing constant. The parametrised normalizing constant for the Boltzmann distribution plays a central role in statistical mechanics. In that context, the normalizing constant is called the partition function.

Bayes' theorem says that the posterior probability measure is proportional to the product of the prior probability measure and the likelihood function. *Proportional to* implies that one must multiply or divide by a normalizing constant to assign measure 1 to the whole space, i.e., to get a probability measure. In a simple discrete case we have

where P(H_{0}) is the prior probability that the hypothesis is true; P(D|H_{0}) is the conditional probability of the data given that the hypothesis is true, but given that the data are known it is the likelihood of the hypothesis (or its parameters) given the data; P(H_{0}|D) is the posterior probability that the hypothesis is true given the data. P(D) should be the probability of producing the data, but on its own is difficult to calculate, so an alternative way to describe this relationship is as one of proportionality:

Since P(H|D) is a probability, the sum over all possible (mutually exclusive) hypotheses should be 1, leading to the conclusion that

In this case, the reciprocal of the value

is the *normalizing constant*.^{ [5] } It can be extended from countably many hypotheses to uncountably many by replacing the sum by an integral.

The Legendre polynomials are characterized by orthogonality with respect to the uniform measure on the interval [− 1, 1] and the fact that they are **normalized** so that their value at 1 is 1. The constant by which one multiplies a polynomial so its value at 1 is 1 is a normalizing constant.

Orthonormal functions are normalized such that

with respect to some inner product <*f*, *g*>.

The constant 1/√2 is used to establish the hyperbolic functions cosh and sinh from the lengths of the adjacent and opposite sides of a hyperbolic triangle.

In mathematics, the **Dirac delta function** is a generalized function or distribution introduced by physicist Paul Dirac. It is used to model the density of an idealized point mass or point charge as a function equal to zero everywhere except for zero and whose integral over the entire real line is equal to one. As there is no function that has these properties, the computations made by theoretical physicists appeared to mathematicians as nonsense until the introduction of distributions by Laurent Schwartz to formalize and validate the computations. As a distribution, the Dirac delta function is a linear functional that maps every function to its value at zero. The Kronecker delta function, which is usually defined on a discrete domain and takes values 0 and 1, is a discrete analog of the Dirac delta function.

In mathematics, the **Hermite polynomials** are a classical orthogonal polynomial sequence.

In mathematics and physical science, **spherical harmonics** are special functions defined on the surface of a sphere. They are often employed in solving partial differential equations in many scientific fields.

In mathematics, **Jensen's inequality**, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function. It was proved by Jensen in 1906. Given its generality, the inequality appears in many forms depending on the context, some of which are presented below. In its simplest form the inequality states that the convex transformation of a mean is less than or equal to the mean applied after convex transformation; it is a simple corollary that the opposite is true of concave transformations.

In mathematics, physics and engineering, the **sinc function**, denoted by sinc(*x*), has two slightly different definitions.

In mathematics, the **Gauss–Kuzmin–Wirsing operator** is the transfer operator of the Gauss map. It is named after Carl Gauss, Rodion Kuzmin, and Eduard Wirsing. It occurs in the study of continued fractions; it is also related to the Riemann zeta function.

In Bayesian probability, the **Jeffreys prior**, named after Sir Harold Jeffreys, is a non-informative (objective) prior distribution for a parameter space; its density function is proportional to the square root of the determinant of the Fisher information matrix:

In mathematics, the **explicit formulae for L-functions** are relations between sums over the complex number zeroes of an L-function and sums over prime powers, introduced by Riemann (1859) for the Riemann zeta function. Such explicit formulae have been applied also to questions on bounding the discriminant of an algebraic number field, and the conductor of a number field.

In functional analysis, a branch of mathematics, it is sometimes possible to generalize the notion of the determinant of a square matrix of finite order to the infinite-dimensional case of a linear operator *S* mapping a function space *V* to itself. The corresponding quantity det(*S*) is called the **functional determinant** of *S*.

In probability theory and statistics, the **characteristic function** of any real-valued random variable completely defines its probability distribution. If a random variable admits a probability density function, then the characteristic function is the Fourier transform of the probability density function. Thus it provides an alternative route to analytical results compared with working directly with probability density functions or cumulative distribution functions. There are particularly simple results for the characteristic functions of distributions defined by the weighted sums of random variables.

**Differential entropy** is a concept in information theory that began as an attempt by Shannon to extend the idea of (Shannon) entropy, a measure of average surprisal of a random variable, to continuous probability distributions. Unfortunately, Shannon did not derive this formula, and rather just assumed it was the correct continuous analogue of discrete entropy, but it is not. The actual continuous version of discrete entropy is the limiting density of discrete points (LDDP). Differential entropy is commonly encountered in the literature, but it is a limiting case of the LDDP, and one that loses its fundamental association with discrete entropy.

A **ratio distribution** is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables *X* and *Y*, the distribution of the random variable *Z* that is formed as the ratio *Z* = *X*/*Y* is a *ratio distribution*.

In mathematics, the **secondary measure** associated with a measure of positive density ρ when there is one, is a measure of positive density μ, turning the secondary polynomials associated with the orthogonal polynomials for ρ into an orthogonal system.

In physics and mathematics, the **solid harmonics** are solutions of the Laplace equation in spherical polar coordinates, assumed to be (smooth) functions . There are two kinds: the *regular solid harmonics*, which vanish at the origin and the *irregular solid harmonics*, which are singular at the origin. Both sets of functions play an important role in potential theory, and are obtained by rescaling spherical harmonics appropriately:

**Common integrals in quantum field theory** are all variations and generalizations of Gaussian integrals to the complex plane and to multiple dimensions. Other integrals can be approximated by versions of the Gaussian integral. Fourier integrals are also considered.

In mathematics, the **method of steepest descent** or **stationary-phase method** or **saddle-point method** is an extension of Laplace's method for approximating an integral, where one deforms a contour integral in the complex plane to pass near a stationary point, in roughly the direction of steepest descent or stationary phase. The saddle-point approximation is used with integrals in the complex plane, whereas Laplace’s method is used with real integrals.

In probability theory and directional statistics, a **wrapped exponential distribution** is a wrapped probability distribution that results from the "wrapping" of the exponential distribution around the unit circle.

In probability theory and directional statistics, a **wrapped asymmetric Laplace distribution** is a wrapped probability distribution that results from the "wrapping" of the asymmetric Laplace distribution around the unit circle. For the symmetric case, the distribution becomes a wrapped Laplace distribution. The distribution of the ratio of two circular variates (*Z*) from two different wrapped exponential distributions will have a wrapped asymmetric Laplace distribution. These distributions find application in stochastic modelling of financial data.

In representation theory of mathematics, the **Waldspurger formula** relates the special values of two *L*-functions of two related admissible irreducible representations. Let `k` be the base field, `f` be an automorphic form over `k`, π be the representation associated via the Jacquet–Langlands correspondence with `f`. Goro Shimura (1976) proved this formula, when and `f` is a cusp form; Günter Harder made the same discovery at the same time in an unpublished paper. Marie-France Vignéras (1980) proved this formula, when { and `f` is a newform. Jean-Loup Waldspurger, for whom the formula is named, reproved and generalized the result of Vignéras in 1985 via a totally different method which was widely used thereafter by mathematicians to prove similar formulas.

- Continuous Distributions at Department of Mathematical Sciences: University of Alabama in Huntsville
- Feller, William (1968).
*An Introduction to Probability Theory and its Applications (volume I)*. John Wiley & Sons. ISBN 0-471-25708-7.

This page is based on this Wikipedia article

Text is available under the CC BY-SA 4.0 license; additional terms may apply.

Images, videos and audio are available under their respective licenses.

Text is available under the CC BY-SA 4.0 license; additional terms may apply.

Images, videos and audio are available under their respective licenses.