Sum of normally distributed random variables

Last updated August 13, 2023

In probability theory, calculation of the sum of normally distributed random variables is an instance of the arithmetic of random variables.

Independent random variables

Let X and Y be independent random variables that are normally distributed (and therefore also jointly so), then their sum is also normally distributed. i.e., if

X\sim N(\mu _{X},\sigma _{X}^{2})

Y\sim N(\mu _{Y},\sigma _{Y}^{2})

Z=X+Y,

then

Z\sim N(\mu _{X}+\mu _{Y},\sigma _{X}^{2}+\sigma _{Y}^{2}).

This means that the sum of two independent normally distributed random variables is normal, with its mean being the sum of the two means, and its variance being the sum of the two variances (i.e., the square of the standard deviation is the sum of the squares of the standard deviations).^[1]

In order for this result to hold, the assumption that X and Y are independent cannot be dropped, although it can be weakened to the assumption that X and Y are jointly, rather than separately, normally distributed.^[2] (See here for an example.)

The result about the mean holds in all cases, while the result for the variance requires uncorrelatedness, but not independence.

Proofs

Proof using characteristic functions

The characteristic function

\varphi _{X+Y}(t)=\operatorname {E} \left(e^{it(X+Y)}\right)

of the sum of two independent random variables X and Y is just the product of the two separate characteristic functions:

\varphi _{X}(t)=\operatorname {E} \left(e^{itX}\right),\qquad \varphi _{Y}(t)=\operatorname {E} \left(e^{itY}\right)

of X and Y.

The characteristic function of the normal distribution with expected value μ and variance σ² is

\varphi (t)=\exp \left(it\mu -{\sigma ^{2}t^{2} \over 2}\right).

So

{\begin{aligned}\varphi _{X+Y}(t)=\varphi _{X}(t)\varphi _{Y}(t)&=\exp \left(it\mu _{X}-{\sigma _{X}^{2}t^{2} \over 2}\right)\exp \left(it\mu _{Y}-{\sigma _{Y}^{2}t^{2} \over 2}\right)\\[6pt]&=\exp \left(it(\mu _{X}+\mu _{Y})-{(\sigma _{X}^{2}+\sigma _{Y}^{2})t^{2} \over 2}\right).\end{aligned}}

This is the characteristic function of the normal distribution with expected value $\mu _{X}+\mu _{Y}$ and variance $\sigma _{X}^{2}+\sigma _{Y}^{2}$

Finally, recall that no two distinct distributions can both have the same characteristic function, so the distribution of X + Y must be just this normal distribution.

Proof using convolutions

For independent random variables X and Y, the distribution f_Z of Z = X + Y equals the convolution of f_X and f_Y:

f_{Z}(z)=\int _{-\infty }^{\infty }f_{Y}(z-x)f_{X}(x)\,dx

Given that f_X and f_Y are normal densities,

{\begin{aligned}f_{X}(x)={\mathcal {N}}(x;\mu _{X},\sigma _{X}^{2})={\frac {1}{{\sqrt {2\pi }}\sigma _{X}}}e^{-(x-\mu _{X})^{2}/(2\sigma _{X}^{2})}\\[5pt]f_{Y}(y)={\mathcal {N}}(y;\mu _{Y},\sigma _{Y}^{2})={\frac {1}{{\sqrt {2\pi }}\sigma _{Y}}}e^{-(y-\mu _{Y})^{2}/(2\sigma _{Y}^{2})}\end{aligned}}

Substituting into the convolution:

{\begin{aligned}f_{Z}(z)&=\int _{-\infty }^{\infty }{\frac {1}{{\sqrt {2\pi }}\sigma _{Y}}}\exp \left[-{(z-x-\mu _{Y})^{2} \over 2\sigma _{Y}^{2}}\right]{\frac {1}{{\sqrt {2\pi }}\sigma _{X}}}\exp \left[-{(x-\mu _{X})^{2} \over 2\sigma _{X}^{2}}\right]\,dx\\[6pt]&=\int _{-\infty }^{\infty }{\frac {1}{{\sqrt {2\pi }}{\sqrt {2\pi }}\sigma _{X}\sigma _{Y}}}\exp \left[-{\frac {\sigma _{X}^{2}(z-x-\mu _{Y})^{2}+\sigma _{Y}^{2}(x-\mu _{X})^{2}}{2\sigma _{X}^{2}\sigma _{Y}^{2}}}\right]\,dx\\[6pt]&=\int _{-\infty }^{\infty }{\frac {1}{{\sqrt {2\pi }}{\sqrt {2\pi }}\sigma _{X}\sigma _{Y}}}\exp \left[-{\frac {\sigma _{X}^{2}(z^{2}+x^{2}+\mu _{Y}^{2}-2xz-2z\mu _{Y}+2x\mu _{Y})+\sigma _{Y}^{2}(x^{2}+\mu _{X}^{2}-2x\mu _{X})}{2\sigma _{Y}^{2}\sigma _{X}^{2}}}\right]\,dx\\[6pt]&=\int _{-\infty }^{\infty }{\frac {1}{{\sqrt {2\pi }}{\sqrt {2\pi }}\sigma _{X}\sigma _{Y}}}\exp \left[-{\frac {x^{2}(\sigma _{X}^{2}+\sigma _{Y}^{2})-2x(\sigma _{X}^{2}(z-\mu _{Y})+\sigma _{Y}^{2}\mu _{X})+\sigma _{X}^{2}(z^{2}+\mu _{Y}^{2}-2z\mu _{Y})+\sigma _{Y}^{2}\mu _{X}^{2}}{2\sigma _{Y}^{2}\sigma _{X}^{2}}}\right]\,dx\\[6pt]\end{aligned}}

Defining $\sigma _{Z}={\sqrt {\sigma _{X}^{2}+\sigma _{Y}^{2}}}$ , and completing the square:

{\begin{aligned}f_{Z}(z)&=\int _{-\infty }^{\infty }{\frac {1}{{\sqrt {2\pi }}\sigma _{Z}}}{\frac {1}{{\sqrt {2\pi }}{\frac {\sigma _{X}\sigma _{Y}}{\sigma _{Z}}}}}\exp \left[-{\frac {x^{2}-2x{\frac {\sigma _{X}^{2}(z-\mu _{Y})+\sigma _{Y}^{2}\mu _{X}}{\sigma _{Z}^{2}}}+{\frac {\sigma _{X}^{2}(z^{2}+\mu _{Y}^{2}-2z\mu _{Y})+\sigma _{Y}^{2}\mu _{X}^{2}}{\sigma _{Z}^{2}}}}{2\left({\frac {\sigma _{X}\sigma _{Y}}{\sigma _{Z}}}\right)^{2}}}\right]\,dx\\[6pt]&=\int _{-\infty }^{\infty }{\frac {1}{{\sqrt {2\pi }}\sigma _{Z}}}{\frac {1}{{\sqrt {2\pi }}{\frac {\sigma _{X}\sigma _{Y}}{\sigma _{Z}}}}}\exp \left[-{\frac {\left(x-{\frac {\sigma _{X}^{2}(z-\mu _{Y})+\sigma _{Y}^{2}\mu _{X}}{\sigma _{Z}^{2}}}\right)^{2}-\left({\frac {\sigma _{X}^{2}(z-\mu _{Y})+\sigma _{Y}^{2}\mu _{X}}{\sigma _{Z}^{2}}}\right)^{2}+{\frac {\sigma _{X}^{2}(z-\mu _{Y})^{2}+\sigma _{Y}^{2}\mu _{X}^{2}}{\sigma _{Z}^{2}}}}{2\left({\frac {\sigma _{X}\sigma _{Y}}{\sigma _{Z}}}\right)^{2}}}\right]\,dx\\[6pt]&=\int _{-\infty }^{\infty }{\frac {1}{{\sqrt {2\pi }}\sigma _{Z}}}\exp \left[-{\frac {\sigma _{Z}^{2}\left(\sigma _{X}^{2}(z-\mu _{Y})^{2}+\sigma _{Y}^{2}\mu _{X}^{2}\right)-\left(\sigma _{X}^{2}(z-\mu _{Y})+\sigma _{Y}^{2}\mu _{X}\right)^{2}}{2\sigma _{Z}^{2}\left(\sigma _{X}\sigma _{Y}\right)^{2}}}\right]{\frac {1}{{\sqrt {2\pi }}{\frac {\sigma _{X}\sigma _{Y}}{\sigma _{Z}}}}}\exp \left[-{\frac {\left(x-{\frac {\sigma _{X}^{2}(z-\mu _{Y})+\sigma _{Y}^{2}\mu _{X}}{\sigma _{Z}^{2}}}\right)^{2}}{2\left({\frac {\sigma _{X}\sigma _{Y}}{\sigma _{Z}}}\right)^{2}}}\right]\,dx\\[6pt]&={\frac {1}{{\sqrt {2\pi }}\sigma _{Z}}}\exp \left[-{(z-(\mu _{X}+\mu _{Y}))^{2} \over 2\sigma _{Z}^{2}}\right]\int _{-\infty }^{\infty }{\frac {1}{{\sqrt {2\pi }}{\frac {\sigma _{X}\sigma _{Y}}{\sigma _{Z}}}}}\exp \left[-{\frac {\left(x-{\frac {\sigma _{X}^{2}(z-\mu _{Y})+\sigma _{Y}^{2}\mu _{X}}{\sigma _{Z}^{2}}}\right)^{2}}{2\left({\frac {\sigma _{X}\sigma _{Y}}{\sigma _{Z}}}\right)^{2}}}\right]\,dx\end{aligned}}

The expression in the integral is a normal density distribution on x, and so the integral evaluates to 1. The desired result follows:

f_{Z}(z)={\frac {1}{{\sqrt {2\pi }}\sigma _{Z}}}\exp \left[-{(z-(\mu _{X}+\mu _{Y}))^{2} \over 2\sigma _{Z}^{2}}\right]

Using the convolution theorem

It can be shown that the Fourier transform of a Gaussian, $f_{X}(x)={\mathcal {N}}(x;\mu _{X},\sigma _{X}^{2})$ , is^[3]

{\mathcal {F}}\{f_{X}\}=F_{X}(\omega )=\exp \left[-j\omega \mu _{X}\right]\exp \left[-{\tfrac {\sigma _{X}^{2}\omega ^{2}}{2}}\right]

By the convolution theorem:

{\begin{aligned}f_{Z}(z)&=(f_{X}*f_{Y})(z)\\[5pt]&={\mathcal {F}}^{-1}{\big \{}{\mathcal {F}}\{f_{X}\}\cdot {\mathcal {F}}\{f_{Y}\}{\big \}}\\[5pt]&={\mathcal {F}}^{-1}{\big \{}\exp \left[-j\omega \mu _{X}\right]\exp \left[-{\tfrac {\sigma _{X}^{2}\omega ^{2}}{2}}\right]\exp \left[-j\omega \mu _{Y}\right]\exp \left[-{\tfrac {\sigma _{Y}^{2}\omega ^{2}}{2}}\right]{\big \}}\\[5pt]&={\mathcal {F}}^{-1}{\big \{}\exp \left[-j\omega (\mu _{X}+\mu _{Y})\right]\exp \left[-{\tfrac {(\sigma _{X}^{2}\ +\sigma _{Y}^{2})\omega ^{2}}{2}}\right]{\big \}}\\[5pt]&={\mathcal {N}}(z;\mu _{X}+\mu _{Y},\sigma _{X}^{2}+\sigma _{Y}^{2})\end{aligned}}

Geometric proof

First consider the normalized case when X, Y ~ N(0, 1), so that their PDFs are

f(x)={\frac {1}{\sqrt {2\pi \,}}}e^{-x^{2}/2}

and

g(y)={\frac {1}{\sqrt {2\pi \,}}}e^{-y^{2}/2}.

Let Z = X + Y. Then the CDF for Z will be

z\mapsto \int _{x+y\leq z}f(x)g(y)\,dx\,dy.

This integral is over the half-plane which lies under the line x+y = z.

The key observation is that the function

f(x)g(y)={\frac {1}{2\pi }}e^{-(x^{2}+y^{2})/2}\,

is radially symmetric. So we rotate the coordinate plane about the origin, choosing new coordinates $x',y'$ such that the line x+y = z is described by the equation $x'=c$ where $c=c(z)$ is determined geometrically. Because of the radial symmetry, we have $f(x)g(y)=f(x')g(y')$ , and the CDF for Z is

\int _{x'\leq c,y'\in \mathbb {R} }f(x')g(y')\,dx'\,dy'.

This is easy to integrate; we find that the CDF for Z is

\int _{-\infty }^{c(z)}f(x')\,dx'=\Phi (c(z)).

To determine the value $c(z)$ , note that we rotated the plane so that the line x+y = z now runs vertically with x-intercept equal to c. So c is just the distance from the origin to the line x+y = z along the perpendicular bisector, which meets the line at its nearest point to the origin, in this case $(z/2,z/2)\,$ . So the distance is $c={\sqrt {(z/2)^{2}+(z/2)^{2}}}=z/{\sqrt {2}}\,$ , and the CDF for Z is $\Phi (z/{\sqrt {2}})$ , i.e., $Z=X+Y\sim N(0,2).$

Now, if a, b are any real constants (not both zero) then the probability that $aX+bY\leq z$ is found by the same integral as above, but with the bounding line $ax+by=z$ . The same rotation method works, and in this more general case we find that the closest point on the line to the origin is located a (signed) distance

{\frac {z}{\sqrt {a^{2}+b^{2}}}}

away, so that

aX+bY\sim N(0,a^{2}+b^{2}).

The same argument in higher dimensions shows that if

X_{i}\sim N(0,\sigma _{i}^{2}),\qquad i=1,\dots ,n,

then

X_{1}+\cdots +X_{n}\sim N(0,\sigma _{1}^{2}+\cdots +\sigma _{n}^{2}).

Now we are essentially done, because

X\sim N(\mu ,\sigma ^{2})\Leftrightarrow {\frac {1}{\sigma }}(X-\mu )\sim N(0,1).

So in general, if

X_{i}\sim N(\mu _{i},\sigma _{i}^{2}),\qquad i=1,\dots ,n,

then

\sum _{i=1}^{n}a_{i}X_{i}\sim N\left(\sum _{i=1}^{n}a_{i}\mu _{i},\sum _{i=1}^{n}(a_{i}\sigma _{i})^{2}\right).

Correlated random variables

In the event that the variables X and Y are jointly normally distributed random variables, then X + Y is still normally distributed (see Multivariate normal distribution) and the mean is the sum of the means. However, the variances are not additive due to the correlation. Indeed,

\sigma _{X+Y}={\sqrt {\sigma _{X}^{2}+\sigma _{Y}^{2}+2\rho \sigma _{X}\sigma _{Y}}},

where ρ is the correlation. In particular, whenever ρ < 0, then the variance is less than the sum of the variances of X and Y.

Extensions of this result can be made for more than two random variables, using the covariance matrix.

Proof

In this case (with X and Y having zero means), one needs to consider

{\frac {1}{2\pi \sigma _{x}\sigma _{y}{\sqrt {1-\rho ^{2}}}}}\iint _{x\,y}\exp \left[-{\frac {1}{2(1-\rho ^{2})}}\left({\frac {x^{2}}{\sigma _{x}^{2}}}+{\frac {y^{2}}{\sigma _{y}^{2}}}-{\frac {2\rho xy}{\sigma _{x}\sigma _{y}}}\right)\right]\delta (z-(x+y))\,\mathrm {d} x\,\mathrm {d} y.

As above, one makes the substitution $y\rightarrow z-x$

This integral is more complicated to simplify analytically, but can be done easily using a symbolic mathematics program. The probability distribution f_Z(z) is given in this case by

f_{Z}(z)={\frac {1}{{\sqrt {2\pi }}\sigma _{+}}}\exp \left(-{\frac {z^{2}}{2\sigma _{+}^{2}}}\right)

where

\sigma _{+}={\sqrt {\sigma _{x}^{2}+\sigma _{y}^{2}+2\rho \sigma _{x}\sigma _{y}}}.

If one considers instead Z = X − Y, then one obtains

f_{Z}(z)={\frac {1}{\sqrt {2\pi (\sigma _{x}^{2}+\sigma _{y}^{2}-2\rho \sigma _{x}\sigma _{y})}}}\exp \left(-{\frac {z^{2}}{2(\sigma _{x}^{2}+\sigma _{y}^{2}-2\rho \sigma _{x}\sigma _{y})}}\right)

which also can be rewritten with

\sigma _{X-Y}={\sqrt {\sigma _{x}^{2}+\sigma _{y}^{2}-2\rho \sigma _{x}\sigma _{y}}}.

The standard deviations of each distribution are obvious by comparison with the standard normal distribution.

Related Research Articles

In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is

<span class="mw-page-title-main">Multivariate normal distribution</span> Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value.

<span class="mw-page-title-main">Log-normal distribution</span> Probability distribution

In probability theory, a log-normal (or lognormal) distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed. Thus, if the random variable $X$ is log-normally distributed, then $Y = ln(X)$ has a normal distribution. Equivalently, if $Y$ has a normal distribution, then the exponential function of $Y$ , $X = exp(Y)$ , has a log-normal distribution. A random variable which is log-normally distributed takes only positive real values. It is a convenient and useful model for measurements in exact and engineering sciences, as well as medicine, economics and other topics (e.g., energies, concentrations, lengths, prices of financial instruments, and other metrics).

In mathematics, a Gaussian function, often simply referred to as a Gaussian, is a function of the base form

The Gaussian integral, also known as the Euler–Poisson integral, is the integral of the Gaussian function $over the entire real line. Named after the German mathematician Carl Friedrich Gauss, the integral is$

The Voigt profile is a probability distribution given by a convolution of a Cauchy-Lorentz distribution and a Gaussian distribution. It is often used in analyzing data from spectroscopy or diffraction.

In probability theory, the Rice distribution or Rician distribution is the probability distribution of the magnitude of a circularly-symmetric bivariate normal random variable, possibly with non-zero mean (noncentral). It was named after Stephen O. Rice (1907–1986).

The noncentral t-distribution generalizes Student's t-distribution using a noncentrality parameter. Whereas the central probability distribution describes how a test statistic t is distributed when the difference tested is null, the noncentral distribution describes how t is distributed when the null is false. This leads to its use in statistics, especially calculating statistical power. The noncentral t-distribution is also known as the singly noncentral t-distribution, and in addition to its primary use in statistical inference, is also used in robust modeling for data.

Differential entropy is a concept in information theory that began as an attempt by Claude Shannon to extend the idea of (Shannon) entropy, a measure of average (surprisal) of a random variable, to continuous probability distributions. Unfortunately, Shannon did not derive this formula, and rather just assumed it was the correct continuous analogue of discrete entropy, but it is not. The actual continuous version of discrete entropy is the limiting density of discrete points (LDDP). Differential entropy is commonly encountered in the literature, but it is a limiting case of the LDDP, and one that loses its fundamental association with discrete entropy.

In mathematics, probabilistic metric spaces are a generalization of metric spaces where the distance no longer takes values in the non-negative real numbers $R \geq 0$ , but in distribution functions.

In mathematics, a $π$ -system on a set $is a collection of certain subsets of such that$

The folded normal distribution is a probability distribution related to the normal distribution. Given a normally distributed random variable X with mean μ and variance σ², the random variable Y = |X| has a folded normal distribution. Such a case may be encountered if only the magnitude of some variable is recorded, but not its sign. The distribution is called "folded" because probability mass to the left of x = 0 is folded over by taking the absolute value. In the physics of heat conduction, the folded normal distribution is a fundamental solution of the heat equation on the half space; it corresponds to having a perfect insulator on a hyperplane through the origin.

Expected shortfall (ES) is a risk measure—a concept used in the field of financial risk measurement to evaluate the market risk or credit risk of a portfolio. The "expected shortfall at q% level" is the expected return on the portfolio in the worst $of cases. ES is an alternative to value at risk that is more sensitive to the shape of the tail of the loss distribution.$

A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.

In probability theory and statistics, the half-normal distribution is a special case of the folded normal distribution.

<span class="mw-page-title-main">Normal-inverse-gamma distribution</span>

In probability theory and statistics, the normal-inverse-gamma distribution is a four-parameter family of multivariate continuous probability distributions. It is the conjugate prior of a normal distribution with unknown mean and variance.

In numerical analysis, Gauss–Hermite quadrature is a form of Gaussian quadrature for approximating the value of integrals of the following kind:

<span class="mw-page-title-main">Wrapped normal distribution</span>

In probability theory and directional statistics, a wrapped normal distribution is a wrapped probability distribution that results from the "wrapping" of the normal distribution around the unit circle. It finds application in the theory of Brownian motion and is a solution to the heat equation for periodic boundary conditions. It is closely approximated by the von Mises distribution, which, due to its mathematical simplicity and tractability, is the most commonly used distribution in directional statistics.

A product distribution is a probability distribution constructed as the distribution of the product of random variables having two other known distributions. Given two statistically independent random variables X and Y, the distribution of the random variable Z that is formed as the product $is a product distribution .$

Lagrangian field theory is a formalism in classical field theory. It is the field-theoretic analogue of Lagrangian mechanics. Lagrangian mechanics is used to analyze the motion of a system of discrete particles each with a finite number of degrees of freedom. Lagrangian field theory applies to continua and fields, which have an infinite number of degrees of freedom.

References

↑ Lemons, Don S. (2002), An Introduction to Stochastic Processes in Physics, The Johns Hopkins University Press, p. 34, ISBN 0-8018-6866-1
↑ Lemons (2002) pp. 35–36
↑ Derpanis, Konstantinos G. (October 20, 2005). "Fourier Transform of the Gaussian" (PDF).

Sum of normally distributed random variables

Contents

Independent random variables

Proofs

Proof using characteristic functions

Proof using convolutions

Using the convolution theorem

Geometric proof

Correlated random variables

Proof

Related Research Articles

References

See also