Last updated

Additive white Gaussian noise (AWGN) is a basic noise model used in information theory to mimic the effect of many random processes that occur in nature. The modifiers denote specific characteristics:

## Contents

• Additive because it is added to any noise that might be intrinsic to the information system.
• White refers to the idea that it has uniform power across the frequency band for the information system. It is an analogy to the color white which has uniform emissions at all frequencies in the visible spectrum.
• Gaussian because it has a normal distribution in the time domain with an average time domain value of zero.

Wideband noise comes from many natural noise sources, such as the thermal vibrations of atoms in conductors (referred to as thermal noise or Johnson–Nyquist noise), shot noise, black-body radiation from the earth and other warm objects, and from celestial sources such as the Sun. The central limit theorem of probability theory indicates that the summation of many random processes will tend to have distribution called Gaussian or Normal.

AWGN is often used as a channel model in which the only impairment to communication is a linear addition of wideband or white noise with a constant spectral density (expressed as watts per hertz of bandwidth) and a Gaussian distribution of amplitude. The model does not account for fading, frequency selectivity, interference, nonlinearity or dispersion. However, it produces simple and tractable mathematical models which are useful for gaining insight into the underlying behavior of a system before these other phenomena are considered.

The AWGN channel is a good model for many satellite and deep space communication links. It is not a good model for most terrestrial links because of multipath, terrain blocking, interference, etc. However, for terrestrial path modeling, AWGN is commonly used to simulate background noise of the channel under study, in addition to multipath, terrain blocking, interference, ground clutter and self interference that modern radio systems encounter in terrestrial operation.

## Channel capacity

The AWGN channel is represented by a series of outputs ${\displaystyle Y_{i}}$ at discrete time event index ${\displaystyle i}$. ${\displaystyle Y_{i}}$ is the sum of the input ${\displaystyle X_{i}}$ and noise, ${\displaystyle Z_{i}}$, where ${\displaystyle Z_{i}}$ is independent and identically distributed and drawn from a zero-mean normal distribution with variance ${\displaystyle N}$ (the noise). The ${\displaystyle Z_{i}}$ are further assumed to not be correlated with the ${\displaystyle X_{i}}$.

${\displaystyle Z_{i}\sim {\mathcal {N}}(0,N)\,\!}$
${\displaystyle Y_{i}=X_{i}+Z_{i}.\,\!}$

The capacity of the channel is infinite unless the noise ${\displaystyle N}$ is nonzero, and the ${\displaystyle X_{i}}$ are sufficiently constrained. The most common constraint on the input is the so-called "power" constraint, requiring that for a codeword ${\displaystyle (x_{1},x_{2},\dots ,x_{k})}$ transmitted through the channel, we have:

${\displaystyle {\frac {1}{k}}\sum _{i=1}^{k}x_{i}^{2}\leq P,}$

where ${\displaystyle P}$ represents the maximum channel power. Therefore, the channel capacity for the power-constrained channel is given by:

${\displaystyle C=\max _{f(x){\text{ s.t. }}E\left(X^{2}\right)\leq P}I(X;Y)\,\!}$

Where ${\displaystyle f(x)}$ is the distribution of ${\displaystyle X}$. Expand ${\displaystyle I(X;Y)}$, writing it in terms of the differential entropy:

{\displaystyle {\begin{aligned}I(X;Y)=h(Y)-h(Y|X)&=h(Y)-h(X+Z|X)&=h(Y)-h(Z|X)\end{aligned}}\,\!}

But ${\displaystyle X}$ and ${\displaystyle Z}$ are independent, therefore:

${\displaystyle I(X;Y)=h(Y)-h(Z)\,\!}$

Evaluating the differential entropy of a Gaussian gives:

${\displaystyle h(Z)={\frac {1}{2}}\log(2\pi eN)\,\!}$

Because ${\displaystyle X}$ and ${\displaystyle Z}$ are independent and their sum gives ${\displaystyle Y}$:

${\displaystyle E(Y^{2})=E((X+Z)^{2})=E(X^{2})+2E(X)E(Z)+E(Z^{2})\leq P+N\,\!}$

From this bound, we infer from a property of the differential entropy that

${\displaystyle h(Y)\leq {\frac {1}{2}}\log(2\pi e(P+N))\,\!}$

Therefore, the channel capacity is given by the highest achievable bound on the mutual information:

${\displaystyle I(X;Y)\leq {\frac {1}{2}}\log(2\pi e(P+N))-{\frac {1}{2}}\log(2\pi eN)\,\!}$

Where ${\displaystyle I(X;Y)}$ is maximized when:

${\displaystyle X\sim {\mathcal {N}}(0,P)\,\!}$

Thus the channel capacity ${\displaystyle C}$ for the AWGN channel is given by:

${\displaystyle C={\frac {1}{2}}\log \left(1+{\frac {P}{N}}\right)\,\!}$

### Channel capacity and sphere packing

Suppose that we are sending messages through the channel with index ranging from ${\displaystyle 1}$ to ${\displaystyle M}$, the number of distinct possible messages. If we encode the ${\displaystyle M}$ messages to ${\displaystyle n}$ bits, then we define the rate ${\displaystyle R}$ as:

${\displaystyle R={\frac {\log M}{n}}\,\!}$

A rate is said to be achievable if there is a sequence of codes so that the maximum probability of error tends to zero as ${\displaystyle n}$ approaches infinity. The capacity ${\displaystyle C}$ is the highest achievable rate.

Consider a codeword of length ${\displaystyle n}$ sent through the AWGN channel with noise level ${\displaystyle N}$. When received, the codeword vector variance is now ${\displaystyle N}$, and its mean is the codeword sent. The vector is very likely to be contained in a sphere of radius ${\displaystyle {\sqrt {n(N+\epsilon )}}}$ around the codeword sent. If we decode by mapping every message received onto the codeword at the center of this sphere, then an error occurs only when the received vector is outside of this sphere, which is very unlikely.

Each codeword vector has an associated sphere of received codeword vectors which are decoded to it and each such sphere must map uniquely onto a codeword. Because these spheres therefore must not intersect, we are faced with the problem of sphere packing. How many distinct codewords can we pack into our ${\displaystyle n}$-bit codeword vector? The received vectors have a maximum energy of ${\displaystyle n(P+N)}$ and therefore must occupy a sphere of radius ${\displaystyle {\sqrt {n(P+N)}}}$. Each codeword sphere has radius ${\displaystyle {\sqrt {nN}}}$. The volume of an n-dimensional sphere is directly proportional to ${\displaystyle r^{n}}$, so the maximum number of uniquely decodeable spheres that can be packed into our sphere with transmission power P is:

${\displaystyle {\frac {(n(P+N))^{\frac {n}{2}}}{(nN)^{\frac {n}{2}}}}=2^{{\frac {n}{2}}\log(1+P/N)}\,\!}$

By this argument, the rate R can be no more than ${\displaystyle {\frac {1}{2}}\log(1+P/N)}$.

### Achievability

In this section, we show achievability of the upper bound on the rate from the last section.

A codebook, known to both encoder and decoder, is generated by selecting codewords of length n, i.i.d. Gaussian with variance ${\displaystyle P-\epsilon }$ and mean zero. For large n, the empirical variance of the codebook will be very close to the variance of its distribution, thereby avoiding violation of the power constraint probabilistically.

Received messages are decoded to a message in the codebook which is uniquely jointly typical. If there is no such message or if the power constraint is violated, a decoding error is declared.

Let ${\displaystyle X^{n}(i)}$ denote the codeword for message ${\displaystyle i}$, while ${\displaystyle Y^{n}}$ is, as before the received vector. Define the following three events:

1. Event ${\displaystyle U}$:the power of the received message is larger than ${\displaystyle P}$.
2. Event ${\displaystyle V}$: the transmitted and received codewords are not jointly typical.
3. Event ${\displaystyle E_{j}}$: ${\displaystyle (X^{n}(j),Y^{n})}$ is in ${\displaystyle A_{\epsilon }^{(n)}}$, the typical set where ${\displaystyle i\neq j}$, which is to say that the incorrect codeword is jointly typical with the received vector.

An error therefore occurs if ${\displaystyle U}$, ${\displaystyle V}$ or any of the ${\displaystyle E_{i}}$ occur. By the law of large numbers, ${\displaystyle P(U)}$ goes to zero as n approaches infinity, and by the joint Asymptotic Equipartition Property the same applies to ${\displaystyle P(V)}$. Therefore, for a sufficiently large ${\displaystyle n}$, both ${\displaystyle P(U)}$ and ${\displaystyle P(V)}$ are each less than ${\displaystyle \epsilon }$. Since ${\displaystyle X^{n}(i)}$ and ${\displaystyle X^{n}(j)}$ are independent for ${\displaystyle i\neq j}$, we have that ${\displaystyle X^{n}(i)}$ and ${\displaystyle Y^{n}}$ are also independent. Therefore, by the joint AEP, ${\displaystyle P(E_{j})=2^{-n(I(X;Y)-3\epsilon )}}$. This allows us to calculate ${\displaystyle P_{e}^{(n)}}$, the probability of error as follows:

{\displaystyle {\begin{aligned}P_{e}^{(n)}&\leq P(U)+P(V)+\sum _{j\neq i}P(E_{j})\\&\leq \epsilon +\epsilon +\sum _{j\neq i}2^{-n(I(X;Y)-3\epsilon )}\\&\leq 2\epsilon +(2^{nR}-1)2^{-n(I(X;Y)-3\epsilon )}\\&\leq 2\epsilon +(2^{3n\epsilon })2^{-n(I(X;Y)-R)}\\&\leq 3\epsilon \end{aligned}}}

Therefore, as n approaches infinity, ${\displaystyle P_{e}^{(n)}}$ goes to zero and ${\displaystyle R. Therefore, there is a code of rate R arbitrarily close to the capacity derived earlier.

### Coding theorem converse

Here we show that rates above the capacity ${\displaystyle C={\frac {1}{2}}\log(1+{\frac {P}{N}})}$ are not achievable.

Suppose that the power constraint is satisfied for a codebook, and further suppose that the messages follow a uniform distribution. Let ${\displaystyle W}$ be the input messages and ${\displaystyle {\hat {W}}}$ the output messages. Thus the information flows as:

${\displaystyle W\longrightarrow X^{(n)}(W)\longrightarrow Y^{(n)}\longrightarrow {\hat {W}}}$

Making use of Fano's inequality gives:

${\displaystyle H(W|{\hat {W}})\leq 1+nRP_{e}^{(n)}=n\epsilon _{n}}$ where ${\displaystyle \epsilon _{n}\rightarrow 0}$ as ${\displaystyle P_{e}^{(n)}\rightarrow 0}$

Let ${\displaystyle X_{i}}$ be the encoded message of codeword index i. Then:

{\displaystyle {\begin{aligned}nR&=H(W)\\&=I(W;{\hat {W}})+H(W|{\hat {W}})\\&\leq I(W;{\hat {W}})+n\epsilon _{n}\\&\leq I(X^{(n)};Y^{(n)})+n\epsilon _{n}\\&=h(Y^{(n)})-h(Y^{(n)}|X^{(n)})+n\epsilon _{n}\\&=h(Y^{(n)})-h(Z^{(n)})+n\epsilon _{n}\\&\leq \sum _{i=1}^{n}h(Y_{i})-h(Z^{(n)})+n\epsilon _{n}\\&\leq \sum _{i=1}^{n}I(X_{i};Y_{i})+n\epsilon _{n}\end{aligned}}}

Let ${\displaystyle P_{i}}$ be the average power of the codeword of index i:

${\displaystyle P_{i}={\frac {1}{2^{nR}}}\sum _{w}x_{i}^{2}(w)\,\!}$

Where the sum is over all input messages ${\displaystyle w}$. ${\displaystyle X_{i}}$ and ${\displaystyle Z_{i}}$ are independent, thus the expectation of the power of ${\displaystyle Y_{i}}$ is, for noise level ${\displaystyle N}$:

${\displaystyle E(Y_{i}^{2})=P_{i}+N\,\!}$

And, if ${\displaystyle Y_{i}}$ is normally distributed, we have that

${\displaystyle h(Y_{i})\leq {\frac {1}{2}}\log {2\pi e}(P_{i}+N)\,\!}$

Therefore,

{\displaystyle {\begin{aligned}nR&\leq \sum (h(Y_{i})-h(Z_{i}))+n\epsilon _{n}\\&\leq \sum \left({\frac {1}{2}}\log(2\pi e(P_{i}+N))-{\frac {1}{2}}\log(2\pi eN)\right)+n\epsilon _{n}\\&=\sum {\frac {1}{2}}\log(1+{\frac {P_{i}}{N}})+n\epsilon _{n}\end{aligned}}}

We may apply Jensen's equality to ${\displaystyle \log(1+x)}$, a concave (downward) function of x, to get:

${\displaystyle {\frac {1}{n}}\sum _{i=1}^{n}{\frac {1}{2}}\log \left(1+{\frac {P_{i}}{N}}\right)\leq {\frac {1}{2}}\log \left(1+{\frac {1}{n}}\sum _{i=1}^{n}{\frac {P_{i}}{N}}\right)\,\!}$

Because each codeword individually satisfies the power constraint, the average also satisfies the power constraint. Therefore,

${\displaystyle {\frac {1}{n}}\sum _{i=1}^{n}{\frac {P_{i}}{N}}\,\!}$

Which we may apply to simplify the inequality above and get:

${\displaystyle {\frac {1}{2}}\log \left(1+{\frac {1}{n}}\sum _{i=1}^{n}{\frac {P_{i}}{N}}\right)\leq {\frac {1}{2}}\log \left(1+{\frac {P}{N}}\right)\,\!}$

Therefore, it must be that ${\displaystyle R\leq {\frac {1}{2}}\log \left(1+{\frac {P}{N}}\right)+\epsilon _{n}}$. Therefore, R must be less than a value arbitrarily close to the capacity derived earlier, as ${\displaystyle \epsilon _{n}\rightarrow 0}$.

## Effects in time domain

In serial data communications, the AWGN mathematical model is used to model the timing error caused by random jitter (RJ).

The graph to the right shows an example of timing errors associated with AWGN. The variable Δt represents the uncertainty in the zero crossing. As the amplitude of the AWGN is increased, the signal-to-noise ratio decreases. This results in increased uncertainty Δt. [1]

When affected by AWGN, the average number of either positive-going or negative-going zero crossings per second at the output of a narrow bandpass filter when the input is a sine wave is

${\displaystyle {\frac {\text{positive zero crossings}}{\text{second}}}={\frac {\text{negative zero crossings}}{\text{second}}}}$
${\displaystyle \quad =f_{0}{\sqrt {\frac {{\text{SNR}}+1+{\frac {B^{2}}{12f_{0}^{2}}}}{{\text{SNR}}+1}}},}$

where

f0 = the center frequency of the filter,
B = the filter bandwidth,
SNR = the signal-to-noise power ratio in linear terms.

## Effects in phasor domain

In modern communication systems, bandlimited AWGN cannot be ignored. When modeling bandlimited AWGN in the phasor domain, statistical analysis reveals that the amplitudes of the real and imaginary contributions are independent variables which follow the Gaussian distribution model. When combined, the resultant phasor's magnitude is a Rayleigh-distributed random variable, while the phase is uniformly distributed from 0 to 2π.

The graph to the right shows an example of how bandlimited AWGN can affect a coherent carrier signal. The instantaneous response of the noise vector cannot be precisely predicted, however, its time-averaged response can be statistically predicted. As shown in the graph, we confidently predict that the noise phasor will reside about 38% of the time inside the 1σ circle, about 86% of the time inside the 2σ circle, and about 98% of the time inside the 3σ circle. [1]

## Related Research Articles

A binary symmetric channel is a common communications channel model used in coding theory and information theory. In this model, a transmitter wishes to send a bit, and the receiver will receive a bit. The bit will be "flipped" with a "crossover probability" of p, and otherwise is received correctly. This model can be applied to varied communication channels such as telephone lines or disk drive storage.

Channel capacity, in electrical engineering, computer science, and information theory, is the tight upper bound on the rate at which information can be reliably transmitted over a communication channel.

In information theory, the typical set is a set of sequences whose probability is close to two raised to the negative power of the entropy of their source distribution. That this set has total probability close to one is a consequence of the asymptotic equipartition property (AEP) which is a kind of law of large numbers. The notion of typicality is only concerned with the probability of a sequence and not the actual sequence itself.

In probability theory, the Azuma–Hoeffding inequality gives a concentration result for the values of martingales that have bounded differences.

In the theory of stochastic processes, the Karhunen–Loève theorem, also known as the Kosambi–Karhunen–Loève theorem is a representation of a stochastic process as an infinite linear combination of orthogonal functions, analogous to a Fourier series representation of a function on a bounded interval. The transformation is also known as Hotelling transform and eigenvector transform, and is closely related to principal component analysis (PCA) technique widely used in image processing and in data analysis in many fields.

In coding theory, block codes are a large and important family of error-correcting codes that encode data in blocks. There is a vast number of examples for block codes, many of which have a wide range of practical applications. The abstract definition of block codes is conceptually useful because it allows coding theorists, mathematicians, and computer scientists to study the limitations of all block codes in a unified way. Such limitations often take the form of bounds that relate different parameters of the block code to each other, such as its rate and its ability to detect and correct errors.

In information theory, the noisy-channel coding theorem, establishes that for any given degree of noise contamination of a communication channel, it is possible to communicate discrete data nearly error-free up to a computable maximum rate through the channel. This result was presented by Claude Shannon in 1948 and was based in part on earlier work and ideas of Harry Nyquist and Ralph Hartley.

In information theory, information dimension is an information measure for random vectors in Euclidean space, based on the normalized entropy of finely quantized versions of the random vectors. This concept was first introduced by Alfréd Rényi in 1959.

In information theory, the error exponent of a channel code or source code over the block length of the code is the rate at which the error probability decays exponentially with the block length of the code. Formally, it is defined as the limiting ratio of the negative logarithm of the error probability to the block length of the code for large block lengths. For example, if the probability of error of a decoder drops as , where is the block length, the error exponent is . In this example, approaches for large . Many of the information-theoretic theorems are of asymptotic nature, for example, the channel coding theorem states that for any rate less than the channel capacity, the probability of the error of the channel code can be made to go to zero as the block length goes to infinity. In practical situations, there are limitations to the delay of the communication and the block length must be finite. Therefore, it is important to study how the probability of error drops as the block length go to infinity.

A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.

In theoretical computer science, a small-bias sample space is a probability distribution that fools parity functions. In other words, no parity function can distinguish between a small-bias sample space and the uniform distribution with high probability, and hence, small-bias sample spaces naturally give rise to pseudorandom generators for parity functions.

In coding theory, list decoding is an alternative to unique decoding of error-correcting codes for large error rates. The notion was proposed by Elias in the 1950s. The main idea behind list decoding is that the decoding algorithm instead of outputting a single possible message outputs a list of possibilities one of which is correct. This allows for handling a greater number of errors than that allowed by unique decoding.

A locally testable code is a type of error-correcting code for which it can be determined if a string is a word in that code by looking at a small number of bits of the string. In some situations, it is useful to know if the data is corrupted without decoding all of it so that appropriate action can be taken in response. For example, in communication, if the receiver encounters a corrupted code, it can request the data be re-sent, which could increase the accuracy of said data. Similarly, in data storage, these codes can allow for damaged data to be recovered and rewritten properly.

In computer networks, self-similarity is a feature of network data transfer dynamics. When modeling network data dynamics the traditional time series models, such as an autoregressive moving average model, are not appropriate. This is because these models only provide a finite number of parameters in the model and thus interaction in a finite time window, but the network data usually have a long-range dependent temporal structure. A self-similar process is one way of modeling network data dynamics with such a long range correlation. This article defines and describes network data transfer dynamics in the context of a self-similar process. Properties of the process are shown and methods are given for graphing and estimating parameters modeling the self-similarity of network data.

In information theory, Shannon–Fano–Elias coding is a precursor to arithmetic coding, in which probabilities are used to determine codewords.

Fuzzy extractors are a method that allows biometric data to be used as inputs to standard cryptographic techniques for security. "Fuzzy", in this context, refers to the fact that the fixed values required for cryptography will be extracted from values close to but not identical to the original key, without compromising the security required. One application is to encrypt and authenticate users records, using the biometric inputs of the user as a key.

The distributional learning theory or learning of probability distribution is a framework in computational learning theory. It has been proposed from Michael Kearns, Yishay Mansour, Dana Ron, Ronitt Rubinfeld, Robert Schapire and Linda Sellie in 1994 and it was inspired from the PAC-framework introduced by Leslie Valiant.

In mathematics and theoretical computer science, analysis of Boolean functions is the study of real-valued functions on or from a spectral perspective. The functions studied are often, but not always, Boolean-valued, making them Boolean functions. The area has found many applications in combinatorics, social choice theory, random graphs, and theoretical computer science, especially in hardness of approximation, property testing, and PAC learning.

Adding controlled noise from predetermined distributions is a way of designing differentially private mechanisms. This technique is useful for designing private mechanisms for real-valued functions on sensitive data. Some commonly used distributions for adding noise include Laplace and Gaussian distributions.

Roth's theorem on arithmetic progressions is a result in additive combinatorics concerning the existence of arithmetic progressions in subsets of the natural numbers. It was first proven by Klaus Roth in 1953. Roth's Theorem is a special case of Szemerédi's Theorem for the case .