Binary symmetric channel

Last updated January 02, 2025

A binary symmetric channel (or BSC_p) is a common communications channel model used in coding theory and information theory. In this model, a transmitter wishes to send a bit (a zero or a one), and the receiver will receive a bit. The bit will be "flipped" with a "crossover probability" of p, and otherwise is received correctly. This model can be applied to varied communication channels such as telephone lines or disk drive storage.

The noisy-channel coding theorem applies to BSC_p, saying that information can be transmitted at any rate up to the channel capacity with arbitrarily low error. The channel capacity is $1-\operatorname {H} _{\text{b}}(p)$ bits, where $\operatorname {H} _{\text{b}}$ is the binary entropy function. Codes including Forney's code have been designed to transmit information efficiently across the channel.

Definition

A binary symmetric channel with crossover probability $p$ , denoted by BSC_p, is a channel with binary input and binary output and probability of error $p$ . That is, if $X$ is the transmitted random variable and $Y$ the received variable, then the channel is characterized by the conditional probabilities:^[1]

{\begin{aligned}\operatorname {Pr} [Y=0|X=0]&=1-p\\\operatorname {Pr} [Y=0|X=1]&=p\\\operatorname {Pr} [Y=1|X=0]&=p\\\operatorname {Pr} [Y=1|X=1]&=1-p\end{aligned}}

It is assumed that $0\leq p\leq 1/2$ . If $p>1/2$ , then the receiver can swap the output (interpret 1 when it sees 0, and vice versa) and obtain an equivalent channel with crossover probability $1-p\leq 1/2$ .

Capacity

The channel capacity of the binary symmetric channel, in bits, is:^[2]

\ C_{\text{BSC}}=1-\operatorname {H} _{\text{b}}(p),

where $\operatorname {H} _{\text{b}}(p)$ is the binary entropy function, defined by:^[2]

\operatorname {H} _{\text{b}}(x)=x\log _{2}{\frac {1}{x}}+(1-x)\log _{2}{\frac {1}{1-x}}

Proof^[3]

The capacity is defined as the maximum mutual information between input and output for all possible input distributions

p_{X}(x)

:

C=\max _{p_{X}(x)}\left\{\,I(X;Y)\,\right\}

The mutual information can be reformulated as

{\begin{aligned}I(X;Y)&=H(Y)-H(Y|X)\\&=H(Y)-\sum _{x\in \{0,1\}}{p_{X}(x)H(Y|X=x)}\\&=H(Y)-\sum _{x\in \{0,1\}}{p_{X}(x)}\operatorname {H} _{\text{b}}(p)\\&=H(Y)-\operatorname {H} _{\text{b}}(p),\end{aligned}}

where the first and second step follows from the definition of mutual information and conditional entropy respectively. The entropy at the output for a given and fixed input symbol ( $H(Y|X=x)$ ) equals the binary entropy function, which leads to the third line and this can be further simplified.

In the last line, only the first term $H(Y)$ depends on the input distribution $p_{X}(x)$ . The entropy of a binary variable is at most 1 bit, and equality is attained if its probability distribution is uniform. It therefore suffices to exhibit an input distribution that yields a uniform probability distribution for the output $Y$ . For this, note that it is a property of any binary symmetric channel that a uniform probability distribution of the input results in a uniform probability distribution of the output. Hence the value $H(Y)$ will be 1 when we choose a uniform distribution for $p_{X}(x)$ . We conclude that the channel capacity for our binary symmetric channel is $C_{\text{BSC}}=1-\operatorname {H} _{\text{b}}(p)$ .

Noisy-channel coding theorem

Shannon's noisy-channel coding theorem gives a result about the rate of information that can be transmitted through a communication channel with arbitrarily low error. We study the particular case of ${\text{BSC}}_{p}$ .

The noise $e$ that characterizes ${\text{BSC}}_{p}$ is a random variable consisting of n independent random bits (n is defined below) where each random bit is a $1$ with probability $p$ and a $0$ with probability $1-p$ . We indicate this by writing " $e\in {\text{BSC}}_{p}$ ".

Theorem — For all $p<{\tfrac {1}{2}},$ all $0<\epsilon <{\tfrac {1}{2}}-p$ , all sufficiently large $n$ (depending on $p$ and $\epsilon$ ), and all $k\leq \lfloor (1-H(p+\epsilon ))n\rfloor$ , there exists a pair of encoding and decoding functions $E:\{0,1\}^{k}\to \{0,1\}^{n}$ and $D:\{0,1\}^{n}\to \{0,1\}^{k}$ respectively, such that every message $m\in \{0,1\}^{k}$ has the following property:

\Pr _{e\in {\text{BSC}}_{p}}[D(E(m)+e)\neq m]\leq 2^{-{\delta }n}

.

What this theorem actually implies is, a message when picked from $\{0,1\}^{k}$ , encoded with a random encoding function $E$ , and sent across a noisy ${\text{BSC}}_{p}$ , there is a very high probability of recovering the original message by decoding, if $k$ or in effect the rate of the channel is bounded by the quantity stated in the theorem. The decoding error probability is exponentially small.

Proof

The theorem can be proved directly with a probabilistic method. Consider an encoding function $E:\{0,1\}^{k}\to \{0,1\}^{n}$ that is selected at random. This means that for each message $m\in \{0,1\}^{k}$ , the value $E(m)\in \{0,1\}^{n}$ is selected at random (with equal probabilities). For a given encoding function $E$ , the decoding function $D:\{0,1\}^{n}\to \{0,1\}^{k}$ is specified as follows: given any received codeword $y\in \{0,1\}^{n}$ , we find the message $m\in \{0,1\}^{k}$ such that the Hamming distance $\Delta (y,E(m))$ is as small as possible (with ties broken arbitrarily). ( $D$ is called a maximum likelihood decoding function.)

The proof continues by showing that at least one such choice $(E,D)$ satisfies the conclusion of theorem, by integration over the probabilities. Suppose $p$ and $\epsilon$ are fixed. First we show that, for a fixed $m\in \{0,1\}^{k}$ and $E$ chosen randomly, the probability of failure over ${\text{BSC}}_{p}$ noise is exponentially small in n. At this point, the proof works for a fixed message $m$ . Next we extend this result to work for all messages $m$ . We achieve this by eliminating half of the codewords from the code with the argument that the proof for the decoding error probability holds for at least half of the codewords. The latter method is called expurgation. This gives the total process the name random coding with expurgation.

Continuation of proof (sketch)

Fix

p

and

\epsilon

. Given a fixed message

m\in \{0,1\}^{k}

, we need to estimate the expected value of the probability of the received codeword along with the noise does not give back

m

on decoding. That is to say, we need to estimate:

\mathbb {E} _{E}\left[\Pr _{e\in {\text{BSC}}_{p}}[D(E(m)+e)\neq m]\right].

Let $y$ be the received codeword. In order for the decoded codeword $D(y)$ not to be equal to the message $m$ , one of the following events must occur:

$y$ does not lie within the Hamming ball of radius $(p+\epsilon )n$ centered at $E(m)$ . This condition is mainly used to make the calculations easier.
There is another message $m'\in \{0,1\}^{k}$ such that $\Delta (y,E(m'))\leqslant \Delta (y,E(m))$ . In other words, the errors due to noise take the transmitted codeword closer to another encoded message.

We can apply the Chernoff bound to ensure the non occurrence of the first event; we get:

Pr_{e\in {\text{BSC}}_{p}}[\Delta (y,E(m))>(p+\epsilon )n]\leqslant 2^{-{\epsilon ^{2}}n}.

This is exponentially small for large $n$ (recall that $\epsilon$ is fixed).

For the second event, we note that the probability that $E(m')\in B(y,(p+\epsilon )n)$ is ${\text{Vol}}(B(y,(p+\epsilon )n)/2^{n}$ where $B(x,r)$ is the Hamming ball of radius $r$ centered at vector $x$ and ${\text{Vol}}(B(x,r))$ is its volume. Using approximation to estimate the number of codewords in the Hamming ball, we have ${\text{Vol}}(B(y,(p+\epsilon )n))\approx 2^{H(p)n}$ . Hence the above probability amounts to $2^{H(p)n}/2^{n}=2^{H(p)n-n}$ . Now using the union bound, we can upper bound the existence of such an $m'\in \{0,1\}^{k}$ by $\leq 2^{k+H(p)n-n}$ which is $2^{-\Omega (n)}$ , as desired by the choice of $k$ .

Continuation of proof (detailed)

From the above analysis, we calculate the probability of the event that the decoded codeword plus the channel noise is not the same as the original message sent. We shall introduce some symbols here. Let

p(y|E(m))

denote the probability of receiving codeword

y

given that codeword

E(m)

was sent. Let

B_{0}

denote

B(E(m),(p+\epsilon )n).

{\begin{aligned}\Pr _{e\in {\text{BSC}}_{p}}[D(E(m)+e)\neq m]&=\sum _{y\in \{0,1\}^{n}}p(y|E(m))\cdot 1_{D(y)\neq m}\\&\leqslant \sum _{y\notin B_{0}}p(y|E(m))\cdot 1_{D(y)\neq m}+\sum _{y\in B_{0}}p(y|E(m))\cdot 1_{D(y)\neq m}\\&\leqslant 2^{-{\epsilon ^{2}}n}+\sum _{y\in B_{0}}p(y|E(m))\cdot 1_{D(y)\neq m}\end{aligned}}

We get the last inequality by our analysis using the Chernoff bound above. Now taking expectation on both sides we have,

{\begin{aligned}\mathbb {E} _{E}\left[\Pr _{e\in {\text{BSC}}_{p}}[D(E(m)+e)\neq m]\right]&\leqslant 2^{-{\epsilon ^{2}}n}+\sum _{y\in B_{0}}p(y|E(m))\mathbb {E} [1_{D(y)\neq m}]\\&\leqslant 2^{-{\epsilon ^{2}}n}+\sum _{y\in B_{0}}\mathbb {E} [1_{D(y)\neq m}]&&\sum _{y\in B_{0}}p(y|E(m))\leqslant 1\\&\leqslant 2^{-{\epsilon ^{2}}n}+2^{k+H(p+\epsilon )n-n}&&\mathbb {E} [1_{D(y)\neq m}]\leqslant 2^{k+H(p+\epsilon )n-n}{\text{ (see above)}}\\&\leqslant 2^{-\delta n}\end{aligned}}

by appropriately choosing the value of $\delta$ . Since the above bound holds for each message, we have

\mathbb {E} _{m}\left[\mathbb {E} _{E}\left[\Pr _{e\in {\text{BSC}}_{p}}\left[D(E(m)+e)\right]\neq m\right]\right]\leqslant 2^{-\delta n}.

Now we can change the order of summation in the expectation with respect to the message and the choice of the encoding function $E$ . Hence:

\mathbb {E} _{E}\left[\mathbb {E} _{m}\left[\Pr _{e\in {\text{BSC}}_{p}}\left[D(E(m)+e)\right]\neq m\right]\right]\leqslant 2^{-\delta n}.

Hence in conclusion, by probabilistic method, we have some encoding function $E^{*}$ and a corresponding decoding function $D^{*}$ such that

\mathbb {E} _{m}\left[\Pr _{e\in {\text{BSC}}_{p}}\left[D^{*}(E^{*}(m)+e)\neq m\right]\right]\leqslant 2^{-\delta n}.

At this point, the proof works for a fixed message $m$ . But we need to make sure that the above bound holds for all the messages $m$ simultaneously. For that, let us sort the $2^{k}$ messages by their decoding error probabilities. Now by applying Markov's inequality, we can show the decoding error probability for the first $2^{k-1}$ messages to be at most $2\cdot 2^{-\delta n}$ . Thus in order to confirm that the above bound to hold for every message $m$ , we could just trim off the last $2^{k-1}$ messages from the sorted order. This essentially gives us another encoding function $E'$ with a corresponding decoding function $D'$ with a decoding error probability of at most $2^{-\delta n+1}$ with the same rate. Taking $\delta '$ to be equal to $\delta -{\tfrac {1}{n}}$ we bound the decoding error probability to $2^{-\delta 'n}$ . This expurgation process completes the proof.

Converse of Shannon's capacity theorem

The converse of the capacity theorem essentially states that $1-H(p)$ is the best rate one can achieve over a binary symmetric channel. Formally the theorem states:

Theorem — If $k$ $\geq$ $\lceil$ $(1-H(p+\epsilon )n)$ $\rceil$ then the following is true for every encoding and decoding function $E$ : $\{0,1\}^{k}$ $\rightarrow$ $\{0,1\}^{n}$ and $D$ : $\{0,1\}^{n}$ $\rightarrow$ $\{0,1\}^{k}$ respectively: $\Pr _{e\in {\text{BSC}}_{p}}$ [ $D(E(m)+e)$ $\neq$ $m]$ $\geq$ ${\frac {1}{2}}$ .

The intuition behind the proof is however showing the number of errors to grow rapidly as the rate grows beyond the channel capacity. The idea is the sender generates messages of dimension $k$ , while the channel ${\text{BSC}}_{p}$ introduces transmission errors. When the capacity of the channel is $H(p)$ , the number of errors is typically $2^{H(p+\epsilon )n}$ for a code of block length $n$ . The maximum number of messages is $2^{k}$ . The output of the channel on the other hand has $2^{n}$ possible values. If there is any confusion between any two messages, it is likely that $2^{k}2^{H(p+\epsilon )n}\geq 2^{n}$ . Hence we would have $k\geq \lceil (1-H(p+\epsilon )n)\rceil$ , a case we would like to avoid to keep the decoding error probability exponentially small.

Codes

Very recently, a lot of work has been done and is also being done to design explicit error-correcting codes to achieve the capacities of several standard communication channels. The motivation behind designing such codes is to relate the rate of the code with the fraction of errors which it can correct.

The approach behind the design of codes which meet the channel capacities of ${\text{BSC}}$ or the binary erasure channel ${\text{BEC}}$ have been to correct a lesser number of errors with a high probability, and to achieve the highest possible rate. Shannon's theorem gives us the best rate which could be achieved over a ${\text{BSC}}_{p}$ , but it does not give us an idea of any explicit codes which achieve that rate. In fact such codes are typically constructed to correct only a small fraction of errors with a high probability, but achieve a very good rate. The first such code was due to George D. Forney in 1966. The code is a concatenated code by concatenating two different kinds of codes.

Forney's code

Forney constructed a concatenated code $C^{*}=C_{\text{out}}\circ C_{\text{in}}$ to achieve the capacity of the noisy-channel coding theorem for ${\text{BSC}}_{p}$ . In his code,

The outer code $C_{\text{out}}$ is a code of block length $N$ and rate $1-{\frac {\epsilon }{2}}$ over the field $F_{2^{k}}$ , and $k=O(\log N)$ . Additionally, we have a decoding algorithm $D_{\text{out}}$ for $C_{\text{out}}$ which can correct up to $\gamma$ fraction of worst case errors and runs in $t_{\text{out}}(N)$ time.
The inner code $C_{\text{in}}$ is a code of block length $n$ , dimension $k$ , and a rate of $1-H(p)-{\frac {\epsilon }{2}}$ . Additionally, we have a decoding algorithm $D_{\text{in}}$ for $C_{\text{in}}$ with a decoding error probability of at most ${\frac {\gamma }{2}}$ over ${\text{BSC}}_{p}$ and runs in $t_{\text{in}}(N)$ time.

For the outer code $C_{\text{out}}$ , a Reed-Solomon code would have been the first code to have come in mind. However, we would see that the construction of such a code cannot be done in polynomial time. This is why a binary linear code is used for $C_{\text{out}}$ .

For the inner code $C_{\text{in}}$ we find a linear code by exhaustively searching from the linear code of block length $n$ and dimension $k$ , whose rate meets the capacity of ${\text{BSC}}_{p}$ , by the noisy-channel coding theorem.

The rate $R(C^{*})=R(C_{\text{in}})\times R(C_{\text{out}})=(1-{\frac {\epsilon }{2}})(1-H(p)-{\frac {\epsilon }{2}})\geq 1-H(p)-\epsilon$ which almost meets the ${\text{BSC}}_{p}$ capacity. We further note that the encoding and decoding of $C^{*}$ can be done in polynomial time with respect to $N$ . As a matter of fact, encoding $C^{*}$ takes time $O(N^{2})+O(Nk^{2})=O(N^{2})$ . Further, the decoding algorithm described takes time $Nt_{\text{in}}(k)+t_{\text{out}}(N)=N^{O(1)}$ as long as $t_{\text{out}}(N)=N^{O(1)}$ ; and $t_{\text{in}}(k)=2^{O(k)}$ .

Decoding error probability

A natural decoding algorithm for $C^{*}$ is to:

Assume $y_{i}^{\prime }=D_{\text{in}}(y_{i}),\quad i\in (0,N)$
Execute $D_{\text{out}}$ on $y^{\prime }=(y_{1}^{\prime }\ldots y_{N}^{\prime })$

Note that each block of code for $C_{\text{in}}$ is considered a symbol for $C_{\text{out}}$ . Now since the probability of error at any index $i$ for $D_{\text{in}}$ is at most ${\tfrac {\gamma }{2}}$ and the errors in ${\text{BSC}}_{p}$ are independent, the expected number of errors for $D_{\text{in}}$ is at most ${\tfrac {\gamma N}{2}}$ by linearity of expectation. Now applying Chernoff bound, we have bound error probability of more than $\gamma N$ errors occurring to be $e^{\frac {-\gamma N}{6}}$ . Since the outer code $C_{\text{out}}$ can correct at most $\gamma N$ errors, this is the decoding error probability of $C^{*}$ . This when expressed in asymptotic terms, gives us an error probability of $2^{-\Omega (\gamma N)}$ . Thus the achieved decoding error probability of $C^{*}$ is exponentially small as the noisy-channel coding theorem.

We have given a general technique to construct $C^{*}$ . For more detailed descriptions on $C_{\text{in}}$ and $C_{\text{out}}$ please read the following references. Recently a few other codes have also been constructed for achieving the capacities. LDPC codes have been considered for this purpose for their faster decoding time.^[4]

Applications

The binary symmetric channel can model a disk drive used for memory storage: the channel input represents a bit being written to the disk and the output corresponds to the bit later being read. Error could arise from the magnetization flipping, background noise or the writing head making an error. Other objects which the binary symmetric channel can model include a telephone or radio communication line or cell division, from which the daughter cells contain DNA information from their parent cell.^[5]

This channel is often used by theorists because it is one of the simplest noisy channels to analyze. Many problems in communication theory can be reduced to a BSC. Conversely, being able to transmit effectively over the BSC can give rise to solutions for more complicated channels.

Notes

↑ MacKay (2003), p. 4.
1 2 MacKay (2003), p. 15.
↑ Cover & Thomas (1991), p. 187.
↑ Richardson and Urbanke
↑ MacKay (2003), p. 3–4.

Related Research Articles

In mathematics, the Wiener process is a real-valued continuous-time stochastic process discovered by Norbert Wiener. It is one of the best known Lévy processes. It occurs frequently in pure and applied mathematics, economics, quantitative finance, evolutionary biology, and physics.

In probability theory, the law of large numbers (LLN) is a mathematical law that states that the average of the results obtained from a large number of independent random samples converges to the true value, if it exists. More formally, the LLN states that given a sample of independent and identically distributed values, the sample mean converges to the true mean.

Channel capacity, in electrical engineering, computer science, and information theory, is the theoretical maximum rate at which information can be reliably transmitted over a communication channel.

In information theory, the typical set is a set of sequences whose probability is close to two raised to the negative power of the entropy of their source distribution. That this set has total probability close to one is a consequence of the asymptotic equipartition property (AEP) which is a kind of law of large numbers. The notion of typicality is only concerned with the probability of a sequence and not the actual sequence itself.

In computational learning theory, probably approximately correct (PAC) learning is a framework for mathematical analysis of machine learning. It was proposed in 1984 by Leslie Valiant.

In probability theory, the multinomial distribution is a generalization of the binomial distribution. For example, it models the probability of counts for each side of a k-sided die rolled n times. For n independent trials each of which leads to a success for exactly one of k categories, with each category having a given fixed success probability, the multinomial distribution gives the probability of any particular combination of numbers of successes for the various categories.

In coding theory, decoding is the process of translating received messages into codewords of a given code. There have been many common methods of mapping messages to codewords. These are often used to recover messages sent over a noisy channel, such as a binary symmetric channel.

In error detection and correction, majority logic decoding is a method to decode repetition codes, based on the assumption that the largest number of occurrences of a symbol was the transmitted symbol.

In information theory, the noisy-channel coding theorem, establishes that for any given degree of noise contamination of a communication channel, it is possible to communicate discrete data nearly error-free up to a computable maximum rate through the channel. This result was presented by Claude Shannon in 1948 and was based in part on earlier work and ideas of Harry Nyquist and Ralph Hartley.

<span class="mw-page-title-main">Binary erasure channel</span>

In coding theory and information theory, a binary erasure channel (BEC) is a communications channel model. A transmitter sends a bit, and the receiver either receives the bit correctly, or with some probability $receives a message that the bit was not received ("erased").$

The Hadamard code is an error-correcting code named after the French mathematician Jacques Hadamard that is used for error detection and correction when transmitting messages over very noisy or unreliable channels. In 1971, the code was used to transmit photos of Mars back to Earth from the NASA space probe Mariner 9. Because of its unique mathematical properties, the Hadamard code is not only used by engineers, but also intensely studied in coding theory, mathematics, and theoretical computer science. The Hadamard code is also known under the names Walsh code, Walsh family, and Walsh–Hadamard code in recognition of the American mathematician Joseph Leonard Walsh.

<span class="mw-page-title-main">Z-channel (information theory)</span>

In coding theory and information theory, a Z-channel or binary asymmetric channel is a communications channel used to model the behaviour of some data storage systems.

In mathematics, uniform integrability is an important concept in real analysis, functional analysis and measure theory, and plays a vital role in the theory of martingales.

In theoretical computer science, a small-bias sample space is a probability distribution that fools parity functions. In other words, no parity function can distinguish between a small-bias sample space and the uniform distribution with high probability, and hence, small-bias sample spaces naturally give rise to pseudorandom generators for parity functions.

In coding theory, list decoding is an alternative to unique decoding of error-correcting codes for large error rates. The notion was proposed by Elias in the 1950s. The main idea behind list decoding is that the decoding algorithm instead of outputting a single possible message outputs a list of possibilities one of which is correct. This allows for handling a greater number of errors than that allowed by unique decoding.

A randomness extractor, often simply called an "extractor", is a function, which being applied to output from a weak entropy source, together with a short, uniformly random seed, generates a highly random output that appears independent from the source and uniformly distributed. Examples of weakly random sources include radioactive decay or thermal noise; the only restriction on possible sources is that there is no way they can be fully controlled, calculated or predicted, and that a lower bound on their entropy rate can be established. For a given source, a randomness extractor can even be considered to be a true random number generator (TRNG); but there is no single extractor that has been proven to produce truly random output from any type of weakly random source.

A locally testable code is a type of error-correcting code for which it can be determined if a string is a word in that code by looking at a small number of bits of the string. In some situations, it is useful to know if the data is corrupted without decoding all of it so that appropriate action can be taken in response. For example, in communication, if the receiver encounters a corrupted code, it can request the data be re-sent, which could increase the accuracy of said data. Similarly, in data storage, these codes can allow for damaged data to be recovered and rewritten properly.

A locally decodable code (LDC) is an error-correcting code that allows a single bit of the original message to be decoded with high probability by only examining a small number of bits of a possibly corrupted codeword. This property could be useful, say, in a context where information is being transmitted over a noisy channel, and only a small subset of the data is required at a particular time and there is no need to decode the entire message at once. Locally decodable codes are not a subset of locally testable codes, though there is some overlap between the two.

In the theory of quantum communication, the quantum capacity is the highest rate at which quantum information can be communicated over many independent uses of a noisy quantum channel from a sender to a receiver. It is also equal to the highest rate at which entanglement can be generated over the channel, and forward classical communication cannot improve it. The quantum capacity theorem is important for the theory of quantum error correction, and more broadly for the theory of quantum computation. The theorem giving a lower bound on the quantum capacity of any channel is colloquially known as the LSD theorem, after the authors Lloyd, Shor, and Devetak who proved it with increasing standards of rigor.

Fuzzy extractors are a method that allows biometric data to be used as inputs to standard cryptographic techniques, to enhance computer security. "Fuzzy", in this context, refers to the fact that the fixed values required for cryptography will be extracted from values close to but not identical to the original key, without compromising the security required. One application is to encrypt and authenticate users records, using the biometric inputs of the user as a key.

References

Cover, Thomas M.; Thomas, Joy A. (1991). Elements of Information Theory. Hoboken, New Jersey: Wiley. ISBN 978-0-471-24195-9.
G. David Forney. Concatenated Codes. MIT Press, Cambridge, MA, 1966.
Venkat Guruswamy's course on Error-Correcting Codes: Constructions and Algorithms], Autumn 2006.
MacKay, David J.C. (2003). Information Theory, Inference, and Learning Algorithms. Cambridge University Press. ISBN 0-521-64298-1.
Atri Rudra's course on Error Correcting Codes: Combinatorics, Algorithms, and Applications (Fall 2007), Lectures 9, 10, 29, and 30.
Madhu Sudan's course on Algorithmic Introduction to Coding Theory (Fall 2001), Lecture 1 and 2.
A mathematical theory of communication C. E Shannon, ACM SIGMOBILE Mobile Computing and Communications Review.
Modern Coding Theory by Tom Richardson and Rudiger Urbanke., Cambridge University Press

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[FOOTNOTEMacKay20034-1] MacKay (2003), p. 4.

[FOOTNOTEMacKay200315-2] 1 2 MacKay (2003), p. 15.

[FOOTNOTECoverThomas1991187-3] Cover & Thomas (1991), p. 187.

[4] Richardson and Urbanke

[FOOTNOTEMacKay20033–4-5] MacKay (2003), p. 3–4.

[1]

[2]

[3]

[4]

[5]