Channel capacity

Last updated

Channel capacity, in electrical engineering, computer science, and information theory, is the theoretical maximum rate at which information can be reliably transmitted over a communication channel.

Contents

Following the terms of the noisy-channel coding theorem, the channel capacity of a given channel is the highest information rate (in units of information per unit time) that can be achieved with arbitrarily small error probability. [1] [2]

Information theory, developed by Claude E. Shannon in 1948, defines the notion of channel capacity and provides a mathematical model by which it may be computed. The key result states that the capacity of the channel, as defined above, is given by the maximum of the mutual information between the input and output of the channel, where the maximization is with respect to the input distribution. [3]

The notion of channel capacity has been central to the development of modern wireline and wireless communication systems, with the advent of novel error correction coding mechanisms that have resulted in achieving performance very close to the limits promised by channel capacity.

Formal definition

The basic mathematical model for a communication system is the following:

where:

Let and be modeled as random variables. Furthermore, let be the conditional probability distribution function of given , which is an inherent fixed property of the communication channel. Then the choice of the marginal distribution completely determines the joint distribution due to the identity

which, in turn, induces a mutual information . The channel capacity is defined as

where the supremum is taken over all possible choices of .

Additivity of channel capacity

Channel capacity is additive over independent channels. [4] It means that using two independent channels in a combined manner provides the same theoretical capacity as using them independently. More formally, let and be two independent channels modelled as above; having an input alphabet and an output alphabet . Idem for . We define the product channel as

This theorem states:

Proof

We first show that .

Let and be two independent random variables. Let be a random variable corresponding to the output of through the channel , and for through .

By definition .

Since and are independent, as well as and , is independent of . We can apply the following property of mutual information:

For now we only need to find a distribution such that . In fact, and , two probability distributions for and achieving and , suffice:

ie.

Now let us show that .

Let be some distribution for the channel defining and the corresponding output . Let be the alphabet of , for , and analogously and .

By definition of mutual information, we have

Let us rewrite the last term of entropy.

By definition of the product channel, . For a given pair , we can rewrite as:

By summing this equality over all , we obtain .

We can now give an upper bound over mutual information:

This relation is preserved at the supremum. Therefore

Combining the two inequalities we proved, we obtain the result of the theorem:

Shannon capacity of a graph

If G is an undirected graph, it can be used to define a communications channel in which the symbols are the graph vertices, and two codewords may be confused with each other if their symbols in each position are equal or adjacent. The computational complexity of finding the Shannon capacity of such a channel remains open, but it can be upper bounded by another important graph invariant, the Lovász number. [5]

Noisy-channel coding theorem

The noisy-channel coding theorem states that for any error probability ε > 0 and for any transmission rate R less than the channel capacity C, there is an encoding and decoding scheme transmitting data at rate R whose error probability is less than ε, for a sufficiently large block length. Also, for any rate greater than the channel capacity, the probability of error at the receiver goes to 0.5 as the block length goes to infinity.

Example application

An application of the channel capacity concept to an additive white Gaussian noise (AWGN) channel with B Hz bandwidth and signal-to-noise ratio S/N is the Shannon–Hartley theorem:

C is measured in bits per second if the logarithm is taken in base 2, or nats per second if the natural logarithm is used, assuming B is in hertz; the signal and noise powers S and N are expressed in a linear power unit (like watts or volts2). Since S/N figures are often cited in dB, a conversion may be needed. For example, a signal-to-noise ratio of 30 dB corresponds to a linear power ratio of .

Channel capacity in wireless communications

This section [6] focuses on the single-antenna, point-to-point scenario. For channel capacity in systems with multiple antennas, see the article on MIMO.

Bandlimited AWGN channel

AWGN channel capacity with the power-limited regime and bandwidth-limited regime indicated. Here,
P
-
N
0
=
1
{\displaystyle {\frac {\bar {P}}{N_{0}}}=1}
; B and C can be scaled proportionally for other values. Channel Capacity with Power- and Bandwidth-Limited Regimes.png
AWGN channel capacity with the power-limited regime and bandwidth-limited regime indicated. Here, ; B and C can be scaled proportionally for other values.

If the average received power is [W], the total bandwidth is in Hertz, and the noise power spectral density is [W/Hz], the AWGN channel capacity is

[bits/s],

where is the received signal-to-noise ratio (SNR). This result is known as the Shannon–Hartley theorem. [7]

When the SNR is large (SNR ≫ 0 dB), the capacity is logarithmic in power and approximately linear in bandwidth. This is called the bandwidth-limited regime.

When the SNR is small (SNR ≪ 0 dB), the capacity is linear in power but insensitive to bandwidth. This is called the power-limited regime.

The bandwidth-limited regime and power-limited regime are illustrated in the figure.

Frequency-selective AWGN channel

The capacity of the frequency-selective channel is given by so-called water filling power allocation,

where and is the gain of subchannel , with chosen to meet the power constraint.

Slow-fading channel

In a slow-fading channel, where the coherence time is greater than the latency requirement, there is no definite capacity as the maximum rate of reliable communications supported by the channel, , depends on the random channel gain , which is unknown to the transmitter. If the transmitter encodes data at rate [bits/s/Hz], there is a non-zero probability that the decoding error probability cannot be made arbitrarily small,

,

in which case the system is said to be in outage. With a non-zero probability that the channel is in deep fade, the capacity of the slow-fading channel in strict sense is zero. However, it is possible to determine the largest value of such that the outage probability is less than . This value is known as the -outage capacity.

Fast-fading channel

In a fast-fading channel, where the latency requirement is greater than the coherence time and the codeword length spans many coherence periods, one can average over many independent channel fades by coding over a large number of coherence time intervals. Thus, it is possible to achieve a reliable rate of communication of [bits/s/Hz] and it is meaningful to speak of this value as the capacity of the fast-fading channel.

Feedback Capacity

Feedback capacity is the greatest rate at which information can be reliably transmitted, per unit time, over a point-to-point communication channel in which the receiver feeds back the channel outputs to the transmitter. Information-theoretic analysis of communication systems that incorporate feedback is more complicated and challenging than without feedback. Possibly, this was the reason C.E. Shannon chose feedback as the subject of the first Shannon Lecture, delivered at the 1973 IEEE International Symposium on Information Theory in Ashkelon, Israel.

The feedback capacity is characterized by the maximum of the directed information between the channel inputs and the channel outputs, where the maximization is with respect to the causal conditioning of the input given the output. The directed information was coined by James Massey [8] in 1990, who showed that its an upper bound on feedback capacity. For memoryless channels, Shannon showed [9] that feedback does not increase the capacity, and the feedback capacity coincides with the channel capacity characterized by the mutual information between the input and the output. The feedback capacity is known as a closed-form expression only for several examples such as: the Trapdoor channel, [10] Ising channel, [11] [12] the binary erasure channel with a no-consecutive-ones input constraint, NOST channels.

The basic mathematical model for a communication system is the following:

Communication with feedback Communication with feedback.png
Communication with feedback

Here is the formal definition of each element (where the only difference with respect to the nonfeedback capacity is the encoder definition):

That is, for each time there exists a feedback of the previous output such that the encoder has access to all previous outputs . An code is a pair of encoding and decoding mappings with , and is uniformly distributed. A rate is said to be achievable if there exists a sequence of codes such that the average probability of error: tends to zero as .

The feedback capacity is denoted by , and is defined as the supremum over all achievable rates.

Main results on feedback capacity

Let and be modeled as random variables. The causal conditioning describes the given channel. The choice of the causally conditional distribution determines the joint distribution due to the chain rule for causal conditioning [13] which, in turn, induces a directed information .

The feedback capacity is given by

,

where the supremum is taken over all possible choices of .

Gaussian feedback capacity

When the Gaussian noise is colored, the channel has memory. Consider for instance the simple case on an autoregressive model noise process where is an i.i.d. process.

Solution techniques

The feedback capacity is difficult to solve in the general case. There are some techniques that are related to control theory and Markov decision processes if the channel is discrete.

See also

Advanced Communication Topics

Related Research Articles

Information theory is the mathematical study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. The field, in applied mathematics, is at the intersection of probability theory, statistics, computer science, statistical mechanics, information engineering, and electrical engineering.

<span class="mw-page-title-main">Entropy (information theory)</span> Expected amount of information needed to specify the output of a stochastic data source

In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable , which takes values in the alphabet and is distributed according to :

<span class="mw-page-title-main">Random variable</span> Variable representing a random phenomenon

A random variable is a mathematical formalization of a quantity or object which depends on random events. The term 'random variable' can be misleading as its mathematical definition is not actually random nor a variable, but rather it is a function from possible outcomes in a sample space to a measurable space, often to the real numbers.

<span class="mw-page-title-main">Bernoulli process</span> Random process of binary (boolean) random variables

In probability and statistics, a Bernoulli process is a finite or infinite sequence of binary random variables, so it is a discrete-time stochastic process that takes only two values, canonically 0 and 1. The component Bernoulli variablesXi are identically distributed and independent. Prosaically, a Bernoulli process is a repeated coin flipping, possibly with an unfair coin. Every variable Xi in the sequence is associated with a Bernoulli trial or experiment. They all have the same Bernoulli distribution. Much of what can be said about the Bernoulli process can also be generalized to more than two outcomes ; this generalization is known as the Bernoulli scheme.

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

Additive white Gaussian noise (AWGN) is a basic noise model used in information theory to mimic the effect of many random processes that occur in nature. The modifiers denote specific characteristics:

In information theory, the typical set is a set of sequences whose probability is close to two raised to the negative power of the entropy of their source distribution. That this set has total probability close to one is a consequence of the asymptotic equipartition property (AEP) which is a kind of law of large numbers. The notion of typicality is only concerned with the probability of a sequence and not the actual sequence itself.

<span class="mw-page-title-main">Vapnik–Chervonenkis theory</span> Branch of statistical computational learning theory

Vapnik–Chervonenkis theory was developed during 1960–1990 by Vladimir Vapnik and Alexey Chervonenkis. The theory is a form of computational learning theory, which attempts to explain the learning process from a statistical point of view.

<span class="mw-page-title-main">Mutual information</span> Measure of dependence between two variables

In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual dependence between the two variables. More specifically, it quantifies the "amount of information" obtained about one random variable by observing the other random variable. The concept of mutual information is intimately linked to that of entropy of a random variable, a fundamental notion in information theory that quantifies the expected "amount of information" held in a random variable.

In information theory, the cross-entropy between two probability distributions and over the same underlying set of events measures the average number of bits needed to identify an event drawn from the set if a coding scheme used for the set is optimized for an estimated probability distribution , rather than the true distribution .

In information theory, the noisy-channel coding theorem, establishes that for any given degree of noise contamination of a communication channel, it is possible to communicate discrete data nearly error-free up to a computable maximum rate through the channel. This result was presented by Claude Shannon in 1948 and was based in part on earlier work and ideas of Harry Nyquist and Ralph Hartley.

In information theory, information dimension is an information measure for random vectors in Euclidean space, based on the normalized entropy of finely quantized versions of the random vectors. This concept was first introduced by Alfréd Rényi in 1959.

<span class="mw-page-title-main">Z-channel (information theory)</span>

In coding theory and information theory, a Z-channel or binary asymmetric channel is a communications channel used to model the behaviour of some data storage systems.

In coding theory, list decoding is an alternative to unique decoding of error-correcting codes for large error rates. The notion was proposed by Elias in the 1950s. The main idea behind list decoding is that the decoding algorithm instead of outputting a single possible message outputs a list of possibilities one of which is correct. This allows for handling a greater number of errors than that allowed by unique decoding.

Zero-forcing precoding is a method of spatial signal processing by which a multiple antenna transmitter can null the multiuser interference in a multi-user MIMO wireless communication system. When the channel state information is perfectly known at the transmitter, the zero-forcing precoder is given by the pseudo-inverse of the channel matrix. Zero-forcing has been used in LTE mobile networks.

In the theory of quantum communication, the quantum capacity is the highest rate at which quantum information can be communicated over many independent uses of a noisy quantum channel from a sender to a receiver. It is also equal to the highest rate at which entanglement can be generated over the channel, and forward classical communication cannot improve it. The quantum capacity theorem is important for the theory of quantum error correction, and more broadly for the theory of quantum computation. The theorem giving a lower bound on the quantum capacity of any channel is colloquially known as the LSD theorem, after the authors Lloyd, Shor, and Devetak who proved it with increasing standards of rigor.

The term Blahut–Arimoto algorithm is often used to refer to a class of algorithms for computing numerically either the information theoretic capacity of a channel, the rate-distortion function of a source or a source encoding. They are iterative algorithms that eventually converge to one of the maxima of the optimization problem that is associated with these information theoretic concepts.

Fuzzy extractors are a method that allows biometric data to be used as inputs to standard cryptographic techniques, to enhance computer security. "Fuzzy", in this context, refers to the fact that the fixed values required for cryptography will be extracted from values close to but not identical to the original key, without compromising the security required. One application is to encrypt and authenticate users records, using the biometric inputs of the user as a key.

In machine learning, the kernel embedding of distributions comprises a class of nonparametric methods in which a probability distribution is represented as an element of a reproducing kernel Hilbert space (RKHS). A generalization of the individual data-point feature mapping done in classical kernel methods, the embedding of distributions into infinite-dimensional feature spaces can preserve all of the statistical features of arbitrary distributions, while allowing one to compare and manipulate distributions using Hilbert space operations such as inner products, distances, projections, linear transformations, and spectral analysis. This learning framework is very general and can be applied to distributions over any space on which a sensible kernel function may be defined. For example, various kernels have been proposed for learning from data which are: vectors in , discrete classes/categories, strings, graphs/networks, images, time series, manifolds, dynamical systems, and other structured objects. The theory behind kernel embeddings of distributions has been primarily developed by Alex Smola, Le Song , Arthur Gretton, and Bernhard Schölkopf. A review of recent works on kernel embedding of distributions can be found in.

A Stein discrepancy is a statistical divergence between two probability measures that is rooted in Stein's method. It was first formulated as a tool to assess the quality of Markov chain Monte Carlo samplers, but has since been used in diverse settings in statistics, machine learning and computer science.

References

  1. Saleem Bhatti. "Channel capacity". Lecture notes for M.Sc. Data Communication Networks and Distributed Systems D51 -- Basic Communications and Networks. Archived from the original on 2007-08-21.
  2. Jim Lesurf. "Signals look like noise!". Information and Measurement, 2nd ed.
  3. Thomas M. Cover, Joy A. Thomas (2006). Elements of Information Theory. John Wiley & Sons, New York. ISBN   9781118585771.
  4. Cover, Thomas M.; Thomas, Joy A. (2006). "Chapter 7: Channel Capacity". Elements of Information Theory (Second ed.). Wiley-Interscience. pp. 206–207. ISBN   978-0-471-24195-9.
  5. Lovász, László (1979), "On the Shannon Capacity of a Graph", IEEE Transactions on Information Theory , IT-25 (1): 1–7, doi:10.1109/tit.1979.1055985 .
  6. David Tse, Pramod Viswanath (2005), Fundamentals of Wireless Communication, Cambridge University Press, UK, ISBN   9780521845274
  7. The Handbook of Electrical Engineering. Research & Education Association. 1996. p. D-149. ISBN   9780878919819.
  8. Massey, James (Nov 1990). "Causality, Feedback and Directed Information" (PDF). Proc. 1990 Int. Symp. On Information Theory and Its Applications (ISITA-90), Waikiki, HI.: 303–305.
  9. Shannon, C. (September 1956). "The zero error capacity of a noisy channel". IEEE Transactions on Information Theory. 2 (3): 8–19. doi:10.1109/TIT.1956.1056798.
  10. Permuter, Haim; Cuff, Paul; Van Roy, Benjamin; Weissman, Tsachy (July 2008). "Capacity of the trapdoor channel with feedback" (PDF). IEEE Trans. Inf. Theory. 54 (7): 3150–3165. arXiv: cs/0610047 . doi:10.1109/TIT.2008.924681. S2CID   1265.
  11. Elishco, Ohad; Permuter, Haim (September 2014). "Capacity and Coding for the Ising Channel With Feedback". IEEE Transactions on Information Theory. 60 (9): 5138–5149. arXiv: 1205.4674 . doi:10.1109/TIT.2014.2331951. S2CID   9761759.
  12. Aharoni, Ziv; Sabag, Oron; Permuter, Haim H. (September 2022). "Feedback Capacity of Ising Channels With Large Alphabet via Reinforcement Learning". IEEE Transactions on Information Theory. 68 (9): 5637–5656. doi:10.1109/TIT.2022.3168729. S2CID   248306743.
  13. Permuter, Haim Henry; Weissman, Tsachy; Goldsmith, Andrea J. (February 2009). "Finite State Channels With Time-Invariant Deterministic Feedback". IEEE Transactions on Information Theory. 55 (2): 644–662. arXiv: cs/0608070 . doi:10.1109/TIT.2008.2009849. S2CID   13178.