Arbitrarily varying channel

Last updated November 21, 2023

An arbitrarily varying channel (AVC) is a communication channel model used in coding theory, and was first introduced by Blackwell, Breiman, and Thomasian. This particular channel has unknown parameters that can change over time and these changes may not have a uniform pattern during the transmission of a codeword. $\textstyle n$ uses of this channel can be described using a stochastic matrix $\textstyle W^{n}:X^{n}\times$ $\textstyle S^{n}\rightarrow Y^{n}$ , where $\textstyle X$ is the input alphabet, $\textstyle Y$ is the output alphabet, and $\textstyle W^{n}(y|x,s)$ is the probability over a given set of states $\textstyle S$ , that the transmitted input $\textstyle x=(x_{1},\ldots ,x_{n})$ leads to the received output $\textstyle y=(y_{1},\ldots ,y_{n})$ . The state $\textstyle s_{i}$ in set $\textstyle S$ can vary arbitrarily at each time unit $\textstyle i$ . This channel was developed as an alternative to Shannon's Binary Symmetric Channel (BSC), where the entire nature of the channel is known, to be more realistic to actual network channel situations.

Capacities and associated proofs

Capacity of deterministic AVCs

An AVC's capacity can vary depending on the certain parameters.

$\textstyle R$ is an achievable rate for a deterministic AVC code if it is larger than $\textstyle 0$ , and if for every positive $\textstyle \varepsilon$ and $\textstyle \delta$ , and very large $\textstyle n$ , length- $\textstyle n$ block codes exist that satisfy the following equations: $\textstyle {\frac {1}{n}}\log N>R-\delta$ and $\displaystyle \max _{s\in S^{n}}{\bar {e}}(s)\leq \varepsilon$ , where $\textstyle N$ is the highest value in $\textstyle Y$ and where $\textstyle {\bar {e}}(s)$ is the average probability of error for a state sequence $\textstyle s$ . The largest rate $\textstyle R$ represents the capacity of the AVC, denoted by $\textstyle c$ .

As you can see, the only useful situations are when the capacity of the AVC is greater than $\textstyle 0$ , because then the channel can transmit a guaranteed amount of data $\textstyle \leq c$ without errors. So we start out with a theorem that shows when $\textstyle c$ is positive in an AVC and the theorems discussed afterward will narrow down the range of $\textstyle c$ for different circumstances.

Before stating Theorem 1, a few definitions need to be addressed:

An AVC is symmetric if $\displaystyle \sum _{s\in S}W(y|x,s)U(s|x')=\sum _{s\in S}W(y|x',s)U(s|x)$ for every $\textstyle (x,x',y,s)$ , where $\textstyle x,x'\in X$ , $\textstyle y\in Y$ , and $\textstyle U(s|x)$ is a channel function $\textstyle U:X\rightarrow S$ .
$\textstyle X_{r}$ , $\textstyle S_{r}$ , and $\textstyle Y_{r}$ are all random variables in sets $\textstyle X$ , $\textstyle S$ , and $\textstyle Y$ respectively.
$\textstyle P_{X_{r}}(x)$ is equal to the probability that the random variable $\textstyle X_{r}$ is equal to $\textstyle x$ .
$\textstyle P_{S_{r}}(s)$ is equal to the probability that the random variable $\textstyle S_{r}$ is equal to $\textstyle s$ .
$\textstyle P_{X_{r}S_{r}Y_{r}}$ is the combined probability mass function (pmf) of $\textstyle P_{X_{r}}(x)$ , $\textstyle P_{S_{r}}(s)$ , and $\textstyle W(y|x,s)$ . $\textstyle P_{X_{r}S_{r}Y_{r}}$ is defined formally as $\textstyle P_{X_{r}S_{r}Y_{r}}(x,s,y)=P_{X_{r}}(x)P_{S_{r}}(s)W(y|x,s)$ .
$\textstyle H(X_{r})$ is the entropy of $\textstyle X_{r}$ .
$\textstyle H(X_{r}|Y_{r})$ is equal to the average probability that $\textstyle X_{r}$ will be a certain value based on all the values $\textstyle Y_{r}$ could possibly be equal to.
$\textstyle I(X_{r}\land Y_{r})$ is the mutual information of $\textstyle X_{r}$ and $\textstyle Y_{r}$ , and is equal to $\textstyle H(X_{r})-H(X_{r}|Y_{r})$ .
$\displaystyle I(P)=\min _{Y_{r}}I(X_{r}\land Y_{r})$ , where the minimum is over all random variables $\textstyle Y_{r}$ such that $\textstyle X_{r}$ , $\textstyle S_{r}$ , and $\textstyle Y_{r}$ are distributed in the form of $\textstyle P_{X_{r}S_{r}Y_{r}}$ .

Theorem 1: $\textstyle c>0$ if and only if the AVC is not symmetric. If $\textstyle c>0$ , then $\displaystyle c=\max _{P}I(P)$ .

Proof of 1st part for symmetry: If we can prove that $\textstyle I(P)$ is positive when the AVC is not symmetric, and then prove that $\textstyle c=\max _{P}I(P)$ , we will be able to prove Theorem 1. Assume $\textstyle I(P)$ were equal to $\textstyle 0$ . From the definition of $\textstyle I(P)$ , this would make $\textstyle X_{r}$ and $\textstyle Y_{r}$ independent random variables, for some $\textstyle S_{r}$ , because this would mean that neither random variable's entropy would rely on the other random variable's value. By using equation $\textstyle P_{X_{r}S_{r}Y_{r}}$ , (and remembering $\textstyle P_{X_{r}}=P$ ,) we can get,

\displaystyle P_{Y_{r}}(y)=\sum _{x\in X}\sum _{s\in S}P(x)P_{S_{r}}(s)W(y|x,s)

\textstyle \equiv (

since

\textstyle X_{r}

and

\textstyle Y_{r}

are independent random variables,

\textstyle W(y|x,s)=W'(y|s)

for some

\textstyle W')

\displaystyle P_{Y_{r}}(y)=\sum _{x\in X}\sum _{s\in S}P(x)P_{S_{r}}(s)W'(y|s)

\textstyle \equiv (

because only

\textstyle P(x)

depends on

\textstyle x

now

\textstyle )

\displaystyle P_{Y_{r}}(y)=\sum _{s\in S}P_{S_{r}}(s)W'(y|s)\left[\sum _{x\in X}P(x)\right]

\textstyle \equiv (

because

\displaystyle \sum _{x\in X}P(x)=1)

\displaystyle P_{Y_{r}}(y)=\sum _{s\in S}P_{S_{r}}(s)W'(y|s)

So now we have a probability distribution on $\textstyle Y_{r}$ that is independent of $\textstyle X_{r}$ . So now the definition of a symmetric AVC can be rewritten as follows: $\displaystyle \sum _{s\in S}W'(y|s)P_{S_{r}}(s)=\sum _{s\in S}W'(y|s)P_{S_{r}}(s)$ since $\textstyle U(s|x)$ and $\textstyle W(y|x,s)$ are both functions based on $\textstyle x$ , they have been replaced with functions based on $\textstyle s$ and $\textstyle y$ only. As you can see, both sides are now equal to the $\textstyle P_{Y_{r}}(y)$ we calculated earlier, so the AVC is indeed symmetric when $\textstyle I(P)$ is equal to $\textstyle 0$ . Therefore, $\textstyle I(P)$ can only be positive if the AVC is not symmetric.

Proof of second part for capacity: See the paper "The capacity of the arbitrarily varying channel revisited: positivity, constraints," referenced below for full proof.

Capacity of AVCs with input and state constraints

The next theorem will deal with the capacity for AVCs with input and/or state constraints. These constraints help to decrease the very large range of possibilities for transmission and error on an AVC, making it a bit easier to see how the AVC behaves.

Before we go on to Theorem 2, we need to define a few definitions and lemmas:

For such AVCs, there exists:

- An input constraint

\textstyle \Gamma

based on the equation

\displaystyle g(x)={\frac {1}{n}}\sum _{i=1}^{n}g(x_{i})

, where

\textstyle x\in X

and

\textstyle x=(x_{1},\dots ,x_{n})

.

- A state constraint

\textstyle \Lambda

, based on the equation

\displaystyle l(s)={\frac {1}{n}}\sum _{i=1}^{n}l(s_{i})

, where

\textstyle s\in X

and

\textstyle s=(s_{1},\dots ,s_{n})

.

-

\displaystyle \Lambda _{0}(P)=\min \sum _{x\in X,s\in S}P(x)l(s)

-

\textstyle I(P,\Lambda )

is very similar to

\textstyle I(P)

equation mentioned previously,

\displaystyle I(P,\Lambda )=\min _{Y_{r}}I(X_{r}\land Y_{r})

, but now any state

\textstyle s

or

\textstyle S_{r}

in the equation must follow the

\textstyle l(s)\leq \Lambda

state restriction.

Assume $\textstyle g(x)$ is a given non-negative-valued function on $\textstyle X$ and $\textstyle l(s)$ is a given non-negative-valued function on $\textstyle S$ and that the minimum values for both is $\textstyle 0$ . In the literature I have read on this subject, the exact definitions of both $\textstyle g(x)$ and $\textstyle l(s)$ (for one variable $\textstyle x_{i}$ ,) is never described formally. The usefulness of the input constraint $\textstyle \Gamma$ and the state constraint $\textstyle \Lambda$ will be based on these equations.

For AVCs with input and/or state constraints, the rate $\textstyle R$ is now limited to codewords of format $\textstyle x_{1},\dots ,x_{N}$ that satisfy $\textstyle g(x_{i})\leq \Gamma$ , and now the state $\textstyle s$ is limited to all states that satisfy $\textstyle l(s)\leq \Lambda$ . The largest rate is still considered the capacity of the AVC, and is now denoted as $\textstyle c(\Gamma ,\Lambda )$ .

Lemma 1: Any codes where $\textstyle \Lambda$ is greater than $\textstyle \Lambda _{0}(P)$ cannot be considered "good" codes, because those kinds of codes have a maximum average probability of error greater than or equal to $\textstyle {\frac {N-1}{2N}}-{\frac {1}{n}}{\frac {l_{max}^{2}}{n(\Lambda -\Lambda _{0}(P))^{2}}}$ , where $\textstyle l_{max}$ is the maximum value of $\textstyle l(s)$ . This isn't a good maximum average error probability because it is fairly large, $\textstyle {\frac {N-1}{2N}}$ is close to $\textstyle {\frac {1}{2}}$ , and the other part of the equation will be very small since the $\textstyle (\Lambda -\Lambda _{0}(P))$ value is squared, and $\textstyle \Lambda$ is set to be larger than $\textstyle \Lambda _{0}(P)$ . Therefore, it would be very unlikely to receive a codeword without error. This is why the $\textstyle \Lambda _{0}(P)$ condition is present in Theorem 2.

Theorem 2: Given a positive $\textstyle \Lambda$ and arbitrarily small $\textstyle \alpha >0$ , $\textstyle \beta >0$ , $\textstyle \delta >0$ , for any block length $\textstyle n\geq n_{0}$ and for any type $\textstyle P$ with conditions $\textstyle \Lambda _{0}(P)\geq \Lambda +\alpha$ and $\displaystyle \min _{x\in X}P(x)\geq \beta$ , and where $\textstyle P_{X_{r}}=P$ , there exists a code with codewords $\textstyle x_{1},\dots ,x_{N}$ , each of type $\textstyle P$ , that satisfy the following equations: $\textstyle {\frac {1}{n}}\log N>I(P,\Lambda )-\delta$ , $\displaystyle \max _{l(s)\leq \Lambda }{\bar {e}}(s)\leq \exp(-n\gamma )$ , and where positive $\textstyle n_{0}$ and $\textstyle \gamma$ depend only on $\textstyle \alpha$ , $\textstyle \beta$ , $\textstyle \delta$ , and the given AVC.

Proof of Theorem 2: See the paper "The capacity of the arbitrarily varying channel revisited: positivity, constraints," referenced below for full proof.

Capacity of randomized AVCs

The next theorem will be for AVCs with randomized code. For such AVCs the code is a random variable with values from a family of length-n block codes, and these codes are not allowed to depend/rely on the actual value of the codeword. These codes have the same maximum and average error probability value for any channel because of its random nature. These types of codes also help to make certain properties of the AVC more clear.

Before we go on to Theorem 3, we need to define a couple important terms first:

$\displaystyle W_{\zeta }(y|x)=\sum _{s\in S}W(y|x,s)P_{S_{r}}(s)$
$\textstyle I(P,\zeta )$ is very similar to the $\textstyle I(P)$ equation mentioned previously, $\displaystyle I(P,\zeta )=\min _{Y_{r}}I(X_{r}\land Y_{r})$ , but now the pmf $\textstyle P_{S_{r}}(s)$ is added to the equation, making the minimum of $\textstyle I(P,\zeta )$ based a new form of $\textstyle P_{X_{r}S_{r}Y_{r}}$ , where $\textstyle W_{\zeta }(y|x)$ replaces $\textstyle W(y|x,s)$ .

Theorem 3: The capacity for randomized codes of the AVC is $\displaystyle c=max_{P}I(P,\zeta )$ .

Proof of Theorem 3: See paper "The Capacities of Certain Channel Classes Under Random Coding" referenced below for full proof.

Related Research Articles

In mathematics, and more specifically in linear algebra, a linear map is a mapping $between two vector spaces that preserves the operations of vector addition and scalar multiplication. The same names and the same definition are also used for the more general case of modules over a ring; see Module homomorphism.$

<span class="mw-page-title-main">Negative binomial distribution</span> Probability distribution

In probability theory and statistics, the negative binomial distribution is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Bernoulli trials before a specified (non-random) number of successes occurs. For example, we can define rolling a 6 on a die as a success, and rolling any other number as a failure, and ask how many failure rolls will occur before we see the third success. In such a case, the probability distribution of the number of failures that appear will be a negative binomial distribution.

In mathematical optimization, the method of Lagrange multipliers is a strategy for finding the local maxima and minima of a function subject to equation constraints. It is named after the mathematician Joseph-Louis Lagrange.

In physics, the CHSH inequality can be used in the proof of Bell's theorem, which states that certain consequences of entanglement in quantum mechanics cannot be reproduced by local hidden-variable theories. Experimental verification of the inequality being violated is seen as confirmation that nature cannot be described by such theories. CHSH stands for John Clauser, Michael Horne, Abner Shimony, and Richard Holt, who described it in a much-cited paper published in 1969. They derived the CHSH inequality, which, as with John Stewart Bell's original inequality, is a constraint—on the statistical occurrence of "coincidences" in a Bell test—which is necessarily true if an underlying local hidden-variable theory exists. In practice, the inequality is routinely violated by modern experiments in quantum mechanics.

The principle of maximum entropy states that the probability distribution which best represents the current state of knowledge about a system is the one with largest entropy, in the context of precisely stated prior data.

In the theory of stochastic processes, the Karhunen–Loève theorem, also known as the Kosambi–Karhunen–Loève theorem states that a stochastic process can be represented as an infinite linear combination of orthogonal functions, analogous to a Fourier series representation of a function on a bounded interval. The transformation is also known as Hotelling transform and eigenvector transform, and is closely related to principal component analysis (PCA) technique widely used in image processing and in data analysis in many fields.

In probability theory, a compound Poisson distribution is the probability distribution of the sum of a number of independent identically-distributed random variables, where the number of terms to be added is itself a Poisson-distributed variable. The result can be either a continuous or a discrete distribution.

In mathematics, the discrete Laplace operator is an analog of the continuous Laplace operator, defined so that it has meaning on a graph or a discrete grid. For the case of a finite-dimensional graph, the discrete Laplace operator is more commonly called the Laplacian matrix.

In probability theory and mathematical physics, a random matrix is a matrix-valued random variable—that is, a matrix in which some or all elements are random variables. Many important properties of physical systems can be represented mathematically as matrix problems. For example, the thermal conductivity of a lattice can be computed from the dynamical matrix of the particle-particle interactions within the lattice.

In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entropy, if nothing is known about a distribution except that it belongs to a certain class, then the distribution with the largest entropy should be chosen as the least-informative default. The motivation is twofold: first, maximizing entropy minimizes the amount of prior information built into the distribution; second, many physical systems tend to move towards maximal entropy configurations over time.

The principle of detailed balance can be used in kinetic systems which are decomposed into elementary processes. It states that at equilibrium, each elementary process is in equilibrium with its reverse process.

In probability theory, the inverse Gaussian distribution is a two-parameter family of continuous probability distributions with support on (0,∞).

In the mathematical discipline of graph theory, the expander walk sampling theorem intuitively states that sampling vertices in an expander graph by doing relatively short random walk can simulate sampling the vertices independently from a uniform distribution. The earliest version of this theorem is due to Ajtai, Komlós & Szemerédi (1987), and the more general version is typically attributed to Gillman (1998).

In mathematics, the Prékopa–Leindler inequality is an integral inequality closely related to the reverse Young's inequality, the Brunn–Minkowski inequality and a number of other important and classical inequalities in analysis. The result is named after the Hungarian mathematicians András Prékopa and László Leindler.

In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event. It is named after French mathematician Siméon Denis Poisson. The Poisson distribution can also be used for the number of events in other specified interval types such as distance, area, or volume. It plays an important role for discrete-stable distributions.

In the mathematical theory of random matrices, the Marchenko–Pastur distribution, or Marchenko–Pastur law, describes the asymptotic behavior of singular values of large rectangular random matrices. The theorem is named after Soviet mathematicians Volodymyr Marchenko and Leonid Pastur who proved this result in 1967.

In probability theory and statistics, Campbell's theorem or the Campbell–Hardy theorem is either a particular equation or set of results relating to the expectation of a function summed over a point process to an integral involving the mean measure of the point process, which allows for the calculation of expected value and variance of the random sum. One version of the theorem, also known as Campbell's formula, entails an integral equation for the aforementioned sum over a general point process, and not necessarily a Poisson point process. There also exist equations involving moment measures and factorial moment measures that are considered versions of Campbell's formula. All these results are employed in probability and statistics with a particular importance in the theory of point processes and queueing theory as well as the related fields stochastic geometry, continuum percolation theory, and spatial statistics.

In statistics, a zero-inflated model is a statistical model based on a zero-inflated probability distribution, i.e. a distribution that allows for frequent zero-valued observations.

In probability, statistics and related fields, a Poisson point process is a type of random mathematical object that consists of points randomly located on a mathematical space with the essential feature that the points occur independently of one another. The Poisson point process is also called a Poisson random measure, Poisson random point field or Poisson point field. When the process is defined on the real line, it is often called simply the Poisson process.

Maximal entropy random walk (MERW) is a popular type of biased random walk on a graph, in which transition probabilities are chosen accordingly to the principle of maximum entropy, which says that the probability distribution which best represents the current state of knowledge is the one with largest entropy. While standard random walk chooses for every vertex uniform probability distribution among its outgoing edges, locally maximizing entropy rate, MERW maximizes it globally by assuming uniform probability distribution among all paths in a given graph.

References

Ahlswede, Rudolf and Blinovsky, Vladimir, "Classical Capacity of Classical-Quantum Arbitrarily Varying Channels," https://ieeexplore.ieee.org/document/4069128
Blackwell, David, Breiman, Leo, and Thomasian, A. J., "The Capacities of Certain Channel Classes Under Random Coding," https://www.jstor.org/stable/2237566
Csiszar, I. and Narayan, P., "Arbitrarily varying channels with constrained inputs and states," http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=2598&isnumber=154
Csiszar, I. and Narayan, P., "Capacity and Decoding Rules for Classes of Arbitrarily Varying Channels," http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=32153&isnumber=139
Csiszar, I. and Narayan, P., "The capacity of the arbitrarily varying channel revisited: positivity, constraints," http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=2627&isnumber=155
Lapidoth, A. and Narayan, P., "Reliable communication under channel uncertainty," http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=720535&isnumber=15554

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.