Z-channel (information theory)

Last updated June 14, 2024

The Z-channel sees each 0 bit of a message transmitted correctly always and each 1 bit transmitted correctly with probability 1-p, due to noise across the transmission medium. Z-channel.svg — The Z-channel sees each 0 bit of a message transmitted correctly always and each 1 bit transmitted correctly with probability 1–p, due to noise across the transmission medium.

In coding theory and information theory, a Z-channel or binary asymmetric channel is a communications channel used to model the behaviour of some data storage systems.

Definition

A Z-channel is a channel with binary input and binary output, where each 0 bit is transmitted correctly, but each 1 bit has probability p of being transmitted incorrectly as a 0, and probability 1–p of being transmitted correctly as a 1. In other words, if X and Y are the random variables describing the probability distributions of the input and the output of the channel, respectively, then the crossovers of the channel are characterized by the conditional probabilities:^[1]

{\begin{aligned}\operatorname {Pr} [Y=0|X=0]&=1\\\operatorname {Pr} [Y=0|X=1]&=p\\\operatorname {Pr} [Y=1|X=0]&=0\\\operatorname {Pr} [Y=1|X=1]&=1-p\end{aligned}}

Capacity

The channel capacity ${\mathsf {cap}}(\mathbb {Z} )$ of the Z-channel $\mathbb {Z}$ with the crossover 1 → 0 probability p, when the input random variable X is distributed according to the Bernoulli distribution with probability $\alpha$ for the occurrence of 0, is given by the following equation:

{\mathsf {cap}}(\mathbb {Z} )={\mathsf {H}}\left({\frac {1}{1+2^{{\mathsf {s}}(p)}}}\right)-{\frac {{\mathsf {s}}(p)}{1+2^{{\mathsf {s}}(p)}}}=\log _{2}(1{+}2^{-{\mathsf {s}}(p)})=\log _{2}\left(1+(1-p)p^{p/(1-p)}\right)

where ${\mathsf {s}}(p)={\frac {{\mathsf {H}}(p)}{1-p}}$ for the binary entropy function ${\mathsf {H}}(\cdot )$ .

This capacity is obtained when the input variable X has Bernoulli distribution with probability $\alpha$ of having value 0 and $1-\alpha$ of value 1, where:

\alpha =1-{\frac {1}{(1-p)(1+2^{{\mathsf {H}}(p)/(1-p)})}},

For small p, the capacity is approximated by

{\mathsf {cap}}(\mathbb {Z} )\approx 1-0.5{\mathsf {H}}(p)

as compared to the capacity $1{-}{\mathsf {H}}(p)$ of the binary symmetric channel with crossover probability p.

Calculation^[2]

{\mathsf {cap}}(\mathbb {Z} )=\max _{\alpha }\{{\mathsf {H}}(Y)-{\mathsf {H}}(Y\mid X)\}=\max _{\alpha }{\Bigl \{}{\mathsf {H}}(Y)-\sum _{x\in \{0,1\}}{\mathsf {H}}(Y\mid X=x){\mathsf {Prob}}\{X=x\}{\Bigr \}}

=\max _{\alpha }\{{\mathsf {H}}((1-\alpha )(1-p))-{\mathsf {H}}(Y\mid X=1){\mathsf {Prob}}\{X=1\}\}

=\max _{\alpha }\{{\mathsf {H}}((1-\alpha )(1-p))-(1-\alpha ){\mathsf {H}}(p)\},

To find the maximum we differentiate

{\frac {d}{d\alpha }}{\mathsf {cap}}(\mathbb {Z} )=-(1-p)\log _{2}\left({\frac {1-(1-\alpha )(1-p)}{(1-\alpha )(1-p)}}\right)+{\mathsf {H}}(p)

And we see the maximum is attained for

\alpha =1-{\frac {1}{(1-p)(1+2^{{\mathsf {H}}(p)/(1-p)})}},

yielding the following value of ${\mathsf {cap}}(\mathbb {Z} )$ as a function of p

{\mathsf {cap}}(\mathbb {Z} )={\mathsf {H}}\left({\frac {1}{1+2^{{\mathsf {s}}(p)}}}\right)-{\frac {{\mathsf {s}}(p)}{1+2^{{\mathsf {s}}(p)}}}=\log _{2}(1{+}2^{-{\mathsf {s}}(p)})=\log _{2}\left(1+(1-p)p^{p/(1-p)}\right)\;{\textrm {where}}\;{\mathsf {s}}(p)={\frac {{\mathsf {H}}(p)}{1-p}}.

For any p, $\alpha <0.5$ (i.e. more 0s should be transmitted than 1s) because transmitting a 1 introduces noise. As $p\rightarrow 1$ , the limiting value of $\alpha$ is ${\frac {1}{e}}$ .^[2]

Bounds on the size of an asymmetric-error-correcting code

Define the following distance function ${\mathsf {d}}_{A}(\mathbf {x} ,\mathbf {y} )$ on the words $\mathbf {x} ,\mathbf {y} \in \{0,1\}^{n}$ of length n transmitted via a Z-channel

{\mathsf {d}}_{A}(\mathbf {x} ,\mathbf {y} ){\stackrel {\vartriangle }{=}}\max \left\{{\big |}\{i\mid x_{i}=0,y_{i}=1\}{\big |},{\big |}\{i\mid x_{i}=1,y_{i}=0\}{\big |}\right\}.

Define the sphere $V_{t}(\mathbf {x} )$ of radius t around a word $\mathbf {x} \in \{0,1\}^{n}$ of length n as the set of all the words at distance t or less from $\mathbf {x}$ , in other words,

V_{t}(\mathbf {x} )=\{\mathbf {y} \in \{0,1\}^{n}\mid {\mathsf {d}}_{A}(\mathbf {x} ,\mathbf {y} )\leq t\}.

A code ${\mathcal {C}}$ of length n is said to be t-asymmetric-error-correcting if for any two codewords $\mathbf {c} \neq \mathbf {c} '\in \{0,1\}^{n}$ , one has $V_{t}(\mathbf {c} )\cap V_{t}(\mathbf {c} ')=\emptyset$ . Denote by $M(n,t)$ the maximum number of codewords in a t-asymmetric-error-correcting code of length n.

The Varshamov bound. For n≥1 and t≥1,

M(n,t)\leq {\frac {2^{n+1}}{\sum _{j=0}^{t}{\left({\binom {\lfloor n/2\rfloor }{j}}+{\binom {\lceil n/2\rceil }{j}}\right)}}}.

The constant-weight^{[ clarification needed ]} code bound. For n > 2t ≥ 2, let the sequence B₀, B₁, ..., B_n-2t-1 be defined as

B_{0}=2,\quad B_{i}=\min _{0\leq j<i}\{B_{j}+A(n{+}t{+}i{-}j{-}1,2t{+}2,t{+}i)\}

for

i>0

.

Then $M(n,t)\leq B_{n-2t-1}.$

Notes

↑ MacKay (2003), p. 148.
1 2 MacKay (2003), p. 159.

Related Research Articles

In computational complexity theory, bounded-error quantum polynomial time (BQP) is the class of decision problems solvable by a quantum computer in polynomial time, with an error probability of at most 1/3 for all instances. It is the quantum analogue to the complexity class BPP.

In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable $, or just distribution function of, evaluated at, is the probability that will take a value less than or equal to .$

In mathematical physics and mathematics, the Pauli matrices are a set of three $2 \times 2$ complex matrices that are Hermitian, involutory and unitary. Usually indicated by the Greek letter sigma, they are occasionally denoted by tau when used in connection with isospin symmetries.

The likelihood function is the joint probability mass of observed data viewed as a function of the parameters of a statistical model. Intuitively, the likelihood function $is the probability of observing data assuming is the actual parameter.$

In machine learning, support vector machines are supervised max-margin models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratories by Vladimir Vapnik with colleagues SVMs are one of the most studied models, being based on statistical learning frameworks of VC theory proposed by Vapnik and Chervonenkis (1974).

In mechanics and geometry, the 3D rotation group, often denoted SO(3), is the group of all rotations about the origin of three-dimensional Euclidean space $under the operation of composition.$

<span class="mw-page-title-main">Covariance matrix</span> Measure of covariance of components of a random vector

In probability theory and statistics, a covariance matrix is a square matrix giving the covariance between each pair of elements of a given random vector.

In particle physics, the Georgi–Glashow model is a particular Grand Unified Theory (GUT) proposed by Howard Georgi and Sheldon Glashow in 1974. In this model, the Standard Model gauge groups SU(3) × SU(2) × U(1) are combined into a single simple gauge group SU(5). The unified group SU(5) is then thought to be spontaneously broken into the Standard Model subgroup below a very high energy scale called the grand unification scale.

In linear algebra, a rotation matrix is a transformation matrix that is used to perform a rotation in Euclidean space. For example, using the convention below, the matrix

<span class="mw-page-title-main">Dirichlet distribution</span> Probability distribution

In probability and statistics, the Dirichlet distribution (after Peter Gustav Lejeune Dirichlet), often denoted $, is a family of continuous multivariate probability distributions parameterized by a vector of positive reals. It is a multivariate generalization of the beta distribution, hence its alternative name of multivariate beta distribution (MBD) . Dirichlet distributions are commonly used as prior distributions in Bayesian statistics, and in fact, the Dirichlet distribution is the conjugate prior of the categorical distribution and multinomial distribution.$

A continuous-time Markov chain (CTMC) is a continuous stochastic process in which, for each state, the process will change state according to an exponential random variable and then move to a different state as specified by the probabilities of a stochastic matrix. An equivalent formulation describes the process as changing state according to the least value of a set of exponential random variables, one for each possible state it can move to, with the parameters determined by the current state.

In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is positive-semidefinite. The conjugate gradient method is often implemented as an iterative algorithm, applicable to sparse systems that are too large to be handled by a direct implementation or other direct methods such as the Cholesky decomposition. Large sparse systems often arise when numerically solving partial differential equations or optimization problems.

In statistics, a binomial proportion confidence interval is a confidence interval for the probability of success calculated from the outcome of a series of success–failure experiments. In other words, a binomial proportion confidence interval is an interval estimate of a success probability $when only the number of experiments and the number of successes are known.$

In computational complexity theory, PostBQP is a complexity class consisting of all of the computational problems solvable in polynomial time on a quantum Turing machine with postselection and bounded error.

<span class="mw-page-title-main">Dual quaternion</span> Eight-dimensional algebra over the real numbers

In mathematics, the dual quaternions are an 8-dimensional real algebra isomorphic to the tensor product of the quaternions and the dual numbers. Thus, they may be constructed in the same way as the quaternions, except using dual numbers instead of real numbers as coefficients. A dual quaternion can be represented in the form A + εB, where A and B are ordinary quaternions and ε is the dual unit, which satisfies ε² = 0 and commutes with every element of the algebra. Unlike quaternions, the dual quaternions do not form a division algebra.

In probability theory and statistics, the Dirichlet-multinomial distribution is a family of discrete multivariate probability distributions on a finite support of non-negative integers. It is also called the Dirichlet compound multinomial distribution (DCM) or multivariate Pólya distribution. It is a compound probability distribution, where a probability vector p is drawn from a Dirichlet distribution with parameter vector $, and an observation drawn from a multinomial distribution with probability vector p and number of trials n . The Dirichlet parameter vector captures the prior belief about the situation and can be seen as a pseudocount: observations of each outcome that occur before the actual data is collected. The compounding corresponds to a Pólya urn scheme. It is frequently encountered in Bayesian statistics, machine learning, empirical Bayes methods and classical statistics as an overdispersed multinomial distribution.$

In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known constant mean rate and independently of the time since the last event. It can also be used for the number of events in other types of intervals than time, and in dimension greater than 1.

IQ imbalance is a performance-limiting issue in the design of a class of radio receivers known as direct conversion receivers. These translate the received radio frequency signal directly from the carrier frequency $to baseband using a single mixing stage.$

The hyperbolastic functions, also known as hyperbolastic growth models, are mathematical functions that are used in medical statistical modeling. These models were originally developed to capture the growth dynamics of multicellular tumor spheres, and were introduced in 2005 by Mohammad Tabatabai, David Williams, and Zoran Bursac. The precision of hyperbolastic functions in modeling real world problems is somewhat due to their flexibility in their point of inflection. These functions can be used in a wide variety of modeling problems such as tumor growth, stem cell proliferation, pharma kinetics, cancer growth, sigmoid activation function in neural networks, and epidemiological disease progression or regression.

In machine learning, diffusion models, also known as diffusion probabilistic models or score-based generative models, are a class of latent variable generative models. A diffusion model consists of three major components: the forward process, the reverse process, and the sampling procedure. The goal of diffusion models is to learn a diffusion process that generates a probability distribution for a given dataset from which we can then sample new images. They learn the latent structure of a dataset by modeling the way in which data points diffuse through their latent space.

References

MacKay, David J.C. (2003). Information Theory, Inference, and Learning Algorithms. Cambridge University Press. ISBN 0-521-64298-1.
Kløve, T. (1981). "Error correcting codes for the asymmetric channel". Technical Report 18–09–07–81. Norway: Department of Informatics, University of Bergen.
Verdú, S. (1997). "Channel Capacity (73.5)". The electrical engineering handbook (second ed.). IEEE Press and CRC Press. pp. 1671–1678.
Tallini, L.G.; Al-Bassam, S.; Bose, B. (2002). On the capacity and codes for the Z-channel. Proceedings of the IEEE International Symposium on Information Theory. Lausanne, Switzerland. p. 422.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[FOOTNOTEMacKay2003148-1] MacKay (2003), p. 148.

[FOOTNOTEMacKay2003159-2] 1 2 MacKay (2003), p. 159.

[1]

[2]