Quantum relative entropy

Last updated December 29, 2022

In quantum information theory, quantum relative entropy is a measure of distinguishability between two quantum states. It is the quantum mechanical analog of relative entropy.

Motivation

For simplicity, it will be assumed that all objects in the article are finite-dimensional.

We first discuss the classical case. Suppose the probabilities of a finite sequence of events is given by the probability distribution P = {p₁...p_n}, but somehow we mistakenly assumed it to be Q = {q₁...q_n}. For instance, we can mistake an unfair coin for a fair one. According to this erroneous assumption, our uncertainty about the j-th event, or equivalently, the amount of information provided after observing the j-th event, is

\;-\log q_{j}.

The (assumed) average uncertainty of all possible events is then

\;-\sum _{j}p_{j}\log q_{j}.

On the other hand, the Shannon entropy of the probability distribution p, defined by

\;-\sum _{j}p_{j}\log p_{j},

is the real amount of uncertainty before observation. Therefore the difference between these two quantities

\;-\sum _{j}p_{j}\log q_{j}-\left(-\sum _{j}p_{j}\log p_{j}\right)=\sum _{j}p_{j}\log p_{j}-\sum _{j}p_{j}\log q_{j}

is a measure of the distinguishability of the two probability distributions p and q. This is precisely the classical relative entropy, or Kullback–Leibler divergence:

D_{\mathrm {KL} }(P\|Q)=\sum _{j}p_{j}\log {\frac {p_{j}}{q_{j}}}\!.

Note

In the definitions above, the convention that 0·log 0 = 0 is assumed, since $\lim _{x\searrow 0}x\log(x)=0$ . Intuitively, one would expect that an event of zero probability to contribute nothing towards entropy.
The relative entropy is not a metric. For example, it is not symmetric. The uncertainty discrepancy in mistaking a fair coin to be unfair is not the same as the opposite situation.

Definition

As with many other objects in quantum information theory, quantum relative entropy is defined by extending the classical definition from probability distributions to density matrices. Let ρ be a density matrix. The von Neumann entropy of ρ, which is the quantum mechanical analog of the Shannon entropy, is given by

S(\rho )=-\operatorname {Tr} \rho \log \rho .

For two density matrices ρ and σ, the quantum relative entropy of ρ with respect to σ is defined by

S(\rho \|\sigma )=-\operatorname {Tr} \rho \log \sigma -S(\rho )=\operatorname {Tr} \rho \log \rho -\operatorname {Tr} \rho \log \sigma =\operatorname {Tr} \rho (\log \rho -\log \sigma ).

We see that, when the states are classically related, i.e. ρσ = σρ, the definition coincides with the classical case, in the sense that if $\rho =SD_{1}S^{\mathsf {T}}$ and $\sigma =SD_{2}S^{\mathsf {T}}$ with $D_{1}={\text{diag}}(\lambda _{1},\ldots ,\lambda _{n})$ and $D_{2}={\text{diag}}(\mu _{1},\ldots ,\mu _{n})$ (because $\rho$ and $\sigma$ commute, they are simultaneously diagonalizable), then $S(\rho \|\sigma )=\sum _{j=1}^{n}\lambda _{j}\ln \left({\frac {\lambda _{j}}{\mu _{j}}}\right)$ is just the ordinary Kullback-Leibler divergence of the probability vector $(\lambda _{1},\ldots ,\lambda _{n})$ with respect to the probability vector $(\mu _{1},\ldots ,\mu _{n})$ .

Non-finite (divergent) relative entropy

In general, the support of a matrix M is the orthogonal complement of its kernel, i.e. ${\text{supp}}(M)={\text{ker}}(M)^{\perp }$ . When considering the quantum relative entropy, we assume the convention that −s · log 0 = ∞ for any s > 0. This leads to the definition that

S(\rho \|\sigma )=\infty

when

{\text{supp}}(\rho )\cap {\text{ker}}(\sigma )\neq \{0\}.

This can be interpreted in the following way. Informally, the quantum relative entropy is a measure of our ability to distinguish two quantum states where larger values indicate states that are more different. Being orthogonal represents the most different quantum states can be. This is reflected by non-finite quantum relative entropy for orthogonal quantum states. Following the argument given in the Motivation section, if we erroneously assume the state $\rho$ has support in ${\text{ker}}(\sigma )$ , this is an error impossible to recover from.

However, one should be careful not to conclude that the divergence of the quantum relative entropy $S(\rho \|\sigma )$ implies that the states $\rho$ and $\sigma$ are orthogonal or even very different by other measures. Specifically, $S(\rho \|\sigma )$ can diverge when $\rho$ and $\sigma$ differ by a vanishingly small amount as measured by some norm. For example, let $\sigma$ have the diagonal representation

$\sigma =\sum _{n}\lambda _{n}|f_{n}\rangle \langle f_{n}|$

with $\lambda _{n}>0$ for $n=0,1,2,\ldots$ and $\lambda _{n}=0$ for $n=-1,-2,\ldots$ where $\{|f_{n}\rangle ,n\in \mathbb {Z} \}$ is an orthonormal set. The kernel of $\sigma$ is the space spanned by the set $\{|f_{n}\rangle ,n=-1,-2,\ldots \}$ . Next let

$\rho =\sigma +\epsilon |f_{-1}\rangle \langle f_{-1}|-\epsilon |f_{1}\rangle \langle f_{1}|$

for a small positive number $\epsilon$ . As $\rho$ has support (namely the state $|f_{-1}\rangle$ ) in the kernel of $\sigma$ , $S(\rho \|\sigma )$ is divergent even though the trace norm of the difference $(\rho -\sigma )$ is $2\epsilon$ . This means that difference between $\rho$ and $\sigma$ as measured by the trace norm is vanishingly small as $\epsilon \to 0$ even though $S(\rho \|\sigma )$ is divergent (i.e. infinite). This property of the quantum relative entropy represents a serious shortcoming if not treated with care.

Non-negativity of relative entropy

Corresponding classical statement

For the classical Kullback–Leibler divergence, it can be shown that

D_{\mathrm {KL} }(P\|Q)=\sum _{j}p_{j}\log {\frac {p_{j}}{q_{j}}}\geq 0,

and the equality holds if and only if P = Q. Colloquially, this means that the uncertainty calculated using erroneous assumptions is always greater than the real amount of uncertainty.

To show the inequality, we rewrite

D_{\mathrm {KL} }(P\|Q)=\sum _{j}p_{j}\log {\frac {p_{j}}{q_{j}}}=\sum _{j}(-\log {\frac {q_{j}}{p_{j}}})(p_{j}).

Notice that log is a concave function. Therefore -log is convex. Applying Jensen's inequality, we obtain

D_{\mathrm {KL} }(P\|Q)=\sum _{j}(-\log {\frac {q_{j}}{p_{j}}})(p_{j})\geq -\log(\sum _{j}{\frac {q_{j}}{p_{j}}}p_{j})=0.

Jensen's inequality also states that equality holds if and only if, for all i, q_i = (Σq_j) p_i, i.e. p = q.

The result

Klein's inequality states that the quantum relative entropy

S(\rho \|\sigma )=\operatorname {Tr} \rho (\log \rho -\log \sigma ).

is non-negative in general. It is zero if and only if ρ = σ.

Proof

Let ρ and σ have spectral decompositions

\rho =\sum _{i}p_{i}v_{i}v_{i}^{*}\;,\;\sigma =\sum _{i}q_{i}w_{i}w_{i}^{*}.

So

\log \rho =\sum _{i}(\log p_{i})v_{i}v_{i}^{*}\;,\;\log \sigma =\sum _{i}(\log q_{i})w_{i}w_{i}^{*}.

Direct calculation gives

S(\rho \|\sigma )=\sum _{k}p_{k}\log p_{k}-\sum _{i,j}(p_{i}\log q_{j})|v_{i}^{*}w_{j}|^{2}

\qquad \quad \;=\sum _{i}p_{i}(\log p_{i}-\sum _{j}\log q_{j}|v_{i}^{*}w_{j}|^{2})

\qquad \quad \;=\sum _{i}p_{i}(\log p_{i}-\sum _{j}(\log q_{j})P_{ij}),

where P_{i j} = |v_i*w_j|².

Since the matrix (P_{i j})_{i j} is a doubly stochastic matrix and -log is a convex function, the above expression is

\geq \sum _{i}p_{i}(\log p_{i}-\log(\sum _{j}q_{j}P_{ij})).

Define r_i = Σ_jq_j P_{i j}. Then {r_i} is a probability distribution. From the non-negativity of classical relative entropy, we have

S(\rho \|\sigma )\geq \sum _{i}p_{i}\log {\frac {p_{i}}{r_{i}}}\geq 0.

The second part of the claim follows from the fact that, since -log is strictly convex, equality is achieved in

\sum _{i}p_{i}(\log p_{i}-\sum _{j}(\log q_{j})P_{ij})\geq \sum _{i}p_{i}(\log p_{i}-\log(\sum _{j}q_{j}P_{ij}))

if and only if (P_{i j}) is a permutation matrix, which implies ρ = σ, after a suitable labeling of the eigenvectors {v_i} and {w_i}.

Joint convexity of relative entropy

The relative entropy is jointly convex. For $0\leq \lambda \leq 1$ and states $\rho _{1(2)},\sigma _{1(2)}$ we have

$D(\lambda \rho _{1}+(1-\lambda )\rho _{2}\|\lambda \sigma _{1}+(1-\lambda )\sigma _{2})\leq \lambda D(\rho _{1}\|\sigma _{1})+(1-\lambda )D(\rho _{2}\|\sigma _{2})$

Monotonicity of relative entropy

The relative entropy decreases monotonically under completely positive trace preserving (CPTP) operations ${\mathcal {N}}$ on density matrices,

$S({\mathcal {N}}(\rho )\|{\mathcal {N}}(\sigma ))\leq S(\rho \|\sigma )$ .

This inequality is called Monotonicity of quantum relative entropy and was first proved by Lindblad.

An entanglement measure

Let a composite quantum system have state space

H=\otimes _{k}H_{k}

and ρ be a density matrix acting on H.

The relative entropy of entanglement of ρ is defined by

\;D_{\mathrm {REE} }(\rho )=\min _{\sigma }S(\rho \|\sigma )

where the minimum is taken over the family of separable states. A physical interpretation of the quantity is the optimal distinguishability of the state ρ from separable states.

Clearly, when ρ is not entangled

\;D_{\mathrm {REE} }(\rho )=0

by Klein's inequality.

Relation to other quantum information quantities

One reason the quantum relative entropy is useful is that several other important quantum information quantities are special cases of it. Often, theorems are stated in terms of the quantum relative entropy, which lead to immediate corollaries concerning the other quantities. Below, we list some of these relations.

Let ρ_AB be the joint state of a bipartite system with subsystem A of dimension n_A and B of dimension n_B. Let ρ_A, ρ_B be the respective reduced states, and I_A, I_B the respective identities. The maximally mixed states are I_A/n_A and I_B/n_B. Then it is possible to show with direct computation that

S(\rho _{A}||I_{A}/n_{A})=\mathrm {log} (n_{A})-S(\rho _{A}),\;

S(\rho _{AB}||\rho _{A}\otimes \rho _{B})=S(\rho _{A})+S(\rho _{B})-S(\rho _{AB})=I(A:B),

S(\rho _{AB}||\rho _{A}\otimes I_{B}/n_{B})=\mathrm {log} (n_{B})+S(\rho _{A})-S(\rho _{AB})=\mathrm {log} (n_{B})-S(B|A),

where I(A:B) is the quantum mutual information and S(B|A) is the quantum conditional entropy.

Related Research Articles

In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable $, which takes values in the alphabet and is distributed according to :$

In quantum mechanics, a density matrix is a matrix that describes the quantum state of a physical system. It allows for the calculation of the probabilities of the outcomes of any measurement performed upon this system, using the Born rule. It is a generalization of the more usual state vectors or wavefunctions: while those can only represent pure states, density matrices can also represent mixed states. Mixed states arise in quantum mechanics in two different situations: first when the preparation of the system is not fully known, and thus one must deal with a statistical ensemble of possible preparations, and second when one wants to describe a physical system which is entangled with another, without describing their combined state.

In physics, a partition function describes the statistical properties of a system in thermodynamic equilibrium. Partition functions are functions of the thermodynamic state variables, such as the temperature and volume. Most of the aggregate thermodynamic variables of the system, such as the total energy, free energy, entropy, and pressure, can be expressed in terms of the partition function or its derivatives. The partition function is dimensionless.

In statistics, the Wishart distribution is a generalization to multiple dimensions of the gamma distribution. It is named in honor of John Wishart, who first formulated the distribution in 1928.

Quantum statistical mechanics is statistical mechanics applied to quantum mechanical systems. In quantum mechanics a statistical ensemble is described by a density operator S, which is a non-negative, self-adjoint, trace-class operator of trace 1 on the Hilbert space H describing the quantum system. This can be shown under various mathematical formalisms for quantum mechanics. One such formalism is provided by quantum logic.

The joint quantum entropy generalizes the classical joint entropy to the context of quantum information theory. Intuitively, given two quantum states $and, represented as density operators that are subparts of a quantum system, the joint quantum entropy is a measure of the total uncertainty or entropy of the joint system. It is written or, depending on the notation being used for the von Neumann entropy. Like other entropies, the joint quantum entropy is measured in bits, i.e. the logarithm is taken in base 2.$

In mathematical statistics, the Kullback–Leibler divergence, denoted $, is a type of statistical distance: a measure of how one probability distribution P is different from a second, reference probability distribution Q . A simple interpretation of the KL divergence of P from Q is the expected excess surprise from using Q as a model when the actual distribution is P . While it is a distance, it is not a metric, the most familiar type of distance: it is not symmetric in the two distributions, and does not satisfy the triangle inequality. Instead, in terms of information geometry, it is a type of divergence, a generalization of squared distance, and for certain classes of distributions, it satisfies a generalized Pythagorean theorem.$

In physics, the von Neumann entropy, named after John von Neumann, is an extension of the concept of Gibbs entropy from classical statistical mechanics to quantum statistical mechanics. For a quantum-mechanical system described by a density matrix $ρ$ , the von Neumann entropy is

In quantum mechanics, notably in quantum information theory, fidelity is a measure of the "closeness" of two quantum states. It expresses the probability that one state will pass a test to identify as the other. The fidelity is not a metric on the space of density matrices, but it can be used to define the Bures metric on this space.

In quantum mechanics, and especially quantum information theory, the purity of a normalized quantum state is a scalar defined as

In quantum mechanics, and especially quantum information and the study of open quantum systems, the trace distanceT is a metric on the space of density matrices and gives a measure of the distinguishability between two states. It is the quantum generalization of the Kolmogorov distance for classical probability distributions.

The Fannes–Audenaert inequality is a mathematical bound on the difference between the von Neumann entropies of two density matrices as a function of their trace distance. It was proved by Koenraad M. R. Audenaert in 2007 as an optimal refinement of Mark Fannes' original inequality, which was published in 1973. Mark Fannes is a Belgian physicist specialised in mathematical quantum mechanics. He works at the KU Leuven. Koenraad M. R. Audenaert is a Belgian physicist and civil engineer. He currently works at Royal Holloway, University of London.

In quantum mechanics, negativity is a measure of quantum entanglement which is easy to compute. It is a measure deriving from the PPT criterion for separability. It has shown to be an entanglement monotone and hence a proper measure of entanglement.

In quantum information theory, the classical capacity of a quantum channel is the maximum rate at which classical data can be sent over it error-free in the limit of many uses of the channel. Holevo, Schumacher, and Westmoreland proved the following least upper bound on the classical capacity of any quantum channel $:$

The entropy of entanglement is a measure of the degree of quantum entanglement between two subsystems constituting a two-part composite quantum system. Given a pure bipartite quantum state of the composite system, it is possible to obtain a reduced density matrix describing knowledge of the state of a subsystem. The entropy of entanglement is the Von Neumann entropy of the reduced density matrix for any of the subsystems. If it is non-zero, i.e. the subsystem is in a mixed state, it indicates the two subsystems are entangled.

In quantum information theory, strong subadditivity of quantum entropy (SSA) is the relation among the von Neumann entropies of various quantum subsystems of a larger quantum system consisting of three subsystems. It is a basic theorem in modern quantum information theory. It was conjectured by D. W. Robinson and D. Ruelle in 1966 and O. E. Lanford III and D. W. Robinson in 1968 and proved in 1973 by E.H. Lieb and M.B. Ruskai, building on results obtained by Lieb in his proof of the Wigner-Yanase-Dyson conjecture.

The min-entropy, in information theory, is the smallest of the Rényi family of entropies, corresponding to the most conservative way of measuring the unpredictability of a set of outcomes, as the negative logarithm of the probability of the most likely outcome. The various Rényi entropies are all equal for a uniform distribution, but measure the unpredictability of a nonuniform distribution in different ways. The min-entropy is never greater than the ordinary or Shannon entropy and that in turn is never greater than the Hartley or max-entropy, defined as the logarithm of the number of outcomes with nonzero probability.

Generalized relative entropy is a measure of dissimilarity between two quantum states. It is a "one-shot" analogue of quantum relative entropy and shares many properties of the latter quantity.

Maximal entropy random walk (MERW) is a popular type of biased random walk on a graph, in which transition probabilities are chosen accordingly to the principle of maximum entropy, which says that the probability distribution which best represents the current state of knowledge is the one with largest entropy. While standard random walk chooses for every vertex uniform probability distribution among its outgoing edges, locally maximizing entropy rate, MERW maximizes it globally by assuming uniform probability distribution among all paths in a given graph.

In physics, in the area of quantum information theory and quantum computation, quantum steering is a special kind of nonlocal correlation, which is intermediate between Bell nonlocality and quantum entanglement. A state exhibiting Bell nonlocality must also exhibit quantum steering, a state exhibiting quantum steering must also exhibit quantum entanglement. But for mixed quantum states, there exist examples which lie between these different quantum correlation sets. The notion was initially proposed by Schrödinger, and later made popular by Howard M. Wiseman, S. J. Jones, and A. C. Doherty.

References

Vedral, V. (8 March 2002). "The role of relative entropy in quantum information theory". Reviews of Modern Physics. American Physical Society (APS). 74 (1): 197–234. arXiv: quant-ph/0102094 . Bibcode:2002RvMP...74..197V. doi:10.1103/revmodphys.74.197. ISSN 0034-6861. S2CID 6370982.
Michael A. Nielsen, Isaac L. Chuang, "Quantum Computation and Quantum Information"
Marco Tomamichel, "Quantum Information Processing with Finite Resources -- Mathematical Foundations". arXiv:1504.00233

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.