History of information theory

Last updated

The decisive event which established the discipline of information theory , and brought it to immediate worldwide attention, was the publication of Claude E. Shannon's classic paper "A Mathematical Theory of Communication" in the Bell System Technical Journal in July and October 1948.

Contents

In this revolutionary and groundbreaking paper, the work for which Shannon had substantially completed at Bell Labs by the end of 1944, Shannon for the first time introduced the qualitative and quantitative model of communication as a statistical process underlying information theory, opening with the assertion that

"The fundamental problem of communication is that of reproducing at one point, either exactly or approximately, a message selected at another point."

With it came the ideas of

Before 1948

Early telecommunications

Some of the oldest methods of telecommunications implicitly use many of the ideas that would later be quantified in information theory. Modern telegraphy, starting in the 1830s, used Morse code, in which more common letters (like "E", which is expressed as one "dot") are transmitted more quickly than less common letters (like "J", which is expressed by one "dot" followed by three "dashes"). The idea of encoding information in this manner is the cornerstone of lossless data compression. A hundred years later, frequency modulation illustrated that bandwidth can be considered merely another degree of freedom. The vocoder, now largely looked at as an audio engineering curiosity, was originally designed in 1939 to use less bandwidth than that of an original message, in much the same way that mobile phones now trade off voice quality with bandwidth.

Quantitative ideas of information

The most direct antecedents of Shannon's work were two papers published in the 1920s by Harry Nyquist and Ralph Hartley, who were both still research leaders at Bell Labs when Shannon arrived in the early 1940s.

Nyquist's 1924 paper, "Certain Factors Affecting Telegraph Speed", is mostly concerned with some detailed engineering aspects of telegraph signals. But a more theoretical section discusses quantifying "intelligence" and the "line speed" at which it can be transmitted by a communication system, giving the relation

where W is the speed of transmission of intelligence, m is the number of different voltage levels to choose from at each time step, and K is a constant. [1]

Hartley's 1928 paper, called simply "Transmission of Information", went further by using the word information (in a technical sense), and making explicitly clear that information in this context was a measurable quantity, reflecting only the receiver's ability to distinguish that one sequence of symbols had been intended by the sender rather than any other—quite regardless of any associated meaning or other psychological or semantic aspect the symbols might represent. This amount of information he quantified as

where S was the number of possible symbols, and n the number of symbols in a transmission. The natural unit of information was therefore the decimal digit, much later renamed the hartley in his honour as a unit or scale or measure of information. The Hartley information, H0, is still used as a quantity for the logarithm of the total number of possibilities. [2]

A similar unit of log10 probability, the ban, and its derived unit the deciban (one tenth of a ban), were introduced by Alan Turing in 1940 as part of the statistical analysis of the breaking of the German second world war Enigma cyphers. The decibannage represented the reduction in (the logarithm of) the total number of possibilities (similar to the change in the Hartley information); and also the log-likelihood ratio (or change in the weight of evidence) that could be inferred for one hypothesis over another from a set of observations. The expected change in the weight of evidence is equivalent to what was later called the Kullback discrimination information.

But underlying this notion was still the idea of equal a-priori probabilities, rather than the information content of events of unequal probability; nor yet any underlying picture of questions regarding the communication of such varied outcomes.

Entropy in statistical mechanics

One area where unequal probabilities were indeed well known was statistical mechanics, where Ludwig Boltzmann had, in the context of his H-theorem of 1872, first introduced the quantity

as a measure of the breadth of the spread of states available to a single particle in a gas of like particles, where f represented the relative frequency distribution of each possible state. Boltzmann argued mathematically that the effect of collisions between the particles would cause the H-function to inevitably increase from any initial configuration until equilibrium was reached; and further identified it as an underlying microscopic rationale for the macroscopic thermodynamic entropy of Clausius.

Boltzmann's definition was soon reworked by the American mathematical physicist J. Willard Gibbs into a general formula for statistical-mechanical entropy, no longer requiring identical and non-interacting particles, but instead based on the probability distribution pi for the complete microstate i of the total system:

This (Gibbs) entropy, from statistical mechanics, can be found to directly correspond to the Clausius's classical thermodynamic definition.

Shannon himself was apparently not particularly aware of the close similarity between his new measure and earlier work in thermodynamics, but John von Neumann was. It is said that, when Shannon was deciding what to call his new measure and fearing the term 'information' was already over-used, von Neumann told him firmly: "You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, no one really knows what entropy really is, so in a debate you will always have the advantage."

(Connections between information-theoretic entropy and thermodynamic entropy, including the important contributions by Rolf Landauer in the 1960s, are explored further in the article Entropy in thermodynamics and information theory ).

Development since 1948

The publication of Shannon's 1948 paper, "A Mathematical Theory of Communication", in the Bell System Technical Journal was the founding of information theory as we know it today. Many developments and applications of the theory have taken place since then, which have made many modern devices for data communication and storage such as CD-ROMs and mobile phones possible.

Notable later developments are listed in a timeline of information theory, including:

See also

Related Research Articles

<span class="mw-page-title-main">Boltzmann distribution</span> Probability distribution of energy states of a system

In statistical mechanics and mathematics, a Boltzmann distribution is a probability distribution or probability measure that gives the probability that a system will be in a certain state as a function of that state's energy and the temperature of the system. The distribution is expressed in the form:

<span class="mw-page-title-main">Entropy</span> Property of a thermodynamic system

Entropy is a scientific concept that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynamics, where it was first recognized, to the microscopic description of nature in statistical physics, and to the principles of information theory. It has found far-ranging applications in chemistry and physics, in biological systems and their relation to life, in cosmology, economics, sociology, weather science, climate change, and information systems including the transmission of information in telecommunication.

Information theory is the mathematical study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. The field, in applied mathematics, is at the intersection of probability theory, statistics, computer science, statistical mechanics, information engineering, and electrical engineering.

<span class="mw-page-title-main">Entropy (information theory)</span> Expected amount of information needed to specify the output of a stochastic data source

In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable , which takes values in the alphabet and is distributed according to :

<span class="mw-page-title-main">Quantum information</span> Information held in the state of a quantum system

Quantum information is the information of the state of a quantum system. It is the basic entity of study in quantum information theory, and can be manipulated using quantum information processing techniques. Quantum information refers to both the technical definition in terms of Von Neumann entropy and the general computational term.

In the field of data compression, Shannon–Fano coding, named after Claude Shannon and Robert Fano, is a name given to two different but related techniques for constructing a prefix code based on a set of symbols and their probabilities.

In information theory, the Shannon–Hartley theorem tells the maximum rate at which information can be transmitted over a communications channel of a specified bandwidth in the presence of noise. It is an application of the noisy-channel coding theorem to the archetypal case of a continuous-time analog communications channel subject to Gaussian noise. The theorem establishes Shannon's channel capacity for such a communication link, a bound on the maximum amount of error-free information per time unit that can be transmitted with a specified bandwidth in the presence of the noise interference, assuming that the signal power is bounded, and that the Gaussian noise process is characterized by a known power or power spectral density. The law is named after Claude Shannon and Ralph Hartley.

The principle of maximum entropy states that the probability distribution which best represents the current state of knowledge about a system is the one with largest entropy, in the context of precisely stated prior data.

In classical statistical mechanics, the H-theorem, introduced by Ludwig Boltzmann in 1872, describes the tendency to decrease in the quantity H in a nearly-ideal gas of molecules. As this quantity H was meant to represent the entropy of thermodynamics, the H-theorem was an early demonstration of the power of statistical mechanics as it claimed to derive the second law of thermodynamics—a statement about fundamentally irreversible processes—from reversible microscopic mechanics. It is thought to prove the second law of thermodynamics, albeit under the assumption of low-entropy initial conditions.

<span class="mw-page-title-main">Ludwig Boltzmann</span> Austrian physicist and philosopher (1844–1906)

Ludwig Eduard Boltzmann was an Austrian physicist and philosopher. His greatest achievements were the development of statistical mechanics, and the statistical explanation of the second law of thermodynamics. In 1877 he provided the current definition of entropy, , where Ω is the number of microstates whose energy equals the system's energy, interpreted as a measure of statistical disorder of a system. Max Planck named the constant kB the Boltzmann constant.

In statistical mechanics, a canonical ensemble is the statistical ensemble that represents the possible states of a mechanical system in thermal equilibrium with a heat bath at a fixed temperature. The system can exchange energy with the heat bath, so that the states of the system will differ in total energy.

In information theory, Shannon's source coding theorem establishes the statistical limits to possible data compression for data whose source is an independent identically-distributed random variable, and the operational meaning of the Shannon entropy.

In physics, maximum entropy thermodynamics views equilibrium thermodynamics and statistical mechanics as inference processes. More specifically, MaxEnt applies inference techniques rooted in Shannon information theory, Bayesian probability, and the principle of maximum entropy. These techniques are relevant to any situation requiring prediction from incomplete or insufficient data. MaxEnt thermodynamics began with two papers by Edwin T. Jaynes published in the 1957 Physical Review.

The mathematical expressions for thermodynamic entropy in the statistical thermodynamics formulation established by Ludwig Boltzmann and J. Willard Gibbs in the 1870s are similar to the information entropy by Claude Shannon and Ralph Hartley, developed in the 1940s.

In information theory, the noisy-channel coding theorem, establishes that for any given degree of noise contamination of a communication channel, it is possible to communicate discrete data nearly error-free up to a computable maximum rate through the channel. This result was presented by Claude Shannon in 1948 and was based in part on earlier work and ideas of Harry Nyquist and Ralph Hartley.

A timeline of events related to  information theory,  quantum information theory and statistical physics,  data compression,  error correcting codes and related subjects.

The concept of entropy developed in response to the observation that a certain amount of functional energy released from combustion reactions is always lost to dissipation or friction and is thus not transformed into useful work. Early heat-powered engines such as Thomas Savery's (1698), the Newcomen engine (1712) and the Cugnot steam tricycle (1769) were inefficient, converting less than two percent of the input energy into useful work output; a great deal of useful energy was dissipated or lost. Over the next two centuries, physicists investigated this puzzle of lost energy; the result was the concept of entropy.

The concept entropy was first developed by German physicist Rudolf Clausius in the mid-nineteenth century as a thermodynamic property that predicts that certain spontaneous processes are irreversible or impossible. In statistical mechanics, entropy is formulated as a statistical property using probability theory. The statistical entropy perspective was introduced in 1870 by Austrian physicist Ludwig Boltzmann, who established a new field of physics that provided the descriptive linkage between the macroscopic observation of nature and the microscopic view based on the rigorous treatment of large ensembles of microstates that constitute thermodynamic systems.

<span class="mw-page-title-main">Introduction to entropy</span> Non-technical introduction to entropy

In thermodynamics, entropy is a numerical quantity that shows that many physical processes can go in only one direction in time. For example, cream and coffee can be mixed together, but cannot be "unmixed"; a piece of wood can be burned, but cannot be "unburned". The word 'entropy' has entered popular usage to refer a lack of order or predictability, or of a gradual decline into disorder. A more physical interpretation of thermodynamic entropy refers to spread of energy or matter, or to extent and diversity of microscopic motion.

<span class="mw-page-title-main">Boltzmann's entropy formula</span> Equation in statistical mechanics

In statistical mechanics, Boltzmann's equation is a probability equation relating the entropy , also written as , of an ideal gas to the multiplicity, the number of real microstates corresponding to the gas's macrostate:

References

  1. "BSTJ 3: 2. April 1924: Certain Factors Affecting Telegraph Speed. (Nyquist, H.)". April 1924.
  2. "BSTJ 7: 3. July 1928: Transmission of Information. (Hartley, R.V.L.)". July 1928.
  3. Gray, Robert M. (2010). "A History of Realtime Digital Speech on Packet Networks: Part II of Linear Predictive Coding and the Internet Protocol" (PDF). Found. Trends Signal Process. 3 (4): 203–303. doi: 10.1561/2000000036 . ISSN   1932-8346.
  4. Ahmed, Nasir (January 1991). "How I Came Up With the Discrete Cosine Transform". Digital Signal Processing . 1 (1): 4–5. doi:10.1016/1051-2004(91)90086-Z.
  5. Ghanbari, Mohammed (2003). Standard Codecs: Image Compression to Advanced Video Coding. Institution of Engineering and Technology. pp. 1–2. ISBN   9780852967102.
  6. "T.81 – DIGITAL COMPRESSION AND CODING OF CONTINUOUS-TONE STILL IMAGES – REQUIREMENTS AND GUIDELINES" (PDF). CCITT. September 1992. Retrieved 12 July 2019.
  7. Guckert, John (Spring 2012). "The Use of FFT and MDCT in MP3 Audio Compression" (PDF). University of Utah . Retrieved 14 July 2019.
  8. Brandenburg, Karlheinz (1999). "MP3 and AAC Explained" (PDF). Archived (PDF) from the original on 2017-02-13.
  9. Jindal, Renuka P. (2009). "From millibits to terabits per second and beyond - over 60 years of innovation". 2009 2nd International Workshop on Electron Devices and Semiconductor Technology. pp. 1–6. doi:10.1109/EDST.2009.5166093. ISBN   978-1-4244-3831-0. S2CID   25112828.