Fibonacci coding

Last updated

In mathematics and computing, Fibonacci coding is a universal code [ citation needed ] which encodes positive integers into binary code words. It is one example of representations of integers based on Fibonacci numbers. Each code word ends with "11" and contains no other instances of "11" before the end.

Contents

The Fibonacci code is closely related to the Zeckendorf representation, a positional numeral system that uses Zeckendorf's theorem and has the property that no number has a representation with consecutive 1s. The Fibonacci code word for a particular integer is exactly the integer's Zeckendorf representation with the order of its digits reversed and an additional "1" appended to the end.

Definition

For a number , if represent the digits of the code word representing then we have:

where F(i) is the ith Fibonacci number, and so F(i+2) is the ith distinct Fibonacci number starting with . The last bit is always an appended bit of 1 and does not carry place value.

It can be shown that such a coding is unique, and the only occurrence of "11" in any code word is at the end i.e. d(k1) and d(k). The penultimate bit is the most significant bit and the first bit is the least significant bit. Also leading zeros cannot be omitted as they can in e.g. decimal numbers.

The first few Fibonacci codes are shown below, and also their so-called implied probability, the value for each number that has a minimum-size code in Fibonacci coding.

SymbolFibonacci representationFibonacci code wordImplied probability
1111/4
20111/8
300111/16
410111/16
5000111/32
6100111/32
7010111/32
80000111/64
91000111/64
100100111/64
110010111/64
121010111/64
1300000111/128
1410000111/128

To encode an integer N:

  1. Find the largest Fibonacci number equal to or less than N; subtract this number from N, keeping track of the remainder.
  2. If the number subtracted was the ith Fibonacci number F(i), put a 1 in place i2 in the code word (counting the left most digit as place 0).
  3. Repeat the previous steps, substituting the remainder for N, until a remainder of 0 is reached.
  4. Place an additional 1 after the rightmost digit in the code word.

To decode a code word, remove the final "1", assign the remaining the values 1,2,3,5,8,13... (the Fibonacci numbers) to the bits in the code word, and sum the values of the "1" bits.

Comparison with other universal codes

Fibonacci coding has a useful property that sometimes makes it attractive in comparison to other universal codes: it is an example of a self-synchronizing code, making it easier to recover data from a damaged stream. With most other universal codes, if a single bit is altered, none of the data that comes after it will be correctly read. With Fibonacci coding, on the other hand, a changed bit may cause one token to be read as two, or cause two tokens to be read incorrectly as one, but reading a "0" from the stream will stop the errors from propagating further. Since the only stream that has no "0" in it is a stream of "11" tokens, the total edit distance between a stream damaged by a single bit error and the original stream is at most three.

This approach—encoding using sequence of symbols, in which some patterns (like "11") are forbidden, can be freely generalized. [1]

Example

The following table shows that the number 65 is represented in Fibonacci coding as 0100100011, since 65 = 2 + 8 + 55. The first two Fibonacci numbers (0 and 1) are not used, and an additional 1 is always appended.

Generalizations

The Fibonacci encodings for the positive integers are binary strings that end with "11" and contain no other instances of "11". This can be generalized to binary strings that end with N consecutive 1's and contain no other instances of N consecutive 1's. For instance, for N = 3 the positive integers are encoded as 111, 0111, 00111, 10111, 000111, 100111, 010111, 110111, 0000111, 1000111, 0100111, …. In this case, the number of encodings as a function of string length is given by the sequence of Tribonacci numbers.

For general constraints defining which symbols are allowed after a given symbol, the maximal information rate can be obtained by first finding the optimal transition probabilities using maximal entropy random walk, then use entropy coder (with switched encoder with decoder) to encode a message as a sequence of symbols fulfilling the found optimal transition probabilities.

See also

Related Research Articles

<span class="mw-page-title-main">Kolmogorov complexity</span> Measure of algorithmic complexity

In algorithmic information theory, the Kolmogorov complexity of an object, such as a piece of text, is the length of a shortest computer program that produces the object as output. It is a measure of the computational resources needed to specify the object, and is also known as algorithmic complexity, Solomonoff–Kolmogorov–Chaitin complexity, program-size complexity, descriptive complexity, or algorithmic entropy. It is named after Andrey Kolmogorov, who first published on the subject in 1963 and is a generalization of classical information theory.

<span class="mw-page-title-main">Huffman coding</span> Technique to compress data

In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code is Huffman coding, an algorithm developed by David A. Huffman while he was a Sc.D. student at MIT, and published in the 1952 paper "A Method for the Construction of Minimum-Redundancy Codes".

Information theory is the mathematical study of the quantification, storage, and communication of information. The field was established and put on a firm footing by Claude Shannon in the 1940s, though early contributions were made in the 1920s through the works of Harry Nyquist and Ralph Hartley. It is at the intersection of electronic engineering, mathematics, statistics, computer science, neurobiology, physics, and electrical engineering.

<span class="mw-page-title-main">Entropy (information theory)</span> Expected amount of information needed to specify the output of a stochastic data source

In information theory, the entropy of a random variable quantifies the average level of uncertainty or information associated with the variable's potential states or possible outcomes. This measures the expected amount of information needed to describe the state of the variable, considering the distribution of probabilities across all potential states. Given a discrete random variable , which takes values in the set and is distributed according to , the entropy is where denotes the sum over the variable's possible values. The choice of base for , the logarithm, varies for different applications. Base 2 gives the unit of bits, while base e gives "natural units" nat, and base 10 gives units of "dits", "bans", or "hartleys". An equivalent definition of entropy is the expected value of the self-information of a variable.

Range coding is an entropy coding method defined by G. Nigel N. Martin in a 1979 paper, which effectively rediscovered the FIFO arithmetic code first introduced by Richard Clark Pasco in 1976. Given a stream of symbols and their probabilities, a range coder produces a space-efficient stream of bits to represent these symbols and, given the stream and the probabilities, a range decoder reverses the process.

Elias code or Elias gamma code is a universal code encoding positive integers developed by Peter Elias. It is used most commonly when coding integers whose upper-bound cannot be determined beforehand.

Elias δ code or Elias delta code is a universal code encoding the positive integers developed by Peter Elias.

<span class="mw-page-title-main">Arithmetic coding</span> Form of entropy encoding used in data compression

Arithmetic coding (AC) is a form of entropy encoding used in lossless data compression. Normally, a string of characters is represented using a fixed number of bits per character, as in the ASCII code. When a string is converted to arithmetic encoding, frequently used characters will be stored with fewer bits and not-so-frequently occurring characters will be stored with more bits, resulting in fewer bits used in total. Arithmetic coding differs from other forms of entropy encoding, such as Huffman coding, in that rather than separating the input into component symbols and replacing each with a code, arithmetic coding encodes the entire message into a single number, an arbitrary-precision fraction q, where 0.0 ≤ q < 1.0. It represents the current information as a range, defined by two numbers. A recent family of entropy coders called asymmetric numeral systems allows for faster implementations thanks to directly operating on a single natural number representing the current information.

A prefix code is a type of code system distinguished by its possession of the "prefix property", which requires that there is no whole code word in the system that is a prefix of any other code word in the system. It is trivially true for fixed-length codes, so only a point of consideration for variable-length codes.

Golomb coding is a lossless data compression method using a family of data compression codes invented by Solomon W. Golomb in the 1960s. Alphabets following a geometric distribution will have a Golomb code as an optimal prefix code, making Golomb coding highly suitable for situations in which the occurrence of small values in the input stream is significantly more likely than large values.

Unary coding, or the unary numeral system and also sometimes called thermometer code, is an entropy encoding that represents a natural number, n, with a code of length n + 1, usually n ones followed by a zero or with n − 1 ones followed by a zero. For example 5 is represented as 111110 or 11110. Some representations use n or n − 1 zeros followed by a one. The ones and zeros are interchangeable without loss of generality. Unary coding is both a prefix-free code and a self-synchronizing code.

Elias ω coding or Elias omega coding is a universal code encoding the positive integers developed by Peter Elias. Like Elias gamma coding and Elias delta coding, it works by prefixing the positive integer with a representation of its order of magnitude in a universal code. Unlike those other two codes, however, Elias omega recursively encodes that prefix; thus, they are sometimes known as recursive Elias codes.

<span class="mw-page-title-main">Zeckendorf's theorem</span> On the unique representation of integers as sums of non-consecutive Fibonacci numbers

In mathematics, Zeckendorf's theorem, named after Belgian amateur mathematician Edouard Zeckendorf, is a theorem about the representation of integers as sums of Fibonacci numbers.

<span class="mw-page-title-main">Fibonacci word</span> Binary sequence from Fibonacci recurrence

A Fibonacci word is a specific sequence of binary digits. The Fibonacci word is formed by repeated concatenation in the same way that the Fibonacci numbers are formed by repeated addition.

<span class="mw-page-title-main">Universal code (data compression)</span>

In data compression, a universal code for integers is a prefix code that maps the positive integers onto binary codewords, with the additional property that whatever the true probability distribution on integers, as long as the distribution is monotonic (i.e., p(i) ≥ p(i + 1) for all positive i), the expected lengths of the codewords are within a constant factor of the expected lengths that the optimal code for that probability distribution would have assigned. A universal code is asymptotically optimal if the ratio between actual and optimal expected lengths is bounded by a function of the information entropy of the code that, in addition to being bounded, approaches 1 as entropy approaches infinity.

In mathematics, negafibonacci coding is a universal code which encodes nonzero integers into binary code words. It is similar to Fibonacci coding, except that it allows both positive and negative integers to be represented. All codes end with "11" and have no "11" before the end.

In mathematics, a sequence of natural numbers is called a complete sequence if every positive integer can be expressed as a sum of values in the sequence, using each value at most once.

In mathematics, the Wythoff array is an infinite matrix of integers derived from the Fibonacci sequence and named after Dutch mathematician Willem Abraham Wythoff. Every positive integer occurs exactly once in the array, and every integer sequence defined by the Fibonacci recurrence can be derived by shifting a row of the array.

Asymmetric numeral systems (ANS) is a family of entropy encoding methods introduced by Jarosław (Jarek) Duda from Jagiellonian University, used in data compression since 2014 due to improved performance compared to previous methods. ANS combines the compression ratio of arithmetic coding, with a processing cost similar to that of Huffman coding. In the tabled ANS (tANS) variant, this is achieved by constructing a finite-state machine to operate on a large alphabet without using multiplication.

In mathematics, the fibbinary numbers are the numbers whose binary representation does not contain two consecutive ones. That is, they are sums of distinct and non-consecutive powers of two.

References

  1. Duda, Jarek (2007). "Optimal encoding on discrete lattice with translational invariant constrains using statistical algorithms". arXiv: 0710.3861 [cs.IT].

Further reading