Fibonacci coding

Last updated

In mathematics and computing, Fibonacci coding is a universal code [ citation needed ] which encodes positive integers into binary code words. It is one example of representations of integers based on Fibonacci numbers. Each code word ends with "11" and contains no other instances of "11" before the end.

Contents

The Fibonacci code is closely related to the Zeckendorf representation, a positional numeral system that uses Zeckendorf's theorem and has the property that no number has a representation with consecutive 1s. The Fibonacci code word for a particular integer is exactly the integer's Zeckendorf representation with the order of its digits reversed and an additional "1" appended to the end.

Definition

For a number , if represent the digits of the code word representing then we have:

where F(i) is the ith Fibonacci number, and so F(i+2) is the ith distinct Fibonacci number starting with . The last bit is always an appended bit of 1 and does not carry place value.

It can be shown that such a coding is unique, and the only occurrence of "11" in any code word is at the end i.e. d(k1) and d(k). The penultimate bit is the most significant bit and the first bit is the least significant bit. Also leading zeros cannot be omitted as they can in e.g. decimal numbers.

The first few Fibonacci codes are shown below, and also their so-called implied probability, the value for each number that has a minimum-size code in Fibonacci coding.

SymbolFibonacci representationFibonacci code wordImplied probability
1111/4
20111/8
300111/16
410111/16
5000111/32
6100111/32
7010111/32
80000111/64
91000111/64
100100111/64
110010111/64
121010111/64
1300000111/128
1410000111/128

To encode an integer N:

  1. Find the largest Fibonacci number equal to or less than N; subtract this number from N, keeping track of the remainder.
  2. If the number subtracted was the ith Fibonacci number F(i), put a 1 in place i2 in the code word (counting the left most digit as place 0).
  3. Repeat the previous steps, substituting the remainder for N, until a remainder of 0 is reached.
  4. Place an additional 1 after the rightmost digit in the code word.

To decode a code word, remove the final "1", assign the remaining the values 1,2,3,5,8,13... (the Fibonacci numbers) to the bits in the code word, and sum the values of the "1" bits.

Comparison with other universal codes

Fibonacci coding has a useful property that sometimes makes it attractive in comparison to other universal codes: it is an example of a self-synchronizing code, making it easier to recover data from a damaged stream. With most other universal codes, if a single bit is altered, none of the data that comes after it will be correctly read. With Fibonacci coding, on the other hand, a changed bit may cause one token to be read as two, or cause two tokens to be read incorrectly as one, but reading a "0" from the stream will stop the errors from propagating further. Since the only stream that has no "0" in it is a stream of "11" tokens, the total edit distance between a stream damaged by a single bit error and the original stream is at most three.

This approach—encoding using sequence of symbols, in which some patterns (like "11") are forbidden, can be freely generalized. [1]

Example

The following table shows that the number 65 is represented in Fibonacci coding as 0100100011, since 65 = 2 + 8 + 55. The first two Fibonacci numbers (0 and 1) are not used, and an additional 1 is always appended.

Generalizations

The Fibonacci encodings for the positive integers are binary strings that end with "11" and contain no other instances of "11". This can be generalized to binary strings that end with N consecutive 1's and contain no other instances of N consecutive 1's. For instance, for N = 3 the positive integers are encoded as 111, 0111, 00111, 10111, 000111, 100111, 010111, 110111, 0000111, 1000111, 0100111, …. In this case, the number of encodings as a function of string length is given by the sequence of Tribonacci numbers.

For general constraints defining which symbols are allowed after a given symbol, the maximal information rate can be obtained by first finding the optimal transition probabilities using maximal entropy random walk, then use entropy coder (with switched encoder with decoder) to encode a message as a sequence of symbols fulfilling the found optimal transition probabilities.

See also

Related Research Articles

Hash function Mapping arbitrary data to fixed-size values

A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called hash values, hash codes, digests, or simply hashes. The values are usually used to index a fixed-size table called a hash table. Use of a hash function to index a hash table is called hashing or scatter storage addressing.

Huffman coding Technique to compress data

In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code proceeds by means of Huffman coding, an algorithm developed by David A. Huffman while he was a Sc.D. student at MIT, and published in the 1952 paper "A Method for the Construction of Minimum-Redundancy Codes".

Information theory is the scientific study of the quantification, storage, and communication of digital information. The field was fundamentally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. The field is at the intersection of probability theory, statistics, computer science, statistical mechanics, information engineering, and electrical engineering.

Entropy (information theory) Expected amount of information needed to specify the output of a stochastic data source

In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable , which takes values in the alphabet and is distributed according to :

Range coding is an entropy coding method defined by G. Nigel N. Martin in a 1979 paper, which effectively rediscovered the FIFO arithmetic code first introduced by Richard Clark Pasco in 1976. Given a stream of symbols and their probabilities, a range coder produces a space-efficient stream of bits to represent these symbols and, given the stream and the probabilities, a range decoder reverses the process.

Elias γ code or Elias gamma code is a universal code encoding positive integers developed by Peter Elias. It is used most commonly when coding integers whose upper-bound cannot be determined beforehand.

Elias δ code or Elias delta code is a universal code encoding the positive integers developed by Peter Elias.

Arithmetic coding Form of entropy encoding used in data compression

Arithmetic coding (AC) is a form of entropy encoding used in lossless data compression. Normally, a string of characters is represented using a fixed number of bits per character, as in the ASCII code. When a string is converted to arithmetic encoding, frequently used characters will be stored with fewer bits and not-so-frequently occurring characters will be stored with more bits, resulting in fewer bits used in total. Arithmetic coding differs from other forms of entropy encoding, such as Huffman coding, in that rather than separating the input into component symbols and replacing each with a code, arithmetic coding encodes the entire message into a single number, an arbitrary-precision fraction q, where 0.0 ≤ q < 1.0. It represents the current information as a range, defined by two numbers. A recent family of entropy coders called asymmetric numeral systems allows for faster implementations thanks to directly operating on a single natural number representing the current information.

A prefix code is a type of code system distinguished by its possession of the "prefix property", which requires that there is no whole code word in the system that is a prefix of any other code word in the system. It is trivially true for fixed-length code, so only a point of consideration in variable-length code.

Golomb coding is a lossless data compression method using a family of data compression codes invented by Solomon W. Golomb in the 1960s. Alphabets following a geometric distribution will have a Golomb code as an optimal prefix code, making Golomb coding highly suitable for situations in which the occurrence of small values in the input stream is significantly more likely than large values.

Unary coding, or the unary numeral system and also sometimes called thermometer code, is an entropy encoding that represents a natural number, n, with a code of length n + 1, usually n ones followed by a zero or with n − 1 ones followed by a zero. For example 5 is represented as 111110 or 11110. Some representations use n or n − 1 zeros followed by a one. The ones and zeros are interchangeable without loss of generality. Unary coding is both a prefix-free code and a self-synchronizing code.

In information theory, Shannon's source coding theorem establishes the limits to possible data compression, and the operational meaning of the Shannon entropy.

Zeckendorfs theorem On the unique representation of integers as sums of non-consecutive Fibonacci numbers

In mathematics, Zeckendorf's theorem, named after Belgian amateur mathematician Edouard Zeckendorf, is a theorem about the representation of integers as sums of Fibonacci numbers.

Fibonacci word Binary sequence from Fibonacci recurrence

A Fibonacci word is a specific sequence of binary digits. The Fibonacci word is formed by repeated concatenation in the same way that the Fibonacci numbers are formed by repeated addition.

Universal code (data compression)

In data compression, a universal code for integers is a prefix code that maps the positive integers onto binary codewords, with the additional property that whatever the true probability distribution on integers, as long as the distribution is monotonic, the expected lengths of the codewords are within a constant factor of the expected lengths that the optimal code for that probability distribution would have assigned. A universal code is asymptotically optimal if the ratio between actual and optimal expected lengths is bounded by a function of the information entropy of the code that, in addition to being bounded, approaches 1 as entropy approaches infinity.

Non-standard positional numeral systems here designates numeral systems that may loosely be described as positional systems, but that do not entirely comply with the following description of standard positional systems:

In mathematics, negaFibonacci coding is a universal code which encodes nonzero integers into binary code words. It is similar to Fibonacci coding, except that it allows both positive and negative integers to be represented. All codes end with "11" and have no "11" before the end.

In mathematics, a sequence of natural numbers is called a complete sequence if every positive integer can be expressed as a sum of values in the sequence, using each value at most once.

Asymmetric numeral systems (ANS) is a family of entropy encoding methods introduced by Jarosław (Jarek) Duda from Jagiellonian University, used in data compression since 2014 due to improved performance compared to previous methods. ANS combines the compression ratio of arithmetic coding, with a processing cost similar to that of Huffman coding. In the tabled ANS (tANS) variant, this is achieved by constructing a finite-state machine to operate on a large alphabet without using multiplication.

In mathematics, the fibbinary numbers are the numbers whose binary representation does not contain two consecutive ones. That is, they are sums of distinct and non-consecutive powers of two.

References

  1. Duda, Jarek (2007). "Optimal encoding on discrete lattice with translational invariant constrains using statistical algorithms". arXiv: 0710.3861 [cs.IT].

Further reading