Fountain code

Last updated

In coding theory, fountain codes (also known as rateless erasure codes) are a class of erasure codes with the property that a potentially limitless sequence of encoding symbols can be generated from a given set of source symbols such that the original source symbols can ideally be recovered from any subset of the encoding symbols of size equal to or only slightly larger than the number of source symbols. The term fountain or rateless refers to the fact that these codes do not exhibit a fixed code rate.

Contents

A fountain code is optimal if the original k source symbols can be recovered from any k successfully received encoding symbols (i.e., excluding those that were erased). Fountain codes are known that have efficient encoding and decoding algorithms and that allow the recovery of the original k source symbols from any k’ of the encoding symbols with high probability, where k’ is just slightly larger than k.

LT codes were the first practical realization of fountain codes. Raptor codes and online codes were subsequently introduced, and achieve linear time encoding and decoding complexity through a pre-coding stage of the input symbols.

Applications

Fountain codes are flexibly applicable at a fixed code rate, or where a fixed code rate cannot be determined a priori, and where efficient encoding and decoding of large amounts of data is required.

One example is that of a data carousel, where some large file is continuously broadcast to a set of receivers. [1] Using a fixed-rate erasure code, a receiver missing a source symbol (due to a transmission error) faces the coupon collector's problem: it must successfully receive an encoding symbol which it does not already have. This problem becomes much more apparent when using a traditional short-length erasure code, as the file must be split into several blocks, each being separately encoded: the receiver must now collect the required number of missing encoding symbols for each block. Using a fountain code, it suffices for a receiver to retrieve any subset of encoding symbols of size slightly larger than the set of source symbols. (In practice, the broadcast is typically scheduled for a fixed period of time by an operator based on characteristics of the network and receivers and desired delivery reliability, and thus the fountain code is used at a code rate that is determined dynamically at the time when the file is scheduled to be broadcast.)

Another application is that of hybrid ARQ in reliable multicast scenarios: parity information that is requested by a receiver can potentially be useful for all receivers in the multicast group.

In standards

Raptor codes are the most efficient fountain codes at this time, [2] having very efficient linear time encoding and decoding algorithms, and requiring only a small constant number of XOR operations per generated symbol for both encoding and decoding. [3] IETF RFC 5053 specifies in detail a systematic Raptor code, which has been adopted into multiple standards beyond the IETF, such as within the 3GPP MBMS standard for broadcast file delivery and streaming services, the DVB-H IPDC standard for delivering IP services over DVB networks, and DVB-IPTV for delivering commercial TV services over an IP network. This code can be used with up to 8,192 source symbols in a source block, and a total of up to 65,536 encoded symbols generated for a source block. This code has an average relative reception overhead of 0.2% when applied to source blocks with 1,000 source symbols, and has a relative reception overhead of less than 2% with probability 99.9999%. [4] The relative reception overhead is defined as the extra encoding data required beyond the length of the source data to recover the original source data, measured as a percentage of the size of the source data. For example, if the relative reception overhead is 0.2%, then this means that source data of size 1  megabyte can be recovered from 1.002 megabytes of encoding data.

A more advanced Raptor code with greater flexibility and improved reception overhead, called RaptorQ, has been specified in IETF RFC 6330. The specified RaptorQ code can be used with up to 56,403 source symbols in a source block, and a total of up to 16,777,216 encoded symbols generated for a source block. This code is able to recover a source block from any set of encoded symbols equal to the number of source symbols in the source block with high probability, and in rare cases from slightly more than the number of source symbols in the source block. The RaptorQ code is an integral part of the ROUTE instantiation specified in ATSC A-331 (ATSC 3.0)

For data storage

Erasure codes are used in data storage applications due to massive savings on the number of storage units for a given level of redundancy and reliability. The requirements of erasure code design for data storage, particularly for distributed storage applications, might be quite different relative to communication or data streaming scenarios. One of the requirements of coding for data storage systems is the systematic form, i.e., the original message symbols are part of the coded symbols.[ citation needed ] Systematic form enables reading off the message symbols without decoding from a storage unit. In addition, since the bandwidth and communication load between storage nodes can be a bottleneck, codes that allow minimum communication are very beneficial particularly when a node fails and a system reconstruction is needed to achieve the initial level of redundancy. In that respect, fountain codes are expected to allow efficient repair process in case of a failure: When a single encoded symbol is lost, it should not require too much communication and computation among other encoded symbols in order to resurrect the lost symbol. In fact, repair latency might sometimes be more important than storage space savings. Repairable fountain codes [5] are projected to address fountain code design objectives for storage systems. A detailed survey about fountain codes and their applications can be found at. [6]

A different approach to distributed storage using fountain codes has been proposed in Liquid Cloud Storage. [7] [8] Liquid Cloud Storage is based on using a large erasure code such as the RaptorQ code specified in IETF RFC 6330 (which provides significantly better data protection than other systems), using a background repair process (which significantly reduces the repair bandwidth requirements compared to other systems), and using a stream data organization (which allows fast access to data even when not all encoded symbols are available).

See also

Notes

  1. J. Byers, M. Luby, M. Mitzenmacher, A. Rege (1998). "A Digital Fountain Approach to Reliable Distribution of Bulk Data" (PDF).{{cite web}}: CS1 maint: multiple names: authors list (link)
  2. "Qualcomm Raptor Technology - Forward Error Correction". 2014-05-30.
  3. ( Shokrollahi 2006 )
  4. T. Stockhammer, A. Shokrollahi, M. Watson, M. Luby, T. Gasiba (March 2008). Furht, B.; Ahson, S. (eds.). "Application Layer Forward Error Correction for Mobile Multimedia Broadcasting". Handbook of Mobile Broadcasting: DVB-H, DMB, ISDB-T and Media FLO. CRC Press.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  5. Asteris, Megasthenis; Dimakis, Alexandros G. (2012). "Repairable Fountain Codes". IEEE Journal on Selected Areas in Communications. 32 (5): 1037–1047. arXiv: 1401.0734 . doi:10.1109/JSAC.2014.140522. S2CID   1300984.
  6. Arslan, Suayb S. (2014). "Incremental Redundancy, Fountain Codes and Advanced Topics". arXiv: 1402.6016 [cs.IT].
  7. Luby, Michael; Padovani, Roberto; Richardson, Thomas J.; Minder, Lorenz; Aggarwal, Pooja (2019). "Liquid Cloud Storage". ACM Transactions on Storage. 15: 1–49. arXiv: 1705.07983 . doi:10.1145/3281276. S2CID   738764.
  8. Luby, M.; Padovani, R.; Richardson, T.; Minder, L.; Aggarwal, P. (2017). "Liquid Cloud Storage". arXiv: 1705.07983v1 [cs.DC].

Related Research Articles

<span class="mw-page-title-main">Error detection and correction</span> Techniques that enable reliable delivery of digital data over unreliable communication channels

In information theory and coding theory with applications in computer science and telecommunication, error detection and correction (EDAC) or error control are techniques that enable reliable delivery of digital data over unreliable communication channels. Many communication channels are subject to channel noise, and thus errors may be introduced during transmission from the source to a receiver. Error detection techniques allow detecting such errors, while error correction enables reconstruction of the original data in many cases.

The Real-time Transport Protocol (RTP) is a network protocol for delivering audio and video over IP networks. RTP is used in communication and entertainment systems that involve streaming media, such as telephony, video teleconference applications including WebRTC, television services and web-based push-to-talk features.

<span class="mw-page-title-main">Line code</span> Pattern used within a communications system to represent digital data

In telecommunication, a line code is a pattern of voltage, current, or photons used to represent digital data transmitted down a communication channel or written to a storage medium. This repertoire of signals is usually called a constrained code in data storage systems. Some signals are more prone to error than others as the physics of the communication channel or storage medium constrains the repertoire of signals that can be used reliably.

In computing, Deflate is a lossless data compression file format that uses a combination of LZ77 and Huffman coding. It was designed by Phil Katz, for version 2 of his PKZIP archiving tool. Deflate was later specified in RFC 1951 (1996).

In computer programming, Base64 is a group of tetrasexagesimal binary-to-text encoding schemes that represent binary data in sequences of 24 bits that can be represented by four 6-bit Base64 digits.

DVB-T, short for Digital Video Broadcasting – Terrestrial, is the DVB European-based consortium standard for the broadcast transmission of digital terrestrial television that was first published in 1997 and first broadcast in Singapore in February, 1998. This system transmits compressed digital audio, digital video and other data in an MPEG transport stream, using coded orthogonal frequency-division multiplexing modulation. It is also the format widely used worldwide for Electronic News Gathering for transmission of video and audio from a mobile newsgathering vehicle to a central receive point. It is also used in the US by Amateur television operators.

<span class="mw-page-title-main">Coding theory</span> Study of the properties of codes and their fitness

Coding theory is the study of the properties of codes and their respective fitness for specific applications. Codes are used for data compression, cryptography, error detection and correction, data transmission and data storage. Codes are studied by various scientific disciplines—such as information theory, electrical engineering, mathematics, linguistics, and computer science—for the purpose of designing efficient and reliable data transmission methods. This typically involves the removal of redundancy and the correction or detection of errors in the transmitted data.

In information theory, a low-density parity-check (LDPC) code is a linear error correcting code, a method of transmitting a message over a noisy transmission channel. An LDPC code is constructed using a sparse Tanner graph. LDPC codes are capacity-approaching codes, which means that practical constructions exist that allow the noise threshold to be set very close to the theoretical maximum for a symmetric memoryless channel. The noise threshold defines an upper bound for the channel noise, up to which the probability of lost information can be made as small as desired. Using iterative belief propagation techniques, LDPC codes can be decoded in time linear in their block length.

In coding theory, an erasure code is a forward error correction (FEC) code under the assumption of bit erasures, which transforms a message of k symbols into a longer message with n symbols such that the original message can be recovered from a subset of the n symbols. The fraction r = k/n is called the code rate. The fraction k’/k, where k’ denotes the number of symbols required for recovery, is called reception efficiency.

In coding theory, Tornado codes are a class of erasure codes that support error correction. Tornado codes require a constant C more redundant blocks than the more data-efficient Reed–Solomon erasure codes, but are much faster to generate and can fix erasures faster. Software-based implementations of tornado codes are about 100 times faster on small lengths and about 10,000 times faster on larger lengths than Reed–Solomon erasure codes. Since the introduction of Tornado codes, many other similar erasure codes have emerged, most notably Online codes, LT codes and Raptor codes.

In computer science, Luby transform codes are the first class of practical fountain codes that are near-optimal erasure correcting codes. They were invented by Michael Luby in 1998 and published in 2002. Like some other fountain codes, LT codes depend on sparse bipartite graphs to trade reception overhead for encoding and decoding speed. The distinguishing characteristic of LT codes is in employing a particularly simple algorithm based on the exclusive or operation to encode and decode the message.

In computer science, Raptor codes are the first known class of fountain codes with linear time encoding and decoding. They were invented by Amin Shokrollahi in 2000/2001 and were first published in 2004 as an extended abstract. Raptor codes are a significant theoretical and practical improvement over LT codes, which were the first practical class of fountain codes.

In computing, telecommunication, information theory, and coding theory, forward error correction (FEC) or channel coding is a technique used for controlling errors in data transmission over unreliable or noisy communication channels.

In coding theory, a systematic code is any error-correcting code in which the input data are embedded in the encoded output. Conversely, in a non-systematic code the output does not contain the input symbols.

<span class="mw-page-title-main">Michael Luby</span> Information theorist and cryptographer

Michael George Luby is a mathematician and computer scientist, CEO of BitRipple, senior research scientist at the International Computer Science Institute (ICSI), former VP Technology at Qualcomm, co-founder and former chief technology officer of Digital Fountain. In coding theory he is known for leading the invention of the Tornado codes and the LT codes. In cryptography he is known for his contributions showing that any one-way function can be used as the basis for private cryptography, and for his analysis, in collaboration with Charles Rackoff, of the Feistel cipher construction. His distributed algorithm to find a maximal independent set in a computer network has also been influential.

ATSC-M/H is a U.S. standard for mobile digital TV that allows TV broadcasts to be received by mobile devices.

In coding theory, concatenated codes form a class of error-correcting codes that are derived by combining an inner code and an outer code. They were conceived in 1966 by Dave Forney as a solution to the problem of finding a code that has both exponentially decreasing error probability with increasing block length and polynomial-time decoding complexity. Concatenated codes became widely used in space communications in the 1970s.

Consistent Overhead Byte Stuffing (COBS) is an algorithm for encoding data bytes that results in efficient, reliable, unambiguous packet framing regardless of packet content, thus making it easy for receiving applications to recover from malformed packets. It employs a particular byte value, typically zero, to serve as a packet delimiter. When zero is used as a delimiter, the algorithm replaces each zero data byte with a non-zero value so that no zero data bytes will appear in the packet and thus be misinterpreted as packet boundaries.

Amin Shokrollahi is a German-Iranian mathematician who has worked on a variety of topics including coding theory and algebraic complexity theory. He is best known for his work on iterative decoding of graph based codes for which he received the IEEE Information Theory Paper Award of 2002. He is one of the inventors of a modern class of practical erasure codes known as tornado codes, and the principal developer of raptor codes, which belong to a class of rateless erasure codes known as Fountain codes. In connection with the work on these codes, he received the IEEE Eric E. Sumner Award in 2007 together with Michael Luby "for bridging mathematics, Internet design and mobile broadcasting as well as successful standardization" and the IEEE Richard W. Hamming Medal in 2012 together with Michael Luby "for the conception, development, and analysis of practical rateless codes". He also received the 2007 joint Communication Society and Information Theory Society best paper award as well as the 2017 Mustafa Prize for his work on raptor codes.

NACK-Oriented Reliable Multicast (NORM) is a transport layer Internet protocol designed to provide reliable transport in multicast groups in data networks. It is formally defined by the Internet Engineering Task Force (IETF) in Request for Comments (RFC) 5740, which was published in November 2009.

References