In computer networking, linear network coding is a program in which intermediate nodes transmit data from source nodes to sink nodes by means of linear combinations.
Linear network coding may be used to improve a network's throughput, efficiency, and scalability, as well as reducing attacks and eavesdropping. The nodes of a network take several packets and combine for transmission. This process may be used to attain the maximum possible information flow in a network.
It has been proven that, theoretically, linear coding is enough to achieve the upper bound in multicast problems with one source. [1] However linear coding is not sufficient in general; even for more general versions of linearity such as convolutional coding and filter-bank coding. [2] Finding optimal coding solutions for general network problems with arbitrary demands is a hard problem, which can be NP-hard [3] [4] and even undecidable. [5] [6]
In a linear network coding problem, a group of nodes are involved in moving the data from source nodes to sink nodes. Each node generates new packets which are linear combinations of past received packets by multiplying them by coefficients chosen from a finite field, typically of size .
More formally, each node, with indegree, , generates a message from the linear combination of received messages by the formula:
Where the values are coefficients selected from . Since operations are computed in a finite field, the generated message is of the same length as the original messages. Each node forwards the computed value along with the coefficients, , used in the level, .
Sink nodes receive these network coded messages, and collect them in a matrix. The original messages can be recovered by performing Gaussian elimination on the matrix. [7] In reduced row echelon form, decoded packets correspond to the rows of the form
A network is represented by a directed graph . is the set of nodes or vertices, is the set of directed links (or edges), and gives the capacity of each link of . Let be the maximum possible throughput from node to node . By the max-flow min-cut theorem, is upper bounded by the minimum capacity of all cuts, which is the sum of the capacities of the edges on a cut, between these two nodes.
Karl Menger proved that there is always a set of edge-disjoint paths achieving the upper bound in a unicast scenario, known as the max-flow min-cut theorem. Later, the Ford–Fulkerson algorithm was proposed to find such paths in polynomial time. Then, Edmonds proved in the paper "Edge-Disjoint Branchings"[ which? ] the upper bound in the broadcast scenario is also achievable, and proposed a polynomial time algorithm.
However, the situation in the multicast scenario is more complicated, and in fact, such an upper bound can't be reached using traditional routing ideas. Ahlswede et al. proved that it can be achieved if additional computing tasks (incoming packets are combined into one or several outgoing packets) can be done in the intermediate nodes. [8]
The butterfly network [8] is often used to illustrate how linear network coding can outperform routing. Two source nodes (at the top of the picture) have information A and B that must be transmitted to the two destination nodes (at the bottom). Each destination node wants to know both A and B. Each edge can carry only a single value (we can think of an edge transmitting a bit in each time slot).
If only routing were allowed, then the central link would be only able to carry A or B, but not both. Supposing we send A through the center; then the left destination would receive A twice and not know B at all. Sending B poses a similar problem for the right destination. We say that routing is insufficient because no routing scheme can transmit both A and B to both destinations simultaneously. Meanwhile, it takes four time slots in total for both destination nodes to know A and B.
Using a simple code, as shown, A and B can be transmitted to both destinations simultaneously by sending the sum of the symbols through the two relay nodes – encoding A and B using the formula "A+B". The left destination receives A and A + B, and can calculate B by subtracting the two values. Similarly, the right destination will receive B and A + B, and will also be able to determine both A and B. Therefore, with network coding, it takes only three time slots and improves the throughput.
Random linear network coding [9] (RLNC) is a simple yet powerful encoding scheme, which in broadcast transmission schemes allows close to optimal throughput using a decentralized algorithm. Nodes transmit random linear combinations of the packets they receive, with coefficients chosen randomly, with a uniform distribution from a Galois field. If the field size is sufficiently large, the probability that the receiver(s) will obtain linearly independent combinations (and therefore obtain innovative information) approaches 1. It should however be noted that, although random linear network coding has excellent throughput performance, if a receiver obtains an insufficient number of packets, it is extremely unlikely that they can recover any of the original packets. This can be addressed by sending additional random linear combinations until the receiver obtains the appropriate number of packets.
There are three key parameters in RLNC. The first one is the generation size. In RLNC, the original data transmitted over the network is divided into packets. The source and intermediate nodes in the network can combine and recombine the set of original and coded packets. The original packets form a block, usually called a generation. The number of original packets combined and recombined together is the generation size. The second parameter is the packet size. Usually, the size of the original packets is fixed. In the case of unequally-sized packets, these can be zero-padded if they are shorter or split into multiple packets if they are longer. In practice, the packet size can be the size of the maximum transmission unit (MTU) of the underlying network protocol. For example, it can be around 1500 bytes in an Ethernet frame. The third key parameter is the Galois field used. In practice, the most commonly used Galois fields are binary extension fields. And the most commonly used sizes for the Galois fields are the binary field and the so-called binary-8 (). In the binary field, each element is one bit long, while in the binary-8, it is one byte long. Since the packet size is usually larger than the field size, each packet is seen as a set of elements from the Galois field (usually referred to as symbols) appended together. The packets have a fixed amount of symbols (Galois field elements), and since all the operations are performed over Galois fields, then the size of the packets does not change with subsequent linear combinations.
The sources and the intermediate nodes can combine any subset of the original and previously coded packets performing linear operations. To form a coded packet in RLNC, the original and previously coded packets are multiplied by randomly chosen coefficients and added together. Since each packet is just an appended set of Galois field elements, the operations of multiplication and addition are performed symbol-wise over each of the individual symbols of the packets, as shown in the picture from the example.
To preserve the statelessness of the code, the coding coefficients used to generate the coded packets are appended to the packets transmitted over the network. Therefore, each node in the network can see what coefficients were used to generate each coded packet. One novelty of linear network coding over traditional block codes is that it allows the recombination of previously coded packets into new and valid coded packets. This process is usually called recoding. After a recoding operation, the size of the appended coding coefficients does not change. Since all the operations are linear, the state of the recoded packet can be preserved by applying the same operations of addition and multiplication to the payload and the appended coding coefficients. In the following example, we will illustrate this process.
Any destination node must collect enough linearly independent coded packets to be able to reconstruct the original data. Each coded packet can be understood as a linear equation where the coefficients are known since they are appended to the packet. In these equations, each of the original packets is the unknown. To solve the linear system of equations, the destination needs at least linearly independent equations (packets).
In the figure, we can see an example of two packets linearly combined into a new coded packet. In the example, we have two packets, namely packet and packet . The generation size of our example is two. We know this because each packet has two coding coefficients () appended. The appended coefficients can take any value from the Galois field. However, an original, uncoded data packet would have appended the coding coefficients or , which means that they are constructed by a linear combination of zero times one of the packets plus one time the other packet. Any coded packet would have appended other coefficients. In our example, packet for instance has appended the coefficients . Since network coding can be applied at any layter of the communication protocol, these packets can have a header from the other layers, which is ignored in the network coding operations.
Now, lets assume that the network node wants to produce a new coded packet combining packet and packet . In RLNC, it will randomly choose two coding coefficients, and in the example. The node will multiply each symbol of packet by , and each symbol of packet by . Then, it will add the results symbol-wise to produce the new coded data. It will perform the same operations of multiplication and addition to the coding coefficients of the coded packets.
Linear network coding is still a relatively new subject. However, the topic has been vastly researched over the last twenty years. Nevertheless, there are still some misconceptions that are no longer valid:
Decoding computational complexity: Network coding decoders have been improved over the years. Nowadays, the algorithms are highly efficient and parallelizable. In 2016, with Intel Core i5 processors with SIMD instructions enabled, the decoding goodput of network coding was 750 MB/s for a generation size of 16 packets and 250 MB/s for a generation size of 64 packets. [10] Furthermore, today's algorithms can be vastly parallelizable, increasing the encoding and decoding goodput even further. [11]
Transmission Overhead: It is usually thought that the transmission overhead of network coding is high due to the need to append the coding coefficients to each coded packet. In reality, this overhead is negligible in most applications. The overhead due to coding coefficients can be computed as follows. Each packet has appended coding coefficients. The size of each coefficient is the number of bits needed to represent one element of the Galois field. In practice, most network coding applications use a generation size of no more than 32 packets per generation and Galois fields of 256 elements (binary-8). With these numbers, each packet needs bytes of appended overhead. If each packet is 1500 bytes long (i.e. the Ethernet MTU), then 32 bytes represent an overhead of only 2%.
Overhead due to linear dependencies: Since the coding coefficients are chosen randomly in RLNC, there is a chance that some transmitted coded packets are not beneficial to the destination because they are formed using a linearly dependent combination of packets. However, this overhead is negligible in most applications. The linear dependencies depend on the Galois fields' size and are practically independent of the generation size used. We can illustrate this with the following example. Let us assume we are using a Galois field of elements and a generation size of packets. If the destination has not received any coded packet, we say it has degrees of freedom, and then almost any coded packet will be useful and innovative. In fact, only the zero-packet (only zeroes in the coding coefficients) will be non-innovative. The probability of generating the zero-packet is equal to the probability of each of the coding coefficient to be equal to the zero-element of the Galois field. I.e., the probability of a non-innovative packet is of . With each successive innovative transmission, it can be shown that the exponent of the probability of a non innovative packet is reduced by one. When the destination has received innovative packets (i.e., it needs only one more packet to fully decode the data). Then the probability of a non innovative packet is of . We can use this knowledge to calculate the expected number of linearly dependent packets per generation. In the worst-case scenario, when the Galois field used contains only two elements (), the expected number of linearly dependent packets per generation is of 1.6 extra packets. If our generation size if of 32 or 64 packets, this represents an overhead of 5% or 2.5%, respectively. If we use the binary-8 field (), then the expected number of linearly dependent packets per generation is practically zero. Since it is the last packets the major contributor to the overhead due to linear dependencies, there are RLNC-based protocols such as tunable sparse network coding [12] that exploit this knowledge. These protocols introduce sparsity (zero-elements) in the coding coefficients at the beginning of the transmission to reduce the decoding complexity, and reduce the sparsity at the end of the transmission to reduce the overhead due to linear dependencies.
Over the years, multiple researchers and companies have integrated network coding solutions into their applications. [13] We can list some of the applications of network coding in different areas:
In telecommunications, orthogonal frequency-division multiplexing (OFDM) is a type of digital transmission used in digital modulation for encoding digital (binary) data on multiple carrier frequencies. OFDM has developed into a popular scheme for wideband digital communication, used in applications such as digital television and audio broadcasting, DSL internet access, wireless networks, power line networks, and 4G/5G mobile communications.
Linear predictive coding (LPC) is a method used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model.
Network congestion in data networking and queueing theory is the reduced quality of service that occurs when a network node or link is carrying more data than it can handle. Typical effects include queueing delay, packet loss or the blocking of new connections. A consequence of congestion is that an incremental increase in offered load leads either only to a small increase or even a decrease in network throughput.
A wireless mesh network (WMN) is a communications network made up of radio nodes organized in a mesh topology. It can also be a form of wireless ad hoc network.
Wireless sensor networks (WSNs) refer to networks of spatially dispersed and dedicated sensors that monitor and record the physical conditions of the environment and forward the collected data to a central location. WSNs can measure environmental conditions such as temperature, sound, pollution levels, humidity and wind.
Proportional-fair scheduling is a compromise-based scheduling algorithm. It is based upon maintaining a balance between two competing interests: Trying to maximize the total throughput of the network while at the same time allowing all users at least a minimal level of service. This is done by assigning each data flow a data rate or a scheduling priority that is inversely proportional to its anticipated resource consumption.
Network calculus is "a set of mathematical results which give insights into man-made systems such as concurrent programs, digital circuits and communication networks." Network calculus gives a theoretical framework for analysing performance guarantees in computer networks. As traffic flows through a network it is subject to constraints imposed by the system components, for example:
A computer network is a set of computers sharing resources located on or provided by network nodes. Computers use common communication protocols over digital interconnections to communicate with each other. These interconnections are made up of telecommunication network technologies based on physically wired, optical, and wireless radio-frequency methods that may be arranged in a variety of network topologies.
Aircrack-ng is a network software suite consisting of a detector, packet sniffer, WEP and WPA/WPA2-PSK cracker and analysis tool for 802.11 wireless LANs. It works with any wireless network interface controller whose driver supports raw monitoring mode and can sniff 802.11a, 802.11b and 802.11g traffic. Packages are released for Linux and Windows.
A wireless ad hoc network (WANET) or mobile ad hoc network (MANET) is a decentralized type of wireless network. The network is ad hoc because it does not rely on a pre-existing infrastructure, such as routers or wireless access points. Instead, each node participates in routing by forwarding data for other nodes. The determination of which nodes forward data is made dynamically on the basis of network connectivity and the routing algorithm in use.
Carrier Interferometry(CI) is a spread spectrum scheme designed to be used in an Orthogonal Frequency-Division Multiplexing (OFDM) communication system for multiplexing and multiple access, enabling the system to support multiple users at the same time over the same frequency band.
In radio, cooperative multiple-input multiple-output is a technology that can effectively exploit the spatial domain of mobile fading channels to bring significant performance improvements to wireless communication systems. It is also called network MIMO, distributed MIMO, virtual MIMO, and virtual antenna arrays.
In radio, multiple-input and multiple-output (MIMO) is a method for multiplying the capacity of a radio link using multiple transmission and receiving antennas to exploit multipath propagation. MIMO has become an essential element of wireless communication standards including IEEE 802.11n, IEEE 802.11ac, HSPA+ (3G), WiMAX, and Long Term Evolution (LTE). More recently, MIMO has been applied to power-line communication for three-wire installations as part of the ITU G.hn standard and of the HomePlug AV2 specification.
In communication networks, cognitive network (CN) is a new type of data network that makes use of cutting edge technology from several research areas to solve some problems current networks are faced with. Cognitive network is different from cognitive radio (CR) as it covers all the layers of the OSI model.
EraMobile -Epidemic-based Reliable and Adaptive Multicast for Mobile ad hoc networks is a bio-inspired reliable multicast protocol targeting mission critical ad hoc networks. EraMobile supports group applications that require high reliability and low overhead with loose delivery time constraints. The protocol aims to deliver multicast data with maximum reliability and minimal network overhead under adverse network conditions. EraMobile adopts an epidemic-based approach, which uses gossip messages, to cope with dynamic topology changes due to the mobility of network nodes. EraMobile's epidemic mechanism does not require maintaining any tree- or mesh-like structure for multicast operation. It requires neither a global nor a partial view of the network, nor does it require information about neighboring nodes and group members. The lack of a central structure for multicast lowers the network overhead by eliminating redundant data transmissions. EraMobile contains a simple adaptivity mechanism which tunes the frequency of control packages based on the node density in the network. This adaptivity mechanism helps the delivery of data reliably in both sparse networks -in which network connectivity is prone to interruptions- and dense networks -in which congestion is likely because of shared wireless medium-.
A distributed file system for cloud is a file system that allows many clients to have access to data and supports operations on that data. Each data file may be partitioned into several parts called chunks. Each chunk may be stored on different remote machines, facilitating the parallel execution of applications. Typically, data is stored in files in a hierarchical tree, where the nodes represent directories. There are several ways to share files in a distributed architecture: each solution must be suitable for a certain type of application, depending on how complex the application is. Meanwhile, the security of the system must be ensured. Confidentiality, availability and integrity are the main keys for a secure system.
Time Slotted Channel Hopping or Time Synchronized Channel Hopping (TSCH) is a channel access method for shared-medium networks.
Associativity-based routing is a mobile routing protocol invented for wireless ad hoc networks, also known as mobile ad hoc networks (MANETs) and wireless mesh networks. ABR was invented in 1993, filed for a U.S. patent in 1996, and granted the patent in 1999. ABR was invented by Chai Keong Toh while doing his Ph.D. at Cambridge University.
Orthogonal Time Frequency Space (OTFS) is a 2D modulation technique that transforms the information carried in the Delay-Doppler coordinate system. The information is transformed in the similar time-frequency domain as utilized by the traditional schemes of modulation such as TDMA, CDMA, and OFDM. It was first used for fixed wireless, and is now a contending waveform for 6G technology due to its robustness in high-speed vehicular scenarios.
George N. Rouskas is a computer scientist, academic, and author. He is an Alumni Distinguished Graduate Professor and Director of Graduate Programs in the Department of Computer Science at North Carolina State University.
Any receiver can then recover the source vectors using Gaussian elimination on the vectors in its h (or more) received packets.
{{cite book}}
: CS1 maint: location missing publisher (link)This article's use of external links may not follow Wikipedia's policies or guidelines.(March 2022) |