Content-addressable network

Last updated July 04, 2023

The content-addressable network (CAN) is a distributed, decentralized P2P infrastructure that provides hash table functionality on an Internet-like scale. CAN was one of the original four distributed hash table proposals, introduced concurrently with Chord, Pastry, and Tapestry.

Overview

Like other distributed hash tables, CAN is designed to be scalable, fault tolerant, and self-organizing. The architectural design is a virtual multi-dimensional Cartesian coordinate space, a type of overlay network, on a multi-torus. This n-dimensional coordinate space is a virtual logical address, completely independent of the physical location and physical connectivity of the nodes. Points within the space are identified with coordinates. The entire coordinate space is dynamically partitioned among all the nodes in the system such that every node possesses at least one distinct zone within the overall space.^[1]

Routing

A CAN node maintains a routing table that holds the IP address and virtual coordinate zone of each of its neighbors. A node routes a message towards a destination point in the coordinate space. The node first determines which neighboring zone is closest to the destination point, and then looks up that zone's node's IP address via the routing table.^[1]

Node joining

To join a CAN, a joining node must:

Find a node already in the overlay network.
Identify a zone that can be split
Update the routing tables of nodes neighboring the newly split zone.^[1]

To find a node already in the overlay network, bootstrapping nodes may be used to inform the joining node of IP addresses of nodes currently in the overlay network.^[1]

After the joining node receives an IP address of a node already in the CAN, it can attempt to identify a zone for itself. The joining node randomly picks a point in the coordinate space and sends a join request, directed to the random point, to one of the received IP addresses. The nodes already in the overlay network route the join request to the correct device via their zone-to-IP routing tables. Once the node managing the destination point's zone receives the join request, it may honor the join request by splitting its zone in half, allocating itself the first half, and allocating the joining node the second half. If it does not honor the join request, the joining node keeps picking random points in the coordinate space and sending join requests directed to these random points until it successfully joins the network.^[1]

After the zone split and allocation is complete, the neighboring nodes are updated with the coordinates of the two new zones and the corresponding IP addresses. Routing tables are updated and updates are propagated across the network.^[1]

Node departing

To handle a node departing, the CAN must

identify a node is departing
have the departing node's zone merged or taken over by a neighboring node
update the routing tables across the network.^[1]

Detecting a node's departure can be done, for instance, via heartbeat messages that periodically broadcast routing table information between neighbors. After a predetermined period of silence from a neighbor, that neighboring node is determined as failed and is considered a departing node.^[1] Alternatively, a node that is willingly departing may broadcast such a notice to its neighbors.

After a departing node is identified, its zone must be either merged or taken over. First the departed node's zone is analyzed to determine whether a neighboring node's zone can merge with the departed node's zone to form a valid zone. For example, a zone in a 2D coordinate space must be either a square or rectangle and cannot be L-shaped. The validation test may cycle through all neighboring zones to determine if a successful merge can occur. If one of the potential merges is deemed a valid merge, the zones are then merged. If none of the potential merges are deemed valid, then the neighboring node with the smallest zone takes over control of the departing node's zone.^[1] After a take-over, the take-over node may periodically attempt to merge its additionally controlled zones with respective neighboring zones.

If the merge is successful, routing tables of neighboring zones' nodes are updated to reflect the merge. The network will see the subsection of the overlay network as one, single zone after a merge and treat all routing processing with this mindset. To effectuate a take-over, the take-over node updates neighboring zones' nodes' routing tables, so that requests to either zone resolve to the take-over node. And, as such, the network still sees the subsection of the overlay network as two separate zones and treats all routing processing with this mindset.

Developers

Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker

Related Research Articles

An Internet Protocol address is a numerical label such as 192.0.2.1 that is connected to a computer network that uses the Internet Protocol for communication. An IP address serves two main functions: network interface identification and location addressing.

Routing is the process of selecting a path for traffic in a network or between or across multiple networks. Broadly, routing is performed in many types of networks, including circuit-switched networks, such as the public switched telephone network (PSTN), and computer networks, such as the Internet.

The Address Resolution Protocol (ARP) is a communication protocol used for discovering the link layer address, such as a MAC address, associated with a given internet layer address, typically an IPv4 address. This mapping is a critical function in the Internet protocol suite. ARP was defined in 1982 by RFC 826, which is Internet Standard STD 37.

A distributed hash table (DHT) is a distributed system that provides a lookup service similar to a hash table. Key–value pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key. The main advantage of a DHT is that nodes can be added or removed with minimum work around re-distributing keys. Keys are unique identifiers which map to particular values, which in turn can be anything from addresses, to documents, to arbitrary data. Responsibility for maintaining the mapping from keys to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. This allows a DHT to scale to extremely large numbers of nodes and to handle continual node arrivals, departures, and failures.

Zigbee is an IEEE 802.15.4-based specification for a suite of high-level communication protocols used to create personal area networks with small, low-power digital radios, such as for home automation, medical device data collection, and other low-power low-bandwidth needs, designed for small scale projects which need wireless connection. Hence, Zigbee is a low-power, low data rate, and close proximity wireless ad hoc network.

GNUnet is a software framework for decentralized, peer-to-peer networking and an official GNU package. The framework offers link encryption, peer discovery, resource allocation, communication over many transports and various basic peer-to-peer algorithms for routing, multicast and network size estimation.

In computing, Chord is a protocol and algorithm for a peer-to-peer distributed hash table. A distributed hash table stores key-value pairs by assigning keys to different computers ; a node will store the values for all the keys for which it is responsible. Chord specifies how keys are assigned to nodes, and how a node can discover the value for a given key by first locating the node responsible for that key.

Kademlia is a distributed hash table for decentralized peer-to-peer computer networks designed by Petar Maymounkov and David Mazières in 2002. It specifies the structure of the network and the exchange of information through node lookups. Kademlia nodes communicate among themselves using UDP. A virtual or overlay network is formed by the participant nodes. Each node is identified by a number or node ID. The node ID serves not only as identification, but the Kademlia algorithm uses the node ID to locate values.

An overlay network is a computer network that is layered on top of another network.

IP traceback is any method for reliably determining the origin of a packet on the Internet. The IP protocol does not provide for the authentication of the source IP address of an IP packet, enabling the source address to be falsified in a strategy called IP address spoofing, and creating potential internet security and stability problems.

A computer network is a set of computers sharing resources located on or provided by network nodes. Computers use common communication protocols over digital interconnections to communicate with each other. These interconnections are made up of telecommunication network technologies based on physically wired, optical, and wireless radio-frequency methods that may be arranged in a variety of network topologies.

BitVault is a content-addressable distributed storage system, developed by Microsoft Research in China. BitVault uses peer-to-peer technology to distribute the tasks of storing and managing data. As such, there is no central authority responsible for management of the system. Rather, it is self-managing, provides high availability, reliability and scales up in a self-organizing manner, with low administrative overhead, which is almost constant irrespective of the size of the distributed overlay network.

Pastry is an overlay network and routing network for the implementation of a distributed hash table (DHT) similar to Chord. The key–value pairs are stored in a redundant peer-to-peer network of connected Internet hosts. The protocol is bootstrapped by supplying it with the IP address of a peer already in the network and from then on via the routing table which is dynamically built and repaired. It is claimed that because of its redundant and decentralized nature there is no single point of failure and any single node can leave the network at any time without warning and with little or no chance of data loss. The protocol is also capable of using a routing metric supplied by an outside program, such as ping or traceroute, to determine the best routes to store in its routing table.

In network theory, small-world routing refers to routing methods for small-world networks. Networks of this type are peculiar in that relatively short paths exist between any two nodes. Determining these paths, however, can be a difficult problem from the perspective of an individual routing node in the network if no further information is known about the network as a whole.

Tapestry is a peer-to-peer overlay network which provides a distributed hash table, routing, and multicasting infrastructure for distributed applications. The Tapestry peer-to-peer system offers efficient, scalable, self-repairing, location-aware routing to nearby resources.

The BAlanced Tree Overlay Network (BATON) is a distributed tree structure designed for peer-to-peer (P2P) systems. Unlike other overlays that utilize a distributed hash table (DHT) like the Chord system, BATON organizes peers in a distributed tree to facilitate range search. Furthermore, BATON aims to maintain a balanced tree height, similar to the AVL tree, resulting in a bounded to $for search exact and range queries as well as update operations (join/leave).$

A bootstrapping node, also known as a rendezvous host, is a node in an overlay network that provides initial configuration information to newly joining nodes so that they may successfully join the overlay network. Bootstrapping nodes are predominantly found in decentralized peer-to-peer (P2P) networks because of the dynamically changing identities and configurations of member nodes in these networks.

An Internet Protocol Version 6 address is a numeric label that is used to identify and locate a network interface of a computer or a network node participating in a computer network using IPv6. IP addresses are included in the packet header to indicate the source and the destination of each packet. The IP address of the destination is used to make decisions about routing IP packets to other networks.

Scalable Source Routing (SSR) is a routing protocol for unstructured networks such as mobile ad hoc networks, mesh networks, or sensor networks. It combines source routing with routing along a virtual ring, and is based on the idea of "pushing Chord into the underlay".

The network interface layer, also commonly referred to as the data link layer or link layer, is the lowest layer in the TCP/IP model. This particular layer has several unique security vulnerabilities that can be exploited by a determined adversary.

References

1 2 3 4 5 6 7 8 9 Ratnasamy; et al. (2001). "A Scalable Content-Addressable Network" (PDF). In Proceedings of ACM SIGCOMM 2001. Retrieved 2013-05-20.{{cite journal}}: Cite journal requires |journal= (help)

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Ratnasamy01-1] 1 2 3 4 5 6 7 8 9 Ratnasamy; et al. (2001). "A Scalable Content-Addressable Network" (PDF). In Proceedings of ACM SIGCOMM 2001. Retrieved 2013-05-20.{{cite journal}}: Cite journal requires |journal= (help)

[1]