Mainline DHT

Last updated

Mainline DHT is the name given to the Kademlia-based distributed hash table (DHT) used by BitTorrent clients to find peers via the BitTorrent protocol. The idea of using a DHT for distributed tracking in BitTorrent was first implemented [1] [2] in Azureus 2.3.0.0 (now known as Vuze) in May 2005, from which it gained significant popularity. Unrelated but around the same time, BitTorrent, Inc. released a similar DHT into their client called Mainline DHT, and thus popularized the use of distributed tracking in the BitTorrent protocol. Measurement showed that by 2013, the concurrent number of users of Mainline DHT is from 16 million to 28 million, with intra-day changes of at least 10 million. [3]

Contents

Description

Mainline DHT is based on the popular Kademlia DHT design. [4] Prior to usage of a DHT for distributing peers, trackers were the only method of finding peers. [5] The key feature of using the DHT over trackers is that the decentralized approach favours the nature of the BitTorrent protocol. The DHT operates by distributing lists of peers identified by the SHA-1 hash of the torrent.

Each client randomly selects a 160-bit node ID, and the distance between nodes is calculated by the XOR of their IDs. Mainline DHT uses k-buckets to track known nodes.

To download a file, a client first queries its k-buckets for nodes closest to the torrent's info_hash. It connects to these nodes to find peers. If peers are found, the client downloads the file from multiple sources, similar to the standard BitTorrent protocol. [5]

Operation

The SHA-1 hash of a torrent, the infohash, is synonymous with a Kademlia key, which is used for finding peers (values) in the overlay network. To find peers in a swarm, a node sends a get_peers query with the infohash as key (equivalent to a Kademlia FIND_VALUE) to the closest known nodes (with respect to the key distance). Like Kademlia, if the node does not return the value (peers), it persists further in an iterative operation. However, after the search is exhausted, the client then also inserts the peer contact information for itself onto the responding nodes with IDs closest to the infohash of the torrent.

Token

Nodes use an additional measure known as a token to ensure others don't sign up other hosts for torrents. The return value for a query for peers includes this opaque value. For a node to announce that its controlling peer is downloading a torrent, it must present the token received from the same queried node in a recent query for peers. When a node attempts to "announce" a torrent, the queried node checks the token against the querying node's IP address.

Mainline DHT uses the SHA-1 hash of the IP address concatenated onto a secret that changes every five minutes for a token value. Tokens up to ten minutes old are accepted.

KRPC

A node in the Mainline DHT consists of an IP and port combination. Nodes communicate via an RPC protocol called KRPC. KRPC is a simple protocol that consists of nodes sending messages (queries, replies and errors) containing bencoded dictionaries over UDP.

A KRPC message is a single dictionary with two keys common to every message and additional keys depending on the type of message. Every message has a key "t" with a string value representing a transaction ID. This transaction ID is generated by the querying node and is echoed in the response, so responses may be correlated with multiple queries to the same node. The transaction ID should be encoded as a short string of binary numbers, typically two octets are enough as they cover 2^16 outstanding queries. The other key contained in every KRPC message is "y" with a single character value describing the type of message. The value of the "y" key is one of "q" for query, "r" for response, or "e" for error.

Queries

Queries, or KRPC message dictionaries with a "y" value of "q", contain two additional keys; "q" and "a". Key "q" has a string value containing the method name of the query. Key "a" has a dictionary value containing named arguments to the query.

Responses

Responses, or KRPC message dictionaries with a "y" value of "r", contain one additional key "r". The value of "r" is a dictionary containing named return values. Response messages are sent upon successful completion of a query.

Errors

Errors, or KRPC message dictionaries with a "y" value of "e", contain one additional key "e". The value of "e" is a list. The first element is an integer representing the error code. The second element is a string containing the error message. Errors are sent when a query cannot be fulfilled.

Routing table

Buckets are structured differently from those in Kademlia. Instead of a list of 160 buckets, BitTorrent starts with only one bucket. When a bucket becomes full, one of two things can happen:

Splitting is an operation that occurs only if our own node ID falls within the range of the bucket. The bucket being split is replaced by two new buckets each with half the range of the old bucket and the nodes from the old bucket are distributed among the two new ones.

There are two benefits to this bucket implementation:

Implementations

Mainline DHT was first included in version 4.2.0 of the BitTorrent software (November 2005). Since then, it has been implemented by a number of other clients:

Related Research Articles

<span class="mw-page-title-main">Distributed hash table</span> Decentralized distributed system with lookup service

A distributed hash table (DHT) is a distributed system that provides a lookup service similar to a hash table. Key–value pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key. The main advantage of a DHT is that nodes can be added or removed with minimum work around re-distributing keys. Keys are unique identifiers which map to particular values, which in turn can be anything from addresses, to documents, to arbitrary data. Responsibility for maintaining the mapping from keys to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. This allows a DHT to scale to extremely large numbers of nodes and to handle continual node arrivals, departures, and failures.

BitTorrent, also referred to simply as torrent, is a communication protocol for peer-to-peer file sharing (P2P), which enables users to distribute data and electronic files over the Internet in a decentralized manner. The protocol is developed and maintained by Rainberry, Inc., and was first released in 2001.

gtk-gnutella

gtk-gnutella is a peer-to-peer file sharing application which runs on the gnutella network. gtk-gnutella uses the GTK+ toolkit for its graphical user interface. Released under the GNU General Public License, gtk-gnutella is free software.

<span class="mw-page-title-main">GNUnet</span> Framework for decentralized, peer-to-peer networking which is part of the GNU Project

GNUnet is a software framework for decentralized, peer-to-peer networking and an official GNU package. The framework offers link encryption, peer discovery, resource allocation, communication over many transports and various basic peer-to-peer algorithms for routing, multicast and network size estimation.

In computing, Chord is a protocol and algorithm for a peer-to-peer distributed hash table. A distributed hash table stores key-value pairs by assigning keys to different computers ; a node will store the values for all the keys for which it is responsible. Chord specifies how keys are assigned to nodes, and how a node can discover the value for a given key by first locating the node responsible for that key.

Kademlia is a distributed hash table for decentralized peer-to-peer computer networks designed by Petar Maymounkov and David Mazières in 2002. It specifies the structure of the network and the exchange of information through node lookups. Kademlia nodes communicate among themselves using UDP. A virtual or overlay network is formed by the participant nodes. Each node is identified by a number or node ID. The node ID serves not only as identification, but the Kademlia algorithm uses the node ID to locate values.

The Invisible Internet Project (I2P) is an anonymous network layer that allows for censorship-resistant, peer-to-peer communication. Anonymous connections are achieved by encrypting the user's traffic, and sending it through a volunteer-run network of roughly 55,000 computers distributed around the world. Given the high number of possible paths the traffic can transit, a third party watching a full connection is unlikely. The software that implements this layer is called an "I2P router", and a computer running I2P is called an "I2P node". I2P is free and open sourced, and is published under multiple licenses.

<span class="mw-page-title-main">Magnet URI scheme</span> Scheme that defines the format of magnet links

Magnet is a URI scheme that defines the format of magnet links, a de facto standard for identifying files (URN) by their content, via cryptographic hash value rather than by their location.

The Kad network is a peer-to-peer (P2P) network which implements the Kademlia P2P overlay protocol. The majority of users on the Kad Network are also connected to servers on the eDonkey network, and Kad Network clients typically query known nodes on the eDonkey network in order to find an initial node on the Kad network.

A BitTorrent tracker is a special type of server that assists in the communication between peers using the BitTorrent protocol.

Protocol encryption (PE), message stream encryption (MSE) or protocol header encrypt (PHE) are related features of some peer-to-peer file-sharing clients, including BitTorrent clients. They attempt to enhance privacy and confidentiality. In addition, they attempt to make traffic harder to identify by third parties including internet service providers (ISPs). However, encryption will not protect one from DMCA notices from sharing illegal content, as one is still uploading material and the monitoring firms can merely connect to the swarm.

Peer exchange or PEX is a communications protocol that augments the BitTorrent file sharing protocol. It allows a group of users that are collaborating to share a given file to do so more swiftly and efficiently.

BitVault is a content-addressable distributed storage system, developed by Microsoft Research in China. BitVault uses peer-to-peer technology to distribute the tasks of storing and managing data. As such, there is no central authority responsible for management of the system. Rather, it is self-managing, provides high availability, reliability and scales up in a self-organizing manner, with low administrative overhead, which is almost constant irrespective of the size of the distributed overlay network.

Pastry is an overlay network and routing network for the implementation of a distributed hash table (DHT) similar to Chord. The key–value pairs are stored in a redundant peer-to-peer network of connected Internet hosts. The protocol is bootstrapped by supplying it with the IP address of a peer already in the network and from then on via the routing table which is dynamically built and repaired. It is claimed that because of its redundant and decentralized nature there is no single point of failure and any single node can leave the network at any time without warning and with little or no chance of data loss. The protocol is also capable of using a routing metric supplied by an outside program, such as ping or traceroute, to determine the best routes to store in its routing table.

The following is a general comparison of BitTorrent clients, which are computer programs designed for peer-to-peer file sharing using the BitTorrent protocol.

This is a glossary of jargon related to peer-to-peer file sharing via the BitTorrent protocol.

Peer-to-peer file sharing (P2P) systems like Gnutella, KaZaA, and eDonkey/eMule, have become extremely popular in recent years, with the estimated user population in the millions. An academic research paper analyzed Gnutella and eMule protocols and found weaknesses in the protocol; many of the issues found in these networks are fundamental and probably common on other P2P networks. Users of file sharing networks, such as eMule and Gnutella, are subject to monitoring of their activity. Clients may be tracked by IP address, DNS name, software version they use, files they share, queries they initiate, and queries they answer to. Clients may also share their private files to the network without notice due to inappropriate settings.

Pharos is a hierarchical and decentralized network coordinate system. With the help of a simple two-level architecture, it achieves much better prediction accuracy then the representative Vivaldi coordinates, and it is incrementally deployable.

In the BitTorrent file distribution system, a torrent file or meta-info file is a computer file that contains metadata about files and folders to be distributed, and usually also a list of the network locations of trackers, which are computers that help participants in the system find each other and form efficient distribution groups called swarms. Torrent files are normally named with the extension .torrent.

TomP2P is a distributed hash table which provides a decentralized key-value infrastructure for distributed applications. Each peer has a table that can be configured either to be disk-based or memory-based to store its values.

References

  1. Jones, Ben (7 June 2015). "BitTorrent's DHT Turns 10 Years Old". TorrentFreak . Archived from the original on 2015-06-11. Retrieved 2015-07-05.
  2. "Vuze Changelog". Azureus.sourceforge.net. Archived from the original on 2006-12-01. Retrieved 2012-07-15.
  3. Wang, Liang; Kangasharju, Jussi (2013). "Measuring Large-Scale Distributed Systems: Case of BitTorrent Mainline DHT" (PDF). IEEE Peer-to-Peer. Archived (PDF) from the original on 12 May 2014. Retrieved 26 October 2013.
  4. Loewenstern, Andrew; Norberg, Arvid (2013-03-22). "DHT Protocol". BitTorrent.org. Archived from the original on 2015-05-20. Retrieved 2021-11-26.
  5. 1 2 Xinxing, Zhang; Zhihong, Tian; Luchen, Zhang (16 June 2016). "A Measurement Study on Mainline DHT and Magnet Link". 2016 IEEE First International Conference on Data Science in Cyberspace (DSC). pp. 11–19. doi:10.1109/DSC.2016.106. ISBN   978-1-5090-1192-6.
  6. "About – Deluge". Archived from the original on 2011-02-10. Retrieved 2014-06-29.