Gossip protocol

Last updated

A gossip protocol or epidemic protocol is a procedure or process of computer peer-to-peer communication that is based on the way epidemics spread. [1] Some distributed systems use peer-to-peer gossip to ensure that data is disseminated to all members of a group. Some ad-hoc networks have no central registry and the only way to spread common data is to rely on each member to pass it along to their neighbors.

Contents

Communication

The concept of gossip communication can be illustrated by the analogy of office workers spreading rumors. Let's say each hour the office workers congregate around the water cooler. Each employee pairs off with another, chosen at random, and shares the latest gossip. At the start of the day, Dave starts a new rumor: he comments to Bob that he believes that Charlie dyes his mustache. At the next meeting, Bob tells Alice, while Dave repeats the idea to Eve. After each water cooler rendezvous, the number of individuals who have heard the rumor roughly doubles (though this doesn't account for gossiping twice to the same person; perhaps Dave tries to tell the story to Frank, only to find that Frank already heard it from Alice). Computer systems typically implement this type of protocol with a form of random "peer selection": with a given frequency, each machine picks another machine at random and shares any rumors.

Variants and styles

There are probably hundreds of variants of specific gossip-like protocols because each use-scenario is likely to be customized to the organization's specific needs.

For example, a gossip protocol might employ some of these ideas:

Protocol types

It is useful to distinguish two prevailing styles of gossip protocol: [2]

Many protocols that predate the earliest use of the term "gossip" fall within this rather inclusive definition. For example, Internet routing protocols often use gossip-like information exchanges. A gossip substrate can be used to implement a standard routed network: nodes "gossip" about traditional point-to-point messages, effectively pushing traffic through the gossip layer. Bandwidth permitting, this implies that a gossip system can potentially support any classic protocol or implement any classical distributed service. However, such a broadly inclusive interpretation is rarely intended. More typically gossip protocols are those that specifically run in a regular, periodic, relatively lazy, symmetric and decentralized manner; the high degree of symmetry among nodes is particularly characteristic. Thus, while one could run a 2-phase commit protocol over a gossip substrate, doing so would be at odds with the spirit, if not the wording, of the definition.

The term convergently consistent is sometimes used to describe protocols that achieve exponentially rapid spread of information. For this purpose, a protocol must propagate any new information to all nodes that will be affected by the information within time logarithmic in the size of the system (the "mixing time" must be logarithmic in system size).

Examples

Suppose that we want to find the object that most closely matches some search pattern, within a network of unknown size, but where the computers are linked to one another and where each machine is running a small agent program that implements a gossip protocol.

If the messages might get large (for example, if many searches are active all at the same time), a size limit should be introduced. Also, searches should "age out" of the network.

It follows that within logarithmic time in the size of the network (the number of agents), any new search string will have reached all agents. Within an additional delay of the same approximate length, every agent will learn where the best match can be found. In particular, the agent that started the search will have found the best match.

For example, in a network with 25,000 machines, we can find the best match after about 30 rounds of gossip: 15 to spread the search string and 15 more to discover the best match. A gossip exchange could occur as often as once every tenth of a second without imposing undue load, hence this form of network search could search a big data center in about three seconds.

In this scenario, searches might automatically age out of the network after, say, 10 seconds. By then, the initiator knows the answer and there is no point in further gossip about that search.

Gossip protocols have also been used for achieving and maintaining distributed database consistency or with other types of data in consistent states, counting the number of nodes in a network of unknown size, spreading news robustly, organizing nodes according to some structuring policy, building so-called overlay networks, computing aggregates, sorting the nodes in a network, electing leaders, etc.

Epidemic algorithms

Gossip protocols can be used to propagate information in a manner rather similar to the way that a viral infection spreads in a biological population. Indeed, the mathematics of epidemics are often used to model the mathematics of gossip communication. The term epidemic algorithm is sometimes employed when describing a software system in which this kind of gossip-based information propagation is employed.

See also

Related Research Articles

The Association for Computing Machinery (ACM) is a US-based international learned society for computing. It was founded in 1947 and is the world's largest scientific and educational computing society. The ACM is a non-profit professional membership group, claiming nearly 110,000 student and professional members as of 2022. Its headquarters are in New York City.

<span class="mw-page-title-main">Client–server model</span> Distributed application structure in computing

The client–server model is a distributed application structure that partitions tasks or workloads between the providers of a resource or service, called servers, and service requesters, called clients. Often clients and servers communicate over a computer network on separate hardware, but both client and server may reside in the same system. A server host runs one or more server programs, which share their resources with clients. A client usually does not share any of its resources, but it requests content or service from a server. Clients, therefore, initiate communication sessions with servers, which await incoming requests. Examples of computer applications that use the client–server model are email, network printing, and the World Wide Web.

A distributed system is a system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another. Distributed computing is a field of computer science that studies distributed systems.

<span class="mw-page-title-main">Peer-to-peer</span> Type of decentralized and distributed network architecture

Peer-to-peer (P2P) computing or networking is a distributed application architecture that partitions tasks or workloads between peers. Peers are equally privileged, equipotent participants in the network. They are said to form a peer-to-peer network of nodes.

<span class="mw-page-title-main">Distributed hash table</span> Decentralized distributed system with lookup service

A distributed hash table (DHT) is a distributed system that provides a lookup service similar to a hash table. Key–value pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key. The main advantage of a DHT is that nodes can be added or removed with minimum work around re-distributing keys. Keys are unique identifiers which map to particular values, which in turn can be anything from addresses, to documents, to arbitrary data. Responsibility for maintaining the mapping from keys to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. This allows a DHT to scale to extremely large numbers of nodes and to handle continual node arrivals, departures, and failures.

In computing, Chord is a protocol and algorithm for a peer-to-peer distributed hash table. A distributed hash table stores key-value pairs by assigning keys to different computers ; a node will store the values for all the keys for which it is responsible. Chord specifies how keys are assigned to nodes, and how a node can discover the value for a given key by first locating the node responsible for that key.

In distributed data storage, a P-Grid is a self-organizing structured peer-to-peer system, which can accommodate arbitrary key distributions, still providing storage load-balancing and efficient search by using randomized routing.

An overlay network is a computer network that is layered on top of another network.

Clock synchronization is a topic in computer science and engineering that aims to coordinate otherwise independent clocks. Even when initially set accurately, real clocks will differ after some amount of time due to clock drift, caused by clocks counting time at slightly different rates. There are several problems that occur as a result of clock rate differences and several solutions, some being more acceptable than others in certain contexts.

Scott J. Shenker is an American computer scientist, and professor of computer science at the University of California, Berkeley. He is also the leader of the Extensible Internet Group at the International Computer Science Institute in Berkeley, California.

Delay-tolerant networking (DTN) is an approach to computer network architecture that seeks to address the technical issues in heterogeneous networks that may lack continuous network connectivity. Examples of such networks are those operating in mobile or extreme terrestrial environments, or planned networks in space.

<span class="mw-page-title-main">Computer network</span> Network that allows computers to share resources and communicate with each other

A computer network is a set of computers sharing resources located on or provided by network nodes. Computers use common communication protocols over digital interconnections to communicate with each other. These interconnections are made up of telecommunication network technologies based on physically wired, optical, and wireless radio-frequency methods that may be arranged in a variety of network topologies.

Pastry is an overlay network and routing network for the implementation of a distributed hash table (DHT) similar to Chord. The key–value pairs are stored in a redundant peer-to-peer network of connected Internet hosts. The protocol is bootstrapped by supplying it with the IP address of a peer already in the network and from then on via the routing table which is dynamically built and repaired. It is claimed that because of its redundant and decentralized nature there is no single point of failure and any single node can leave the network at any time without warning and with little or no chance of data loss. The protocol is also capable of using a routing metric supplied by an outside program, such as ping or traceroute, to determine the best routes to store in its routing table.

M. Dale Skeen is an American computer scientist. He specializes in designing and implementing large-scale computing systems, distributed computing and database management systems.

A fundamental problem in distributed computing and multi-agent systems is to achieve overall system reliability in the presence of a number of faulty processes. This often requires coordinating processes to reach consensus, or agree on some data value that is needed during computation. Example applications of consensus include agreeing on what transactions to commit to a database in which order, state machine replication, and atomic broadcasts. Real-world applications often requiring consensus include cloud computing, clock synchronization, PageRank, opinion formation, smart power grids, state estimation, control of UAVs, load balancing, blockchain, and others.

Virgil Dorin Gligor is a Romanian-American professor of electrical and computer engineering who specializes in the research of network security and applied cryptography.

Peer-to-peer SIP (P2P-SIP) is an implementation of a distributed voice over Internet Protocol (VoIP) or instant messaging communications application using a peer-to-peer (P2P) architecture in which session control between communication end points is facilitated with the Session Initiation Protocol (SIP).

A distributed file system for cloud is a file system that allows many clients to have access to data and supports operations on that data. Each data file may be partitioned into several parts called chunks. Each chunk may be stored on different remote machines, facilitating the parallel execution of applications. Typically, data is stored in files in a hierarchical tree, where the nodes represent directories. There are several ways to share files in a distributed architecture: each solution must be suitable for a certain type of application, depending on how complex the application is. Meanwhile, the security of the system must be ensured. Confidentiality, availability and integrity are the main keys for a secure system.

Social cloud computing, also peer-to-peer social cloud computing, is an area of computer science that generalizes cloud computing to include the sharing, bartering and renting of computing resources across peers whose owners and operators are verified through a social network or reputation system. It expands cloud computing past the confines of formal commercial data centers operated by cloud providers to include anyone interested in participating within the cloud services sharing economy. This in turn leads to more options, greater economies of scale, while bearing additional advantages for hosting data and computing services closer to the edge where they may be needed most.

Rachid Guerraoui is a Moroccan-Swiss computer scientist and a professor at the School of Computer and Communication Sciences at École Polytechnique Fédérale de Lausanne (EPFL), known for his contributions in the fields of concurrent and distributed computing. He is an ACM Fellow and the Chair in Informatics and Computational Science for the year 2018–2019 at Collège de France for distributed computing.

References

  1. Demers, Alan; Greene, Dan; Hauser, Carl; Irish, Wes; Larson, John (1987). "Epidemic algorithms for replicated database maintenance". Proceedings of the sixth annual ACM Symposium on Principles of distributed computing - PODC '87. pp. 1–12. doi:10.1145/41840.41841. ISBN   978-0-89791-239-6. OCLC   8876960204. S2CID   1889203.
  2. Jelasity, Márk (2011-01-01). "Gossip" (PDF). In Serugendo, Giovanna Di Marzo; Gleizes, Marie-Pierre; Karageorgos, Anthony (eds.). Self-organising Software. Natural Computing Series. Springer Berlin Heidelberg. pp. 139–162. doi:10.1007/978-3-642-17348-6_7. ISBN   978-3-642-17347-9. S2CID   214970849.

Further reading