Reliable multicast

Last updated January 28, 2024

A reliable multicast is any computer networking protocol that provides a reliable sequence of packets to multiple recipients simultaneously, making it suitable for applications such as multi-receiver file transfer.

Overview

Multicast is a network addressing method for the delivery of information to a group of destinations simultaneously using the most efficient strategy to deliver the messages over each link of the network only once, creating copies only when the links to the multiple destinations split (typically network switches and routers). However, like the User Datagram Protocol, multicast does not guarantee the delivery of a message stream. Messages may be dropped, delivered multiple times, or delivered out of order. A reliable multicast protocol adds the ability for receivers to detect lost and/or out-of-order messages and take corrective action (similar in principle to TCP), resulting in a gap-free, in-order message stream.

Reliability

The exact meaning of reliability depends on the specific protocol instance. A minimal definition of reliable multicast is eventual delivery of all the data to all the group members, without enforcing any particular delivery order.^[1] However, not all reliable multicast protocols ensure this level of reliability; many of them trade efficiency for reliability, in different ways. For example, while TCP makes the sender responsible for transmission reliability, multicast NAK-based protocols shift the responsibility to receivers: the sender never knows for sure that all the receivers have in fact received all the data.^[2] RFC- 2887 explores the design space for bulk data transfer, with a brief discussion on the various issues and some hints at the possible different meanings of reliable.

Reliable Group Data Delivery

Reliable Group Data Delivery (RGDD) is a form of multicasting where an object is to be moved from a single source to a fixed set of receivers known before transmission begins.^[3]^[4] A variety of applications may need such delivery: Hadoop Distributed File System (HDFS) replicates any chunk of data two additional times to specific servers, VM replication to multiple servers may be required for scale out of applications and data replication to multiple servers may be necessary for load balancing by allowing multiple servers to serve the same data from their local cached copies. Such delivery is frequent within datacenters due to plethora of servers communicating while running highly distributed applications.

RGDD may also occur across datacenters and is sometimes referred to as inter-datacenter Point to Multipoint (P2MP) Transfers.^[5] Such transfers deliver huge volumes of data from one datacenter to multiple datacenters for various applications: search engines distribute search index updates periodically (e.g. every 24 hours), social media applications push new content to many cache locations across the world (e.g. YouTube and Facebook), and backup services make several geographically dispersed copies for increased fault tolerance. To maximize bandwidth utilization and reduce completion times of bulk transfers, a variety of techniques have been proposed for selection of multicast forwarding trees.^[5]^[6]

Virtual synchrony

Modern systems like the Spread Toolkit, Quicksilver, and Corosync can achieve data rates of 10,000 multicasts per second or more, and can scale to large networks with huge numbers of groups or processes.

Most distributed computing platforms support one or more of these models. For example, the widely supported object-oriented CORBA platforms all support transactions and some CORBA products support transactional replication in the one-copy-serializability model. The "CORBA Fault Tolerant Objects standard" is based on the virtual synchrony model. Virtual synchrony was also used in developing the New York Stock Exchange fault-tolerance architecture, the French Air Traffic Control System, the US Navy AEGIS system, IBM's Business Process replication architecture for WebSphere and Microsoft's Windows Clustering architecture for Windows Longhorn enterprise servers.^[7]

Systems that support virtual synchrony

Virtual synchrony was first supported by the Cornell University and was called the "Isis Toolkit".^[8] Cornell's most current version, Vsync was released in 2013 under the name Isis2 (the name was changed from Isis2 to Vsync in 2015 in the wake of a terrorist attack in Paris by an extremist organization called ISIS), with periodic updates and revisions since that time. The most current stable release is V2.2.2020; it was released on November 14, 2015; the V2.2.2048 release is currently available in Beta form.^[9] Vsync aims at the massive data centers that support cloud computing.

Other such systems include the Horus system^[10] the Transis system, the Totem system, an IBM system called Phoenix, a distributed security key management system called Rampart, the "Ensemble system",^[11] the Quicksilver system, "The OpenAIS project",^[12] its derivative the Corosync Cluster Engine and a number of products (including the IBM and Microsoft ones mentioned earlier).

Other existing or proposed protocols

Pragmatic General Multicast (PGM)
Tibco Software's TRDP (part of RV). Note: when Tibco acquired Talarian, they inherited a PGM implementation with SmartSockets (SmartPGM). TRDP pre-dates the development of SmartPGM
OpenDDS as an open source implementation since their 0.12 release
Scalable Reliable Multicast (SRM)
Reliable Multicast Transport To Large Groups, Jorg Nonnenmacher, EPFL Thesis 1832^[13]
QuickSilver Scalable Multicast (QSM)
SMART Multicast (Secure Multicast for Advanced Repeating of Television)
Reliable Stream Protocol^[14] (RSP), a high-performance open source protocol for compute clusters
TIPC Communication Groups

Library support

JGroups (Java API): popular project/implementation
Spread: C/C++ API, Java API
RMF (C# API)
hmbdc open source (headers only) C++ middleware, ultra-low latency/high throughput, scalable and reliable inter-thread, IPC and network messaging

Related Research Articles

<span class="mw-page-title-main">Multicast</span> Computer networking technique for transmission from one sender to multiple receivers

In computer networking, multicast is group communication where data transmission is addressed to a group of destination computers simultaneously. Multicast can be one-to-many or many-to-many distribution. Multicast should not be confused with physical layer point-to-multipoint communication.

In distributed computing, a remote procedure call (RPC) is when a computer program causes a procedure (subroutine) to execute in a different address space, which is written as if it were a normal (local) procedure call, without the programmer explicitly writing the details for the remote interaction. That is, the programmer writes essentially the same code whether the subroutine is local to the executing program, or remote. This is a form of client–server interaction, typically implemented via a request–response message-passing system. In the object-oriented programming paradigm, RPCs are represented by remote method invocation (RMI). The RPC model implies a level of location transparency, namely that calling procedures are largely the same whether they are local or remote, but usually, they are not identical, so local calls can be distinguished from remote calls. Remote calls are usually orders of magnitude slower and less reliable than local calls, so distinguishing them is important.

Message-oriented middleware (MOM) is software or hardware infrastructure supporting sending and receiving messages between distributed systems. MOM allows application modules to be distributed over heterogeneous platforms and reduces the complexity of developing applications that span multiple operating systems and network protocols. The middleware creates a distributed communications layer that insulates the application developer from the details of the various operating systems and network interfaces. APIs that extend across diverse platforms and networks are typically provided by MOM.

A content delivery network, or content distribution network (CDN), is a geographically distributed network of proxy servers and their data centers. The goal is to provide high availability and performance by distributing the service spatially relative to end users. CDNs came into existence in the late 1990s as a means for alleviating the performance bottlenecks of the Internet as the Internet was starting to become a mission-critical medium for people and enterprises. Since then, CDNs have grown to serve a large portion of the Internet content today, including web objects, downloadable objects, applications, live streaming media, on-demand streaming media, and social media sites.

An overlay network is a computer network that is layered on top of another network. The concept of overlay networking is distinct from the traditional model of OSI layered networks, and almost always assumes that the underlay network is an IP network of some kind.

In software architecture, publish–subscribe is a messaging pattern where publishers categorize messages into classes that are received by subscribers. This is contrasted to the typical messaging pattern model where publishers send messages directly to subscribers.

Push technology, also known as server push, refers to a method of communication on the Internet where the initial request for a transaction is initiated by the server, rather than the client. This approach is different from the more commonly known "pull" method, where information transmission is requested by the receiver or client.

IP multicast is a method of sending Internet Protocol (IP) datagrams to a group of interested receivers in a single transmission. It is the IP-specific form of multicast and is used for streaming media and other network applications. It uses specially reserved multicast address blocks in IPv4 and IPv6.

Replication in computing involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.

In computer science, state machine replication (SMR) or state machine approach is a general method for implementing a fault-tolerant service by replicating servers and coordinating client interactions with server replicas. The approach also provides a framework for understanding and designing replication management protocols.

Pragmatic General Multicast (PGM) is a reliable multicast computer network transport protocol. PGM provides a reliable sequence of packets to multiple recipients simultaneously, making it suitable for applications like multi-receiver file-transfer.

WAN optimization is a collection of techniques for improving data transfer across wide area networks (WANs). In 2008, the WAN optimization market was estimated to be $1 billion, and was to grow to $4.4 billion by 2014 according to Gartner, a technology research firm. In 2015 Gartner estimated the WAN optimization market to be a $1.1 billion market.

Paxos is a family of protocols for solving consensus in a network of unreliable or fallible processors. Consensus is the process of agreeing on one result among a group of participants. This problem becomes difficult when the participants or their communications may experience failures.

Fault Tolerant Messaging in the context of computer systems and networks, refers to a design approach and set of techniques aimed at ensuring reliable and continuous communication between components or nodes even in the presence of errors or failures. This concept is especially critical in distributed systems, where components may be geographically dispersed and interconnected through networks, making them susceptible to various potential points of failure.

A gossip protocol or epidemic protocol is a procedure or process of computer peer-to-peer communication that is based on the way epidemics spread. Some distributed systems use peer-to-peer gossip to ensure that data is disseminated to all members of a group. Some ad-hoc networks have no central registry and the only way to spread common data is to rely on each member to pass it along to their neighbors.

Live distributed object refers to a running instance of a distributed multi-party protocol, viewed from the object-oriented perspective, as an entity that has a distinct identity, may encapsulate internal state and threads of execution, and that exhibits a well-defined externally visible behavior.

Software-defined networking (SDN) technology is an approach to network management that enables dynamic, programmatically efficient network configuration to improve network performance and monitoring, in a manner more akin to cloud computing than to traditional network management. SDN is meant to address the static architecture of traditional networks and may be employed to centralize network intelligence in one network component by disassociating the forwarding process of network packets from the routing process. The control plane consists of one or more controllers, which are considered the brains of the SDN network, where the whole intelligence is incorporated. However, centralization has certain drawbacks related to security, scalability and elasticity.

Gbcast is a reliable multicast protocol that provides ordered, fault-tolerant (all-or-none) message delivery in a group of receivers within a network of machines that experience crash failure. The protocol is capable of solving Consensus in a network of unreliable processors, and can be used to implement state machine replication. Gbcast can be used in a standalone manner, or can support the virtual synchrony execution model, in which case Gbcast is normally used for group membership management while other, faster, protocols are often favored for routine communication tasks.

Kenneth P. Birman is a professor in the Department of Computer Science at Cornell University. He currently holds the N. Rama Rao Chair in Computer Science.

The Vsync software library is a BSD-licensed open source library written in C# for the .NET platform, providing a wide variety of primitives for fault-tolerant distributed computing, including: state machine replication, virtual synchrony process groups, atomic broadcast with several levels of ordering and durability, a distributed lock manager, persistent replicated data, a distributed key-value store, and scalable aggregation. The system implements the virtual synchrony execution model, and includes an implementation of Leslie Lamport's Paxos Protocol.

References

↑ Floyd, S.; Jacobson, V.; Liu, C. -G.; McCanne, S.; Zhang, L. (December 1997). "A reliable multicast framework for light-weight sessions and application level framing". IEEE/ACM Transactions on Networking. 5 (6): 784–803. doi:10.1109/90.650139. S2CID 221634489.
↑ Diot, C.; Dabbous, W.; Crowcroft, J. (April 1997). "Multipoint communication: A survey of protocols, functions, and mechanisms" (PDF). IEEE Journal on Selected Areas in Communications. 15 (3): 277–290. doi:10.1109/49.564128.
↑ C. Guo; et al. (November 1, 2012). "Datacast: A Scalable and Efficient Reliable Group Data Delivery Service For Data Centers". ACM. Retrieved July 26, 2017.
↑ T. Zhu; et al. (Oct 18, 2016). "MCTCP: Congestion-aware and robust multicast TCP in Software-Defined networks". 2016 IEEE/ACM 24th International Symposium on Quality of Service (IWQoS). IEEE. pp. 1–10. doi:10.1109/IWQoS.2016.7590433. ISBN 978-1-5090-2634-0. S2CID 28159768.
1 2 M. Noormohammadpour; et al. (July 10, 2017). "DCCast: Efficient Point to Multipoint Transfers Across Datacenters". USENIX. Retrieved July 26, 2017.
↑ M. Noormohammadpour; et al. (2018). "QuickCast: Fast and Efficient Inter-Datacenter Transfers using Forwarding Tree Cohorts" . Retrieved January 23, 2018.
↑ K. P. Birman (July 1999). "A Review of Experiences with Reliable Multicast". Software: Practice and Experience. 29 (9): 741–774. doi:10.1002/(SICI)1097-024X(19990725)29:9<741::AID-SPE259>3.0.CO;2-I. hdl: 1813/7380 .
↑ "Isis Toolkit"
↑ "Vsync Cloud Computing Library".
↑ "Horus system"
↑ "Ensemble system"
↑ "The OpenAIS project"
↑ https://infoscience.epfl.ch/record/32309
↑ RSP; info needed.