Vsync (computing)

Last updated

The Vsync software library is a BSD-licensed open source library written in C# for the .NET platform, providing a wide variety of primitives for fault-tolerant distributed computing, including: state machine replication, virtual synchrony process groups, atomic broadcast with several levels of ordering and durability, a distributed lock manager, persistent replicated data, a distributed key-value store (also called a Distributed Hash Table or DHT), and scalable aggregation. The system implements the virtual synchrony execution model, and includes an implementation of Leslie Lamport's Paxos Protocol.

The main author is Ken Birman, a Professor of Computer Science at Cornell University, and it is the fourth in a series of Cornell-developed software libraries for reliable multicast. The first was the Isis Toolkit, developed in 1985 and ultimately used in the New York Stock Exchange, the French Air Traffic Control System, the US Navy AEGIS and other settings. [1]

Subsequent generations of the technology included the Horus System [2] and the Ensemble System. [3]

Vsync was originally released as "Isis2" in 2010, but Birman changed the name of the package in order to avoid similarity of the name to ISIS. The name Vsync is a reference to the formal model used by the system, namely virtual synchrony.

Related Research Articles

Checkpointing is a technique that provides fault tolerance for computing systems. It basically consists of saving a snapshot of the application's state, so that applications can restart from that point in case of failure. This is particularly important for long running applications that are executed in failure-prone computing systems.

Amoeba is a distributed operating system developed by Andrew S. Tanenbaum and others at the Vrije Universiteit Amsterdam. The aim of the Amoeba project was to build a timesharing system that makes an entire network of computers appear to the user as a single machine. Development at the Vrije Universiteit was stopped: the source code of the latest version (5.3) was last modified on 30 July 1996.

<span class="mw-page-title-main">Concurrency (computer science)</span> Ability to execute a task in a non-serial manner

In computer science, concurrency is the ability of different parts or units of a program, algorithm, or problem to be executed out-of-order or in partial order, without affecting the outcome. This allows for parallel execution of the concurrent units, which can significantly improve overall speed of the execution in multi-processor and multi-core systems. In more technical terms, concurrency refers to the decomposability of a program, algorithm, or problem into order-independent or partially-ordered components or units of computation.

In software architecture, publish–subscribe is a messaging pattern where publishers categorize messages into classes that are received by subscribers. This is contrasted to the typical messaging pattern model where publishers send messages directly to subscribers.

Scott J. Shenker is an American computer scientist, and professor of computer science at the University of California, Berkeley. He is also the leader of the Extensible Internet Group at the International Computer Science Institute in Berkeley, California.

Replication in computing involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.

In computer science, state machine replication (SMR) or state machine approach is a general method for implementing a fault-tolerant service by replicating servers and coordinating client interactions with server replicas. The approach also provides a framework for understanding and designing replication management protocols.

Paxos is a family of protocols for solving consensus in a network of unreliable or fallible processors. Consensus is the process of agreeing on one result among a group of participants. This problem becomes difficult when the participants or their communications may experience failures.

<span class="mw-page-title-main">Özalp Babaoğlu</span> Turkish computer scientist (born 1955)

Özalp Babaoğlu, is a Turkish computer scientist. He is currently professor of computer science at the University of Bologna, Italy. He received a Ph.D. in 1981 from the University of California at Berkeley. He is the recipient of 1982 Sakrison Memorial Award, 1989 UNIX InternationalRecognition Award and 1993 USENIX AssociationLifetime Achievement Award for his contributions to the UNIX system community and to Open Industry Standards. Before moving to Bologna in 1988, Babaoğlu was an associate professor in the Department of Computer Science at Cornell University. He has participated in several European research projects in distributed computing and complex systems. Babaoğlu is an ACM Fellow and has served as a resident fellow of the Institute of Advanced Studies at the University of Bologna and on the editorial boards for ACM Transactions on Computer Systems, ACM Transactions on Autonomous and Adaptive Systems and Springer-Verlag Distributed Computing.

<span class="mw-page-title-main">Werner Vogels</span> American computer scientist

Werner Hans Peter Vogels is the chief technology officer and vice president of Amazon in charge of driving technology innovation within the company. Vogels has broad internal and external responsibilities.

A gossip protocol or epidemic protocol is a procedure or process of computer peer-to-peer communication that is based on the way epidemics spread. Some distributed systems use peer-to-peer gossip to ensure that data is disseminated to all members of a group. Some ad-hoc networks have no central registry and the only way to spread common data is to rely on each member to pass it along to their neighbors.

A reliable multicast is any computer networking protocol that provides a reliable sequence of packets to multiple recipients simultaneously, making it suitable for applications such as multi-receiver file transfer.

<span class="mw-page-title-main">Live distributed object</span>

Live distributed object refers to a running instance of a distributed multi-party protocol, viewed from the object-oriented perspective, as an entity that has a distinct identity, may encapsulate internal state and threads of execution, and that exhibits a well-defined externally visible behavior.

The Corosync Cluster Engine is an open source implementation of the Totem Single Ring Ordering and Membership protocol. It was originally derived from the OpenAIS project and licensed under the new BSD License. The mission of the Corosync effort is to develop, release, and support a community-defined, open source cluster.

A distributed operating system is system software over a collection of independent software, networked, communicating, and physically separate computational nodes. They handle jobs which are serviced by multiple CPUs. Each individual node holds a specific software subset of the global aggregate operating system. Each subset is a composite of two distinct service provisioners. The first is a ubiquitous minimal kernel, or microkernel, that directly controls that node's hardware. Second is a higher-level collection of system management components that coordinate the node's individual and collaborative activities. These components abstract microkernel functions and support user applications.

Gbcast is a reliable multicast protocol that provides ordered, fault-tolerant (all-or-none) message delivery in a group of receivers within a network of machines that experience crash failure. The protocol is capable of solving Consensus in a network of unreliable processors, and can be used to implement state machine replication. Gbcast can be used in a standalone manner, or can support the virtual synchrony execution model, in which case Gbcast is normally used for group membership management while other, faster, protocols are often favored for routine communication tasks.

Kenneth P. Birman is a professor in the Department of Computer Science at Cornell University. He currently holds the N. Rama Rao Chair in Computer Science.

A distributed file system for cloud is a file system that allows many clients to have access to data and supports operations on that data. Each data file may be partitioned into several parts called chunks. Each chunk may be stored on different remote machines, facilitating the parallel execution of applications. Typically, data is stored in files in a hierarchical tree, where the nodes represent directories. There are several ways to share files in a distributed architecture: each solution must be suitable for a certain type of application, depending on how complex the application is. Meanwhile, the security of the system must be ensured. Confidentiality, availability and integrity are the main keys for a secure system.

Sanjay Ghemawat is an Indian American computer scientist and software engineer. He is currently a Senior Fellow at Google in the Systems Infrastructure Group. Ghemawat's work at Google, much of it in close collaboration with Jeff Dean, has included big data processing model MapReduce, the Google File System, and databases Bigtable and Spanner. Wired have described him as one of the "most important software engineers of the internet age".

References

  1. Ken Birman (2010). "A history of the virtual synchrony replication model". In Charron-Bost, Bernadette; Pedone, Fernando; Schiper, André (eds.). Replication (PDF). Berlin, Heidelberg: Springer-Verlag. pp. 91–120.
  2. Robbert Van Renesse, Silvio Maffeis and Ken Birman (April 1996). "Horus: A Flexible Group Communications System". Communications of the ACM. 39 (4): 76–83. doi: 10.1145/227210.227229 . S2CID   1400110.
  3. Xiaoming Liu; Christoph Kreitz; Robbert van Renesse; Jason Hickey; Mark Hayden; Ken Birman & Robert Constable. (December 1999). "Building Reliable, High-Performance Communication Systems from Components. In Proc. of the 17th ACM Symposium on Operating System Principles, Kiawah Island Resort, SC" (PDF).