Heartbeat (computing)

Last updated

In computer science, a heartbeat is a periodic signal generated by hardware or software to indicate normal operation or to synchronize other parts of a computer system. [1] [2] Heartbeat mechanism is one of the common techniques in mission critical systems for providing high availability and fault tolerance of network services by detecting the network or systems failures of nodes or daemons which belongs to a network cluster—administered by a master server—for the purpose of automatic adaptation and rebalancing of the system by using the remaining redundant nodes on the cluster to take over the load of failed nodes for providing constant services. [3] [1] Usually a heartbeat is sent between machines at a regular interval in the order of seconds; a heartbeat message. [4] If the endpoint does not receive a heartbeat for a time—usually a few heartbeat intervals—the machine that should have sent the heartbeat is assumed to have failed. [5] Heartbeat messages are typically sent non-stop on a periodic or recurring basis from the originator's start-up until the originator's shutdown. When the destination identifies a lack of heartbeat messages during an anticipated arrival period, the destination may determine that the originator has failed, shutdown, or is generally no longer available.

Contents

Heartbeat protocol

A heartbeat protocol is generally used to negotiate and monitor the availability of a resource, such as a floating IP address, and the procedure involves sending network packets to all the nodes in the cluster to verify its reachability. [3] Typically when a heartbeat starts on a machine, it will perform an election process with other machines on the heartbeat network to determine which machine, if any, owns the resource. On heartbeat networks of more than two machines, it is important to take into account partitioning, where two halves of the network could be functioning but not able to communicate with each other. In a situation such as this, it is important that the resource is only owned by one machine, not one machine in each partition.

As a heartbeat is intended to be used to indicate the health of a machine, it is important that the heartbeat protocol and the transport that it runs on are as reliable as possible. Causing a failover because of a false alarm may, depending on the resource, be highly undesirable. It is also important to react quickly to an actual failure, further signifying the reliability of the heartbeat messages. For this reason, it is often desirable to have a heartbeat running over more than one transport; for instance, an Ethernet segment using UDP/IP, and a serial link.

A "cluster membership" of a node is a property of network reachability: if the master can communicate with the node , it's considered a member of the cluster and "dead" otherwise. [6] A heartbeat program as a whole consist of various subsystems: [7]

Heartbeat messages are sent in a periodic manner through techniques such as broadcast or multicasts in larger clusters. [6] Since CMs have transactions across the cluster, the most common pattern is to send heartbeat messages to all the nodes and "await" responses in non-blocking fashion. [8] Since the heartbeat or keepalive messages are the overwhelming majority of non-application related cluster control messages—which also goes to all the members of the cluster—major critical systems also include non-IP protocols like serial ports to deliver heartbeats. [9]

Design and implementation

Every CM on the master server maintains a finite-state machine with three states for each node it administers: Down, Init, and Alive. [10] Whenever a new node joins, the CM changes the state of the node from Down to Init and broadcasts a "boot-up message", which the node receives the executes set of start-up procedures. It then responses with an acknowledgment message, CM then includes the node as the member of the cluster and transitions the state of the node from Init to Alive. Every node in the Alive state would receive a periodic broadcast heartbeat message from the HS subsystem and expects an acknowledgment message back within a timeout range. If CM didn't receive an acknowledgment heartbeat message back, the node is considered unavailable, and a state transition from Alive to Down takes place for that node by CM. [11] The procedures or scripts to run, and actions to take between each state transition is an implementation detail of the system.

Heartbeat network

Heartbeat network is a private network which is shared only by the nodes in the cluster, and is not accessible from outside the cluster. It is used by cluster nodes in order to monitor each node's status and communicate with each other messages necessary for maintaining the operation of the cluster. The heartbeat method uses the FIFO nature of the signals sent across the network. By making sure that all messages have been received, the system ensures that events can be properly ordered. [12]

In this communications protocol every node sends back a message in a given interval, say delta, in effect confirming that it is alive and has a heartbeat. These messages are viewed as control messages that help determine that the network includes no delayed messages. A receiver node called a "sync", maintains an ordered list of the received messages. Once a message with a timestamp later than the given marked time is received from every node, the system determines that all messages have been received since the FIFO property ensures that the messages are ordered. [13]

In general, it is difficult to select a delta that is optimal for all applications. If delta is too small, it requires too much overhead and if it is large it results in performance degradation as everything waits for the next heartbeat signal. [14]

See also

Notes

  1. 1 2 Hou & Huang 2003, p. 1.
  2. "Definition of Heartbeat". pcmag.com Encyclopedia. Retrieved 7 October 2020.
  3. 1 2 Robertson 2000, p. 1.
  4. US 4710926,Donald W. Brown, James W. Leth, James E. Vandendorpe,"Fault recovery in a distributed processing system",published 1987-12-01
  5. Kawazoe Aguilera, Marcos; Chen, Wei; Toueg, Sam (1997). "Heartbeat: A timeout-free failure detector for quiescent reliable communication" (PDF). Distributed Algorithms. Berlin, Heidelberg: Springer Berlin Heidelberg. pp. 126–140. doi:10.1007/bfb0030680. hdl:1813/7286. ISBN   978-3-540-63575-8. ISSN   0302-9743.
  6. 1 2 Robertson 2000, p. 2.
  7. Robertson 2000, p. 1-2.
  8. Robertson 2000, p. 2-3.
  9. Robertson 2000, p. 5.
  10. Li, Yu & Wu 2009, p. 2.
  11. Li, Yu & Wu 2009, p. 2-3.
  12. Nikoletseas 2011, p. 304.
  13. Nikoletseas 2011, p. 304-305.
  14. Nikoletseas 2011, p. 306.

Related Research Articles

<span class="mw-page-title-main">OSI model</span> Model of communication of seven abstraction layers

The Open Systems Interconnection (OSI) model is a reference model from the International Organization for Standardization (ISO) that "provides a common basis for the coordination of standards development for the purpose of systems interconnection." In the OSI reference model, the communications between systems are split into seven different abstraction layers: Physical, Data Link, Network, Transport, Session, Presentation, and Application.

<span class="mw-page-title-main">Border Gateway Protocol</span> Protocol for communicating routing information on the Internet

Border Gateway Protocol (BGP) is a standardized exterior gateway protocol designed to exchange routing and reachability information among autonomous systems (AS) on the Internet. BGP is classified as a path-vector routing protocol, and it makes routing decisions based on paths, network policies, or rule-sets configured by a network administrator.

NetBIOS is an acronym for Network Basic Input/Output System. It provides services related to the session layer of the OSI model allowing applications on separate computers to communicate over a local area network. As strictly an API, NetBIOS is not a networking protocol. Operating systems of the 1980s ran NetBIOS over IEEE 802.2 and IPX/SPX using the NetBIOS Frames (NBF) and NetBIOS over IPX/SPX (NBX) protocols, respectively. In modern networks, NetBIOS normally runs over TCP/IP via the NetBIOS over TCP/IP (NBT) protocol. NetBIOS is also used for identifying system names in TCP/IP (Windows).

The end-to-end principle is a design framework in computer networking. In networks designed according to this principle, guaranteeing certain application-specific features, such as reliability and security, requires that they reside in the communicating end nodes of the network. Intermediary nodes, such as gateways and routers, that exist to establish the network, may implement these to improve efficiency but cannot guarantee end-to-end correctness.

<span class="mw-page-title-main">GNUnet</span> Framework for decentralized, peer-to-peer networking which is part of the GNU Project

GNUnet is a software framework for decentralized, peer-to-peer networking and an official GNU package. The framework offers link encryption, peer discovery, resource allocation, communication over many transports and various basic peer-to-peer algorithms for routing, multicast and network size estimation.

The Resource Reservation Protocol (RSVP) is a transport layer protocol designed to reserve resources across a network using the integrated services model. RSVP operates over an IPv4 or IPv6 and provides receiver-initiated setup of resource reservations for multicast or unicast data flows. It does not transport application data but is similar to a control protocol, like Internet Control Message Protocol (ICMP) or Internet Group Management Protocol (IGMP). RSVP is described in RFC 2205.

The GPRS core network is the central part of the general packet radio service (GPRS) which allows 2G, 3G and WCDMA mobile networks to transmit Internet Protocol (IP) packets to external networks such as the Internet. The GPRS system is an integrated part of the GSM network switching subsystem.

An overlay network is a computer network that is layered on top of another network. The concept of overlay networking is distinct from the traditional model of OSI layered networks, and almost always assumes that the underlay network is an IP network of some kind.

The IP Multimedia Subsystem or IP Multimedia Core Network Subsystem (IMS) is a standardised architectural framework for delivering IP multimedia services. Historically, mobile phones have provided voice call services over a circuit-switched-style network, rather than strictly over an IP packet-switched network. Various voice over IP technologies are available on smartphones; IMS provides a standard protocol across vendors.

A keepalive (KA) is a message sent by one device to another to check that the link between the two is operating, or to prevent the link from being broken.

The Time-Triggered Protocol (TTP) is an open computer network protocol for control systems. It was designed as a time-triggered fieldbus for vehicles and industrial applications. and standardized in 2011 as SAE AS6003. TTP controllers have accumulated over 500 million flight hours in commercial DAL A aviation application, in power generation, environmental and flight controls. TTP is used in FADEC and modular aerospace controls, and flight computers. In addition, TTP devices have accumulated over 1 billion operational hours in SIL4 railway signalling applications.

CANopen is a communication protocol stack and device profile specification for embedded systems used in automation. In terms of the OSI model, CANopen implements the layers above and including the network layer. The CANopen standard consists of an addressing scheme, several small communication protocols and an application layer defined by a device profile. The communication protocols have support for network management, device monitoring and communication between nodes, including a simple transport layer for message segmentation/desegmentation. The lower level protocol implementing the data link and physical layers is usually Controller Area Network (CAN), although devices using some other means of communication can also implement the CANopen device profile.

Pastry is an overlay network and routing network for the implementation of a distributed hash table (DHT) similar to Chord. The key–value pairs are stored in a redundant peer-to-peer network of connected Internet hosts. The protocol is bootstrapped by supplying it with the IP address of a peer already in the network and from then on via the routing table which is dynamically built and repaired. It is claimed that because of its redundant and decentralized nature there is no single point of failure and any single node can leave the network at any time without warning and with little or no chance of data loss. The protocol is also capable of using a routing metric supplied by an outside program, such as ping or traceroute, to determine the best routes to store in its routing table.

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still the common use. It has since also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

JGroups is a library for reliable one-to-one or one-to-many communication written in the Java language.

Fault Tolerant Messaging in the context of computer systems and networks, refers to a design approach and set of techniques aimed at ensuring reliable and continuous communication between components or nodes even in the presence of errors or failures. This concept is especially critical in distributed systems, where components may be geographically dispersed and interconnected through networks, making them susceptible to various potential points of failure.

A VMScluster, originally known as a VAXcluster, is a computer cluster involving a group of computers running the OpenVMS operating system. Whereas tightly coupled multiprocessor systems run a single copy of the operating system, a VMScluster is loosely coupled: each machine runs its own copy of OpenVMS, but the disk storage, lock manager, and security domain are all cluster-wide, providing a single system image abstraction. Machines can join or leave a VMScluster without affecting the rest of the cluster. For enhanced availability, VMSclusters support the use of dual-ported disks connected to two machines or storage controllers simultaneously.

<span class="mw-page-title-main">Raft (algorithm)</span> Consensus algorithm

Raft is a consensus algorithm designed as an alternative to the Paxos family of algorithms. It was meant to be more understandable than Paxos by means of separation of logic, but it is also formally proven safe and offers some additional features. Raft offers a generic way to distribute a state machine across a cluster of computing systems, ensuring that each node in the cluster agrees upon the same series of state transitions. It has a number of open-source reference implementations, with full-specification implementations in Go, C++, Java, and Scala. It is named after Reliable, Replicated, Redundant, And Fault-Tolerant.

Port Control Protocol (PCP) is a computer networking protocol that allows hosts on IPv4 or IPv6 networks to control how the incoming IPv4 or IPv6 packets are translated and forwarded by an upstream router that performs network address translation (NAT) or packet filtering. By allowing hosts to create explicit port forwarding rules, handling of the network traffic can be easily configured to make hosts placed behind NATs or firewalls reachable from the rest of the Internet, which is a requirement for many applications.

Reachability analysis is a solution to the reachability problem in the particular context of distributed systems. It is used to determine which global states can be reached by a distributed system which consists of a certain number of local entities that communicated by the exchange of messages.

References