Circuit breaker design pattern

Last updated

Circuit breaker is a design pattern used in software development. It is used to detect failures and encapsulates the logic of preventing a failure from constantly recurring, during maintenance, temporary external system failure or unexpected system difficulties. Circuit breaker pattern prevents cascading failures particularly in distributed systems. [1]

Contents

Circuit breaker pattern should be used along with other patterns such as retry, fallback and timeout pattern. This helps the system to be more fault tolerant. [2]

Common uses

Assume that an application connects to a database 100 times per second and the database fails. The application designer does not want to have the same error reoccur constantly. They also want to handle the error quickly and gracefully without waiting for TCP connection timeout.

Generally Circuit Breaker can be used to check the availability of an external service. An external service can be a database server or a web service used by the application.

Circuit breaker detects failures and prevents the application from trying to perform the action that is doomed to fail (until it's safe to retry).

Implementation

Implementations of the Circuit Breaker Design Pattern need to retain the state of the connection over a series of requests. It must offload the logic to detect failures from the actual requests. Therefore, the state machine within the circuit breaker needs to operate in some sense concurrently with the requests passing through it. One way this can be achieved is asynchronously.

In a multi-node (clustered) server, the state of the upstream service will need to be reflected across all the nodes in the cluster. Therefore, implementations may need to use a persistent storage layer, e.g. a network cache such as Memcached or Redis, or local cache (disk or memory based) to record the availability of what is, to the application, an external service.

Circuit Breaker records the state of the external service on a given interval.

Before the external service is used from the application, the storage layer is queried to retrieve the current state.

Performance implication

While it's safe to say that the benefits outweigh the consequences, implementing Circuit Breaker will negatively affect storage space, application complexity, and computational cost to the executing application. This is because it adds additional code into the execution path to check for the state of the circuit. This can be seen in the PHP example below, where checking APC for the database status costs a few extra cycles. Also, running the circuit breaker code itself consumes resources on the system where it is running, thus leaving less execution power for "real" applications.[ why? ]

By how much depends on the storage layer used and generally available resources. The largest factors in this regard are the type of cache, for example, disk-based vs. memory-based and local vs. network.

Different states of circuit breaker

Closed state

When everything is normal, the circuit breakers remained closed, and all the request passes through to the services as shown below. If the number of failures increases beyond the threshold, the circuit breaker trips and goes into an open state.

Circuit Breaker Closed State Circuit Breaker -Closed state.png
Circuit Breaker Closed State

Open state

In this state circuit breaker returns an error immediately without even invoking the services. The Circuit breakers move into the half-open state after a timeout period elapses. Usually, it will have a monitoring system where the timeout will be specified.

Circuit Breaker Open State Circuit Breaker -Openstate.png
Circuit Breaker Open State

Half-open state

In this state, the circuit breaker allows a limited number of requests from the service to pass through and invoke the operation. If the requests are successful, then the circuit breaker will go to the closed state. However, if the requests continue to fail, then it goes back to Open state.

Circuit Breaker Half Open State Circuit Breaker -Half Open state.png
Circuit Breaker Half Open State

Example implementation

PHP

The following is a sample implementation in PHP. The proof of concept stores the status of a MySQL server into a shared memory cache (APC User Cache).

Check

The following script could be run on a set interval through crontab.

$mysqli=newmysqli("localhost","username","password");if($mysqli->connect_error){apcu_add("dbStatus","down");}else{apcu_add("dbStatus","up");$mysqli->close();}

Usage in an application

if(apcu_fetch("dbStatus")==="down"){echo"The database server is currently not available. Please try again in a minute.";exit;}$mysqli=newmysqli("localhost","username","password","database");$result=$mysqli->query("SELECT * FROM table");

Related Research Articles

<span class="mw-page-title-main">Web server</span> Computer software that distributes web pages

A web server is computer software and underlying hardware that accepts requests via HTTP or its secure variant HTTPS. A user agent, commonly a web browser or web crawler, initiates communication by making a request for a web page or other resource using HTTP, and the server responds with the content of that resource or an error message. A web server can also accept and store resources sent from the user agent if configured to do so.

<span class="mw-page-title-main">Load balancing (computing)</span> Set of techniques to improve the distribution of workloads across multiple computing resources

In computing, load balancing is the process of distributing a set of tasks over a set of resources, with the aim of making their overall processing more efficient. Load balancing can optimize response time and avoid unevenly overloading some compute nodes while other compute nodes are left idle.

Memcached is a general-purpose distributed memory-caching system. It is often used to speed up dynamic database-driven websites by caching data and objects in RAM to reduce the number of times an external data source must be read. Memcached is free and open-source software, licensed under the Revised BSD license. Memcached runs on Unix-like operating systems and on Microsoft Windows. It depends on the libevent library.

In computer networking, localhost is a hostname that refers to the current computer used to access it. The name localhost is reserved for loopback purposes. It is used to access the network services that are running on the host via the loopback network interface. Using the loopback interface bypasses any local network interface hardware.

An edge case is a problem or situation that occurs only at an extreme operating parameter. For example, a stereo speaker might noticeably distort audio when played at maximum volume, even in the absence of any other extreme setting or condition.

REST is a software architectural style that was created to guide the design and development of the architecture for the World Wide Web. REST defines a set of constraints for how the architecture of a distributed, Internet-scale hypermedia system, such as the Web, should behave. The REST architectural style emphasises uniform interfaces, independent deployment of components, the scalability of interactions between them, and creating a layered architecture to promote caching to reduce user-perceived latency, enforce security, and encapsulate legacy systems.

Code injection is a class of computer security exploits in which a vulnerable computer program is tricked into misinterpreting external data as part of its code. An attacker thereby introduces code into the program and changes the course of its execution. The result of successful code injection can be disastrous, for example, by allowing computer viruses or computer worms to propagate.

<span class="mw-page-title-main">RapidIO</span> High-speed interconnect technology

The RapidIO architecture is a high-performance packet-switched electrical connection technology. It supports messaging, read/write and cache coherency semantics. Based on industry-standard electrical specifications such as those for Ethernet, RapidIO can be used as a chip-to-chip, board-to-board, and chassis-to-chassis interconnect.

<span class="mw-page-title-main">Dependency injection</span> Software programming technique

In software engineering, dependency injection is a programming technique in which an object or function receives other objects or functions that it requires, as opposed to creating them internally. Dependency injection aims to separate the concerns of constructing objects and using them, leading to loosely coupled programs. The pattern ensures that an object or function that wants to use a given service should not have to know how to construct those services. Instead, the receiving 'client' is provided with its dependencies by external code, which it is not aware of. Dependency injection makes implicit dependencies explicit and helps solve the following problems:

In computer science, consistent hashing is a special kind of hashing technique such that when a hash table is resized, only keys need to be remapped on average where is the number of keys and is the number of slots. In contrast, in most traditional hash tables, a change in the number of array slots causes nearly all keys to be remapped because the mapping between the keys and the slots is defined by a modular operation.

A Domain Name System (DNS) zone file is a text file that describes a DNS zone. A DNS zone is a subset, often a single domain, of the hierarchical domain name structure of the DNS. The zone file contains mappings between domain names and IP addresses and other resources, organized in the form of text representations of resource records (RR). A zone file may be either a DNS master file, authoritatively describing a zone, or it may be used to list the contents of a DNS cache.

Database caching is a process included in the design of computer applications which generate web pages on-demand (dynamically) by accessing backend databases.

Web2py is an open-source web application framework written in the Python programming language. Web2py allows web developers to program dynamic web content using Python. Web2py is designed to help reduce tedious web development tasks, such as developing web forms from scratch, although a web developer may build a form from scratch if required.

Gearman is an open-source application framework designed to distribute appropriate computer tasks to multiple computers, so large tasks can be done more quickly. In some cases, load balancing rather than raw speed may be the main goal; a Web server, for instance, could use Gearman to send tasks for which it is not optimized to another computer.

IBM WebSphere Application Server for z/OS is one of the platform implementations of IBM's WebSphere Application Server family. The latest version is Version 9.0.

Software fault tolerance is the ability of computer software to continue its normal operation despite the presence of system or hardware faults. Fault-tolerant software has the ability to satisfy requirements despite failures.

<span class="mw-page-title-main">Raft (algorithm)</span> Consensus algorithm

Raft is a consensus algorithm designed as an alternative to the Paxos family of algorithms. It was meant to be more understandable than Paxos by means of separation of logic, but it is also formally proven safe and offers some additional features. Raft offers a generic way to distribute a state machine across a cluster of computing systems, ensuring that each node in the cluster agrees upon the same series of state transitions. It has a number of open-source reference implementations, with full-specification implementations in Go, C++, Java, and Scala. It is named after Reliable, Replicated, Redundant, And Fault-Tolerant.

Kubernetes is an open-source container orchestration system for automating software deployment, scaling, and management. Originally designed by Google, the project is now maintained by a worldwide community of contributors, and the trademark is held by the Cloud Native Computing Foundation.

Asynchrony, in computer programming, refers to the occurrence of events independent of the main program flow and ways to deal with such events. These may be "outside" events such as the arrival of signals, or actions instigated by a program that take place concurrently with program execution, without the program hanging to wait for results. Asynchronous input/output is an example of the latter case of asynchrony, and lets programs issue commands to storage or network devices that service these requests while the processor continues executing the program. Doing so provides a degree of parallelism.

References

  1. Machine Learning in Microservices Productionizing Microservices Architecture for Machine Learning Solutions. Packt Publishing. 2023. ISBN   9781804612149.
  2. Kubernetes Native Microservices with Quarkus and MicroProfile. Manning. 2022. ISBN   9781638357155.