Circuit breaker design pattern

Last updated

Circuit breaker is a design pattern used in software development. It is used to detect failures and encapsulates the logic of preventing a failure from constantly recurring, during maintenance, temporary external system failure or unexpected system difficulties. Circuit breaker pattern prevents cascading failures particularly in distributed systems. [1]

Contents

Circuit breaker pattern should be used along with other patterns such as retry, fallback and timeout pattern. This helps the system to be more fault tolerant. [2]

Common uses

Assume that an application connects to a database 100 times per second and the database fails. The application designer does not want to have the same error reoccur constantly. They also want to handle the error quickly and gracefully without waiting for TCP connection timeout.

Generally Circuit Breaker can be used to check the availability of an external service. An external service can be a database server or a web service used by the application.

Circuit breaker detects failures and prevents the application from trying to perform the action that is doomed to fail (until it's safe to retry).

Implementation

Implementations of the Circuit Breaker Design Pattern need to retain the state of the connection over a series of requests. It must offload the logic to detect failures from the actual requests. Therefore, the state machine within the circuit breaker needs to operate in some sense concurrently with the requests passing through it. One way this can be achieved is asynchronously.

In a multi-node (clustered) server, the state of the upstream service will need to be reflected across all the nodes in the cluster. Therefore, implementations may need to use a persistent storage layer, e.g. a network cache such as Memcached or Redis, or local cache (disk or memory based) to record the availability of what is, to the application, an external service.

Circuit Breaker records the state of the external service on a given interval.

Before the external service is used from the application, the storage layer is queried to retrieve the current state.

Performance implication

While it's safe to say that the benefits outweigh the consequences, implementing Circuit Breaker will negatively affect storage space, application complexity, and computational cost to the executing application. This is because it adds additional code into the execution path to check for the state of the circuit. This can be seen in the PHP example below, where checking APC for the database status costs a few extra cycles. Also, running the circuit breaker code itself consumes resources on the system where it is running, thus leaving less execution power for "real" applications.[ why? ]

By how much depends on the storage layer used and generally available resources. The largest factors in this regard are the type of cache, for example, disk-based vs. memory-based and local vs. network.

Different states of circuit breaker

Closed state

When everything is normal, the circuit breakers remained closed, and all the request passes through to the services as shown below. If the number of failures increases beyond the threshold, the circuit breaker trips and goes into an open state.

Circuit Breaker Closed State Circuit Breaker -Closed state.png
Circuit Breaker Closed State

Open state

In this state circuit breaker returns an error immediately without even invoking the services. The Circuit breakers move into the half-open state after a timeout period elapses. Usually, it will have a monitoring system where the timeout will be specified.

Circuit Breaker Open State Circuit Breaker -Openstate.png
Circuit Breaker Open State

Half-open state

In this state, the circuit breaker allows a limited number of requests from the Microservice to passthrough and invoke the operation. If the requests are successful, then the circuit breaker will go to the closed state. However, if the requests continue to fail, then it goes back to Open state.

Circuit Breaker Half Open State Circuit Breaker -Half Open state.png
Circuit Breaker Half Open State

Example implementation

PHP

The following is a sample implementation in PHP. The proof of concept stores the status of a MySQL server into a shared memory cache (APC User Cache).

Check

The following script could be run on a set interval through crontab.

$mysqli=newmysqli("localhost","username","password");if($mysqli->connect_error){apcu_add("dbStatus","down");}else{apcu_add("dbStatus","up");$mysqli->close();}

Usage in an application

if(apcu_fetch("dbStatus")==="down"){echo"The database server is currently not available. Please try again in a minute.";exit;}$mysqli=newmysqli("localhost","username","password","database");$result=$mysqli->query("SELECT * FROM table");

Related Research Articles

<span class="mw-page-title-main">Cache (computing)</span> Additional storage that enables faster access to main storage

In computing, a cache is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewhere. A cache hit occurs when the requested data can be found in a cache, while a cache miss occurs when it cannot. Cache hits are served by reading data from the cache, which is faster than recomputing a result or reading from a slower data store; thus, the more requests that can be served from the cache, the faster the system performs.

<span class="mw-page-title-main">Web server</span> Computer software that distributes web pages

A web server is computer software and underlying hardware that accepts requests via HTTP or its secure variant HTTPS. A user agent, commonly a web browser or web crawler, initiates communication by making a request for a web page or other resource using HTTP, and the server responds with the content of that resource or an error message. A web server can also accept and store resources sent from the user agent if configured to do so.

<span class="mw-page-title-main">Load balancing (computing)</span> Set of techniques to improve the distribution of workloads across multiple computing resources

In computing, load balancing is the process of distributing a set of tasks over a set of resources, with the aim of making their overall processing more efficient. Load balancing can optimize response time and avoid unevenly overloading some compute nodes while other compute nodes are left idle.

Memcached is a general-purpose distributed memory-caching system. It is often used to speed up dynamic database-driven websites by caching data and objects in RAM to reduce the number of times an external data source must be read. Memcached is free and open-source software, licensed under the Revised BSD license. Memcached runs on Unix-like operating systems and on Microsoft Windows. It depends on the libevent library.

In computer networking, localhost is a hostname that refers to the current computer used to access it. The name localhost is reserved for loopback purposes. It is used to access the network services that are running on the host via the loopback network interface. Using the loopback interface bypasses any local network interface hardware.

REST is a software architectural style that was created to guide the design and development of the architecture for the World Wide Web. REST defines a set of constraints for how the architecture of a distributed, Internet-scale hypermedia system, such as the Web, should behave. The REST architectural style emphasises uniform interfaces, independent deployment of components, the scalability of interactions between them, and creating a layered architecture to promote caching to reduce user-perceived latency, enforce security, and encapsulate legacy systems.

In computing, a data source name is a string that has an associated data structure used to describe a connection to a data source. Most commonly used in connection with ODBC, DSNs also exist for JDBC and for other data access mechanisms. The term often overlaps with "connection string". Most systems do not make a distinction between DSNs or connection strings and the term can often be used interchangeably.

<span class="mw-page-title-main">RapidIO</span> High-speed interconnect technology

The RapidIO architecture is a high-performance packet-switched electrical connection technology. It supports messaging, read/write and cache coherency semantics. Based on industry-standard electrical specifications such as those for Ethernet, RapidIO can be used as a chip-to-chip, board-to-board, and chassis-to-chassis interconnect.

<span class="mw-page-title-main">Dependency injection</span> Software programming technique

In software engineering, dependency injection is a programming technique in which an object or function receives other objects or functions that it requires, as opposed to creating them internally. Dependency injection aims to separate the concerns of constructing objects and using them, leading to loosely coupled programs. The pattern ensures that an object or function that wants to use a given service should not have to know how to construct those services. Instead, the receiving 'client' is provided with its dependencies by external code, which it is not aware of. Dependency injection makes implicit dependencies explicit and helps solve the following problems:

A Domain Name System (DNS) zone file is a text file that describes a DNS zone. A DNS zone is a subset, often a single domain, of the hierarchical domain name structure of the DNS. The zone file contains mappings between domain names and IP addresses and other resources, organized in the form of text representations of resource records (RR). A zone file may be either a DNS master file, authoritatively describing a zone, or it may be used to list the contents of a DNS cache.

Database caching is a process included in the design of computer applications which generate web pages on-demand (dynamically) by accessing backend databases.

Web2py is an open-source web application framework written in the Python programming language. Web2py allows web developers to program dynamic web content using Python. Web2py is designed to help reduce tedious web development tasks, such as developing web forms from scratch, although a web developer may build a form from scratch if required.

TNSDL stands for TeleNokia Specification and Description Language. TNSDL is based on the ITU-T SDL-88 language. It is used exclusively at Nokia Networks, primarily for developing applications for telephone exchanges.

IBM WebSphere Application Server for z/OS is one of the platform implementations of IBM's WebSphere Application Server family. The latest version is Version 9.0.

Software fault tolerance is the ability of computer software to continue its normal operation despite the presence of system or hardware faults. Fault-tolerant software has the ability to satisfy requirements despite failures.

In database management systems (DBMS), a prepared statement, parameterized statement, or parameterized query is a feature where the database pre-compiles SQL code and stores the results, separating it from data. Benefits of prepared statements are:

<span class="mw-page-title-main">Raft (algorithm)</span> Consensus algorithm

Raft is a consensus algorithm designed as an alternative to the Paxos family of algorithms. It was meant to be more understandable than Paxos by means of separation of logic, but it is also formally proven safe and offers some additional features. Raft offers a generic way to distribute a state machine across a cluster of computing systems, ensuring that each node in the cluster agrees upon the same series of state transitions. It has a number of open-source reference implementations, with full-specification implementations in Go, C++, Java, and Scala. It is named after Reliable, Replicated, Redundant, And Fault-Tolerant.

Asynchrony, in computer programming, refers to the occurrence of events independent of the main program flow and ways to deal with such events. These may be "outside" events such as the arrival of signals, or actions instigated by a program that take place concurrently with program execution, without the program hanging to wait for results. Asynchronous input/output is an example of the latter case of asynchrony, and lets programs issue commands to storage or network devices that service these requests while the processor continues executing the program. Doing so provides a degree of parallelism.

Server-side request forgery (SSRF) is a type of computer security exploit where an attacker abuses the functionality of a server causing it to access or manipulate information in the realm of that server that would otherwise not be directly accessible to the attacker.

References

  1. Machine Learning in Microservices Productionizing Microservices Architecture for Machine Learning Solutions. Packt Publishing. 2023. ISBN   9781804612149.
  2. Kubernetes Native Microservices with Quarkus and MicroProfile. Manning. 2022. ISBN   9781638357155.