Cache stampede

Last updated September 02, 2023

A cache stampede is a type of cascading failure that can occur when massively parallel computing systems with caching mechanisms come under a very high load. This behaviour is sometimes also called dog-piling.^[1]^[2]

To understand how cache stampedes occur, consider a web server that uses memcached to cache rendered pages for some period of time, to ease system load. Under particularly high load to a single URL, the system remains responsive as long as the resource remains cached, with requests being handled by accessing the cached copy. This minimizes the expensive rendering operation.

Under low load, cache misses result in a single recalculation of the rendering operation. The system will continue as before, with the average load being kept very low because of the high cache hit rate.

However, under very heavy load, when the cached version of that page expires, there may be sufficient concurrency in the server farm that multiple threads of execution will all attempt to render the content of that page simultaneously. Systematically, none of the concurrent servers know that the others are doing the same rendering at the same time. If sufficiently high load is present, this may by itself be enough to bring about congestion collapse of the system via exhausting shared resources. Congestion collapse results in preventing the page from ever being completely re-rendered and re-cached, as every attempt to do so times out. Thus, cache stampede reduces the cache hit rate to zero and keeps the system continuously in congestion collapse as it attempts to regenerate the resource for as long as the load remains very heavy.

To give a concrete example, assume the page in consideration takes 3 seconds to render and we have a traffic of 10 requests per second. Then, when the cached page expires, we have 30 processes simultaneously recomputing the rendering of the page and updating the cache with the rendered page.

Typical cache usage

Below is a typical cache usage pattern for an item that needs to be updated every ttl units of time:

function fetch(key, ttl) {     value ← cache_read(key)     if (!value) {         value ← recompute_value()         cache_write(key, value, ttl)     }     returnvalue }

If the function recompute_value() takes a long time and the key is accessed frequently, many processes will simultaneously call recompute_value() upon expiration of the cache value.

In typical web applications, the function recompute_value() may query a database, access other services, or perform some complicated operation (which is why this particular computation is being cached in the first place). When the request rate is high, the database (or any other shared resource) will suffer from an overload of requests/queries, which may in turn cause a system collapse.

Cache stampede mitigation

Several approaches have been proposed to mitigate cache stampedes (also known as dogpile prevention). They can be roughly grouped in 3 main categories.

Locking

To prevent multiple simultaneous recomputations of the same value, upon a cache miss a process will attempt to acquire the lock for that cache key and recompute it only if it acquires it.

There are different implementation options for the case when the lock is not acquired:

Wait until the value is recomputed
Return a "not-found" and have the client handle the absence of the value properly
Keep a stale item in the cache to be used while the new value is recomputed

If implemented properly, locking can prevent stampedes altogether, but requires an extra write for the locking mechanism. Apart from doubling the number of writes, the main drawback is a correct implementation of the locking mechanism which also takes care of edge cases including failure of the process acquiring the lock, tuning of a time-to-live for the lock, race-conditions, and so on.

External recomputation

This solution moves the recomputation of the cache value from the processes needing it to an external process. The recomputation of the external process can be triggered in different ways:

When the cache value approaches its expiration
Periodically
When a process needing the value encounters a cache miss

This approach requires one more moving part - the external process - that needs to be maintained and monitored. In addition, this solution requires unnatural code separation/duplication and is mostly suited for static cache keys (i.e., not dynamically generated, as in the case of keys indexed by an id).

Probabilistic early expiration

With this approach, each process may recompute the cache value before its expiration by making an independent probabilistic decision, where the probability of performing the early recomputation increases as we get closer to the expiration of the value. Since the probabilistic decision is made independently by each process, the effect of the stampede is mitigated as fewer processes will expire at the same time.

The following implementation based on an exponential distribution has been shown to be optimal in terms of its effectiveness in preventing stampedes and how early recomputations can happen.^[3]

function x-fetch(key, ttl, beta=1) {     value, delta, expiry ← cache_read(key)     if (!value || (time() - delta * beta * log(rand(0,1))) ≥ expiry) {         start ← time()         value ← recompute_value()         delta ← time() – start         cache_write(key, (value, delta), ttl)     }     returnvalue }

The parameter beta can be set to a value greater than 1 to favor earlier recomputations and further reduce stampedes but the authors show that setting beta=1 works well in practice. The variable delta represents the time to recompute the value and is used to scale the probability distribution appropriately.

This approach is simple to implement and effectively reduces cache stampedes by automatically favoring early recomputations when the traffic rate increases. One drawback is that it takes more memory in cache as we need to bundle the value delta with the cache item - when the caching system does not support retrieval of the key expiration time, we also need to store the expiry (that is, time() + ttl) in the bundle.

Related Research Articles

<span class="mw-page-title-main">Cache (computing)</span> Additional storage that enables faster access to main storage

In computing, a cache is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewhere. A cache hit occurs when the requested data can be found in a cache, while a cache miss occurs when it cannot. Cache hits are served by reading data from the cache, which is faster than recomputing a result or reading from a slower data store; thus, the more requests that can be served from the cache, the faster the system performs.

The Domain Name System (DNS) is a hierarchical and distributed naming system for computers, services, and other resources in the Internet or other Internet Protocol (IP) networks. It associates various information with domain names assigned to each of the associated entities. Most prominently, it translates readily memorized domain names to the numerical IP addresses needed for locating and identifying computer services and devices with the underlying network protocols. The Domain Name System has been an essential component of the functionality of the Internet since 1985.

Time to live (TTL) or hop limit is a mechanism which limits the lifespan or lifetime of data in a computer or network. TTL may be implemented as a counter or timestamp attached to or embedded in the data. Once the prescribed event count or timespan has elapsed, data is discarded or revalidated. In computer networking, TTL prevents a data packet from circulating indefinitely. In computing applications, TTL is commonly used to improve the performance and manage the caching of data.

In computing, load balancing is the process of distributing a set of tasks over a set of resources, with the aim of making their overall processing more efficient. Load balancing can optimize the response time and avoid unevenly overloading some compute nodes while other compute nodes are left idle.

In computer science, the test-and-set instruction is an instruction used to write (set) 1 to a memory location and return its old value as a single atomic operation. The caller can then "test" the result to see if the state was changed by the call. If multiple processes may access the same memory location, and if a process is currently performing a test-and-set, no other process may begin another test-and-set until the first process's test-and-set is finished. A central processing unit (CPU) may use a test-and-set instruction offered by another electronic component, such as dual-port RAM; a CPU itself may also offer a test-and-set instruction.

Memcached is a general-purpose distributed memory-caching system. It is often used to speed up dynamic database-driven websites by caching data and objects in RAM to reduce the number of times an external data source must be read. Memcached is free and open-source software, licensed under the Revised BSD license. Memcached runs on Unix-like operating systems and on Microsoft Windows. It depends on the libevent library.

In computing, cache replacement policies are optimizing instructions, or algorithms, that a computer program or a hardware-maintained structure can utilize in order to manage a cache of information stored on the computer. Caching improves performance by keeping recent or often-used data items in memory locations that are faster or computationally cheaper to access than normal memory stores. When the cache is full, the algorithm must choose which items to discard to make room for the new ones.

NetFlow is a feature that was introduced on Cisco routers around 1996 that provides the ability to collect IP network traffic as it enters or exits an interface. By analyzing the data provided by NetFlow, a network administrator can determine things such as the source and destination of traffic, class of service, and the causes of congestion. A typical flow monitoring setup consists of three main components:

Operating systems use lock managers to organise and serialise the access to resources. A distributed lock manager (DLM) runs in every machine in a cluster, with an identical copy of a cluster-wide lock database. In this way a DLM provides software applications which are distributed across a cluster on multiple machines with a means to synchronize their accesses to shared resources.

In computer science, synchronization refers to one of two distinct but related concepts: synchronization of processes, and synchronization of data. Process synchronization refers to the idea that multiple processes are to join up or handshake at a certain point, in order to reach an agreement or commit to a certain sequence of action. Data synchronization refers to the idea of keeping multiple copies of a dataset in coherence with one another, or to maintain data integrity. Process synchronization primitives are commonly used to implement data synchronization.

In machine learning, lazy learning is a learning method in which generalization of the training data is, in theory, delayed until a query is made to the system, as opposed to eager learning, where the system tries to generalize the training data before receiving queries.

DNS rebinding is a method of manipulating resolution of domain names that is commonly used as a form of computer attack. In this attack, a malicious web page causes visitors to run a client-side script that attacks machines elsewhere on the network. In theory, the same-origin policy prevents this from happening: client-side scripts are only allowed to access content on the same host that served the script. Comparing domain names is an essential part of enforcing this policy, so DNS rebinding circumvents this protection by abusing the Domain Name System (DNS).

Most real databases contain data whose correctness is uncertain. In order to work with such data, there is a need to quantify the integrity of the data. This is achieved by using probabilistic databases.

Resin is a web server and Java application server from Caucho Technology. In addition to Resin (GPL), Resin Pro is available for enterprise and production environments with a license. Resin supports the Java EE standard as well as a mod_php/PHP like engine called Quercus.

Incremental computing, also known as incremental computation, is a software feature which, whenever a piece of data changes, attempts to save time by only recomputing those outputs which depend on the changed data. When incremental computing is successful, it can be significantly faster than computing new outputs naively. For example, a spreadsheet software package might use incremental computation in its recalculation feature, to update only those cells containing formulas which depend on the changed cells.

Redis is an open-source in-memory storage, used as a distributed, in-memory key–value database, cache and message broker, with optional durability. Because it holds all data in memory and because of its design, Redis offers low-latency reads and writes, making it particularly suitable for use cases that require a cache. Redis is the most popular NoSQL database, and one of the most popular databases overall. Redis is used in companies like Twitter, Airbnb, Tinder, Yahoo, Adobe, Hulu, and Amazon.

In computer science, a concurrent data structure is a particular way of storing and organizing data for access by multiple computing threads on a computer.

Elliptics is a distributed key–value data storage with open source code. By default it is a classic distributed hash table (DHT) with multiple replicas put in different groups. Elliptics was created to meet requirements of multi-datacenter and physically distributed storage locations when storing huge amount of medium and large files.

This is a glossary of terms relating to computer graphics.

ASP.NET Web Forms is a web application framework and one of several programming models supported by the Microsoft ASP.NET technology. Web Forms applications can be written in any programming language which supports the Common Language Runtime, such as C# or Visual Basic. The main building blocks of Web Forms pages are server controls, which are reusable components responsible for rendering HTML markup and responding to events. A technique called view state is used to persist the state of server controls between normally stateless HTTP requests.

References

↑ Galbraith, Patrick (2009), Developing Web Applications with Apache, MySQL, memcached, and Perl, John Wiley & Sons, p. 353, ISBN 9780470538326 .
↑ Allspaw, John; Robbins, Jesse (2010), Web Operations: Keeping the Data On Time, O'Reilly Media, pp. 128–132, ISBN 9781449394158 .
↑ Vattani, A.; Chierichetti, F.; Lowenstein, K. (2015), "Optimal Probabilistic Cache Stampede Prevention" (PDF), Proceedings of the VLDB Endowment, VLDB, 8 (8): 886–897, doi:10.14778/2757807.2757813, ISSN 2150-8097 .

External links

Minimizing cache stampedes, Joshua Thijssen, 2010
Problems and solutions for typical perl cache usage, Jonathan Swartz, 2008

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Galbraith, Patrick (2009), Developing Web Applications with Apache, MySQL, memcached, and Perl, John Wiley & Sons, p. 353, ISBN 9780470538326 .

[2] Allspaw, John; Robbins, Jesse (2010), Web Operations: Keeping the Data On Time, O'Reilly Media, pp. 128–132, ISBN 9781449394158 .

[3] Vattani, A.; Chierichetti, F.; Lowenstein, K. (2015), "Optimal Probabilistic Cache Stampede Prevention" (PDF), Proceedings of the VLDB Endowment, VLDB, 8 (8): 886–897, doi:10.14778/2757807.2757813, ISSN 2150-8097 .

[1]

[2]

[3]