Thundering herd problem

Last updated

In computer science, the thundering herd problem occurs when a large number of processes or threads waiting for an event are awoken when that event occurs, but only one process is able to handle the event. When the processes wake up, they will each try to handle the event, but only one will win. All processes will compete for resources, possibly freezing the computer, until the herd is calmed down again. [1]

Contents

Mitigation

The Linux kernel serializes responses for requests to a single file descriptor, so only one thread or process is woken up. [2] For epoll() in version 4.5 of the Linux kernel, the EPOLLEXCLUSIVE flag was added. Thus several epoll sets (different threads or different processes) may wait on the same resource and only one set will be woken up. For certain workloads this flag can give significant processing time reduction. [3]

Similarly in Microsoft Windows, I/O completion ports can mitigate the thundering herd problem, as they can be configured such that only one of the threads waiting on the completion port is woken up when an event occurs. [4]

In systems that rely on a backoff mechanism (e.g. exponential backoff), the clients will retry failed calls by waiting a specific amount of time between consecutive retries. In order to avoid the thundering herd problem, jitter can be purposefully introduced in order to break the synchronization across the clients, thereby avoiding collisions. In this approach, randomness is added to the wait intervals between retries, so that clients are no longer synchronized.

See also

Related Research Articles

<span class="mw-page-title-main">Microkernel</span> Kernel that provides fewer services than a traditional kernel

In computer science, a microkernel is the near-minimum amount of software that can provide the mechanisms needed to implement an operating system (OS). These mechanisms include low-level address space management, thread management, and inter-process communication (IPC).

<span class="mw-page-title-main">Thread (computing)</span> Smallest sequence of programmed instructions that can be managed independently by a scheduler

In computer science, a thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler, which is typically a part of the operating system. The implementation of threads and processes differs between operating systems. In Modern Operating Systems, Tanenbaum shows that many distinct models of process organization are possible. In many cases, a thread is a component of a process. The multiple threads of a given process may be executed concurrently, sharing resources such as memory, while different processes do not share these resources. In particular, the threads of a process share its executable code and the values of its dynamically allocated variables and non-thread-local global variables at any given time.

<span class="mw-page-title-main">System call</span> Way for programs to access kernel services

In computing, a system call is the programmatic way in which a computer program requests a service from the operating system on which it is executed. This may include hardware-related services, creation and execution of new processes, and communication with integral kernel services such as process scheduling. System calls provide an essential interface between a process and the operating system.

<span class="mw-page-title-main">Load (computing)</span> Amount of computational work that a computer system performs

In UNIX computing, the system load is a measure of the amount of computational work that a computer system performs. The load average represents the average system load over a period of time. It conventionally appears in the form of three numbers which represent the system load during the last one-, five-, and fifteen-minute periods.

In software engineering, a spinlock is a lock that causes a thread trying to acquire it to simply wait in a loop ("spin") while repeatedly checking whether the lock is available. Since the thread remains active but is not performing a useful task, the use of such a lock is a kind of busy waiting. Once acquired, spinlocks will usually be held until they are explicitly released, although in some implementations they may be automatically released if the thread being waited on blocks or "goes to sleep".

In Unix and Unix-like computer operating systems, a file descriptor is a process-unique identifier (handle) for a file or other input/output resource, such as a pipe or network socket.

In computer science, compare-and-swap (CAS) is an atomic instruction used in multithreading to achieve synchronization. It compares the contents of a memory location with a given value and, only if they are the same, modifies the contents of that memory location to a new given value. This is done as a single atomic operation. The atomicity guarantees that the new value is calculated based on up-to-date information; if the value had been updated by another thread in the meantime, the write would fail. The result of the operation must indicate whether it performed the substitution; this can be done either with a simple boolean response, or by returning the value read from the memory location.

In computing, a futex is a kernel system call that programmers can use to implement basic locking, or as a building block for higher-level locking abstractions such as semaphores and POSIX mutexes or condition variables.

Exponential backoff is an algorithm that uses feedback to multiplicatively decrease the rate of some process, in order to gradually find an acceptable rate. These algorithms find usage in a wide range of systems and processes, with radio networks and computer networks being particularly notable.

In computer science, a ticket lock is a synchronization mechanism, or locking algorithm, that is a type of spinlock that uses "tickets" to control which thread of execution is allowed to enter a critical section.

In computer science, asynchronous I/O is a form of input/output processing that permits other processing to continue before the transmission has finished. A name used for asynchronous I/O in the Windows API is overlapped I/O.

In computer science, the event loop is a programming construct or design pattern that waits for and dispatches events or messages in a program. The event loop works by making a request to some internal or external "event provider", then calls the relevant event handler. The event loop is also sometimes referred to as the message dispatcher, message loop, message pump, or run loop.

In computer science, synchronization refers to one of two distinct but related concepts: synchronization of processes, and synchronization of data. Process synchronization refers to the idea that multiple processes are to join up or handshake at a certain point, in order to reach an agreement or commit to a certain sequence of action. Data synchronization refers to the idea of keeping multiple copies of a dataset in coherence with one another, or to maintain data integrity. Process synchronization primitives are commonly used to implement data synchronization.

Doors is an inter-process communication facility for Unix computer systems. They provide a form of procedure call.

<span class="mw-page-title-main">Process management (computing)</span> Computer system for maintaining order among running programs

A process is a program in execution, and an integral part of any modern-day operating system (OS). The OS must allocate resources to processes, enable processes to share and exchange information, protect the resources of each process from other processes and enable synchronization among processes. To meet these requirements, the OS must maintain a data structure for each process, which describes the state and resource ownership of that process, and which enables the OS to exert control over each process.

<span class="mw-page-title-main">Grand Central Dispatch</span> Technology developed by Apple Inc

Grand Central Dispatch, is a technology developed by Apple Inc. to optimize application support for systems with multi-core processors and other symmetric multiprocessing systems. It is an implementation of task parallelism based on the thread pool pattern. The fundamental idea is to move the management of the thread pool out of the hands of the developer, and closer to the operating system. The developer injects "work packages" into the pool oblivious of the pool's architecture. This model improves simplicity, portability and performance.

Kqueue is a scalable event notification interface introduced in FreeBSD 4.1 in July 2000, also supported in NetBSD, OpenBSD, DragonFly BSD, and macOS. Kqueue was originally authored in 2000 by Jonathan Lemon, then involved with the FreeBSD Core Team. Kqueue makes it possible for software like nginx to solve the c10k problem.

epoll is a Linux kernel system call for a scalable I/O event notification mechanism, first introduced in version 2.5.44 of the Linux kernel. Its function is to monitor multiple file descriptors to see whether I/O is possible on any of them. It is meant to replace the older POSIX select(2) and poll(2) system calls, to achieve better performance in more demanding applications, where the number of watched file descriptors is large (unlike the older system calls, which operate in O(n) time, epoll operates in O(1) time).

Synchronous Interprocess Messaging Project for LINUX (SIMPL) is a free and open-source project that allows QNX-style synchronous message passing by adding a Linux library using user space techniques like shared memory and Unix pipes to implement SendMssg/ReceiveMssg/ReplyMssg inter-process messaging mechanisms.

Enduro/X is an open-source middleware platform for distributed transaction processing. It is built on proven APIs such as X/Open group's XATMI and XA. The platform is designed for building real-time microservices based applications with a clusterization option. Enduro/X functions as an extended drop-in replacement for Oracle Tuxedo. The platform uses in-memory POSIX Kernel queues which insures high interprocess communication throughput.

References

  1. "Thundering Herd Problem". The Jargon File (version 4.4.7). Retrieved 9 July 2019.
  2. "Does the Thundering Herd Problem exist on Linux anymore". stackoverflow.com. Retrieved 2019-07-09.
  3. Madars, Vitolins (2015-12-05). "EPOLLEXCLUSIVE Linux Kernel patch testing". mvitolin. Retrieved 2020-08-11.
  4. "IO Completion Ports — Matt Godbolt's blog". xania.org. Retrieved 2019-01-23.