Thundering herd problem

Last updated December 23, 2024

In computer science, the thundering herd problem occurs when a large number of processes or threads waiting for an event are awakened when that event occurs, but only one process is able to handle the event. When the processes wake up, they will each try to handle the event, but only one will win. All processes will compete for resources, possibly freezing the computer, until the herd is calmed down again.^[1]

Mitigation

The Linux kernel serializes responses for requests to a single file descriptor, so only one thread or process is woken up.^[2] For epoll() in version 4.5 of the Linux kernel, the EPOLLEXCLUSIVE flag was added. Thus several epoll sets (different threads or different processes) may wait on the same resource and only one set will be woken up. For certain workloads this flag can give significant processing time reduction.^[3]

Similarly in Microsoft Windows, I/O completion ports can mitigate the thundering herd problem, as they can be configured such that only one of the threads waiting on the completion port is woken up when an event occurs.^[4]

In systems that rely on a backoff mechanism (e.g. exponential backoff), the clients will retry failed calls by waiting a specific amount of time between consecutive retries. In order to avoid the thundering herd problem, jitter can be purposefully introduced in order to break the synchronization across the clients, thereby avoiding collisions. In this approach, randomness is added to the wait intervals between retries, so that clients are no longer synchronized.

Related Research Articles

In computing, a context switch is the process of storing the state of a process or thread, so that it can be restored and resume execution at a later point, and then restoring a different, previously saved, state. This allows multiple processes to share a single central processing unit (CPU), and is an essential feature of a multiprogramming or multitasking operating system. In a traditional CPU, each process - a program in execution - utilizes the various CPU registers to store data and hold the current state of the running process. However, in a multitasking operating system, the operating system switches between processes or threads to allow the execution of multiple processes simultaneously. For every switch, the operating system must save the state of the currently running process, followed by loading the next process state, which will run on the CPU. This sequence of operations that stores the state of the running process and loads the following running process is called a context switch.

In computer science, a thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler, which is typically a part of the operating system. In many cases, a thread is a component of a process.

<span class="mw-page-title-main">Inter-process communication</span> How computer operating systems enable data sharing

In computer science, inter-process communication (IPC), also spelled interprocess communication, are the mechanisms provided by an operating system for processes to manage shared data. Typically, applications can use IPC, categorized as clients and servers, where the client requests data and the server responds to client requests. Many applications are both clients and servers, as commonly seen in distributed computing.

<span class="mw-page-title-main">Load (computing)</span> Amount of computational work that a computer system performs

In UNIX computing, the system load is a measure of the amount of computational work that a computer system performs. The load average represents the average system load over a period of time. It conventionally appears in the form of three numbers which represent the system load during the last one-, five-, and fifteen-minute periods.

In computing, scheduling is the action of assigning resources to perform tasks. The resources may be processors, network links or expansion cards. The tasks may be threads, processes or data flows.

In software engineering, a spinlock is a lock that causes a thread trying to acquire it to simply wait in a loop ("spin") while repeatedly checking whether the lock is available. Since the thread remains active but is not performing a useful task, the use of such a lock is a kind of busy waiting. Once acquired, spinlocks will usually be held until they are explicitly released, although in some implementations they may be automatically released if the thread being waited on blocks or "goes to sleep".

In Unix and Unix-like computer operating systems, a file descriptor is a process-unique identifier (handle) for a file or other input/output resource, such as a pipe or network socket.

In computer science, compare-and-swap (CAS) is an atomic instruction used in multithreading to achieve synchronization. It compares the contents of a memory location with a given value and, only if they are the same, modifies the contents of that memory location to a new given value. This is done as a single atomic operation. The atomicity guarantees that the new value is calculated based on up-to-date information; if the value had been updated by another thread in the meantime, the write would fail. The result of the operation must indicate whether it performed the substitution; this can be done either with a simple boolean response, or by returning the value read from the memory location, thus "swapping" the read and written values.

In computer systems programming, an interrupt handler, also known as an interrupt service routine or ISR, is a special block of code associated with a specific interrupt condition. Interrupt handlers are initiated by hardware interrupts, software interrupt instructions, or software exceptions, and are used for implementing device drivers or transitions between protected modes of operation, such as system calls.

In computing, a futex is a kernel system call that programmers can use to implement basic locking, or as a building block for higher-level locking abstractions such as semaphores and POSIX mutexes or condition variables.

In computer science, a ticket lock is a synchronization mechanism, or locking algorithm, that is a type of spinlock that uses "tickets" to control which thread of execution is allowed to enter a critical section.

In computer science, asynchronous I/O is a form of input/output processing that permits other processing to continue before the I/O operation has finished. A name used for asynchronous I/O in the Windows API is overlapped I/O.

In computer science, the event loop is a programming construct or design pattern that waits for and dispatches events or messages in a program. The event loop works by making a request to some internal or external "event provider", then calls the relevant event handler.

In computer science, synchronization is the task of coordinating multiple processes to join up or handshake at a certain point, in order to reach an agreement or commit to a certain sequence of action.

Doors is an inter-process communication facility for Unix computer systems. They provide a form of procedure call.

A process is a program in execution, and an integral part of any modern-day operating system (OS). The OS must allocate resources to processes, enable processes to share and exchange information, protect the resources of each process from other processes and enable synchronization among processes. To meet these requirements, The OS must maintain a data structure for each process, which describes the state and resource ownership of that process, and which enables the operating system to exert control over each process.

Kqueue is a scalable event notification interface introduced in FreeBSD 4.1 in July 2000, also supported in NetBSD, OpenBSD, DragonFly BSD, and macOS. Kqueue was originally authored in 2000 by Jonathan Lemon, then involved with the FreeBSD Core Team. Kqueue makes it possible for software like nginx to solve the c10k problem. The term "kqueue" refers to its function as a "kernel event queue"

epoll is a Linux kernel system call for a scalable I/O event notification mechanism, first introduced in version 2.5.45 of the Linux kernel. Its function is to monitor multiple file descriptors to see whether I/O is possible on any of them. It is meant to replace the older POSIX select(2) and poll(2) system calls, to achieve better performance in more demanding applications, where the number of watched file descriptors is large (unlike the older system calls, which operate in O(n) time, epoll operates in O(1) time).

Synchronous Interprocess Messaging Project for LINUX (SIMPL) is a free and open-source project that allows QNX-style synchronous message passing by adding a Linux library using user space techniques like shared memory and Unix pipes to implement SendMssg/ReceiveMssg/ReplyMssg inter-process messaging mechanisms.

Enduro/X is an open-source middleware platform for distributed transaction processing. It is built on proven APIs such as X/Open group's XATMI and XA. The platform is designed for building real-time microservices based applications with a clusterization option. Enduro/X functions as an extended drop-in replacement for Oracle Tuxedo. The platform uses in-memory POSIX message queues which insures high inter-process communication throughput.

References

↑ "Thundering Herd Problem". The Jargon File (version 4.4.7). Retrieved 9 July 2019.
↑ "Does the Thundering Herd Problem exist on Linux anymore". stackoverflow.com. Retrieved 2019-07-09.
↑ Madars, Vitolins (2015-12-05). "EPOLLEXCLUSIVE Linux Kernel patch testing". mvitolin. Retrieved 2020-08-11.
↑ "IO Completion Ports — Matt Godbolt's blog". xania.org. Retrieved 2019-01-23.

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "Thundering Herd Problem". The Jargon File (version 4.4.7). Retrieved 9 July 2019.

[2] "Does the Thundering Herd Problem exist on Linux anymore". stackoverflow.com. Retrieved 2019-07-09.

[3] Madars, Vitolins (2015-12-05). "EPOLLEXCLUSIVE Linux Kernel patch testing". mvitolin. Retrieved 2020-08-11.

[4] "IO Completion Ports — Matt Godbolt's blog". xania.org. Retrieved 2019-01-23.

[1]

[2]

[3]

[4]

Thundering herd problem

Contents

Mitigation

See also

Related Research Articles

References

External links