Futex

Last updated

In computing, a futex (short for "fast userspace mutex") is a kernel system call that programmers can use to implement basic locking, or as a building block for higher-level locking abstractions such as semaphores and POSIX mutexes or condition variables.

Contents

A futex consists of a kernel-space wait queue that is attached to an atomic integer in userspace. Multiple processes or threads operate on the integer entirely in userspace (using atomic operations to avoid interfering with one another), and only resort to relatively expensive[ citation needed ] system calls to request operations on the wait queue (for example to wake up waiting processes, or to put the current process on the wait queue). A properly programmed futex-based lock will not use system calls except when the lock has contended; since most operations do not require arbitration between processes, this will not happen in most cases.

History

Hubertus Franke (IBM Thomas J. Watson Research Center), Matthew Kirkwood, Ingo Molnár (Red Hat), and Rusty Russell (IBM Linux Technology Center) originated the futex mechanism on Linux in 2002. [1] In the same year, discussions took place on a proposal to make futexes accessible via the file system by creating a special node in /dev or /proc. However, Linus Torvalds strongly opposed this idea and rejected any related patches. [2]

Futexes then appeared for the first time in version 2.5.7 of the Linux kernel development series; the semantics stabilized as of version 2.5.40, and futexes have been part of the Linux kernel mainline since the December 2003 release of 2.6.x stable kernel series.

Futexes have been implemented in Microsoft Windows since Windows 8 or Windows Server 2012 under the name WaitOnAddress. [3]

In 2013, Microsoft patented futexes and the patent was granted in 2014. [4]

In May 2014, the CVE system announced a vulnerability discovered in the Linux kernel's futex subsystem that allowed denial-of-service attacks or local privilege escalation. [5] [6]

In May 2015, the Linux kernel introduced a deadlock bug via Commit b0c29f79ecea that caused a hang in user applications. The bug affected many enterprise Linux distributions, including 3.x and 4.x kernels, and Red Hat Enterprise Linux version 5, 6 and 7, SUSE Linux 12 and Amazon Linux. [7]

Futexes have been implemented in OpenBSD since 2016. [8]

The futex mechanism is one of the core concepts of the Zircon kernel [9] in Google's Fuchsia operating system since at least April 2018. [10]

Apple implemented futex in iOS/iPadOS/tvOS 17.4, macOS 14.4, watchOS 10.4 and visionOS 1.1. [11]

Operations

Futexes have two basic operations, WAIT and WAKE.

If the value stored at the address addr is val, puts the current thread to sleep.
Wakes up num number of threads waiting on the address addr.

For more advanced uses, there are a number of other operations, the most used being REQUEUE and WAKE_OP, which both function as more generic WAKE operations. [12]

If the value stored at the address old_addr is val, wakes num_wake threads waiting on the address old_addr, and enqueues num_move threads waiting on the address old_addr to now wait on the address new_addr. This can be used to avoid the thundering herd problem on wake. [13] [14]
Will read addr2, perform op with op_arg on it, and storing the result back to addr2. Then it will wake num1 threads waiting on addr1, and, if the previously read value from addr2 matches cmp_arg using comparison cmp, will wake num2 threads waiting on addr2. This very flexible and generic wake mechanism is useful for implementing many synchronization primitives.

See also

Related Research Articles

In computing, a context switch is the process of storing the state of a process or thread, so that it can be restored and resume execution at a later point, and then restoring a different, previously saved, state. This allows multiple processes to share a single central processing unit (CPU), and is an essential feature of a multiprogramming or multitasking operating system. In a traditional CPU, each process - a program in execution - utilizes the various CPU registers to store data and hold the current state of the running process. However, in a multitasking operating system, the operating system switches between processes or threads to allow the execution of multiple processes simultaneously. For every switch, the operating system must save the state of the currently running process, followed by loading the next process state, which will run on the CPU. This sequence of operations that stores the state of the running process and the loading of the following running process is called a context switch.

<span class="mw-page-title-main">Microkernel</span> Kernel that provides fewer services than a traditional kernel

In computer science, a microkernel is the near-minimum amount of software that can provide the mechanisms needed to implement an operating system (OS). These mechanisms include low-level address space management, thread management, and inter-process communication (IPC).

XFS is a high-performance 64-bit journaling file system created by Silicon Graphics, Inc (SGI) in 1993. It was the default file system in SGI's IRIX operating system starting with its version 5.3. XFS was ported to the Linux kernel in 2001; as of June 2014, XFS is supported by most Linux distributions; Red Hat Enterprise Linux uses it as its default file system.

<span class="mw-page-title-main">Thread (computing)</span> Smallest sequence of programmed instructions that can be managed independently by a scheduler

In computer science, a thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler, which is typically a part of the operating system. In many cases, a thread is a component of a process.

<span class="mw-page-title-main">System call</span> Way for programs to access kernel services

In computing, a system call is the programmatic way in which a computer program requests a service from the operating system on which it is executed. This may include hardware-related services, creation and execution of new processes, and communication with integral kernel services such as process scheduling. System calls provide an essential interface between a process and the operating system.

In computing, scheduling is the action of assigning resources to perform tasks. The resources may be processors, network links or expansion cards. The tasks may be threads, processes or data flows.

<span class="mw-page-title-main">Network interface controller</span> Hardware component that connects a computer to a network

A network interface controller is a computer hardware component that connects a computer to a computer network.

In computer science, read-copy-update (RCU) is a synchronization mechanism that avoids the use of lock primitives while multiple threads concurrently read and update elements that are linked through pointers and that belong to shared data structures.

In computer programming, a callback or callback function is any reference to executable code that is passed as an argument to another piece of code; that code is expected to call back (execute) the callback function as part of its job. This execution may be immediate as in a synchronous callback, or it might happen at a later point in time as in an asynchronous callback. They are also called blocking and non-blocking.

<span class="mw-page-title-main">DTrace</span> Dynamic tracing framework for kernel and applications

DTrace is a comprehensive dynamic tracing framework originally created by Sun Microsystems for troubleshooting kernel and application problems on production systems in real time. Originally developed for Solaris, it has since been released under the free Common Development and Distribution License (CDDL) in OpenSolaris and its descendant illumos, and has been ported to several other Unix-like systems.

In concurrent programming, a monitor is a synchronization construct that prevents threads from concurrently accessing a shared object's state and allows them to wait for the state to change. They provide a mechanism for threads to temporarily give up exclusive access in order to wait for some condition to be met, before regaining exclusive access and resuming their task. A monitor consists of a mutex (lock) and at least one condition variable. A condition variable is explicitly 'signalled' when the object's state is modified, temporarily passing the mutex to another thread 'waiting' on the conditional variable.

Netlink is a socket family used for inter-process communication (IPC) between both the kernel and userspace processes, and between different userspace processes, in a way similar to the Unix domain sockets available on certain Unix-like operating systems, including its original incarnation as a Linux kernel interface, as well as in the form of a later implementation on FreeBSD. Similarly to the Unix domain sockets, and unlike INET sockets, Netlink communication cannot traverse host boundaries. However, while the Unix domain sockets use the file system namespace, Netlink sockets are usually addressed by process identifiers (PIDs).

In computer science, a readers–writer is a synchronization primitive that solves one of the readers–writers problems. An RW lock allows concurrent access for read-only operations, whereas write operations require exclusive access. This means that multiple threads can read the data in parallel but an exclusive lock is needed for writing or modifying data. When a writer is writing the data, all other writers and readers will be blocked until the writer is finished writing. A common use might be to control access to a data structure in memory that cannot be updated atomically and is invalid until the update is complete.

<span class="mw-page-title-main">OpenCL</span> Open standard for programming heterogenous computing systems, such as CPUs or GPUs

OpenCL is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-programmable gate arrays (FPGAs) and other processors or hardware accelerators. OpenCL specifies programming languages for programming these devices and application programming interfaces (APIs) to control the platform and execute programs on the compute devices. OpenCL provides a standard interface for parallel computing using task- and data-based parallelism.

<span class="mw-page-title-main">Grand Central Dispatch</span> Technology developed by Apple Inc

Grand Central Dispatch, is a technology developed by Apple Inc. to optimize application support for systems with multi-core processors and other symmetric multiprocessing systems. It is an implementation of task parallelism based on the thread pool pattern. The fundamental idea is to move the management of the thread pool out of the hands of the developer, and closer to the operating system. The developer injects "work packages" into the pool oblivious of the pool's architecture. This model improves simplicity, portability and performance.

<span class="mw-page-title-main">Linux kernel</span> Operating system kernel

The Linux kernel is a free and open-source, monolithic, modular, multitasking, Unix-like operating system kernel. It was originally written in 1991 by Linus Torvalds for his i386-based PC, and it was soon adopted as the kernel for the GNU operating system, which was written to be a free (libre) replacement for Unix.

nftables is a subsystem of the Linux kernel providing filtering and classification of network packets/datagrams/frames. It has been available since Linux kernel 3.13 released on 19 January 2014.

CIFSD is an open-source in-kernel CIFS/SMB server created by Namjae Jeon for the Linux kernel. Initially the goal is to provide improved file I/O performance, but the bigger goal is to have some new features which are much easier to develop and maintain inside the kernel and expose the layers fully. Directions can be attributed to sections where Samba is moving to a few modules inside the kernel to have features like Remote direct memory access (RDMA) to work with actual performance gain.

<span class="mw-page-title-main">Fuchsia (operating system)</span> Computer operating system by Google

Fuchsia is an open-source capability-based operating system developed by Google. In contrast to Google's Linux-based operating systems such as ChromeOS and Android, Fuchsia is based on a custom kernel named Zircon. It publicly debuted as a self-hosted git repository in August 2016 without any official corporate announcement. After years of development, its official product launch was in 2021 on the first-generation Google Nest Hub, replacing its original Linux-based Cast OS.

io_uring is a Linux kernel system call interface for storage device asynchronous I/O operations addressing performance issues with similar interfaces provided by functions like read /write or aio_read /aio_write etc. for operations on data accessed by file descriptors.

References

  1. "Fuss, Futexes and Furwocks: Fast Userlevel Locking in Linux" by Franke, Russell, Kirkwood. Published in 2002 for the Ottawa Linux Symposium.
  2. Torvalds, Linus. "Futex Asynchronous Interface".
  3. "WaitOnAddress function" . Retrieved 2019-11-01.
  4. "US8782674B2 Wait on address synchronization interface" . Retrieved 2019-11-01.
  5. CVE-2014-3153
  6. "[SECURITY][DSA 2949-1] linux security update". Lists.debian.org. 2014-06-05. Retrieved 2014-06-08.
  7. "Linux futex_wait() bug..." 2015-05-13. Retrieved 2018-03-24.
  8. Mazurek, Michal. "'Futexes for OpenBSD' - MARC". marc.info. Retrieved 30 April 2017.
  9. "Zircon Kernel Concepts". fuchsia.dev. Retrieved 20 October 2019.
  10. "zx_futex_wait". fuchsia.dev. Retrieved 20 October 2019.
  11. "os_sync_wait_on_address". Apple Developer Documentation. Retrieved 2024-03-14.
  12. Futexes Are Tricky Ulrich Drepper (Red Hat, v1.6, 2011)
  13. Linux futex(2) man page, FUTEX_CMP_REQUEUE section
  14. Zircon zx_futex_requeue documentation