Kqueue

Last updated

Kqueue is a scalable event notification interface introduced in FreeBSD 4.1 in July 2000, [1] [2] also supported in NetBSD, OpenBSD, DragonFly BSD, and macOS. Kqueue was originally authored in 2000 by Jonathan Lemon, [1] [2] then involved with the FreeBSD Core Team. Kqueue makes it possible for software like nginx to solve the c10k problem. [3] [4] The term "kqueue" refers to its function as a "kernel event queue" [1] [2]

Contents

Kqueue provides efficient input and output event pipelines between the kernel and userland. Thus, it is possible to modify event filters as well as receive pending events while using only a single system call to kevent(2) per main event loop iteration. This contrasts with older traditional polling system calls such as poll(2) and select(2) which are less efficient, especially when polling for events on numerous file descriptors.

Kqueue not only handles file descriptor events but is also used for various other notifications such as file modification monitoring, signals, asynchronous I/O events (AIO), child process state change monitoring, and timers which support nanosecond resolution, furthermore kqueue provides a way to use user-defined events in addition to the ones provided by the kernel.

Some other operating systems which traditionally only supported select(2) and poll(2) also currently provide more efficient polling alternatives, such as epoll on Linux and I/O completion ports on Windows and Solaris.

libkqueue is a user space implementation of kqueue(2), which translates calls to an operating system's native backend event mechanism. [5]

API

The function prototypes and types are found in sys/event.h. [6]

intkqueue(void);

Creates a new kernel event queue and returns a descriptor.

intkevent(intkq,conststructkevent*changelist,intnchanges,structkevent*eventlist,intnevents,conststructtimespec*timeout);

Used to register events with the queue, then wait for and return any pending events to the user. In contrast to epoll, kqueue uses the same function to register and wait for events, and multiple event sources may be registered and modified using a single call. The changelist array can be used to pass modifications (changing the type of events to wait for, register new event sources, etc.) to the event queue, which are applied before waiting for events begins. nevents is the size of the user supplied eventlist array that is used to receive events from the event queue.

EV_SET(kev,ident,filter,flags,fflags,data,udata);

A macro that is used for convenient initialization of a struct kevent object.

See also

OS-independent libraries with support for kqueue:

Kqueue equivalent for other platforms:

Related Research Articles

Berkeley sockets is an application programming interface (API) for Internet sockets and Unix domain sockets, used for inter-process communication (IPC). It is commonly implemented as a library of linkable modules. It originated with the 4.2BSD Unix operating system, which was released in 1983.

<span class="mw-page-title-main">DragonFly BSD</span> Free and open-source Unix-like operating system

DragonFly BSD is a free and open-source Unix-like operating system forked from FreeBSD 4.8. Matthew Dillon, an Amiga developer in the late 1980s and early 1990s and FreeBSD developer between 1994 and 2003, began working on DragonFly BSD in June 2003 and announced it on the FreeBSD mailing lists on 16 July 2003.

<span class="mw-page-title-main">Matthew Dillon</span> American software engineer (born 1966)

Matthew Dillon is an American software engineer known for Amiga software, contributions to FreeBSD and for starting and leading the DragonFly BSD project since 2003.

chroot is an operation on Unix and Unix-like operating systems that changes the apparent root directory for the current running process and its children. A program that is run in such a modified environment cannot name files outside the designated directory tree. The term "chroot" may refer to the chroot(2) system call or the chroot(8) wrapper program. The modified environment is called a chroot jail.

In computing, busdma, bus_dma and bus_space is a set of application programming interfaces designed to help make device drivers less dependent on platform-specific code, thereby allowing the host operating system to be more easily ported to new computer hardware. This is accomplished by having abstractions for direct memory access (DMA) mapping across popular machine-independent computer buses like PCI, which are used on distinct architectures from IA-32 (NetBSD/i386) to DEC Alpha (NetBSD/alpha). Additionally, some devices may come in multiple flavours supporting more than one bus, e.g., ISA, EISA, VESA Local Bus and PCI, still sharing the same core logic irrespective of the bus, and such device drivers would also benefit from this same abstraction. Thus the rationale of busdma is to facilitate maximum code reuse across a wide range of platforms.

In Unix and Unix-like computer operating systems, a file descriptor is a process-unique identifier (handle) for a file or other input/output resource, such as a pipe or network socket.

stat (system call) Unix system call

stat is a Unix system call that returns file attributes about an inode. The semantics of stat vary between operating systems. As an example, Unix command ls uses this system call to retrieve information on files that includes:

In computer science, asynchronous I/O is a form of input/output processing that permits other processing to continue before the I/O operation has finished. A name used for asynchronous I/O in the Windows API is overlapped I/O.

libevent is a software library that provides asynchronous event notification. The libevent API provides a mechanism to execute a callback function when a specific event occurs on a file descriptor or after a timeout has been reached. libevent also supports callbacks triggered by signals and regular timeouts.

inotify is a Linux kernel subsystem created by John McCutchan, which monitors changes to the filesystem, and reports those changes to applications. It can be used to automatically update directory views, reload configuration files, log changes, backup, synchronize, and upload. The inotifywait and inotifywatch commands allow using the inotify subsystem from the command line. One major use is in desktop search utilities like Beagle, where its functionality permits reindexing of changed files without scanning the filesystem for changes every few minutes, which would be very inefficient.

OS-level virtualization is an operating system (OS) virtualization paradigm in which the kernel allows the existence of multiple isolated user space instances, called containers, zones, virtual private servers (OpenVZ), partitions, virtual environments (VEs), virtual kernels, or jails. Such instances may look like real computers from the point of view of programs running in them. A computer program running on an ordinary operating system can see all resources of that computer. However, programs running inside of a container can only see the container's contents and devices assigned to the container.

Netlink is a socket family used for inter-process communication (IPC) between both the kernel and userspace processes, and between different userspace processes, in a way similar to the Unix domain sockets available on certain Unix-like operating systems, including its original incarnation as a Linux kernel interface, as well as in the form of a later implementation on FreeBSD. Similarly to the Unix domain sockets, and unlike INET sockets, Netlink communication cannot traverse host boundaries. However, while the Unix domain sockets use the file system namespace, Netlink sockets are usually addressed by process identifiers (PIDs).

Slab allocation is a memory management mechanism intended for the efficient memory allocation of objects. In comparison with earlier mechanisms, it reduces fragmentation caused by allocations and deallocations. This technique is used for retaining allocated memory containing a data object of a certain type for reuse upon subsequent allocations of objects of the same type. It is analogous to an object pool, but only applies to memory, not other resources.

<span class="mw-page-title-main">FreeBSD</span> Free and open-source Unix-like operating system

FreeBSD is a free and open-source Unix-like operating system descended from the Berkeley Software Distribution (BSD). The first version of FreeBSD was released in 1993 developed from 386BSD and the current version runs on x86, ARM, PowerPC and RISC-V processors. The project is supported and promoted by the FreeBSD Foundation.

select is a system call and application programming interface (API) in Unix-like and POSIX-compliant operating systems for examining the status of file descriptors of open input/output channels. The select system call is similar to the poll facility introduced in UNIX System V and later operating systems. However, with the c10k problem, both select and poll have been superseded by the likes of kqueue, epoll, /dev/poll and I/O completion ports.

<span class="mw-page-title-main">Grand Central Dispatch</span> Technology developed by Apple Inc

Grand Central Dispatch, is a technology developed by Apple Inc. to optimize application support for systems with multi-core processors and other symmetric multiprocessing systems. It is an implementation of task parallelism based on the thread pool pattern. The fundamental idea is to move the management of the thread pool out of the hands of the developer, and closer to the operating system. The developer injects "work packages" into the pool oblivious of the pool's architecture. This model improves simplicity, portability and performance.

epoll is a Linux kernel system call for a scalable I/O event notification mechanism, first introduced in version 2.5.45 of the Linux kernel. Its function is to monitor multiple file descriptors to see whether I/O is possible on any of them. It is meant to replace the older POSIX select(2) and poll(2) system calls, to achieve better performance in more demanding applications, where the number of watched file descriptors is large (unlike the older system calls, which operate in O(n) time, epoll operates in O(1) time).

<span class="mw-page-title-main">NextBSD</span> Operating system

NextBSD was an operating system initially based on the trunk version of FreeBSD as of August 2015. It was a fork of FreeBSD which implemented new features developed on branches, but not yet implemented in FreeBSD. As of 2019, the website is defunct, with the last commits on GitHub dating to October 2019. The Wayback Machine captures of the website after December 15, 2017 are domain squatter pages, and as of March 17, 2021, the site is redirects to a fake "Apple Support" page.

Enduro/X is an open-source middleware platform for distributed transaction processing. It is built on proven APIs such as X/Open group's XATMI and XA. The platform is designed for building real-time microservices based applications with a clusterization option. Enduro/X functions as an extended drop-in replacement for Oracle Tuxedo. The platform uses in-memory POSIX Kernel queues which insures high interprocess communication throughput.

poll is a POSIX system call to wait for one or more file descriptors to become ready for use.

References

  1. 1 2 3 Jonathan Lemon (2000). "kqueue, kevent — kernel event notification mechanism". BSD Cross Reference. FreeBSD, OpenBSD, NetBSD, DragonFly BSD.
  2. 1 2 3 Jonathan Lemon (2001-05-01). Kqueue: A generic and scalable event notification facility (PDF). Proceedings of the FREENIX Track: 2001 USENIX Annual Technical Conference. USENIX (published June 25–30, 2001).
  3. "Connection processing methods". nginx.org.
  4. Andrew Alexeev (2012). "§14. nginx". In Amy Brown; Greg Wilson (eds.). The Architecture of Open Source Applications, Volume II: Structure, Scale and a Few More Fearless Hacks. Lulu.com. ISBN   9781105571817.
  5. libkqueue on GitHub
  6. kqueue(2)    FreeBSD System Calls Manual