Ioctl

Last updated

In computing, ioctl (an abbreviation of input/output control) is a system call for device-specific input/output operations and other operations which cannot be expressed by regular file semantics. It takes a parameter specifying a request code; the effect of a call depends completely on the request code. Request codes are often device-specific. For instance, a CD-ROM device driver which can instruct a physical device to eject a disc would provide an ioctl request code to do so. Device-independent request codes are sometimes used to give userspace access to kernel functions which are only used by core system software or still under development.

Contents

The ioctl system call first appeared in Version 7 of Unix under that name. It is supported by most Unix and Unix-like systems, including Linux and macOS, though the available request codes differ from system to system. Microsoft Windows provides a similar function, named "DeviceIoControl", in its Win32 API.

Background

Conventional operating systems can be divided into two layers, userspace and the kernel. Application code such as a text editor resides in userspace, while the underlying facilities of the operating system, such as the network stack, reside in the kernel. Kernel code handles sensitive resources and implements the security and reliability barriers between applications; for this reason, user mode applications are prevented by the operating system from directly accessing kernel resources.

Userspace applications typically make requests to the kernel by means of system calls, whose code lies in the kernel layer. A system call usually takes the form of a "system call vector", in which the desired system call is indicated with an index number. For instance, exit() might be system call number 1, and write() number 4. The system call vector is then used to find the desired kernel function for the request. In this way, conventional operating systems typically provide several hundred system calls to the userspace.

Though an expedient design for accessing standard kernel facilities, system calls are sometimes inappropriate for accessing non-standard hardware peripherals. By necessity, most hardware peripherals (aka devices) are directly addressable only within the kernel. But user code may need to communicate directly with devices; for instance, an administrator might configure the media type on an Ethernet interface. Modern operating systems support diverse devices, many of which offer a large collection of facilities. Some of these facilities may not be foreseen by the kernel designer, and as a consequence it is difficult for a kernel to provide system calls for using the devices.

To solve this problem, the kernel is designed to be extensible, and may accept an extra module called a device driver which runs in kernel space and can directly address the device. An ioctl interface is a single system call by which userspace may communicate with device drivers. Requests on a device driver are vectored with respect to this ioctl system call, typically by a handle to the device and a request number. The basic kernel can thus allow the userspace to access a device driver without knowing anything about the facilities supported by the device, and without needing an unmanageably large collection of system calls.

Uses

Hardware device configuration

A common use of ioctl is to control hardware devices.

For example, on Win32 systems, ioctl calls can communicate with USB devices, or they can discover drive-geometry information of the attached storage-devices.

On OpenBSD and NetBSD, ioctl is used by the bio(4) pseudo-device driver and the bioctl utility to implement RAID volume management in a unified vendor-agnostic interface similar to ifconfig . [1] [2]

On NetBSD, ioctl is also used by the sysmon framework. [3]

Terminals

One use of ioctl in code exposed to end-user applications is terminal I/O.

Unix operating systems have traditionally made heavy use of command-line interfaces, originally with hardware text terminals such as VT100s attached to serial ports, and later with terminal emulators and remote login servers using pseudoterminals. Serial port devices and pseudoterminals are both controlled and configured using ioctl calls. For instance, the display size is set using the TIOCSWINSZ call. The TIOCSTI (terminal I/O control, simulate terminal input) ioctl function can push a character into a device stream. [4]

Kernel extensions

When applications need to extend the kernel, for instance to accelerate network processing, ioctl calls provide a convenient way to bridge userspace code to kernel extensions. Kernel extensions can provide a location in the filesystem that can be opened by name, through which an arbitrary number of ioctl calls can be dispatched, allowing the extension to be programmed without adding system calls to the operating system.

sysctl alternative

According to an OpenBSD developer, ioctl and sysctl are the two system calls for extending the kernel, with sysctl possibly being the simpler of the two. [5]

In NetBSD, the sysmon_envsys framework for hardware monitoring uses ioctl through proplib ; whereas OpenBSD and DragonFly BSD instead use sysctl for their corresponding hw.sensors framework. The original revision of envsys in NetBSD was implemented with ioctl before proplib was available, and had a message suggesting that the framework is experimental, and should be replaced by a sysctl(8) interface, should one be developed, [6] [7] which potentially explains the choice of sysctl in OpenBSD with its subsequent introduction of hw.sensors in 2003. However, when the envsys framework was redesigned in 2007 around proplib, the system call remained as ioctl, and the message was removed. [8]

Implementations

Unix

The ioctl system call first appeared in Version 7 Unix, as a replacement for the stty [9] and gtty system calls, with an additional request code argument. An ioctl call takes as parameters:

  1. an open file descriptor
  2. a request code number
  3. an untyped pointer to data (either going to the driver, coming back from the driver, or both).

The kernel generally dispatches an ioctl call straight to the device driver, which can interpret the request number and data in whatever way required. The writers of each driver document request numbers for that particular driver and provide them as constants in a header file.

Request numbers usually combine a code identifying the device or class of devices for which the request is intended and a number indicating the particular request; the code identifying the device or class of devices is usually a single ASCII character. Some Unix systems, including 4.2BSD and later BSD releases, operating systems derived from those releases, and Linux, have conventions that also encode within the request number the size of the data to be transferred to/from the device driver and the direction of the data transfer. Regardless of whether any such conventions are followed, the kernel and the driver collaborate to deliver a uniform error code (denoted by the symbolic constant ENOTTY) to an application which makes a request of a driver which does not recognise it.

The mnemonic ENOTTY (traditionally associated with the textual message " Not a typewriter ") derives from the earliest systems that incorporated an ioctl call, where only the teletype (tty) device raised this error. Though the symbolic mnemonic is fixed by compatibility requirements, some modern systems more helpfully render a more general message such as "Inappropriate device control operation" (or a localization thereof).

TCSETS exemplifies an ioctl call on a serial port. The normal read and write calls on a serial port receive and send data bytes. An ioctl(fd,TCSETS,data) call, separate from such normal I/O, controls various driver options like handling of special characters, or the output signals on the port (such as the DTR signal).

Win32

A Win32 DeviceIoControl takes as parameters:

  1. an open object handle (the Win32 equivalent of a file descriptor)
  2. a request code number (the "control code")
  3. a buffer for input parameters
  4. length of the input buffer
  5. a buffer for output results
  6. length of the output buffer
  7. an OVERLAPPED structure, if overlapped I/O is being used.

The Win32 device control code takes into consideration the mode of the operation being performed.

There are 4 defined modes of operation, impacting the security of the device driver -

  1. METHOD_IN_DIRECT: The buffer address is verified to be readable by the user mode caller.
  2. METHOD_OUT_DIRECT: The buffer address is verified to be writable by the user mode caller.
  3. METHOD_NEITHER: User mode virtual addresses are passed to the driver without mapping or validation.
  4. METHOD_BUFFERED: IO Manager controlled shared buffers are used to move data to and from user mode.

Alternatives

Other vectored call interfaces

Devices and kernel extensions may be linked to userspace using additional new system calls, although this approach is rarely taken, because operating system developers try to keep the system call interface focused and efficient.

On Unix operating systems, two other vectored call interfaces are popular: the fcntl ("file control") system call configures open files, and is used in situations such as enabling non-blocking I/O; and the setsockopt ("set socket option") system call configures open network sockets, a facility used to configure the ipfw packet firewall on BSD Unix systems.

Memory mapping

Unix
Device interfaces and input/output capabilities are sometimes provided using memory-mapped files. Applications that interact with devices open a location on the filesystem corresponding to the device, as they would for an ioctl call, but then use memory mapping system calls to tie a portion of their address space to that of the kernel. This interface is a far more efficient way to provide bulk data transfer between a device and a userspace application; individual ioctl or read/write system calls inflict overhead due to repeated userspace-to-kernel transitions, where access to a memory-mapped range of addresses incurs no such overhead.
Win32
Buffered IO methods or named file mapping objects can be used; however, for simple device drivers the standard DeviceIoControl METHOD_ accesses are sufficient.

Netlink is a socket-like mechanism for inter-process communication (IPC), designed to be a more flexible successor to ioctl.

Implications

Complexity

ioctl calls minimize the complexity of the kernel's system call interface. However, by providing a place for developers to "stash" bits and pieces of kernel programming interfaces, ioctl calls complicate the overall user-to-kernel API. A kernel that provides several hundred system calls may provide several thousand ioctl calls.

Though the interface to ioctl calls appears somewhat different from conventional system calls, there is in practice little difference between an ioctl call and a system call; an ioctl call is simply a system call with a different dispatching mechanism. Many of the arguments against expanding the kernel system call interface could therefore be applied to ioctl interfaces.

To application developers, system calls appear no different from application subroutines; they are simply function calls that take arguments and return values. The core libraries (e.g. libc ) mask the complexity involved in invoking system calls. The same is true for ioctls, where driver interfaces usually come with a user space library. (E.g. Mesa for the Direct Rendering Infrastructure of graphics drivers.)

Libpcap and libdnet are two examples of third-party wrapper Unix libraries designed to mask the complexity of ioctl interfaces, for packet capture and packet I/O, respectively.

Security

In traditional design, kernels resided in ring 0, separated from device drivers in ring 1, and in microkernels, also from each other. This has largely been given up due adding the same overhead of transitioning between rings to driver/kernel interfaces, that syscalls impose on kernel/user space interfaces. This has led to the difficult-in-practice requirement that all drivers, which now reside in ring 0 as well, must uphold the same level of security as the kernel core.

While the user-to-kernel interfaces of mainstream operating systems are often audited heavily for code flaws and security vulnerabilities prior to release, these audits typically focus on the well-documented system call interfaces. For instance, auditors might ensure that sensitive security calls such as changing user IDs are only available to administrative users.
Because the handler for an ioctl call also resides directly in ring 0, the input from userspace should be validated just as carefully. As vulnerabilities in device drivers can be exploited by local users, e.g. by passing invalid buffers to ioctl calls.
In practice, this is not the case. ioctl interfaces are larger, more diverse, and less well defined, and thus harder to audit than system calls. Furthermore, because ioctl calls can be provided by third-party developers, often after the core operating system has been released, ioctl call implementations may generally receive less scrutiny and thus harbor more vulnerabilities. Finally, some ioctl calls, particularly for third-party device drivers, can be entirely undocumented.

Varying fixes for this have been created, with the goal of achieving an equivalent to the former security, while keeping the gained speed.
Win32 and Unix operating systems can protect a userspace device name from access by applications with specific access controls applied to the device. Security problems can arise when device driver developers do not apply appropriate access controls to the userspace accessible object.
Some modern operating systems protect the kernel from hostile userspace code (such as applications that have been infected by buffer overflow exploits) using system call wrappers. System call wrappers implement role-based access control by specifying which system calls can be invoked by which applications; wrappers can, for instance, be used to "revoke" the right of a mail program to spawn other programs. ioctl interfaces complicate system call wrappers because there are large numbers of them, each taking different arguments, some of which may be required by normal programs.
Furthermore, such solutions negate the gained reduction of overhead.

Further reading

Related Research Articles

<span class="mw-page-title-main">Thread (computing)</span> Smallest sequence of programmed instructions that can be managed independently by a scheduler

In computer science, a thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler, which is typically a part of the operating system. In many cases, a thread is a component of a process.

<span class="mw-page-title-main">System call</span> Way for programs to access kernel services

In computing, a system call is the programmatic way in which a computer program requests a service from the operating system on which it is executed. This may include hardware-related services, creation and execution of new processes, and communication with integral kernel services such as process scheduling. System calls provide an essential interface between a process and the operating system.

The Open Sound System (OSS) is an interface for making and capturing sound in Unix and Unix-like operating systems. It is based on standard Unix devices system calls. The term also sometimes refers to the software in a Unix kernel that provides the OSS interface; it can be thought of as a device driver for sound controller hardware. The goal of OSS is to allow the writing of sound-based applications that are agnostic of the underlying sound hardware.

<span class="mw-page-title-main">Text-based user interface</span> Type of interface based on outputting to or controlling a text display

In computing, text-based user interfaces (TUI), is a retronym describing a type of user interface (UI) common as an early form of human–computer interaction, before the advent of bitmapped displays and modern conventional graphical user interfaces (GUIs). Like modern GUIs, they can use the entire screen area and may accept mouse and other inputs. They may also use color and often structure the display using box-drawing characters such as ┌ and ╣. The modern context of use is usually a terminal emulator.

udev is a device manager for the Linux kernel. As the successor of devfsd and hotplug, udev primarily manages device nodes in the /dev directory. At the same time, udev also handles all user space events raised when hardware devices are added into the system or removed from it, including firmware loading as required by certain devices.

The Direct Rendering Manager (DRM) is a subsystem of the Linux kernel responsible for interfacing with GPUs of modern video cards. DRM exposes an API that user-space programs can use to send commands and data to the GPU and perform operations such as configuring the mode setting of the display. DRM was first developed as the kernel-space component of the X Server Direct Rendering Infrastructure, but since then it has been used by other graphic stack alternatives such as Wayland and standalone applications and libraries such as SDL2 and Kodi.

The proc filesystem (procfs) is a special filesystem in Unix-like operating systems that presents information about processes and other system information in a hierarchical file-like structure, providing a more convenient and standardized method for dynamically accessing process data held in the kernel than traditional tracing methods or direct access to kernel memory. Typically, it is mapped to a mount point named /proc at boot time. The proc file system acts as an interface to internal data structures about running processes in the kernel. In Linux, it can also be used to obtain information about the kernel and to change certain kernel parameters at runtime (sysctl).

<span class="mw-page-title-main">Architecture of Windows NT</span> Structure of the operating system

The architecture of Windows NT, a line of operating systems produced and sold by Microsoft, is a layered design that consists of two main components, user mode and kernel mode. It is a preemptive, reentrant multitasking operating system, which has been designed to work with uniprocessor and symmetrical multiprocessor (SMP)-based computers. To process input/output (I/O) requests, it uses packet-driven I/O, which utilizes I/O request packets (IRPs) and asynchronous I/O. Starting with Windows XP, Microsoft began making 64-bit versions of Windows available; before this, there were only 32-bit versions of these operating systems.

<span class="mw-page-title-main">Linux kernel interfaces</span> An overview and comparison of the Linux kernel API and ABI.

The Linux kernel provides multiple interfaces to user-space and kernel-mode code that are used for varying purposes and that have varying properties by design. There are two types of application programming interface (API) in the Linux kernel:

  1. the "kernel–user space" API; and
  2. the "kernel internal" API.
sysctl Unix-like software that manages kernel attributes

sysctl is a software mechanism in some Unix-like operating systems that reads and modifies the attributes of the system kernel such as its version number, maximum limits, and security settings. It is available both as a system call for compiled programs, and an administrator command for interactive use and scripting. Linux additionally exposes sysctl as a virtual file system.

In computer networking, STREAMS is the native framework in Unix System V for implementing character device drivers, network protocols, and inter-process communication. In this framework, a stream is a chain of coroutines that pass messages between a program and a device driver. STREAMS originated in Version 8 Research Unix, as Streams.

Netlink is a socket family used for inter-process communication (IPC) between both the kernel and userspace processes, and between different userspace processes, in a way similar to the Unix domain sockets available on certain Unix-like operating systems, including its original incarnation as a Linux kernel interface, as well as in the form of a later implementation on FreeBSD. Similarly to the Unix domain sockets, and unlike INET sockets, Netlink communication cannot traverse host boundaries. However, while the Unix domain sockets use the file system namespace, Netlink sockets are usually addressed by process identifiers (PIDs).

A line discipline (LDISC) is a layer in the terminal subsystem in some Unix-like systems. The terminal subsystem consists of three layers: the upper layer to provide the character device interface, the lower hardware driver to communicate with the hardware or pseudo terminal, and the middle line discipline to implement behavior common to terminal devices.

In Unix-like operating systems, a device file, device node, or special file is an interface to a device driver that appears in a file system as if it were an ordinary file. There are also special files in DOS, OS/2, and Windows. These special files allow an application program to interact with a device by using its device driver via standard input/output system calls. Using standard system calls simplifies many programming tasks, and leads to consistent user-space I/O mechanisms regardless of device features and functions.

ethtool is the primary means in Linux kernel-based operating systems for displaying and modifying the parameters of network interface controllers (NICs) and their associated device driver software from application programs running in userspace.

The POSIX terminal interface is the generalized abstraction, comprising both an application programming interface for programs, and a set of behavioural expectations for users of a terminal, as defined by the POSIX standard and the Single Unix Specification. It is a historical development from the terminal interfaces of BSD version 4 and Seventh Edition Unix.

The Seventh Edition Unix terminal interface is the generalized abstraction, comprising both an application programming interface for programs and a set of behavioural expectations for users, of a terminal as historically available in Seventh Edition Unix. It has been largely superseded by the POSIX terminal interface.

<span class="mw-page-title-main">Longene</span> Linux distribution

Longene is a Linux-based operating system kernel intended to be binary compatible with application software and device drivers made for Microsoft Windows and Linux. As of 1.0-rc2, it consists of a Linux kernel module implementing aspects of the Windows kernel and a modified Wine distribution designed to take advantage of the more native interface. Longene is written in the C programming language and is free and open source software. It is licensed under the terms of the GNU General Public License version 2 (GPLv2).

The envsys framework is a kernel-level hardware monitoring sensors framework in NetBSD. As of 4 March 2019, the framework is used by close to 85 device drivers to export various environmental monitoring sensors, as evidenced by references of the sysmon_envsys_register symbol within the sys path of NetBSD; with temperature sensors, ENVSYS_STEMP, being the most likely type to be exported by any given driver. Sensors are registered with the kernel through sysmon_envsys(9) API. Consumption and monitoring of sensors from the userland is performed with the help of envstat utility through proplib(3) through ioctl(2) against the /dev/sysmon pseudo-device file, the powerd power management daemon that responds to kernel events by running scripts from /etc/powerd/scripts/, as well as third-party tools like symon and GKrellM from pkgsrc.

The bio(4) pseudo-device driver and the bioctl(8) utility implement a generic RAID volume management interface in OpenBSD and NetBSD. The idea behind this software is similar to ifconfig, where a single utility from the operating system can be used to control any RAID controller using a generic interface, instead of having to rely on many proprietary and custom RAID management utilities specific for each given hardware RAID manufacturer. Features include monitoring of the health status of the arrays, controlling identification through blinking the LEDs and managing of sound alarms, and specifying hot spare disks. Additionally, the softraid configuration in OpenBSD is delegated to bioctl as well; whereas the initial creation of volumes and configuration of hardware RAID is left to card BIOS as non-essential after the operating system has already been booted. Interfacing between the kernel and userland is performed through the ioctl system call through the /dev/bio pseudo-device.

References

  1. Niklas Hallqvist (2002); Marco Peereboom (2006). "bio(4) — block I/O ioctl tunnel pseudo-device". BSD Cross Reference. OpenBSD.{{cite web}}: CS1 maint: numeric names: authors list (link)
  2. Marco Peereboom (2005). "bioctl(8) — RAID management interface". BSD Cross Reference. OpenBSD.
  3. "sysmon(4) — system monitoring and power management interface". NetBSD. An ioctl(2) interface available via /dev/sysmon.
  4. Christiansen, Tom; Torkington, Nathan (1998). "12: Packages, Libraries, and Modules". Perl Cookbook: Solutions & Examples for Perl Programmers (2 ed.). Sebastopol, California: O'Reilly Media, Inc. (published 2003). p. 482. ISBN   9780596554965 . Retrieved 2016-11-15. [...] TIOCSTI [...] stands for 'terminal I/O control, simulate terminal input.' On systems that implement this function, it will push one character into your device stream so that the next time any process reads from that device, it gets the character you put there.
  5. Federico Biancuzzi (2004-10-28). "OpenBSD 3.6 Live". ONLamp . O'Reilly Media. Archived from the original on 2004-10-29. Retrieved 2019-03-20. There are two system calls that can be used to add functionality to the kernel (without adding yet another system call): ioctl(2) and sysctl(3). The latter was chosen because it was very simple to implement the new feature.
  6. Tim Rightnour; Bill Squier (2007-12-19). "envsys -- Environmental Systems API". NetBSD 4.0. This API is experimental and may be deprecated at any time ... This entire API should be replaced by a sysctl(8) interface or a kernel events mechanism, should one be developed.
  7. Constantine A. Murenin (2007-04-17). "3.5. NetBSD's sysmon(4)". Generalised Interfacing with Microprocessor System Hardware Monitors. Proceedings of 2007 IEEE International Conference on Networking, Sensing and Control, 15–17 April 2007. London, United Kingdom: IEEE. pp. 901–906. doi:10.1109/ICNSC.2007.372901. ISBN   978-1-4244-1076-7. IEEE ICNSC 2007, pp. 901—906.
  8. Constantine A. Murenin (2010-05-21). "6.1. Framework timeline; 7.1. NetBSD envsys / sysmon". OpenBSD Hardware Sensors — Environmental Monitoring and Fan Control (MMath thesis). University of Waterloo: UWSpace. hdl:10012/5234. Document ID: ab71498b6b1a60ff817b29d56997a418.
  9. McIlroy, M. D. (1987). A Research Unix reader: annotated excerpts from the Programmer's Manual, 1971–1986 (PDF) (Technical report). CSTR. Bell Labs. 139. Archived from the original (PDF) on 2023-07-30.