File descriptor

Last updated

In Unix and Unix-like computer operating systems, a file descriptor (FD, less frequently fildes) is a process-unique identifier (handle) for a file or other input/output resource, such as a pipe or network socket.

Contents

File descriptors typically have non-negative integer values, with negative values being reserved to indicate "no value" or error conditions.

File descriptors are a part of the POSIX API. Each Unix process (except perhaps daemons) should have three standard POSIX file descriptors, corresponding to the three standard streams:

Integer valueName< unistd.h > symbolic constant [1] < stdio.h > file stream [2]
0 Standard input STDIN_FILENOstdin
1 Standard output STDOUT_FILENOstdout
2 Standard error STDERR_FILENOstderr

Overview

File descriptors for a single process, file table and inode table. Note that multiple file descriptors can refer to the same file table entry (e.g., as a result of the dup system call ) and that multiple file table entries can in turn refer to the same inode (if it has been opened multiple times; the table is still simplified because it represents inodes by file names, even though an inode can have multiple names). File descriptor 3 does not refer to anything in the file table, signifying that it has been closed. File table and inode table.svg
File descriptors for a single process, file table and inode table. Note that multiple file descriptors can refer to the same file table entry (e.g., as a result of the dup system call ) and that multiple file table entries can in turn refer to the same inode (if it has been opened multiple times; the table is still simplified because it represents inodes by file names, even though an inode can have multiple names). File descriptor 3 does not refer to anything in the file table, signifying that it has been closed.

In the traditional implementation of Unix, file descriptors index into a per-process file descriptor table maintained by the kernel, that in turn indexes into a system-wide table of files opened by all processes, called the file table. This table records the mode with which the file (or other resource) has been opened: for reading, writing, appending, and possibly other modes. It also indexes into a third table called the inode table that describes the actual underlying files. [3] To perform input or output, the process passes the file descriptor to the kernel through a system call, and the kernel will access the file on behalf of the process. The process does not have direct access to the file or inode tables.

On Linux, the set of file descriptors open in a process can be accessed under the path /proc/PID/fd/, where PID is the process identifier. File descriptor /proc/PID/fd/0 is stdin, /proc/PID/fd/1 is stdout, and /proc/PID/fd/2 is stderr. As a shortcut to these, any running process can also access its own file descriptors through the folders /proc/self/fd and /dev/fd. [4]

In Unix-like systems, file descriptors can refer to any Unix file type named in a file system. As well as regular files, this includes directories, block and character devices (also called "special files"), Unix domain sockets, and named pipes. File descriptors can also refer to other objects that do not normally exist in the file system, such as anonymous pipes and network sockets.

The FILE data structure in the C standard I/O library usually includes a low level file descriptor for the object in question on Unix-like systems. The overall data structure provides additional abstraction and is instead known as a file handle.

Operations on file descriptors

The following lists typical operations on file descriptors on modern Unix-like systems. Most of these functions are declared in the <unistd.h> header, but some are in the <fcntl.h> header instead.

Creating file descriptors

Deriving file descriptors

Operations on a single file descriptor

Operations on multiple file descriptors

Operations on the file descriptor table

The fcntl() function is used to perform various operations on a file descriptor, depending on the command argument passed to it. There are commands to get and set attributes associated with a file descriptor, including F_GETFD, F_SETFD, F_GETFL and F_SETFL.

Operations that modify process state

File locking

Sockets

Miscellaneous

Upcoming operations

A series of new operations on file descriptors has been added to many modern Unix-like systems, as well as numerous C libraries, to be standardized in a future version of POSIX. [7] The at suffix signifies that the function takes an additional first argument supplying a file descriptor from which relative paths are resolved, the forms lacking the at suffix thus becoming equivalent to passing a file descriptor corresponding to the current working directory. The purpose of these new operations is to defend against a certain class of TOCTOU attacks.

File descriptors as capabilities

Unix file descriptors behave in many ways as capabilities. They can be passed between processes across Unix domain sockets using the sendmsg() system call. Note, however, that what is actually passed is a reference to an "open file description" that has mutable state (the file offset, and the file status and access flags). This complicates the secure use of file descriptors as capabilities, since when programs share access to the same open file description, they can interfere with each other's use of it by changing its offset or whether it is blocking or non-blocking, for example. [8] [9] In operating systems that are specifically designed as capability systems, there is very rarely any mutable state associated with a capability itself.

A Unix process' file descriptor table is an example of a C-list.

See also

Related Research Articles

The Portable Operating System Interface is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines both the system and user-level application programming interfaces (APIs), along with command line shells and utility interfaces, for software compatibility (portability) with variants of Unix and other operating systems. POSIX is also a trademark of the IEEE. POSIX is intended to be used by both application and system developers.

Berkeley sockets is an application programming interface (API) for Internet sockets and Unix domain sockets, used for inter-process communication (IPC). It is commonly implemented as a library of linkable modules. It originated with the 4.2BSD Unix operating system, which was released in 1983.

In computing, a symbolic link is a file whose purpose is to point to a file or directory by specifying a path thereto.

In computing, particularly in the context of the Unix operating system and its workalikes, fork is an operation whereby a process creates a copy of itself. It is an interface which is required for compliance with the POSIX and Single UNIX Specification standards. It is usually implemented as a C standard library wrapper to the fork, clone, or other system calls of the kernel. Fork is the primary method of process creation on Unix-like operating systems.

Capability-based security is a concept in the design of secure computing systems, one of the existing security models. A capability is a communicable, unforgeable token of authority. It refers to a value that references an object along with an associated set of access rights. A user program on a capability-based operating system must use a capability to access an object. Capability-based security refers to the principle of designing user programs such that they directly share capabilities with each other according to the principle of least privilege, and to the operating system infrastructure necessary to make such transactions efficient and secure. Capability-based security is to be contrasted with an approach that uses traditional UNIX permissions and Access Control Lists.

In computing, mmap(2) is a POSIX-compliant Unix system call that maps files or devices into memory. It is a method of memory-mapped file I/O. It implements demand paging because file contents are not immediately read from disk and initially use no physical RAM at all. The actual reads from disk are performed after a specific location is accessed, in a lazy manner. After the mapping is no longer needed, the pointers must be unmapped with munmap(2). Protection information—for example, marking mapped regions as executable—can be managed using mprotect(2), and special treatment can be enforced using madvise(2).

stat (system call) Unix system call

stat is a Unix system call that returns file attributes about an inode. The semantics of stat vary between operating systems. As an example, Unix command ls uses this system call to retrieve information on files that includes:

The seven standard Unix file types are regular, directory, symbolic link, FIFO special, block special, character special, and socket as defined by POSIX. Different OS-specific implementations allow more types than what POSIX requires. A file's type can be identified by the ls -l command, which displays the type in the first character of the file-system permissions field.

File locking is a mechanism that restricts access to a computer file, or to a region of a file, by allowing only one user or process to modify or delete it at a specific time and to prevent reading of the file while it's being modified or deleted.

Unix-like operating systems identify a user by a value called a user identifier, often abbreviated to user ID or UID. The UID, along with the group identifier (GID) and other access control criteria, is used to determine which system resources a user can access. The password file maps textual user names to UIDs. UIDs are stored in the inodes of the Unix file system, running processes, tar archives, and the now-obsolete Network Information Service. In POSIX-compliant environments, the shell command id gives the current user's UID, as well as more information such as the user name, primary user group and group identifier (GID).

The proc filesystem (procfs) is a special filesystem in Unix-like operating systems that presents information about processes and other system information in a hierarchical file-like structure, providing a more convenient and standardized method for dynamically accessing process data held in the kernel than traditional tracing methods or direct access to kernel memory. Typically, it is mapped to a mount point named /proc at boot time. The proc file system acts as an interface to internal data structures about running processes in the kernel. In Linux, it can also be used to obtain information about the kernel and to change certain kernel parameters at runtime (sysctl).

In computer networking, STREAMS is the native framework in Unix System V for implementing character device drivers, network protocols, and inter-process communication. In this framework, a stream is a chain of coroutines that pass messages between a program and a device driver. STREAMS originated in Version 8 Research Unix, as Streams.

In computer science, the event loop is a programming construct or design pattern that waits for and dispatches events or messages in a program. The event loop works by making a request to some internal or external "event provider", then calls the relevant event handler. The event loop is also sometimes referred to as the message dispatcher, message loop, message pump, or run loop.

A Unix domain socket aka UDS or IPC socket is a data communications endpoint for exchanging data between processes executing on the same host operating system. It is also referred to by its address family AF_UNIX. Valid socket types in the UNIX domain are:

select is a system call and application programming interface (API) in Unix-like and POSIX-compliant operating systems for examining the status of file descriptors of open input/output channels. The select system call is similar to the poll facility introduced in UNIX System V and later operating systems. However, with the c10k problem, both select and poll have been superseded by the likes of kqueue, epoll, /dev/poll and I/O completion ports.

Kqueue is a scalable event notification interface introduced in FreeBSD 4.1 in July 2000, also supported in NetBSD, OpenBSD, DragonFly BSD, and macOS. Kqueue was originally authored in 2000 by Jonathan Lemon, then involved with the FreeBSD Core Team. Kqueue makes it possible for software like nginx to solve the c10k problem.

epoll is a Linux kernel system call for a scalable I/O event notification mechanism, first introduced in version 2.5.45 of the Linux kernel. Its function is to monitor multiple file descriptors to see whether I/O is possible on any of them. It is meant to replace the older POSIX select(2) and poll(2) system calls, to achieve better performance in more demanding applications, where the number of watched file descriptors is large (unlike the older system calls, which operate in O(n) time, epoll operates in O(1) time).

In Unix-like operating systems, dup and dup2 system calls create a copy of a given file descriptor. This new descriptor actually does not behave like a copy, but like an alias of the old one.

can4linux is an Open Source CAN Linux-Kernel device driver. Development started in the mid-1990s for the Philips 82C200 CAN controller stand alone chip on an ISA Board AT-CAN-MINI. In 1995 the first version was created to use the CAN bus with Linux for laboratory automation as a project of the Linux Lab Project at FU Berlin.

poll is a POSIX system call to wait for one or more file descriptors to become ready for use.

References

  1. The Open Group. "The Open Group Base Specifications Issue 7, IEEE Std 1003.1-2008, 2016 Edition" . Retrieved 2017-09-21.
  2. The Open Group. "The Open Group Base Specifications Issue 7, IEEE Std 1003.1-2008, 2016 Edition". <stdio.h>. Retrieved 2017-09-21.
  3. 1 2 Bach, Maurice J. (1986). The Design of the UNIX Operating System (8 ed.). Prentice-Hall. pp.  92–96. ISBN   9780132017992.
  4. "Devices - What does the output of 'll /Proc/Self/Fd/' (From 'll /Dev/Fd') mean?".
  5. The Open Group. "The Open Group Base Specifications Issue 7, IEEE Std 1003.1-2008, 2018 Edition – creat" . Retrieved 2019-04-11.
  6. Stephen Kitt, Michael Kerrisk. "close_range(2) — Linux manual page" . Retrieved 2021-03-22.
  7. Extended API Set, Part 2 . The Open Group. October 2006. ISBN   1931624674.
  8. Brinkmann, Marcus (2009-02-04). "Building a bridge: library API's and file descriptors?". cap-talk. Archived from the original on 2012-07-30. Retrieved 2017-09-21.
  9. de Boyne Pollard, Jonathan (2007). "Don't set shared file descriptors to non-blocking I/O mode" . Retrieved 2017-09-21.