Unix file types

Last updated

The seven standard Unix file types are regular, directory, symbolic link, FIFO special, block special, character special, and socket as defined by POSIX. [1] Different OS-specific implementations allow more types than what POSIX requires (e.g. Solaris doors). A file's type can be identified by the ls -l command, which displays the type in the first character of the file-system permissions field.

Contents

For regular files, Unix does not impose or provide any internal file structure; therefore, their structure and interpretation is entirely dependent on the software using them. [2] However, the file command can usually be used to determine what type of data they contain. [3]

Representations

Numeric

In the stat structure, file type and permissions (the mode) are stored together in a st_mode bit field, which has a size of at least 12 bits (3 bits to specify the type among the seven possible types of files; 9 bits for permissions). The layout for permissions is defined by POSIX to be at the least-significant 9 bits, but the rest is undefined. [1]

By convention, the mode is a 16-bit value written out as a six-digit octal number without a leading zero. The format part occupies the lead 4-bits (2 octal digits), and "010" (1000 in binary) usually stands for a regular file. The next 3 bits (1 digit) are usually used for setuid, setgid, and sticky. The last part is already defined by POSIX to contain the permission. An example is "100644" for a typical file. This format can be seen in git, tar, and ar, among other places. [4]

The type of a file can be tested using macros like S_ISDIR. Such a check is usually performed by masking the mode with S_IFMT (often the octal number "170000" for the lead 4 bits convention) and checking whether the result matches S_IFDIR. S_IFMT is not a core POSIX concept, but a X/Open System Interfaces (XSI) extension; systems conforming to only POSIX may use some other methods. [1]

Mode string

Take for example one line in the ls -l output:

drwxr-xr-x 2 root root     0 Jan  1  1970 home

POSIX specifies [5] the format of the output for the long format (-l option). In particular, the first field (before the first space) is dubbed the "file mode string", here drwxr-xr-x. Its first character describes the file type, here d (directory). The rest of this string indicates the file permissions.

Examples of implementations

The GNU coreutils version of ls uses a call to filemode(), a glibc function (exposed in the gnulib library [6] ) to get the mode string.

FreeBSD uses a simpler approach but allows a smaller number of file types. [7]

Directory

The most common special file is the directory. The layout of a directory file is defined by the filesystem used. As several filesystems are available under Unix, both native and non-native, there is no one directory file layout.

A directory is marked with a d as the first letter in the mode field in the output of ls -dl [5] or stat, e.g.

$ ls -dl / drwxr-xr-x 26 root root 4096 Sep 22 09:29 /  $ stat /   File: "/"   Size: 4096            Blocks: 8          IO Block: 4096   directory Device: 802h/2050d      Inode: 128         Links: 26 Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root) ...

A symbolic link is a reference to another file. This special file is stored as a textual representation of the referenced file's path (which means the destination may be a relative path, or may not exist at all).

A symbolic link is marked with an l (lower case L) as the first letter of the mode string, e.g. in this abbreviated ls -l output: [5]

lrwxrwxrwx ... termcap -> /usr/share/misc/termcap lrwxrwxrwx ... S03xinetd -> ../init.d/xinetd

FIFO (named pipe)

One of the strengths of Unix has always been inter-process communication. Among the facilities provided by the OS are pipes, which connect the output of one process to the input of another. This is fine if both processes exist in the same parent process space, started by the same user, but there are circumstances where the communicating processes must use FIFOs, here referred to as named pipes. One such circumstance occurs when the processes must be executed under different user names and permissions.

Named pipes are special files that can exist anywhere in the file system. They can be created with the command mkfifo as in mkfifo mypipe.

A named pipe is marked with a p as the first letter of the mode string, e.g. in this abbreviated ls -l output: [5]

prw-rw---- ... mypipe

Socket

A socket is a special file used for inter-process communication, which enables communication between two processes. In addition to sending data, processes can send file descriptors across a Unix domain socket connection using the sendmsg() and recvmsg() system calls.

Unlike named pipes which allow only unidirectional data flow, sockets are fully duplex-capable.

A socket is marked with an s as the first letter of the mode string, e.g.

srwxrwxrwx /tmp/.X11-unix/X0

Device file (block, character)

In Unix, almost all things are handled as files and have a location in the file system, even hardware devices like hard drives. The great exception is network devices, which do not turn up in the file system but are handled separately.

Device files are used to apply access rights to the devices and to direct operations on the files to the appropriate device drivers.

Unix makes a distinction between character devices and block devices. The distinction is roughly as follows:

Although, for example, disk partitions may have both character devices that provide un-buffered random access to blocks on the partition and block devices that provide buffered random access to blocks on the partition.

A character device is marked with a c as the first letter of the mode string and a block device is marked with a b, e.g. in this abbreviated ls -l output: [5]

crw-rw-rw- ... /dev/null brw-rw---- ... /dev/sda 

Related Research Articles

<span class="mw-page-title-main">Shell script</span> Script written for the shell, or command line interpreter, of an operating system

A shell script is a computer program designed to be run by a Unix shell, a command-line interpreter. The various dialects of shell scripts are considered to be scripting languages. Typical operations performed by shell scripts include file manipulation, program execution, and printing text. A script which sets up the environment, runs the program, and does any necessary cleanup or logging, is called a wrapper.

In Unix and Unix-like operating systems, chmod is the command and system call used to change the access permissions and the special mode flags of file system objects. Collectively these were originally called its modes, and the name chmod was chosen as an abbreviation of change mode.

In computing, tar is a computer software utility for collecting many files into one archive file, often referred to as a tarball, for distribution or backup purposes. The name is derived from "tape archive", as it was originally developed to write data to sequential I/O devices with no file system of their own, such as devices that use magnetic tape. The archive data sets created by tar contain various file system parameters, such as name, timestamps, ownership, file-access permissions, and directory organization. POSIX abandoned tar in favor of pax, yet tar sees continued widespread use.

ls Command to list files and directories in Unix and Unix-like operating systems

In computing, ls is a command to list computer files and directories in Unix and Unix-like operating systems. It is specified by POSIX and the Single UNIX Specification.

In computing, a symbolic link is a file whose purpose is to point to a file or directory by specifying a path thereto.

errno.h is a header file in the standard library of the C programming language. It defines macros for reporting and retrieving error conditions using the symbol errno.

The inode is a data structure in a Unix-style file system that describes a file-system object such as a file or a directory. Each inode stores the attributes and disk block locations of the object's data. File-system object attributes may include metadata, as well as owner and permission data.

In Unix and Unix-like computer operating systems, a file descriptor is a process-unique identifier (handle) for a file or other input/output resource, such as a pipe or network socket.

stat (system call) Unix system call

stat is a Unix system call that returns file attributes about an inode. The semantics of stat vary between operating systems. As an example, Unix command ls uses this system call to retrieve information on files that includes:

In computing, umask is a command that determines the settings of a mask that controls how file permissions are set for newly created files. It may also affect how the file permissions are changed explicitly. umask is also a function that sets the mask, or it may refer to the mask itself, which is formally known as the file mode creation mask. The mask is a grouping of bits, each of which restricts how its corresponding permission is set for newly created files. The bits in the mask may be changed by invoking the umask command.

Most file systems include attributes of files and directories that control the ability of users to read, change, navigate, and execute the contents of the file system. In some cases, menu options or functions may be made visible or hidden depending on a user's permission level; this kind of user interface is referred to as permission-driven.

Multi-Environment Real-Time (MERT), later renamed UNIX Real-Time (UNIX-RT), is a hybrid time-sharing and real-time operating system developed in the 1970s at Bell Labs for use in embedded minicomputers. A version named Duplex Multi Environment Real Time (DMERT) was the operating system for the AT&T 3B20D telephone switching minicomputer, designed for high availability; DMERT was later renamed Unix RTR.

file (command) Standard Unix program

The file command is a standard program of Unix and Unix-like operating systems for recognizing the type of data contained in a computer file.

In Unix-like operating systems, find is a command-line utility that locates files based on some user-specified criteria and either prints the pathname of each matched object or, if another action is requested, performs that action on each matched object.

<span class="mw-page-title-main">Comparison of command shells</span>

A command shell is a command-line interface to interact with and manipulate a computer's operating system.

In computing, the sticky bit is a user ownership access right flag that can be assigned to files and directories on Unix-like systems.

In computing, tee is a command in command-line interpreters (shells) using standard streams which reads standard input and writes it to both standard output and one or more files, effectively duplicating its input. It is primarily used in conjunction with pipes and filters. The command is named after the T-splitter used in plumbing.

cpio is a general file archiver utility and its associated file format. It is primarily installed on Unix-like computer operating systems. The software utility was originally intended as a tape archiving program as part of the Programmer's Workbench (PWB/UNIX), and has been a component of virtually every Unix operating system released thereafter. Its name is derived from the phrase copy in and out, in close description of the program's use of standard input and standard output in its operation.

The Unix command fuser is used to show which processes are using a specified computer file, file system, or Unix socket.

Toybox is a free and open-source software implementation of over 200 Unix command line utilities such as ls, cp, and mv. The Toybox project was started in 2006, and became a 0BSD licensed BusyBox alternative. Toybox is used for most of Android's command-line tools in all currently supported Android versions, and is also used to build Android on Linux and macOS. All of the tools are tested on Linux, and many of them also work on BSD and macOS.

References

  1. 1 2 3 "<sys/stat.h>". The Open Group Base Specifications Issue 6. The Open Group. 21 July 2019. Archived from the original on 27 November 2016. Retrieved 10 February 2017.
  2. Loukides, Mike (October 2002). "When Is a File Not a File?". Unix Power Tools (3 ed.). O'Reilly. p. 80. ISBN   9780596003302. A file is nothing more than a stream of bytes ...
  3. "file". IEEE Std 1003.1-2017 (POSIX). The Open Group. 2018. Archived from the original on 2018-10-12. Retrieved 2023-10-26.
  4. Kitt, Stephen. "What file mode is a symlink?". Unix & Linux Stack Exchange.
  5. 1 2 3 4 5 "ls". IEEE Std 1003.1-2008 (POSIX). The Open Group. 11 March 2017. Archived from the original on 3 August 2017. Retrieved 10 February 2017.
  6. "filemode function in GNU coreutils". GNU. 11 March 2017.
  7. "printtype function from FreeBSD". FreeBSD. 11 March 2017.