File (command)

Last updated
file
Developer(s) AT&T Bell Laboratories
Initial release1973 (1973) as part of Unix Research Version 4; 1986 (1986) open-source reimplementation
Stable release
5.46 [1]   OOjs UI icon edit-ltr-progressive.svg / 27 November 2024;21 days ago (27 November 2024)
Repository github.com/file/file
Written in C
Operating system Unix, Unix-like, Plan 9, IBM i
Platform Cross-platform
Type File type detector
License BSD license, CDDL
Plan 9: MIT License
Website darwinsys.com/file/

The file command is a standard program of Unix and Unix-like operating systems for recognizing the type of data contained in a computer file.

Contents

History

The original version of file originated in Unix Research Version 4 [2] in 1973. System V brought a major update with several important changes, most notably moving the file type information into an external text file rather than compiling it into the binary itself.

Most major BSD and Linux distributions use a free, open-source reimplementation which was written in 1986–87 by Ian Darwin [3] from scratch; it keeps file type information in a text file with a format based on that of the System V version. It was expanded by Geoff Collyer in 1989 and since then has had input from many others, including Guy Harris, Chris Lowth and Eric Fischer; from late 1993 onward its maintenance has been organized by Christos Zoulas. The OpenBSD system has its own subset implementation written from scratch, but still uses the Darwin/Zoulas collection of magic file formatted information.

The file command has also been ported to the IBM i operating system. [4]

Specification

The Single UNIX Specification (SUS) specifies that a series of tests are performed on the file specified on the command line:

  1. if the file cannot be read, or its Unix file type is undetermined, the file program will indicate that the file was processed but its type was undetermined.
  2. file must be able to determine the types directory, FIFO, socket, block special file, and character special file
  3. zero-length files are identified as such
  4. an initial part of file is considered and file is to use position-sensitive tests
  5. the entire file is considered and file is to use context-sensitive tests
  6. the file is identified as a data file

file's position-sensitive tests are normally implemented by matching various locations within the file against a textual database of magic numbers (see the Usage section). This differs from other simpler methods such as file extensions and schemes like MIME.

In the System V implementation, the Ian Darwin implementation, and the OpenBSD implementation, the file command uses a database to drive the probing of the lead bytes. That database is implemented in a file called magic, whose location is usually in /etc/magic, /usr/share/file/magic or a similar location.

Usage

The SUS [5] mandates the following options:

Other Unix and Unix-like operating systems may add extra options than these. Ian Darwin's implementation adds -s 'special files', -k 'keep-going' or -r 'raw' (examples below), among many others. [6]

The command tells only what the file looks like, not what it is (in the case where file looks at the content). It is easy to fool the program by putting a magic number into a file the content of which does not match it. Thus the command is not usable as a security tool other than in specific situations.

Examples

$ filefile.c file.c: C program text 
$ fileprogram program: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked     (uses shared libs), stripped 
$ file /dev/hda1 /dev/hda1: block special (0/0)
$ file-s/dev/hda1 /dev/hda1: Linux/i386 ext2 filesystem 

Note that -s is a non-standard option available only on the Ian Darwin branch, which tells file to read device files and try to identify their contents rather than merely identifying them as device files. Normally file does not try to read device files since reading such a file can have undesirable side effects.

$ file-k-rlibmagic-dev_5.35-4_armhf.deb# (on Linux) libmagic-dev_5.35-4_armhf.deb: Debian binary package (format 2.0) - current ar archive - data

Through Ian Darwin's non-standard option -k the program does not stop after the first hit found, but looks for other matching patterns. The -r option, which is available in some versions, causes the unprintable new line character to be displayed in its raw form rather than in its octal representation.

$ filecompressed.gz compressed.gz: gzip compressed data, deflated, original filename, `compressed', last     modified: Thu Jan 26 14:08:23 2006, os: Unix
$ file-icompressed.gz# (on Linux) compressed.gz: application/x-gzip; charset=binary 
$ filedata.ppm data.ppm: Netpbm PPM "rawbits" image data
$ file/bin/cat /bin/cat: Mach-O universal binary with 2 architectures /bin/cat (for architecture ppc7400): Mach-O executable ppc /bin/cat (for architecture i386): Mach-O executable i386
$ file/usr/bin/vi /usr/bin/vi: symbolic link to vim

Identifying symbolic links is not available on all platforms and will be dereferenced if -L is passed or POSIXLY_CORRECT is set.

Libmagic library

As of version 4.00 of the Ian Darwin/Christos Zoulas version of file, the functionality of file is incorporated into a libmagic library that is accessible via C (and C-compatible) linking; [8] [9] file is implemented using that library. [10] [11]

Related Research Articles

gzip GNU file compression/decompression tool

gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems, and intended for use by GNU. Version 0.1 was first publicly released on 31 October 1992, and version 1.0 followed in February 1993.

fsck System tool for checking the consistency of a file system

The system utility fsck is a tool for checking the consistency of a file system in Unix and Unix-like operating systems, such as Linux, macOS, and FreeBSD. The equivalent programs on MS-DOS and Microsoft Windows are CHKDSK, SFC, and SCANDISK.

man page Unix software documentation

A man page is a form of software documentation found on Unix and Unix-like operating systems. Topics covered include programs, system libraries, system calls, and sometimes local system details. The local host administrators can create and install manual pages associated with the specific host. A manual end user may invoke a documentation page by issuing the man command followed by the specific detail they require. These manual pages are typically requested by end users, programmers and administrators doing real time work but can also be formatted for printing.

In computing, tar is a computer software utility for collecting many files into one archive file, often referred to as a tarball, for distribution or backup purposes. The name is derived from "tape archive", as it was originally developed to write data to sequential I/O devices with no file system of their own, such as devices that use magnetic tape. The archive data sets created by tar contain various file system parameters, such as name, timestamps, ownership, file-access permissions, and directory organization. POSIX abandoned tar in favor of pax, yet tar sees continued widespread use.

ls Command to list files and directories in Unix and Unix-like operating systems

In computing, ls is a command to list computer files and directories in Unix and Unix-like operating systems. It is specified by POSIX and the Single UNIX Specification.

In computing, a symbolic link is a file whose purpose is to point to a file or directory by specifying a path thereto.

<span class="mw-page-title-main">GoboLinux</span> Linux distribution

GoboLinux is a Linux distribution whose most prominent feature is a reorganization of the traditional Linux file system. Rather than following the Filesystem Hierarchy Standard like most Unix-like systems, each program in a GoboLinux system has its own subdirectory tree, where all of its files may be found. Thus, a program "Foo" has all of its specific files and libraries in /Programs/Foo, under the corresponding version of this program at hand. For example, the commonly known GCC compiler suite version 8.1.0, would reside under the directory /Programs/GCC/8.1.0.

uname Standard UNIX utility that prints name and other details about the machine

uname is a computer program in Unix and Unix-like computer operating systems that prints the name, version and other details about the current machine and the operating system running on it.

The Filesystem Hierarchy Standard (FHS) is a reference describing the conventions used for the layout of Unix-like systems. It has been made popular by its use in Linux distributions, but it is used by other Unix-like systems as well. It is maintained by the Linux Foundation. The latest version is 3.0, released on 3 June 2015.

ln (Unix) Unix file management utility

The ln command is a standard Unix command utility used to create a hard link or a symbolic link (symlink) to an existing file or directory. The use of a hard link allows multiple filenames to be associated with the same file since a hard link points to the inode of a given file, the data of which is stored on disk. On the other hand, symbolic links are special files that refer to other files by name.

pax is an archiving utility available for various operating systems and defined since 1995. Rather than sort out the incompatible options that have crept up between tar and cpio, along with their implementations across various versions of Unix, the IEEE designed a new archive utility pax that could support various archive formats with useful options from both archivers. The pax command is available on Unix and Unix-like operating systems and on IBM i, and Microsoft Windows NT until Windows 2000.

cp (Unix) Unix command utility

In computing, cp is a command in various Unix and Unix-like operating systems for copying files and directories. The command has three principal modes of operation, expressed by the types of arguments presented to the program for copying a file to another file, one or more files to a directory, or for copying entire directories to another directory.

In Unix-like operating systems, find is a command-line utility that locates files based on some user-specified criteria and either prints the pathname of each matched object or, if another action is requested, performs that action on each matched object.

The proc filesystem (procfs) is a special filesystem in Unix-like operating systems that presents information about processes and other system information in a hierarchical file-like structure, providing a more convenient and standardized method for dynamically accessing process data held in the kernel than traditional tracing methods or direct access to kernel memory. Typically, it is mapped to a mount point named /proc at boot time. The proc file system acts as an interface to internal data structures about running processes in the kernel. In Linux, it can also be used to obtain information about the kernel and to change certain kernel parameters at runtime (sysctl).

In computing, the sticky bit is a user ownership access right flag that can be assigned to files and directories on Unix-like systems.

In computing, a shebang is the character sequence #!, consisting of the characters number sign and exclamation mark, at the beginning of a script. It is also called sharp-exclamation, sha-bang, hashbang, pound-bang, or hash-pling.

chattr is the command in Linux that allows a user to set certain attributes of a file. lsattr is the command that displays the attributes of a file.

In Unix-like operating systems, a device file, device node, or special file is an interface to a device driver that appears in a file system as if it were an ordinary file. There are also special files in DOS, OS/2, and Windows. These special files allow an application program to interact with a device by using its device driver via standard input/output system calls. Using standard system calls simplifies many programming tasks, and leads to consistent user-space I/O mechanisms regardless of device features and functions.

<span class="mw-page-title-main">Unix filesystem</span> Directory structure used by a Unix-like operating system

In Unix and operating systems inspired by it, the file system is considered a central component of the operating system. It was also one of the first parts of the system to be designed and implemented by Ken Thompson in the first experimental version of Unix, dated 1969.

"Everything is a file" is an approach to interface design in Unix derivatives. While this turn of phrase does not as such figure as a Unix design principle or philosophy, it is a common way to analyse designs, and informs the design of new interfaces in a way that prefers, in rough order of import:

  1. representing objects as file descriptors in favour of alternatives like abstract handles or names,
  2. operating on the objects with standard input/output operations returning byte streams to be interpreted by applications, and
  3. allowing the usage or creation of objects by opening or creating files in the global filesystem name space.

References

  1. "[File] FIle 5.46 is now available". 27 November 2024. Retrieved 28 November 2024.
  2. "Source of the UNIX V4 "file" man page". Archived from the original on 2019-12-10. Retrieved 2022-03-13.
  3. The early history of this program is recorded in its private CVS repository; see Archived 2017-04-01 at the Wayback Machine the log of the main program
  4. "IBM System i Version 7.2 Programming Qshell" (PDF). IBM . Archived (PDF) from the original on 2021-03-05. Retrieved 2020-09-05.
  5. "The Open Group Base Specifications Issue 7 — file command". Archived from the original on 2018-10-12. Retrieved 2014-08-20.
  6. 1 2 file(1)    Linux User Manual – User Commands
  7. file(1)    NetBSD General Commands Manual
  8. libmagic(3)    Linux Programmer's Manual – Library Functions
  9. libmagic(3)    NetBSD Library Functions Manual
  10. Zoulas, Christos (February 27, 2003). "file-3.41 is now available". File (Mailing list). Archived from the original on March 4, 2016. Retrieved January 1, 2013.
  11. Zoulas, Christos (March 24, 2003). "file-4.00 is now available". File (Mailing list). Archived from the original on December 28, 2016. Retrieved January 1, 2013.

Manual pages

Other