Cpio

Last updated
cpio
Original author(s) Dick Haight
Developer(s) AT&T Bell Laboratories
Initial release1977;46 years ago (1977)
Operating system Unix and Unix-like
Type Command
License GNU cpio: GPLv3
libarchive bsdcpio: New BSD License
cpio
Filename extension
.cpio
Internet media type
application/x-cpio
Uniform Type Identifier (UTI) public.cpio-archive
Type of format File archiver

cpio is a general file archiver utility and its associated file format. It is primarily installed on Unix-like computer operating systems. The software utility was originally intended as a tape archiving program as part of the Programmer's Workbench (PWB/UNIX), and has been a component of virtually every Unix operating system released thereafter. Its name is derived from the phrase copy in and out, in close description of the program's use of standard input and standard output in its operation.

Contents

All variants of Unix also support other backup and archiving programs, such as tar, which has become more widely recognized. [1] The use of cpio by the RPM Package Manager, in the initramfs program of Linux kernel 2.6, and in Apple's Installer (pax) make cpio an important archiving tool.

Since its original design, cpio and its archive file format have undergone several, sometimes incompatible, revisions. Most notable is the change, now an operational option, from the use of a binary format of archive file meta information to an ASCII-based representation.

The POSIX standard abandoned cpio in favor of pax.[ citation needed ]

History

cpio appeared in Version 7 Unix as part of the Programmer's Workbench project. [2]

Operation and archive format

cpio was originally designed to store backup file archives on a tape device in a sequential, contiguous manner. It does not compress any content, but resulting archives are often compressed using gzip or other external compressors.

Archive creation

When creating archives during the copy-out operation, initiated with the -o command line flag, cpio reads file and directory path names from its standard input channel and writes the resulting archive byte stream to its standard output. Cpio is therefore typically used with other utilities that generate the list of files to be archived, such as the find program.

The resulting cpio archive is a sequence of files and directories concatenated into a single archive, separated by header sections with file meta information, such as filename, inode number, ownership, permissions, and timestamps. By convention, the file name of an archive is usually given the file extension cpio.

This example uses the find utility to generate a list of path names starting in the current directory to create an archive of the directory tree:

$ find.-depth-print|cpio-o>/path/archive.cpio 

Extraction

During the copy-in operation, initiated by the command line flag i, cpio reads an archive from its standard input and recreates the archived files in the operating system's file system.

$ cpio-i-vd<archive.cpio 

Command line flag d tells cpio to construct directories as necessary. Flag v (verbose) lists file names as they are extracted.

Any remaining command line arguments other than the option flags are shell-like globbing-patterns; only files in the archive with matching names are copied from the archive. The following example extracts the file /etc/fstab from the archive:

$ cpio-i-d/etc/fstab<archive.cpio 

List

The files contained in a cpio archive may be listed with this invocation:

$ cpio-t<archive.cpio 

List may be useful since a cpio archive may contain absolute rather than relative paths (e.g., /bin/ls vs. bin/ls).

Copy

Cpio supports a third type of operation which copies files. It is initiated with the pass-through option flag (p). This mode combines the copy-out and copy-in steps without actually creating any file archive. In this mode, cpio reads path names on standard input like the copy-out operation, but instead of creating an archive, it recreates the directories and files at a different location in the file system, as specified by the path given as a command line argument.

This example copies the directory tree starting at the current directory to another path new-path in the file system, preserving files modification times (flag m), creating directories as needed (d), replacing any existing files unconditionally (u), while producing a progress listing on standard output (v):

$ find.-depth-print|cpio-p-dumvnew-path 

POSIX standardization

The cpio utility is standardized in POSIX.1-1988, but was omitted from POSIX.1-2001 because of its file size and other limitations. For example, the GNU version offers various output format options, such as "bin" (default, and obsolete) and "ustar" (POSIX tar), having a file size limitations of 2,147,483,647 bytes (2 GB) and 8,589,934,591 bytes (8 GB), respectively. [3]

The cpio, ustar, and pax file formats are defined by POSIX.1-2001 for the pax utility, which is currently POSIX 1003.1-2008 compliant, and so it can read and write cpio and ustar formatted archives.

Implementations

Most Linux distributions provide the GNU version of cpio. [4] FreeBSD and macOS use the BSD-licensed bsdcpio provided with libarchive. [5]

See also

Related Research Articles

The Single UNIX Specification (SUS) is a standard for computer operating systems, compliance with which is required to qualify for using the "UNIX" trademark. The standard specifies programming interfaces for the C language, a command-line shell, and user commands. The core specifications of the SUS known as Base Specifications are developed and maintained by the Austin Group, which is a joint working group of IEEE, ISO/IEC JTC 1/SC 22/WG 15 and The Open Group. If an operating system is submitted to The Open Group for certification, and passes conformance tests, then it is deemed to be compliant with a UNIX standard such as UNIX 98 or UNIX 03.

In computing, tar is a computer software utility for collecting many files into one archive file, often referred to as a tarball, for distribution or backup purposes. The name is derived from "tape archive", as it was originally developed to write data to sequential I/O devices with no file system of their own. The archive data sets created by tar contain various file system parameters, such as name, timestamps, ownership, file-access permissions, and directory organization. POSIX abandoned tar in favor of pax, yet tar sees continued widespread use.

ls Command to list files and directories in Unix and Unix-like operating systems

In computing, ls is a command to list computer files and directories in Unix and Unix-like operating systems. It is specified by POSIX and the Single UNIX Specification.

compress is a Unix shell compression program based on the LZW compression algorithm. Compared to gzip's fastest setting, compress is slightly slower at compression, slightly faster at decompression, and has a significantly lower compression ratio. 1.8 MiB of memory is used to compress the Hutter Prize data, slightly more than gzip's slowest setting.

pwd Directory information command on various operating systems

In Unix-like and some other operating systems, the pwd command writes the full pathname of the current working directory to the standard output.

ln (Unix) Unix file management utility

The ln command is a standard Unix command utility used to create a hard link or a symbolic link (symlink) to an existing file or directory. The use of a hard link allows multiple filenames to be associated with the same file since a hard link points to the inode of a given file, the data of which is stored on disk. On the other hand, symbolic links are special files that refer to other files by name.

The archiver, also known simply as ar, is a Unix utility that maintains groups of files as a single archive file. Today, ar is generally used only to create and update static library files that the link editor or linker uses and for generating .deb packages for the Debian family; it can be used to create archives for any purpose, but has been largely replaced by tar for purposes other than static libraries. An implementation of ar is included as one of the GNU Binutils.

xargs is a command on Unix and most Unix-like operating systems used to build and execute commands from standard input. It converts input from standard input into arguments to a command.

dd is a command-line utility for Unix, Plan 9, Inferno, and Unix-like operating systems and beyond, the primary purpose of which is to convert and copy files. On Unix, device drivers for hardware and special device files appear in the file system just like normal files; dd can also read and/or write from/to these files, provided that function is implemented in their respective driver. As a result, dd can be used for tasks such as backing up the boot sector of a hard drive, and obtaining a fixed amount of random data. The dd program can also perform conversions on the data as it is copied, including byte order swapping and conversion to and from the ASCII and EBCDIC text encodings.

pax is an archiving utility available for various operating systems and defined since 1995. Rather than sort out the incompatible options that have crept up between tar and cpio, along with their implementations across various versions of Unix, the IEEE designed new archive utility pax that could support various archive formats with useful options from both archivers. The pax command is available on Unix and Unix-like operating systems and on IBM i, and Microsoft Windows NT until Windows 2000.

cp (Unix) Unix command utility

In computing, cp is a command in various Unix and Unix-like operating systems for copying files and directories. The command has three principal modes of operation, expressed by the types of arguments presented to the program for copying a file to another file, one or more files to a directory, or for copying entire directories to another directory.

In computing, cut is a command line utility on Unix and Unix-like operating systems which is used to extract sections from each line of input — usually from a file. It is currently part of the GNU coreutils package and the BSD Base System.

cksum Unix command

cksum is a command in Unix and Unix-like operating systems that generates a checksum value for a file or stream of data. The cksum command reads each file given in its arguments, or standard input if no arguments are provided, and outputs the file's 32-bit cyclic redundancy check (CRC) checksum and byte count. The CRC output by cksum is different from the CRC-32 used in zip, PNG and zlib.

df (Unix) Standard Unix command

df is a standard Unix command used to display the amount of available disk space for file systems on which the invoking user has appropriate read access. df is typically implemented using the statfs or statvfs system calls.

mv is a Unix command that moves one or more files or directories from one place to another. If both filenames are on the same filesystem, this results in a simple file rename; otherwise the file content is copied to the new location and the old file is removed. Using mv requires the user to have write permission for the directories the file will move between. This is because mv changes the content of both directories involved in the move. When using the mv command on files located on the same filesystem, the file's timestamp is not updated.

In Unix-like and some other operating systems, find is a command-line utility that locates files based on some user-specified criteria and either prints the pathname of each matched object or, if another action is requested, performs that action on each matched object.

In computing, tee is a command in command-line interpreters (shells) using standard streams which reads standard input and writes it to both standard output and one or more files, effectively duplicating its input. It is primarily used in conjunction with pipes and filters. The command is named after the T-splitter used in plumbing.

sum is a legacy utility available on some Unix and Unix-like operating systems. This utility outputs a 16-bit checksum of each argument file, as well as the number of blocks they take on disk. Two different checksum algorithms are in use. POSIX abandoned sum in favor of cksum.

cat (Unix) Unix command utility

cat is a standard Unix utility that reads files sequentially, writing them to standard output. The name is derived from its function to (con)catenate files. It has been ported to a number of operating systems.

libarchive is a free and open-source library for reading and writing various archive and compression formats. It is written in C and works on most Unix-like systems and Windows.

References

  1. Peek, J; O'Reilly, T; Loukides, M (1997). Unix Power Tools. O'Reilly & Associates, Inc. p. 38.13. ISBN   1-565-92260-3.
  2. McIlroy, M. D. (1987). A Research Unix reader: annotated excerpts from the Programmer's Manual, 1971–1986 (PDF) (Technical report). CSTR. Bell Labs. 139.
  3. cpio info document, in the Options node, bsdcpio manual page
  4. "Cpio". GNU.org.
  5. "libarchive".