Cat (Unix)

Last updated
cat
Original author(s) Ken Thompson,
Dennis Ritchie
Developer(s) AT&T Bell Laboratories
Initial releaseNovember 3, 1971;52 years ago (1971-11-03)
Operating system Unix, Unix-like, Plan 9, Inferno, ReactOS
Platform Cross-platform
Type Command
License coreutils: GPLv3+
ReactOS: GPLv2+

cat is a standard Unix utility that reads files sequentially, writing them to standard output. The name is derived from its function to (con)catenate files (from Latin catenare, "to chain"). [1] [2] It has been ported to a number of operating systems.

Contents

The other primary purpose of cat, aside from concatenation, is file printing allowing the computer user to view the contents of a file. Printing to files and the terminal are the most common uses of cat. [3]

History

cat was part of the early versions of Unix, e.g., Version 1, and replaced pr, a PDP-7 and Multics utility for copying a single file to the screen. [4] It was written by Ken Thompson and Dennis Ritchie. The version of cat bundled in GNU coreutils was written by Torbjorn Granlund and Richard Stallman. [5] The ReactOS version was written by David Welch, Semyon Novikov, and Hermès Bélusca. [6]

Over time, alternative utilities such as tac and bat also became available, bringing different new features. [7] [8]

Usage

The cat utility serves a dual purpose: concatenating and printing. With a single argument, it is often used to print a file to the user's terminal emulator (or historically to a computer terminal or teletype). With more than one argument, it concatenates several files. The combined result is by default also printed to the terminal, but often users redirect the result into yet another file. [9] Hence printing a single file to the terminal is a special use-case of this concatenation program. Yet, this is its most common use. [3]

The Single Unix Specification defines the operation of cat to read files in the sequence given in its arguments, writing their contents to the standard output in the same sequence. The specification mandates the support of one option flag, u for unbuffered output, meaning that each byte is written after it has been read. Some operating systems, like the ones using GNU Core Utilities, do this by default and ignore the flag. [10]

If one of the input filenames is specified as a single hyphen (-), then cat reads from standard input at that point in the sequence. If no files are specified, cat reads from standard input only.

The command-syntax is:

cat [options] [file_names]

Options

Example of some cat options: [11]

Use cases

cat can be used to pipe a file to a program that expects plain text or binary data on its input stream. cat does not destroy non-text bytes when concatenating and outputting. As such, its two main use cases are text files and certain format-compatible types of binary files.

Concatenation of text is limited to text files using the same legacy encoding, such as ASCII. cat does not provide a way to concatenate Unicode text files that have a Byte Order Mark or files using different text encodings from each other.

For many structured binary data sets, the resulting combined file may not be valid; for example, if a file has a unique header or footer, the result will spuriously duplicate these. However, for some multimedia digital container formats, the resulting file is valid, and so cat provides an effective means of appending files. Video streams can be a significant example of files that cat can concatenate without issue, e.g. the MPEG program stream (MPEG-1 and MPEG-2) and DV (Digital Video) formats, which are fundamentally simple streams of packets.

Examples

CommandExplanation
cat file1.txt Display contents of file
cat file1.txt file2.txtConcatenate two text files and display the result in the terminal
cat file1.txt file2.txt > newcombinedfile.txtConcatenate two text files and write them to a new file
cat >newfile.txtCreate a file called newfile.txt. Type the desired input and press CTRL+D to finish. The text will be in file newfile.txt.
cat -n file1.txt file2.txt > newnumberedfile.txtSome implementations of cat, with option -n, can also number lines
cat file1.txt > file2.txtCopy the contents of file1.txt into file2.txt
cat file1.txt >> file2.txtAppend the contents of file1.txt to file2.txt
cat file1.txt file2.txt file3.txt | sort > test4Concatenate the files, sort the complete set of lines, and write the output to a newly created file
cat file1.txt file2.txt | lessRun the program "less" with the concatenation of file1 and file2 as its input
cat file1.txt | grep exampleHighlight instances the word "example" in file1.txt
command | catCancel "command" special behavior (e.g. paging) when it writes directly to TTY (cf. UUOC below)

Unix culture

Jargon file definition

The Jargon File version 4.4.7 lists this as the definition of cat:

  1. To spew an entire file to the screen or some other output sink without pause (syn. blast).
  2. By extension, to dump large amounts of data at an unprepared target or with no intention of browsing it carefully. Usage: considered silly. Rare outside Unix sites. See also dd , BLT.

Among Unix fans, cat(1) is considered an excellent example of user-interface design, because it delivers the file contents without such verbosity as spacing or headers between the files, and because it does not require the files to consist of lines of text, but works with any sort of data.

Among Unix critics, cat(1) is considered the canonical example of bad user-interface design, because of its woefully unobvious name. It is far more often used to blast a single file to standard output than to concatenate two or more files. The name cat for the former operation is just as unintuitive as, say, LISP's cdr .[ citation needed ]

Useless use of cat

Useless use of cat (UUOC) is common Unix jargon for command line constructs that only provide a function of convenience to the user. [12] In computing, the word "abuse", [13] in the second sense of the definition, is used to disparage the excessive or unnecessary use of a language construct; thus, abuse of cat is sometimes called "cat abuse". Example of a common cat abuse is given in the award:

cat filename | command arg1 arg2 argn

This can be rewritten using redirection of stdin instead, in either of the following forms (the first is more traditional):

command arg1 arg2 argn < filename
<filename command arg1 arg2 argn

Beyond other benefits, the input redirection forms allow command to perform random access on the file, whereas the cat examples do not. This is because the redirection form opens the file as the stdin file descriptor which command can fully access, while the cat form simply provides the data as a stream of bytes.

Another common case where cat is unnecessary is where a command defaults to operating on stdin, but will read from a file, if the filename is given as an argument. This is the case for many common commands; the following examples

cat file | grep pattern
cat file | less

can instead be written as

grep pattern file
less file

A common interactive use of cat for a single file is to output the content of a file to standard output. However, if the output is piped or redirected, cat is unnecessary.

A cat written with UUOC might still be preferred for readability reasons, as reading a piped stream left-to-right might be easier to conceptualize. [14] Also, one wrong use of the redirection symbol > instead of < (often adjacent on keyboards) may permanently delete the content of a file, in other words clobbering, and one way to avoid this is to use cat with pipes. Compare:

command < in | command2 > out
<in command | command2 > out

with:

cat in | command | command2 > out

See also

Related Research Articles

ls Command to list files and directories in Unix and Unix-like operating systems

In computing, ls is a command to list computer files and directories in Unix and Unix-like operating systems. It is specified by POSIX and the Single UNIX Specification.

dir (command) Directory information command on various operating systems

In computing, dir (directory) is a command in various computer operating systems used for computer file and directory listing. It is one of the basic commands to help navigate the file system. The command is usually implemented as an internal command in the command-line interpreter (shell). On some systems, a more graphical representation of the directory structure can be displayed using the tree command.

In computer programming, standard streams are preconnected input and output communication channels between a computer program and its environment when it begins execution. The three input/output (I/O) connections are called standard input (stdin), standard output (stdout) and standard error (stderr). Originally I/O happened via a physically connected system console, but standard streams abstract this. When a command is executed via an interactive shell, the streams are typically connected to the text terminal on which the shell is running, but can be changed with redirection or a pipeline. More generally, a child process inherits the standard streams of its parent process.

head (Unix) Program on Unix and Unix-like systems

head is a program on Unix and Unix-like operating systems used to display the beginning of a text file or piped data.

xargs is a command on Unix and most Unix-like operating systems used to build and execute commands from standard input. It converts input from standard input into arguments to a command.

dd is a command-line utility for Unix, Plan 9, Inferno, and Unix-like operating systems and beyond, the primary purpose of which is to convert and copy files. On Unix, device drivers for hardware and special device files appear in the file system just like normal files; dd can also read and/or write from/to these files, provided that function is implemented in their respective driver. As a result, dd can be used for tasks such as backing up the boot sector of a hard drive, and obtaining a fixed amount of random data. The dd program can also perform conversions on the data as it is copied, including byte order swapping and conversion to and from the ASCII and EBCDIC text encodings.

join is a command in Unix and Unix-like operating systems that merges the lines of two sorted text files based on the presence of a common field. It is similar to the join operator used in relational databases but operating on text files.

wc (Unix) Unix command utility

wc is a command in Unix, Plan 9, Inferno, and Unix-like operating systems. The program reads either standard input or a list of computer files and generates one or more of the following statistics: newline count, word count, and byte count. If a list of files is provided, both individual file and total statistics follow.

cksum Unix command

cksum is a command in Unix and Unix-like operating systems that generates a checksum value for a file or stream of data. The cksum command reads each file given in its arguments, or standard input if no arguments are provided, and outputs the file's 32-bit cyclic redundancy check (CRC) checksum and byte count. The CRC output by cksum is different from the CRC-32 used in zip, PNG and zlib.

split is a utility on Unix, Plan 9, and Unix-like operating systems most commonly used to split a computer file into two or more smaller files.

paste is a Unix command line utility which is used to join files horizontally by outputting lines consisting of the sequentially corresponding lines of each file specified, separated by tabs, to the standard output.

nl is a Unix utility for numbering lines, either from a file or from standard input, reproducing output on standard output.

The Thompson shell was the first Unix shell, introduced in the first version of Unix in 1971, and was written by Ken Thompson. It was a simple command interpreter, not designed for scripting, but nonetheless introduced several innovative features to the command-line interface and led to the development of the later Unix shells.

<span class="mw-page-title-main">Pipeline (Unix)</span> Mechanism for inter-process communication using message passing

In Unix-like computer operating systems, a pipeline is a mechanism for inter-process communication using message passing. A pipeline is a set of processes chained together by their standard streams, so that the output text of each process (stdout) is passed directly as input (stdin) to the next one. The second process is started as the first process is still executing, and they are executed concurrently. The concept of pipelines was championed by Douglas McIlroy at Unix's ancestral home of Bell Labs, during the development of Unix, shaping its toolbox philosophy. It is named by analogy to a physical pipeline. A key feature of these pipelines is their "hiding of internals". This in turn allows for more clarity and simplicity in the system.

tail is a program available on Unix, Unix-like systems, FreeDOS and MSX-DOS used to display the tail end of a text file or piped data.

In computing, tee is a command in command-line interpreters (shells) using standard streams which reads standard input and writes it to both standard output and one or more files, effectively duplicating its input. It is primarily used in conjunction with pipes and filters. The command is named after the T-splitter used in plumbing.

cpio is a general file archiver utility and its associated file format. It is primarily installed on Unix-like computer operating systems. The software utility was originally intended as a tape archiving program as part of the Programmer's Workbench (PWB/UNIX), and has been a component of virtually every Unix operating system released thereafter. Its name is derived from the phrase copy in and out, in close description of the program's use of standard input and standard output in its operation.

whoami Command on various operating systems

In computing, whoami is a command found on most Unix-like operating systems, Intel iRMX 86, every Microsoft Windows operating system since Windows Server 2003, and on ReactOS. It is a concatenation of the words "Who am I?" and prints the effective username of the current user when invoked.

sum is a legacy utility available on some Unix and Unix-like operating systems. This utility outputs a 16-bit checksum of each argument file, as well as the number of blocks they take on disk. Two different checksum algorithms are in use. POSIX abandoned sum in favor of cksum.

The csplit command in Unix and Unix-like operating systems is a utility that is used to split a file into two or more smaller files determined by context lines.

References

  1. "In Unix, what do some obscurely named commands stand for?". University Information Technology Services. Indiana University.
  2. Kernighan, Brian W.; Pike, Rob (1984). The UNIX Programming Environment. Addison-Wesley. p. 15.
  3. 1 2 Pike, Rob; Kernighan, Brian W. Program design in the UNIX environment (PDF) (Report). p. 3.
  4. McIlroy, M. D. (1987). A Research Unix reader: annotated excerpts from the Programmer's Manual, 1971–1986 (PDF) (Technical report). CSTR. Bell Labs. 139.
  5. cat(1)    Linux User Commands Manual
  6. "reactos/cat.c at master · reactos/reactos · GitHub". github.com. Retrieved August 28, 2021.
  7. "tac(1) - Linux manual page". man7.org.
  8. "sharkdp/bat". December 2, 2021 via GitHub.
  9. UNIX programmers manual (PDF). bitsavers.org (Report). November 3, 1971. p. 32. Archived from the original (PDF) on 2006-06-17.
  10. GNU Coreutils. "GNU Coreutils manual", GNU , Retrieved on 1 Mars 2017.
  11. OpenBSD manual page and the GNU Core Utiltites version of cat
  12. Brian Blackmore (1994-12-05). "Perl or Sed?". Newsgroup:  comp.unix.shell . Retrieved 2024-02-12.
  13. "Merriam Webster's Definition of Abuse" . Retrieved 2021-02-25.
  14. Nguyen, Dan. "Stanford Computational Journalism Lab". stanford.edu. Retrieved 2017-10-08.