Cut (Unix)

Last updated • 2 min readFrom Wikipedia, The Free Encyclopedia
cut
Original author(s) AT&T Bell Laboratories
Developer(s) Various open-source and commercial developers
Initial releaseFebruary 1985;38 years ago (1985-02)
Operating system Unix, Unix-like, IBM i
Platform Cross-platform
Type Command
License coreutils: GPLv3+

In computing, cut is a command line utility on Unix and Unix-like operating systems which is used to extract sections from each line of input — usually from a file. It is currently part of the GNU coreutils package and the BSD Base System.

Contents

Extraction of line segments can typically be done by bytes (-b), characters (-c), or fields (-f) separated by a delimiter (-d the tab character by default). A range must be provided in each case which consists of one of N, N-M,N- (N to the end of the line), or -M (beginning of the line to M), where N and M are counted from 1 (there is no zeroth value). Since version 6, an error is thrown if you include a zeroth value. Prior to this the value was ignored and assumed to be 1.

History

The original Bell Labs version was written by Gottfried W. R. Luderer. [1] [2] cut is part of the X/Open Portability Guide since issue 2 of 1987. It was inherited into the first version of POSIX.1 and the Single Unix Specification. [3] It first appeared in AT&T System III UNIX in 1982. [4]

The version of cut bundled in GNU coreutils was written by David M. Ihnat, David MacKenzie, and Jim Meyering. [5] The command is available as a separate package for Microsoft Windows as part of the UnxUtils collection of native Win32 ports of common GNU Unix-like utilities. [6] The cut command has also been ported to the IBM i operating system. [7]

Examples

Assuming a file named "file" containing the lines:

foo:bar:baz:qux:quux one:two:three:four:five:six:seven alpha:beta:gamma:delta:epsilon:zeta:eta:theta:iota:kappa:lambda:mu the quick brown fox jumps over the lazy dog

To output the fourth through tenth characters of each line:

$ cut-c4-10file :bar:ba:two:thha:beta quick

To output the fifth field through the end of the line of each line using the colon character as the field delimiter:

$ cut-d":"-f5-file quuxfive:six:sevenepsilon:zeta:eta:theta:iota:kappa:lambda:muthe quick brown fox jumps over the lazy dog

(note that because the colon character is not found in the last line the entire line is shown)

Option -d specifies a single character delimiter (in the example above it is a colon) which serves as field separator. Option -f which specifies range of fields included in the output (here fields range from five till the end). Option -d presupposes usage of option -f.

To output the third field of each line using space as the field delimiter:

$ cut-d" "-f3file foo:bar:baz:qux:quuxone:two:three:four:five:six:sevenalpha:beta:gamma:delta:epsilon:zeta:eta:theta:iota:kappa:lambda:mubrown

(Note that because the space character is not found in the first three lines these entire lines are shown.)

To separate two words having any delimiter:

$ line=process.processid $ cut-d"."-f1<<<$lineprocess$ cut-d"."-f2<<<$lineprocessid

Syntax

cut [-b list] [-c list] [-f list] [-n] [-d delim] [-s] [file]

Flags which may be used include:

-b
Bytes; a list following -b specifies a range of bytes which will be returned, e.g. cut -b1-66 would return the first 66 bytes of a line. NB If used in conjunction with -n, no multi-byte characters will be split. NNB. -b will only work on input lines of less than 1023 bytes
-c
Characters; a list following -c specifies a range of characters which will be returned, e.g. cut -c1-66 would return the first 66 characters of a line
-f
Specifies a field list, separated by a delimiter
list
A comma separated or blank separated list of integer denoted fields, incrementally ordered. The - indicator may be supplied as shorthand to allow inclusion of ranges of fields e.g. 4-6 for ranges 4–6 or 5- as shorthand for field 5 to the end, etc.
-n
Used in combination with -b suppresses splits of multi-byte characters
-d
Delimiter; the character immediately following the -d option is the field delimiter for use in conjunction with the -f option; the default delimiter is tab. Space and other characters with special meanings within the context of the shell in use must be enquoted or escaped as necessary.
-s
Bypasses lines which contain no field delimiters when -f is specified, unless otherwise indicated.
file
The file (and accompanying path if necessary) to process as input. If no file is specified then standard input will be used.

See also

Related Research Articles

uniq is a utility command on Unix, Plan 9, Inferno, and Unix-like operating systems which, when fed a text file or standard input, outputs the text with adjacent identical lines collapsed to one, unique line of text.

In computing, tar is a computer software utility for collecting many files into one archive file, often referred to as a tarball, for distribution or backup purposes. The name is derived from "tape archive", as it was originally developed to write data to sequential I/O devices with no file system of their own. The archive data sets created by tar contain various file system parameters, such as name, timestamps, ownership, file-access permissions, and directory organization. POSIX abandoned tar in favor of pax, yet tar sees continued widespread use.

ls Command to list files and directories in Unix and Unix-like operating systems

In computing, ls is a command to list computer files and directories in Unix and Unix-like operating systems. It is specified by POSIX and the Single UNIX Specification.

head (Unix) Program on Unix and Unix-like systems

head is a program on Unix and Unix-like operating systems used to display the beginning of a text file or piped data.

ln (Unix) Unix file management utility

The ln command is a standard Unix command utility used to create a hard link or a symbolic link (symlink) to an existing file or directory. The use of a hard link allows multiple filenames to be associated with the same file since a hard link points to the inode of a given file, the data of which is stored on disk. On the other hand, symbolic links are special files that refer to other files by name.

xargs is a command on Unix and most Unix-like operating systems used to build and execute commands from standard input. It converts input from standard input into arguments to a command.

dd is a command-line utility for Unix, Plan 9, Inferno, and Unix-like operating systems and beyond, the primary purpose of which is to convert and copy files. On Unix, device drivers for hardware and special device files appear in the file system just like normal files; dd can also read and/or write from/to these files, provided that function is implemented in their respective driver. As a result, dd can be used for tasks such as backing up the boot sector of a hard drive, and obtaining a fixed amount of random data. The dd program can also perform conversions on the data as it is copied, including byte order swapping and conversion to and from the ASCII and EBCDIC text encodings.

join is a command in Unix and Unix-like operating systems that merges the lines of two sorted text files based on the presence of a common field. It is similar to the join operator used in relational databases but operating on text files.

tr (Unix) Unix text formatting utility

tr is a command in Unix, Plan 9, Inferno, and Unix-like operating systems. It is an abbreviation of translate or transliterate, indicating its operation of replacing or removing specific characters in its input data set.

wc (Unix) Unix command utility

wc is a command in Unix, Plan 9, Inferno, and Unix-like operating systems. The program reads either standard input or a list of computer files and generates one or more of the following statistics: newline count, word count, and byte count. If a list of files is provided, both individual file and total statistics follow.

pax is an archiving utility available for various operating systems and defined since 1995. Rather than sort out the incompatible options that have crept up between tar and cpio, along with their implementations across various versions of Unix, the IEEE designed new archive utility pax that could support various archive formats with useful options from both archivers. The pax command is available on Unix and Unix-like operating systems and on IBM i, and Microsoft Windows NT until Windows 2000.

paste is a Unix command line utility which is used to join files horizontally by outputting lines consisting of the sequentially corresponding lines of each file specified, separated by tabs, to the standard output.

cmp (Unix)

In computing, cmp is a command-line utility on Unix and Unix-like operating systems that compares two files of any type and writes the results to the standard output. By default, cmp is silent if the files are the same; if they differ, the byte and line number at which the first difference occurred is reported. The command is also available in the OS-9 shell.

who (Unix)

The standard Unix command who displays a list of users who are currently logged into the computer.

rm (Unix) Unix command utility

rm is a basic command on Unix and Unix-like operating systems used to remove objects such as computer files, directories and symbolic links from file systems and also special files such as device nodes, pipes and sockets, similar to the del command in MS-DOS, OS/2, and Microsoft Windows. The command is also available in the EFI shell.

yes (Unix) Unix command

yes is a command on Unix and Unix-like operating systems, which outputs an affirmative response, or a user-defined string of text, continuously until killed.

cpio is a general file archiver utility and its associated file format. It is primarily installed on Unix-like computer operating systems. The software utility was originally intended as a tape archiving program as part of the Programmer's Workbench (PWB/UNIX), and has been a component of virtually every Unix operating system released thereafter. Its name is derived from the phrase copy in and out, in close description of the program's use of standard input and standard output in its operation.

sort (Unix) Standard UNIX utility

In computing, sort is a standard command line program of Unix and Unix-like operating systems, that prints the lines of its input or concatenation of all files listed in its argument list in sorted order. Sorting is done based on one or more sort keys extracted from each line of input. By default, the entire input is taken as sort key. Blank space is the default field separator. The command supports a number of command-line options that can vary by implementation. For instance the "-r" flag will reverse the sort order.

sum is a legacy utility available on some Unix and Unix-like operating systems. This utility outputs a 16-bit checksum of each argument file, as well as the number of blocks they take on disk. Two different checksum algorithms are in use. POSIX abandoned sum in favor of cksum.

cat (Unix) Unix command utility

cat is a standard Unix utility that reads files sequentially, writing them to standard output. The name is derived from its function to (con)catenate files. It has been ported to a number of operating systems.

References

  1. "cut(1) - OpenBSD manual pages".
  2. "[TUHS] A portrait of cut(1)". 15 January 2020.
  3. cut   Shell and Utilities Reference, The Single UNIX Specification , Version 4 from The Open Group
  4. cut(1)    FreeBSD General Commands Manual
  5. cut(1)    Linux General Commands Manual
  6. "Native Win32 ports of some GNU utilities". unxutils.sourceforge.net.
  7. IBM. "IBM System i Version 7.2 Programming Qshell" (PDF). IBM . Archived (PDF) from the original on 2020-09-18. Retrieved 2020-09-05.