Operating system | Unix and Unix-like |
---|---|
Platform | Cross-platform |
Type | Command |
License | coreutils: GPLv3+ |
paste is a Unix command line utility which is used to join files horizontally (parallel merging) by outputting lines consisting of the sequentially corresponding lines of each file specified, separated by tabs, to the standard output.
The original Bell Labs version was written by Gottfried W. R. Luderer. [1] [2] The version of paste
bundled in GNU coreutils was written by David M. Ihnat and David MacKenzie. [3] The command is available as a separate package for Microsoft Windows as part of the UnxUtils collection of native Win32 ports of common GNU Unix-like utilities. [4]
The paste utility is invoked with the following syntax:
paste [options] [file1 ..]
Once invoked, paste will read all its file arguments. For each corresponding line, paste will append the contents of each file at that line to its output along with a tab. When it has completed its operation for the last file, paste will output a newline character and move on to the next line.
paste exits after all streams return end of file. The number of lines in the output stream will equal the number of lines in the input file with the largest number of lines. Missing values are represented by empty strings.
Though potentially useful, an option to have paste emit an alternate string for a missing field (such as "NA") is not standard.
A sequence of empty records at the bottom of a column of the output stream may or may not have been present in the input file corresponding to that column as explicit empty records, unless you know the input file supplied all rows explicitly (e.g. in the canonical case where all input files all do indeed have the same number of lines).
The paste utility accepts the following options:
-d|--delimiters delimiters
, which specifies a list of delimiters to be used instead of tabs for separating consecutive values on a single line. Each delimiter is used in turn; when the list has been exhausted, paste begins again at the first delimiter.
-s|--serial
, which causes paste to append the data in serial rather than in parallel; that is, in a horizontal rather than vertical fashion.
For the following examples, assume that names.txt is a plain-text file that contains the following information:
Mark Smith Bobby Brown Sue Miller Jenny Igotit
and that numbers.txt is another plain-text file that contains the following information:
555-1234 555-9876 555-6743 867-5309
The following example shows the invocation of paste with names.txt and numbers.txt as well as the resulting output:
$ pastenames.txtnumbers.txt Mark Smith 555-1234Bobby Brown 555-9876Sue Miller 555-6743Jenny Igotit 867-5309
When invoked with the --serial
option (-s
on BSD or older systems), the output of paste is adjusted such that the information is presented in a horizontal fashion:
$ paste--serialnames.txtnumbers.txt Mark Smith Bobby Brown Sue Miller Jenny Igotit555-1234 555-9876 555-6734 867-5309
Finally, the use of the --delimiters
option (-d
on BSD or older systems) is illustrated in the following example:
$ paste--delimiters.names.txtnumbers.txt Mark Smith.555-1234Bobby Brown.555-9876Sue Miller.555-6743Jenny Igotit.867-5309
As an example usage of both, the paste command can be used to concatenate multiple consecutive lines into a single row:
$ paste--serial--delimiters'\t\n'names.txt Mark Smith Bobby BrownSue Miller Jenny Igotit
sed is a Unix utility that parses and transforms text, using a simple, compact programming language. It was developed from 1973 to 1974 by Lee E. McMahon of Bell Labs, and is available today for most operating systems. sed was based on the scripting features of the interactive editor ed and the earlier qed. It was one of the earliest tools to support regular expressions, and remains in use for text processing, most notably with the substitution command. Popular alternative tools for plaintext string manipulation and "stream editing" include AWK and Perl.
In computing, the utility diff is a data comparison tool that computes and displays the differences between the contents of files. Unlike edit distance notions used for other purposes, diff is line-oriented rather than character-oriented, but it is like Levenshtein distance in that it tries to determine the smallest set of deletions and insertions to create one file from the other. The utility displays the changes in one of several standard formats, such that both humans or computers can parse the changes, and use them for patching.
The computer tool patch is a Unix program that updates text files according to instructions contained in a separate file, called a patch file. The patch file is a text file that consists of a list of differences and is produced by running the related diff program with the original and updated file as arguments. Updating files with patch is often referred to as applying the patch or simply patching the files.
xargs is a command on Unix and most Unix-like operating systems used to build and execute commands from standard input. It converts input from standard input into arguments to a command.
join
is a command in Unix and Unix-like operating systems that merges the lines of two sorted text files based on the presence of a common field. It is similar to the join operator used in relational databases but operating on text files.
bc, for basic calculator, is "an arbitrary-precision calculator language" with syntax similar to the C programming language. bc is typically used as either a mathematical scripting language or as an interactive mathematical shell.
In computing, cut
is a command line utility on Unix and Unix-like operating systems which is used to extract sections from each line of input — usually from a file. It is currently part of the GNU coreutils package and the BSD Base System.
cksum
is a command in Unix and Unix-like operating systems that generates a checksum value for a file or stream of data. The cksum command reads each file given in its arguments, or standard input if no arguments are provided, and outputs the file's 32-bit cyclic redundancy check (CRC) checksum and byte count. The CRC output by cksum is different from the CRC-32 used in zip, PNG and zlib.
split
is a utility on Unix, Plan 9, and Unix-like operating systems most commonly used to split a computer file into two or more smaller files.
nl is a Unix utility for numbering lines, either from a file or from standard input, reproducing output on standard output.
In computing, more
is a command to view the contents of a text file one screen at a time. It is available on Unix and Unix-like systems, DOS, Digital Research FlexOS, IBM/Toshiba 4690 OS, IBM OS/2, Microsoft Windows and ReactOS. Programs of this sort are called pagers. more
is a very basic pager, originally allowing only forward navigation through a file, though newer implementations do allow for limited backward movement.
tail is a program available on Unix, Unix-like systems, FreeDOS and MSX-DOS used to display the tail end of a text file or piped data.
In computing, tee
is a command in command-line interpreters (shells) using standard streams which reads standard input and writes it to both standard output and one or more files, effectively duplicating its input. It is primarily used in conjunction with pipes and filters. The command is named after the T-splitter used in plumbing.
yes
is a command on Unix and Unix-like operating systems, which outputs an affirmative response, or a user-defined string of text, continuously until killed.
cpio is a general file archiver utility and its associated file format. It is primarily installed on Unix-like computer operating systems. The software utility was originally intended as a tape archiving program as part of the Programmer's Workbench (PWB/UNIX), and has been a component of virtually every Unix operating system released thereafter. Its name is derived from the phrase copy in and out, in close description of the program's use of standard input and standard output in its operation.
In computing, sort is a standard command line program of Unix and Unix-like operating systems, that prints the lines of its input or concatenation of all files listed in its argument list in sorted order. Sorting is done based on one or more sort keys extracted from each line of input. By default, the entire input is taken as sort key. Blank space is the default field separator. The command supports a number of command-line options that can vary by implementation. For instance the "-r
" flag will reverse the sort order.
The tsort program is a command line utility on Unix and Unix-like platforms, that performs a topological sort on its input. As of 2017, it is part of the POSIX.1 standard.
sum is a legacy utility available on some Unix and Unix-like operating systems. This utility outputs a 16-bit checksum of each argument file, as well as the number of blocks they take on disk. Two different checksum algorithms are in use. POSIX abandoned sum
in favor of cksum.
The csplit
command in Unix and Unix-like operating systems is a utility that is used to split a file into two or more smaller files determined by context lines.
cat
is a standard Unix utility that reads files sequentially, writing them to standard output. The name is derived from its function to (con)catenate files . It has been ported to a number of operating systems.