Paste (Unix)

Last updated
paste
Operating system Unix and Unix-like
Platform Cross-platform
Type Command
License coreutils: GPLv3+

paste is a Unix command line utility which is used to join files horizontally (parallel merging) by outputting lines consisting of the sequentially corresponding lines of each file specified, separated by tabs, to the standard output.

Contents

History

The original Bell Labs version was written by Gottfried W. R. Luderer. [1] [2] The version of paste bundled in GNU coreutils was written by David M. Ihnat and David MacKenzie. [3] The command is available as a separate package for Microsoft Windows as part of the UnxUtils collection of native Win32 ports of common GNU Unix-like utilities. [4]

Usage

The paste utility is invoked with the following syntax:

paste [options] [file1 ..]

Description

Once invoked, paste will read all its file arguments. For each corresponding line, paste will append the contents of each file at that line to its output along with a tab. When it has completed its operation for the last file, paste will output a newline character and move on to the next line.

paste exits after all streams return end of file. The number of lines in the output stream will equal the number of lines in the input file with the largest number of lines. Missing values are represented by empty strings.

Though potentially useful, an option to have paste emit an alternate string for a missing field (such as "NA") is not standard.

A sequence of empty records at the bottom of a column of the output stream may or may not have been present in the input file corresponding to that column as explicit empty records, unless you know the input file supplied all rows explicitly (e.g. in the canonical case where all input files all do indeed have the same number of lines).

Options

The paste utility accepts the following options:

-d|--delimiters delimiters, which specifies a list of delimiters to be used instead of tabs for separating consecutive values on a single line. Each delimiter is used in turn; when the list has been exhausted, paste begins again at the first delimiter.

-s|--serial, which causes paste to append the data in serial rather than in parallel; that is, in a horizontal rather than vertical fashion.

Examples

For the following examples, assume that names.txt is a plain-text file that contains the following information:

Mark Smith Bobby Brown Sue Miller Jenny Igotit

and that numbers.txt is another plain-text file that contains the following information:

555-1234 555-9876 555-6743 867-5309

The following example shows the invocation of paste with names.txt and numbers.txt as well as the resulting output:

$ paste names.txt numbers.txt Mark Smith 555-1234Bobby Brown 555-9876Sue Miller 555-6743Jenny Igotit 867-5309

When invoked with the --serialize option (-s on BSD or older systems), the output of paste is adjusted such that the information is presented in a horizontal fashion:

$ paste --serialize names.txt numbers.txt Mark Smith Bobby Brown Sue Miller Jenny Igotit555-1234 555-9876 555-6734 867-5309

Finally, the use of the --delimiters option (-d on BSD or older systems) is illustrated in the following example:

$ paste --delimiters . names.txt numbers.txt Mark Smith.555-1234Bobby Brown.555-9876Sue Miller.555-6743Jenny Igotit.867-5309

As an example usage of both, the paste command can be used to concatenate multiple consecutive lines into a single row:

$ paste --serialize --delimiters '\t\n' names.txt Mark Smith       Bobby BrownSue Miller       Jenny Igotit

See also

Related Research Articles

sed Standard UNIX utility for editing streams of data

sed is a Unix utility that parses and transforms text, using a simple, compact programming language. It was developed from 1973 to 1974 by Lee E. McMahon of Bell Labs, and is available today for most operating systems. sed was based on the scripting features of the interactive editor ed and the earlier qed. It was one of the earliest tools to support regular expressions, and remains in use for text processing, most notably with the substitution command. Popular alternative tools for plaintext string manipulation and "stream editing" include AWK and Perl.

In software development, Make is a build automation tool that automatically builds executable programs and libraries from source code by reading files called Makefiles which specify how to derive the target program. Though integrated development environments and language-specific compiler features can also be used to manage a build process, Make remains widely used, especially in Unix and Unix-like operating systems.

xargs is a command on Unix and most Unix-like operating systems used to build and execute commands from standard input. It converts input from standard input into arguments to a command.

dd is a command-line utility for Unix, Plan 9, Inferno, and Unix-like operating systems and beyond, the primary purpose of which is to convert and copy files. On Unix, device drivers for hardware and special device files appear in the file system just like normal files; dd can also read and/or write from/to these files, provided that function is implemented in their respective driver. As a result, dd can be used for tasks such as backing up the boot sector of a hard drive, and obtaining a fixed amount of random data. The dd program can also perform conversions on the data as it is copied, including byte order swapping and conversion to and from the ASCII and EBCDIC text encodings.

join is a command in Unix and Unix-like operating systems that merges the lines of two sorted text files based on the presence of a common field. It is similar to the join operator used in relational databases but operating on text files.

In computing, cut is a command line utility on Unix and Unix-like operating systems which is used to extract sections from each line of input — usually from a file. It is currently part of the GNU coreutils package and the BSD Base System.

cksum Unix command

cksum is a command in Unix and Unix-like operating systems that generates a checksum value for a file or stream of data. The cksum command reads each file given in its arguments, or standard input if no arguments are provided, and outputs the file's 32-bit cyclic redundancy check (CRC) checksum and byte count. The CRC output by cksum is different from the CRC-32 used in zip, PNG and zlib.

split is a utility on Unix, Plan 9, and Unix-like operating systems most commonly used to split a computer file into two or more smaller files.

nl is a Unix utility for numbering lines, either from a file or from standard input, reproducing output on standard output.

more (command) Terminal pager available on various operating systems

In computing, more is a command to view the contents of a text file one screen at a time. It is available on Unix and Unix-like systems, DOS, Digital Research FlexOS, IBM/Toshiba 4690 OS, IBM OS/2, Microsoft Windows and ReactOS. Programs of this sort are called pagers. more is a very basic pager, originally allowing only forward navigation through a file, though newer implementations do allow for limited backward movement.

tail is a program available on Unix, Unix-like systems, FreeDOS and MSX-DOS used to display the tail end of a text file or piped data.

In computing, tee is a command in command-line interpreters (shells) using standard streams which reads standard input and writes it to both standard output and one or more files, effectively duplicating its input. It is primarily used in conjunction with pipes and filters. The command is named after the T-splitter used in plumbing.

yes (Unix) Unix command

yes is a command on Unix and Unix-like operating systems, which outputs an affirmative response, or a user-defined string of text, continuously until killed.

cpio is a general file archiver utility and its associated file format. It is primarily installed on Unix-like computer operating systems. The software utility was originally intended as a tape archiving program as part of the Programmer's Workbench (PWB/UNIX), and has been a component of virtually every Unix operating system released thereafter. Its name is derived from the phrase copy in and out, in close description of the program's use of standard input and standard output in its operation.

sort (Unix) Standard UNIX utility

In computing, sort is a standard command line program of Unix and Unix-like operating systems, that prints the lines of its input or concatenation of all files listed in its argument list in sorted order. Sorting is done based on one or more sort keys extracted from each line of input. By default, the entire input is taken as sort key. Blank space is the default field separator. The command supports a number of command-line options that can vary by implementation. For instance the "-r" flag will reverse the sort order.

The tsort program is a command line utility on Unix and Unix-like platforms, that performs a topological sort on its input. As of 2017, it is part of the POSIX.1 standard.

sum is a legacy utility available on some Unix and Unix-like operating systems. This utility outputs a 16-bit checksum of each argument file, as well as the number of blocks they take on disk. Two different checksum algorithms are in use. POSIX abandoned sum in favor of cksum.

The csplit command in Unix and Unix-like operating systems is a utility that is used to split a file into two or more smaller files determined by context lines.

fold is a Unix command used for making a file with long lines more readable on a limited width computer terminal by performing a line wrap.

cat (Unix) Unix command utility

cat is a standard Unix utility that reads files sequentially, writing them to standard output. The name is derived from its function to (con)catenate files. It has been ported to a number of operating systems.

References

  1. "paste(1) - OpenBSD manual pages".
  2. "[TUHS] A portrait of cut(1)".
  3. "Paste(1): Merge lines of files - Linux man page".
  4. "Native Win32 ports of some GNU utilities". unxutils.sourceforge.net.