Csplit

Last updated
csplit
Operating system Unix and Unix-like
Platform Cross-platform
Type Command
License GNU GPL v3

The csplit command in Unix and Unix-like operating systems is a utility that is used to split a file into two or more smaller files determined by context lines.

Contents

History

csplit is part of the X/Open Portability Guide since issue 2 of 1987. It was inherited into the first version of POSIX and the Single Unix Specification. [1] It first appeared in PWB UNIX. [2]

The version of csplit bundled in GNU coreutils was written by Stuart Kemp and David MacKenzie. [3] The command is available as a separate package for Microsoft Windows as part of the UnxUtils collection of native Win32 ports of common GNU Unix-like utilities. [4]

Usage

The command-syntax is:

csplit [OPTION]... FILE PATTERN... 

The patterns may be line numbers or regular expressions. The program outputs pieces of the file separated by the patterns into files xx00, xx01, etc., and outputs the size of each piece, in bytes, to standard output.

The optional parameters modify the behaviour of the program in various ways. For example, the default prefix string (xx) and number of digits (2) in the output filenames can be changed.

As with most Unix utilities, a return code of 0 indicates success, while nonzero values indicate failure.

Comparison to split

The split command also splits a file into pieces, except that all the pieces are of a fixed size (measured in lines or bytes).

See also

Related Research Articles

uniq is a utility command on Unix, Plan 9, Inferno, and Unix-like operating systems which, when fed a text file or standard input, outputs the text with adjacent identical lines collapsed to one, unique line of text.

ls

In computing, ls is a command to list computer files in Unix and Unix-like operating systems. ls is specified by POSIX and the Single UNIX Specification. When invoked without any arguments, ls lists the files in the current working directory. The command is also available in the EFI shell. In other environments, such as DOS, OS/2, and Microsoft Windows, similar functionality is provided by the dir command. The numerical computing environments MATLAB and GNU Octave include an ls function with similar functionality.

uname Standard UNIX utility that prints name and other details about the machine

uname is a computer program in Unix and Unix-like computer operating systems that prints the name, version and other details about the current machine and the operating system running on it.

dd is a command-line utility for Unix, Unix-like operating systems and beyond, the primary purpose of which is to convert and copy files.

join is a command in Unix and Unix-like operating systems that merges the lines of two sorted text files based on the presence of a common field. It is similar to the join operator used in relational databases but operating on text files.

tr (Unix) Command in Unix-like operating systems

tr is a command in Unix, Plan 9, Inferno, and Unix-like operating systems. It is an abbreviation of translate or transliterate, indicating its operation of replacing or removing specific characters in its input data set.

wc (Unix)

wc is a command in Unix, Plan 9, Inferno, and Unix-like operating systems. The program reads either standard input or a list of computer files and generates one or more of the following statistics: newline count, word count, and byte count. If a list of files is provided, both individual file and total statistics follow.

In computing, cut is a command line utility on Unix and Unix-like operating systems which is used to extract sections from each line of input — usually from a file. It is currently part of the GNU coreutils package and the BSD Base System.

cksum

cksum is a command in Unix and Unix-like operating systems that generates a checksum value for a file or stream of data. The cksum command reads each file given in its arguments, or standard input if no arguments are provided, and outputs the file's CRC-32 checksum and byte count.

split is a utility on Unix, Plan 9, and Unix-like operating systems most commonly used to split a computer file into two or more smaller files.

df (Unix) Standard Unix command

df is a standard Unix command used to display the amount of available disk space for file systems on which the invoking user has appropriate read access. df is typically implemented using the statfs or statvfs system calls.

nl is a Unix utility for numbering lines, either from a file or from standard input, reproducing output on standard output.

cmp (Unix)

In computing, cmp is a command-line utility on Unix and Unix-like operating systems that compares two files of any type and writes the results to the standard output. By default, cmp is silent if the files are the same; if they differ, the byte and line number at which the first difference occurred is reported. The command is also available in the OS-9 shell.

tail is a program available on Unix, Unix-like systems, FreeDOS and MSX-DOS used to display the tail end of a text file or piped data.

In computing, tee is a command in command-line interpreters (shells) using standard streams which reads standard input and writes it to both standard output and one or more files, effectively duplicating its input. It is primarily used in conjunction with pipes and filters. The command is named after the T-splitter used in plumbing.

In computing, sleep is a command in Unix, Unix-like and other operating systems that suspends program execution for a specified time.

logname

In computer software, logname is a program in Unix and Unix-like operating systems that prints the name of the user who is currently logged in on the terminal. It usually corresponds to the LOGNAME variable in the system-state environment.

od is a command on various operating systems for displaying ("dumping") data in various human-readable output formats. The name is an acronym for "octal dump" since it defaults to printing in the octal data format.

fold is a Unix command used for making a file with long lines more readable on a limited width computer terminal by performing a line wrap.

cat (Unix)

cat is a standard Unix utility that reads files sequentially, writing them to standard output. The name is derived from its function to concatenate files. It has been ported to a number of operating systems.

References

  1. csplit   Commands & Utilities Reference, The Single UNIX Specification , Issue 7 from The Open Group
  2. csplit(1)    FreeBSD General Commands Manual
  3. "Csplit(1) - Linux man page".
  4. "Native Win32 ports of some GNU utilities". unxutils.sourceforge.net.

Further reading