Xargs

xargs
Developer(s)	Various open-source and commercial developers
Operating system	Unix, Unix-like, Plan 9, IBM i
Platform	Cross-platform
Type	Command

Last updated January 02, 2025

xargs (short for "extended arguments")^[1] is a command on Unix and most Unix-like operating systems used to build and execute commands from standard input. It converts input from standard input into arguments to a command.

A port of an older version of GNU xargs is available for Microsoft Windows as part of the UnxUtils collection of native Win32 ports of common GNU Unix-like utilities.^[2] A ground-up rewrite named wargs is part of the open-source TextTools^[3] project. The xargs command has also been ported to the IBM i operating system.^[4]

Examples

One use case of the xargs command is to remove a list of files using the rm command. POSIX systems have an ARG_MAX for the maximum total length of the command line,^[5]^[6] so the command may fail with an error message of "Argument list too long" (meaning that the exec system call's limit on the length of a command line was exceeded): rm/path/* or rm$(find/path-typef). (The latter invocation is incorrect, as it may expand globs in the output.)

This can be rewritten using the xargs command to break the list of arguments into sublists small enough to be acceptable:

$ find/path-typef-print|xargsrm

In the above example, the find utility feeds the input of xargs with a long list of file names. xargs then splits this list into sublists and calls rm once for every sublist.

Some implementations of xargs can also be used to parallelize operations with the -P maxprocs argument to specify how many parallel processes should be used to execute the commands over the input argument lists. However, the output streams may not be synchronized. This can be overcome by using an --output file argument where possible, and then combining the results after processing. The following example queues 24 processes and waits on each to finish before launching another.

$ find/path-name'*.foo'|xargs-P24-I'{}'/cpu/bound/process'{}'-o'{}'.out

xargs often covers the same functionality as the command substitution feature of many shells, denoted by the backquote notation (`...` or $(...)). xargs is also a good companion for commands that output long lists of files such as find , locate and grep , but only if one uses -0 (or equivalently --null), since xargs without -0 deals badly with file names containing ', " and space. GNU Parallel is a similar tool that offers better compatibility with find, locate and grep when file names may contain ', ", and space (newline still requires -0).

Placement of arguments

`-I` option: single argument

The xargs command offers options to insert the listed arguments at some position other than the end of the command line. The -I option to xargs takes a string that will be replaced with the supplied input before the command is executed. A common choice is %.

$ mkdir~/backups $ find/path-typef-name'*~'-print0|xargs-0-I%cp-a%~/backups

The string to replace may appear multiple times in the command part. Using -I at all limits the number of lines used each time to one.

Shell trick: any number

Another way to achieve a similar effect is to use a shell as the launched command, and deal with the complexity in that shell, for example:

$ mkdir~/backups $ find/path-typef-name'*~'-print0|xargs-0sh-c'for filename; do cp -a "$filename" ~/backups; done'sh

The word sh at the end of the line is for the POSIX shell sh -c to fill in for $0, the "executable name" part of the positional parameters (argv). If it weren't present, the name of the first matched file would be instead assigned to $0 and the file wouldn't be copied to ~/backups. One can also use any other word to fill in that blank, my-xargs-script for example.

Since cp accepts multiple files at once, one can also simply do the following:

$ find/path-typef-name'*~'-print0|xargs-0sh-c'if [ $# -gt 0 ]; then cp -a "$@" ~/backup; fi'sh

This script runs cp with all the files given to it when there are any arguments passed. Doing so is more efficient since only one invocation of cp is done for each invocation of sh.

Separator problem

Many Unix utilities are line-oriented. These may work with xargs as long as the lines do not contain ', ", or a space. Some of the Unix utilities can use NUL as record separator (e.g. Perl (requires -0 and \0 instead of \n), locate (requires using -0), find (requires using -print0), grep (requires -z or -Z), sort (requires using -z)). Using -0 for xargs deals with the problem, but many Unix utilities cannot use NUL as separator (e.g. head , tail , ls , echo , sed , tar -v, wc , which ).

But often people forget this and assume xargs is also line-oriented, which is not the case (per default xargs separates on newlines and blanks within lines, substrings with blanks must be single- or double-quoted).

The separator problem is illustrated here:

# Make some targets to practice on touchimportant_file touch'not important_file' mkdir-p'12" records'  find.-namenot\*|tail-1|xargsrm find\!-name.-typed|tail-1|xargsrmdir

Running the above will cause important_file to be removed but will remove neither the directory called 12" records, nor the file called not important_file.

The proper fix is to use the GNU-specific -print0 option, but tail (and other tools) do not support NUL-terminated strings:

# use the same preparation commands as above find.-namenot\*-print0|xargs-0rm find\!-name.-typed-print0|xargs-0rmdir

When using the -print0 option, entries are separated by a null character instead of an end-of-line. This is equivalent to the more verbose command:find.-namenot\*|tr\\n\\0|xargs-0rm or shorter, by switching xargs to (non-POSIX) line-oriented mode with the -d (delimiter) option: find.-namenot\*|xargs-d'\n'rm

but in general using -0 with -print0 should be preferred, since newlines in filenames are still a problem.

GNU parallel is an alternative to xargs that is designed to have the same options, but is line-oriented. Thus, using GNU Parallel instead, the above would work as expected.^[7]

For Unix environments where xargs does not support the -0 nor the -d option (e.g. Solaris, AIX), the POSIX standard states that one can simply backslash-escape every character:find.-namenot\*|sed's/$.$/\\\1/g'|xargsrm.^[8] Alternatively, one can avoid using xargs at all, either by using GNU parallel or using the -exec ... + functionality of find.

Operating on a subset of arguments at a time

One might be dealing with commands that can only accept one or maybe two arguments at a time. For example, the diff command operates on two files at a time. The -n option to xargs specifies how many arguments at a time to supply to the given command. The command will be invoked repeatedly until all input is exhausted. Note that on the last invocation one might get fewer than the desired number of arguments if there is insufficient input. Use xargs to break up the input into two arguments per line:

$ echo{0..9}|xargs-n20 12 34 56 78 9

In addition to running based on a specified number of arguments at a time, one can also invoke a command for each line of input with the -L 1 option. One can use an arbitrary number of lines at a time, but one is most common. Here is how one might diff every git commit against its parent.^[9]

$ gitlog--format="%H %P"|xargs-L1gitdiff

Encoding problem

The argument separator processing of xargs is not the only problem with using the xargs program in its default mode. Most Unix tools which are often used to manipulate filenames (for example sed, basename, sort, etc.) are text processing tools. However, Unix path names are not really text. Consider a path name /aaa/bbb/ccc. The /aaa directory and its bbb subdirectory can in general be created by different users with different environments. That means these users could have a different locale setup, and that means that aaa and bbb do not even necessarily have to have the same character encoding. For example, aaa could be in UTF-8 and bbb in Shift JIS. As a result, an absolute path name in a Unix system may not be correctly processable as text under a single character encoding. Tools which rely on their input being text may fail on such strings.

One workaround for this problem is to run such tools in the C locale, which essentially processes the bytes of the input as-is. However, this will change the behavior of the tools in ways the user may not expect (for example, some of the user's expectations about case-folding behavior may not be met).

Related Research Articles

<span class="mw-page-title-main">AWK</span> Programming language

AWK is a domain-specific language designed for text processing and typically used as a data extraction and reporting tool. Like sed and grep, it is a filter, and it is a standard feature of most Unix-like operating systems.

Bash, short for Bourne-Again SHell, is a shell program and command language supported by the Free Software Foundation and first developed for the GNU Project by Brian Fox. Designed as a 100% free software alternative for the Bourne shell, it was initially released in 1989. Its moniker is a play on words, referencing both its predecessor, the Bourne shell, and the concept of rebirth.

A shell script is a computer program designed to be run by a Unix shell, a command-line interpreter. The various dialects of shell scripts are considered to be command languages. Typical operations performed by shell scripts include file manipulation, program execution, and printing text. A script which sets up the environment, runs the program, and does any necessary cleanup or logging, is called a wrapper.

A Unix shell is a command-line interpreter or shell that provides a command line user interface for Unix-like operating systems. The shell is both an interactive command language and a scripting language, and is used by the operating system to control the execution of the system using shell scripts.

grep is a command-line utility for searching plaintext datasets for lines that match a regular expression. Its name comes from the ed command g/re/p, which has the same effect. grep was originally developed for the Unix operating system, but later became available for all Unix-like systems and some others such as OS-9.

The C shell is a Unix shell created by Bill Joy while he was a graduate student at University of California, Berkeley in the late 1970s. It has been widely distributed, beginning with the 2BSD release of the Berkeley Software Distribution (BSD) which Joy first distributed in 1978. Other early contributors to the ideas or the code were Michael Ubell, Eric Allman, Mike O'Brien and Jim Kulp.

In computing, tar is a computer software utility for collecting many files into one archive file, often referred to as a tarball, for distribution or backup purposes. The name is derived from "tape archive", as it was originally developed to write data to sequential I/O devices with no file system of their own, such as devices that use magnetic tape. The archive data sets created by tar contain various file system parameters, such as name, timestamps, ownership, file-access permissions, and directory organization. POSIX abandoned tar in favor of pax, yet tar sees continued widespread use.

In computer programming, glob patterns specify sets of filenames with wildcard characters. For example, the Unix Bash shell command mv *.txttextfiles/ moves all files with names ending in .txt from the current directory to the directory textfiles. Here, * is a wildcard and *.txt is a glob pattern. The wildcard * stands for "any string of any length including empty, but excluding the path separator characters ".

wc is a command in Unix, Plan 9, Inferno, and Unix-like operating systems. The program reads either standard input or a list of computer files and generates one or more of the following statistics: newline count, word count, and byte count. If a list of files is provided, both individual file and total statistics follow.

In computing, cp is a command in various Unix and Unix-like operating systems for copying files and directories. The command has three principal modes of operation, expressed by the types of arguments presented to the program for copying a file to another file, one or more files to a directory, or for copying entire directories to another directory.

nl is a Unix utility for numbering lines, either from a file or from standard input, reproducing output on standard output.

mv is a Unix command that moves one or more files or directories from one place to another. If both filenames are on the same filesystem, this results in a simple file rename; otherwise the file content is copied to the new location and the old file is removed. Using mv requires the user to have write permission for the directories the file will move between. This is because mv changes the content of both directories involved in the move. When using the mv command on files located on the same filesystem, the file's timestamp is not updated.

In Unix-like operating systems, find is a command-line utility that locates files based on some user-specified criteria and either prints the pathname of each matched object or, if another action is requested, performs that action on each matched object.

rm is a basic command on Unix and Unix-like operating systems used to remove objects such as computer files, directories and symbolic links from file systems and also special files such as device nodes, pipes and sockets, similar to the del command in MS-DOS, OS/2, and Microsoft Windows. The command is also available in the EFI shell.

A command shell is a command-line interface to interact with and manipulate a computer's operating system.

cpio is a general file archiver utility and its associated file format. It is primarily installed on Unix-like computer operating systems. The software utility was originally intended as a tape archiving program as part of the Programmer's Workbench (PWB/UNIX), and has been a component of virtually every Unix operating system released thereafter. Its name is derived from the phrase copy in and out, in close description of the program's use of standard input and standard output in its operation.

In computing, a shebang is the character sequence #!, consisting of the characters number sign and exclamation mark, at the beginning of a script. It is also called sharp-exclamation, sha-bang, hashbang, pound-bang, or hash-pling.

<span class="mw-page-title-main">GNU parallel</span> Shell tool for executing jobs in parallel

GNU parallel is a command-line utility for Linux and other Unix-like operating systems which allows the user to execute shell scripts or commands in parallel. GNU parallel is free software, written by Ole Tange in Perl. It is available under the terms of GPLv3.

A command-line interface (CLI) is a means of interacting with a computer program by inputting lines of text called command lines. Command-line interfaces emerged in the mid-1960s, on computer terminals, as an interactive and more user-friendly alternative to the non-interactive interface available with punched cards.

cat is a standard Unix utility that reads files sequentially, writing them to standard output. The name is derived from its function to (con)catenate files . It has been ported to a number of operating systems.

References

↑ "The Unix Acronym List: The Complete List". www.roesler-ac.de. Retrieved 2020-04-12.
↑ "Native Win32 ports of some GNU utilities". unxutils.sourceforge.net.
↑ "Text processing tools for Windows".
↑ IBM. "IBM System i Version 7.2 Programming Qshell" (PDF). Retrieved 2020-09-05.
↑ "GNU Core Utilities Frequently Asked Questions" . Retrieved December 7, 2015.
↑ "The maximum length of arguments for a new process". www.in-ulm.de.
↑ Differences Between xargs and GNU Parallel. GNU.org. Accessed February 2012.
↑ xargs – Shell and Utilities Reference, The Single UNIX Specification , Version 4 from The Open Group
↑ Cosmin Stejerean. "Things you (probably) didn't know about xargs" . Retrieved December 7, 2015.

External links

xargs : construct argument lists and invoke utility – Shell and Utilities Reference, The Single UNIX Specification , Version 4 from The Open Group

Manual pages

xargs(1) – GNU Findutils reference
xargs(1) : construct argument list(s) and execute utility – FreeBSD General Commands Manual
xargs(1) : construct argument list(s) and execute utility – NetBSD General Commands Manual
xargs(1) : construct argument list(s) and execute utility – OpenBSD General Commands Manual
xargs(1) : construct argument lists and invoke utility – Solaris 11.4 User Commands Reference Manual

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "The Unix Acronym List: The Complete List". www.roesler-ac.de. Retrieved 2020-04-12.

[2] "Native Win32 ports of some GNU utilities". unxutils.sourceforge.net.

[3] "Text processing tools for Windows".

[4] IBM. "IBM System i Version 7.2 Programming Qshell" (PDF). Retrieved 2020-09-05.

[5] "GNU Core Utilities Frequently Asked Questions" . Retrieved December 7, 2015.

[6] "The maximum length of arguments for a new process". www.in-ulm.de.

[7] Differences Between xargs and GNU Parallel. GNU.org. Accessed February 2012.

[8] xargs – Shell and Utilities Reference, The Single UNIX Specification , Version 4 from The Open Group

[9] Cosmin Stejerean. "Things you (probably) didn't know about xargs" . Retrieved December 7, 2015.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

v t e Unix command-line interface programs and shell builtins
File system	cat chattr chmod chown chgrp cksum cmp cp dd du df file fuser ln ls mkdir mv pax pwd rm rmdir split tee touch type umask
Processes	at bg crontab fg kill nice ps time
User environment	env exit logname mesg talk tput uname who write
Text processing	awk basename comm csplit cut diff dirname ed ex fold head iconv join m4 more nl paste patch printf read sed sort strings tail tr troff uniq vi wc xargs
Shell builtins	alias cd echo test unset wait
Searching	find grep
Documentation	man
Software development	ar ctags lex make nm strip yacc
Miscellaneous	bc cal expr lp od sleep true and false
Categories Standard Unix programs Unix SUS2008 utilities List