Find (Unix)

Last updated
find
Original author(s) Dick Haight
Developer(s) AT&T Bell Laboratories
Operating system Unix, Unix-like, Plan 9, IBM i
Platform Cross-platform
Type Command

In Unix-like operating systems, find is a command-line utility that locates files based on some user-specified criteria and either prints the pathname of each matched object or, if another action is requested, performs that action on each matched object.

Contents

It initiates a search from a desired starting location and then recursively traverses the nodes (directories) of a hierarchical structure (typically a tree). find can traverse and search through different file systems of partitions belonging to one or more storage devices mounted under the starting directory. [1]

The possible search criteria include a pattern to match against the filename or a time range to match against the modification time or access time of the file. By default, find returns a list of all files below the current working directory, although users can limit the search to any desired maximum number of levels under the starting directory.

The related locate programs use a database of indexed files obtained through find (updated at regular intervals, typically by cron job) to provide a faster method of searching the entire file system for files by name.

History

find appeared in Version 5 Unix as part of the Programmer's Workbench project, and was written by Dick Haight alongside cpio, [2] which were designed to be used together. [3]

The GNU find implementation was originally written by Eric Decker. It was later enhanced by David MacKenzie, Jay Plett, and Tim Wood. [4]

The find command has also been ported to the IBM i operating system. [5]

Find syntax

$ find[-H|-L]path...[operand_expression...]

The two options control how the find command should treat symbolic links. The default behaviour is never to follow symbolic links. The -L flag will cause the find command to follow symbolic links. The -H flag will only follow symbolic links while processing the command line arguments. These flags are specified in the POSIX standard for find. [6] A common extension is the -P flag, for explicitly disabling symlink following. [7] [8]

At least one path must precede the expression. find is capable of interpreting wildcards internally and commands must be quoted carefully in order to control shell globbing.

Expression elements are separated by the command-line argument boundary, usually represented as whitespace in shell syntax. They are evaluated from left to right. They can contain logical elements such as AND (-and or -a) and OR (-or or -o) as well as predicates (filters and actions).

GNU find has a large number of additional features not specified by POSIX.

Predicates

Commonly-used primaries include:

If the expression uses none of -print0, -print, -exec, or -ok, find defaults to performing -print if the conditions test as true.

Operators

Operators can be used to enhance the expressions of the find command. Operators are listed in order of decreasing precedence:

$ find.-name'fileA_*'-o-name'fileB_*'

This command searches the current working directory tree for files whose names start with fileA_ or fileB_. We quote the fileA_* so that the shell does not expand it.

$ find.-name'foo.cpp''!'-path'.svn'

This command searches the current working directory tree except the subdirectory tree ".svn" for files whose name is "foo.cpp". We quote the ! so that it's not interpreted by the shell as the history substitution character.

POSIX protection from infinite output

Real-world file systems often contain looped structures created through the use of hard or soft links. The POSIX standard requires that

The find utility shall detect infinite loops; that is, entering a previously visited directory that is an ancestor of the last file encountered. When it detects an infinite loop, find shall write a diagnostic message to standard error and shall either recover its position in the hierarchy or terminate.

Examples

From the current working directory

$ find.-name'my*'

This searches the current working directory tree for files whose names start with my. The single quotes avoid the shell expansion—without them the shell would replace my* with the list of files whose names begin with my in the current working directory. In newer versions of the program, the directory may be omitted, and it will imply the current working directory.

Regular files only

$ find.-name'my*'-typef 

This limits the results of the above search to only regular files, therefore excluding directories, special files, symbolic links, etc. my* is enclosed in single quotes (apostrophes) as otherwise the shell would replace it with the list of files in the current working directory starting with my...

Commands

The previous examples created listings of results because, by default, find executes the -print action. (Note that early versions of the find command had no default action at all; therefore the resulting list of files would be discarded, to the bewilderment of users.)

$ find.-name'my*'-typef-ls 

This prints extended file information.

Search all directories

$ find/-namemyfile-typef-print 

This searches every directory for a regular file whose name is myfile and prints it to the screen. It is generally not a good idea to look for files this way. This can take a considerable amount of time, so it is best to specify the directory more precisely. Some operating systems may mount dynamic file systems that are not congenial to find. More complex filenames including characters special to the shell may need to be enclosed in single quotes.

Search all but one subdirectory tree

$ find/-pathexcluded_path-prune-o-typef-namemyfile-print 

This searches every directory except the subdirectory tree excluded_path (full path including the leading /) that is pruned by the -prune action, for a regular file whose name is myfile.

Specify a directory

$ find/home/weedly-namemyfile-typef-print 

This searches the /home/weedly directory tree for regular files named myfile. You should always specify the directory to the deepest level you can remember.

Search several directories

$ findlocal/tmp-namemydir-typed-print 

This searches the local subdirectory tree of the current working directory and the /tmp directory tree for directories named mydir.

Ignore errors

If you're doing this as a user other than root, you might want to ignore permission denied (and any other) errors. Since errors are printed to stderr, they can be suppressed by redirecting the output to /dev/null. The following example shows how to do this in the bash shell:

$ find/-namemyfile-typef-print2>/dev/null 

If you are a csh or tcsh user, you cannot redirect stderr without redirecting stdout as well. You can use sh to run the find command to get around this:

$ sh-c"find / -name myfile -type f -print 2> /dev/null"

An alternate method when using csh or tcsh is to pipe the output from stdout and stderr into a grep command. This example shows how to suppress lines that contain permission denied errors.

$ find.-namemyfile|&grep-v'Permission denied'

Find any one of differently named files

$ find.\(-name'*jsp'-o-name'*java'\)-typef-ls 

The -ls operator prints extended information, and the example finds any regular file whose name ends with either 'jsp' or 'java'. Note that the parentheses are required. In many shells the parentheses must be escaped with a backslash (\( and \)) to prevent them from being interpreted as special shell characters. The -ls operator is not available on all versions of find.

Execute an action

$ find/var/ftp/mp3-name'*.mp3'-typef-execchmod644{}\;

This command changes the permissions of all regular files whose names end with .mp3 in the directory tree /var/ftp/mp3. The action is carried out by specifying the statement -exec chmod 644 {} \; in the command. For every regular file whose name ends in .mp3, the command chmod 644 {} is executed replacing {} with the name of the file. The semicolon (backslashed to avoid the shell interpreting it as a command separator) indicates the end of the command. Permission 644, usually shown as rw-r--r--, gives the file owner full permission to read and write the file, while other users have read-only access. In some shells, the {} must be quoted. The trailing ";" is customarily quoted with a leading "\", but could just as effectively be enclosed in single quotes.

Note that the command itself should not be quoted; otherwise you get error messages like

find: echo "mv ./3bfn rel071204": No such file or directory

which means that find is trying to run a file called 'echo"mv ./3bfn rel071204"' and failing.

If you will be executing over many results, it is more efficient to use a variant of the exec primary that collects filenames up to ARG_MAX and then executes COMMAND with a list of filenames.

$ find.-execCOMMAND{}+ 

This will ensure that filenames with whitespaces are passed to the executed COMMAND without being split up by the shell.

Delete files and directories

The -delete action is a GNU extension, and using it turns on -depth. So, if you are testing a find command with -print instead of -delete in order to figure out what will happen before going for it, you need to use -depth -print.

Delete empty files and print the names (note that -empty is a vendor unique extension from GNU find that may not be available in all find implementations):

$ find.-empty-delete-print 

Delete empty regular files:

$ find.-typef-empty-delete 

Delete empty directories:

$ find.-typed-empty-delete 

Delete empty files named 'bad':

$ find.-namebad-empty-delete 

Warning. — The -delete action should be used with conditions such as -empty or -name:

$ find.-delete# this deletes all in .

Search for a string

This command will search all files from the /tmp directory tree for a string:

$ find/tmp-typef-execgrep'search string'/dev/null'{}'\+

The /dev/null argument is used to show the name of the file before the text that is found. Without it, only the text found is printed. (Alternatively, some versions of grep support a -H flag that forces the file name to be printed.) GNU grep can be used on its own to perform this task:

$ grep-r'search string'/tmp 

Example of search for "LOG" in jsmith's home directory tree:

$ find~jsmith-execgrepLOG'{}'/dev/null\;-print /home/jsmith/scripts/errpt.sh:cp $LOG $FIXEDLOGNAME/home/jsmith/scripts/errpt.sh:cat $LOG/home/jsmith/scripts/title:USER=$LOGNAME

Example of search for the string "ERROR" in all XML files in the current working directory tree:

$ find.-name"*.xml"-execgrep"ERROR"/dev/null'{}'\+

The double quotes (" ") surrounding the search string and single quotes (' ') surrounding the braces are optional in this example, but needed to allow spaces and some other special characters in the string. Note with more complex text (notably in most popular shells descended from `sh` and `csh`) single quotes are often the easier choice, since double quotes do not prevent all special interpretation. Quoting filenames which have English contractions demonstrates how this can get rather complicated, since a string with an apostrophe in it is easier to protect with double quotes:

$ find.-name"file-containing-can't"-execgrep"can't"'{}'\;-print 

Search for all files owned by a user

$ find.-user<userid> 

Search in case insensitive mode

Note that -iname is not in the standard and may not be supported by all implementations.

$ find.-iname'MyFile*'

If the -iname switch is not supported on your system then workaround techniques may be possible such as:

$ find.-name'[mM][yY][fF][iI][lL][eE]*'

Search files by size

Searching files whose size is between 100 kilobytes and 500 kilobytes:

$ find.-size+100k-a-size-500k 

Searching empty files:

$ find.-size0k 

Searching non-empty files:

$ find.!-size0k 

Search files by name and size

$ find/usr/src!\(-name'*,v'-o-name'.*,v'\)'{}'\;-print 

This command will search the /usr/src directory tree. All files that are of the form '*,v' and '.*,v' are excluded. Important arguments to note are in the tooltip that is displayed on mouse-over.

forfilein$(find/opt\(-nameerror_log-o-name'access_log'-o-name'ssl_engine_log'-o-name'rewrite_log'-o-name'catalina.out'\)-size+300000k-a-size-5000000k);docat/dev/null>$filedone

The units should be one of [bckw], 'b' means 512-byte blocks, 'c' means byte, 'k' means kilobytes and 'w' means 2-byte words. The size does not count indirect blocks, but it does count blocks in sparse files that are not actually allocated.

Searching files by time

Date ranges can be used to, for example, list files changed since a backup.

Files modified a relative number of days ago:

Example to find all text files in the document folder modified since a week (meaning 7 days):

$ find~/Documents/-iname"*.txt"-mtime-7 

Files modified before or after an absolute date and time:

Example to find all text files last edited in February 2017:

$ find~/Documents/-iname"*.txt"-newermt2017-02-01-not-newermt2017-03-01 

List all text files edited more recently than "document.txt":

$ find~/Documents/-iname"*.txt"-newerdocument.txt 

See also

Related Research Articles

ed (software) Line-oriented text editor for Unix

ed is a line editor for Unix and Unix-like operating systems. It was one of the first parts of the Unix operating system that was developed, in August 1969. It remains part of the POSIX and Open Group standards for Unix-based operating systems, alongside the more sophisticated full-screen editor vi.

<span class="mw-page-title-main">Shell script</span> Script written for the shell, or command line interpreter, of an operating system

A shell script is a computer program designed to be run by a Unix shell, a command-line interpreter. The various dialects of shell scripts are considered to be command languages. Typical operations performed by shell scripts include file manipulation, program execution, and printing text. A script which sets up the environment, runs the program, and does any necessary cleanup or logging, is called a wrapper.

grep is a command-line utility for searching plaintext datasets for lines that match a regular expression. Its name comes from the ed command g/re/p, which has the same effect. grep was originally developed for the Unix operating system, but later became available for all Unix-like systems and some others such as OS-9.

<span class="mw-page-title-main">C shell</span> Unix shell

The C shell is a Unix shell created by Bill Joy while he was a graduate student at University of California, Berkeley in the late 1970s. It has been widely distributed, beginning with the 2BSD release of the Berkeley Software Distribution (BSD) which Joy first distributed in 1978. Other early contributors to the ideas or the code were Michael Ubell, Eric Allman, Mike O'Brien and Jim Kulp.

In computing, tar is a computer software utility for collecting many files into one archive file, often referred to as a tarball, for distribution or backup purposes. The name is derived from "tape archive", as it was originally developed to write data to sequential I/O devices with no file system of their own, such as devices that use magnetic tape. The archive data sets created by tar contain various file system parameters, such as name, timestamps, ownership, file-access permissions, and directory organization. POSIX abandoned tar in favor of pax, yet tar sees continued widespread use.

basename is a standard computer program on Unix and Unix-like operating systems. When basename is given a pathname, it will delete any prefix up to the last slash ('/') character and return the result. basename is described in the Single UNIX Specification and is primarily used in shell scripts.

ln (Unix) Unix file management utility

The ln command is a standard Unix command utility used to create a hard link or a symbolic link (symlink) to an existing file or directory. The use of a hard link allows multiple filenames to be associated with the same file since a hard link points to the inode of a given file, the data of which is stored on disk. On the other hand, symbolic links are special files that refer to other files by name.

In computing, a hard link is a directory entry that associates a name with a file. Thus, each file must have at least one hard link. Creating additional hard links for a file makes the contents of that file accessible via additional paths. This causes an alias effect: a process can open the file by any one of its paths and change its content. By contrast, a soft link or “shortcut” to a file is not a direct link to the data itself, but rather a reference to a hard link or another soft link.

xargs is a command on Unix and most Unix-like operating systems used to build and execute commands from standard input. It converts input from standard input into arguments to a command.

nl is a Unix utility for numbering lines, either from a file or from standard input, reproducing output on standard output.

mv is a Unix command that moves one or more files or directories from one place to another. If both filenames are on the same filesystem, this results in a simple file rename; otherwise the file content is copied to the new location and the old file is removed. Using mv requires the user to have write permission for the directories the file will move between. This is because mv changes the content of both directories involved in the move. When using the mv command on files located on the same filesystem, the file's timestamp is not updated.

rm (Unix) Unix command utility

rm is a basic command on Unix and Unix-like operating systems used to remove objects such as computer files, directories and symbolic links from file systems and also special files such as device nodes, pipes and sockets, similar to the del command in MS-DOS, OS/2, and Microsoft Windows. The command is also available in the EFI shell.

test is a command-line utility found in Unix, Plan 9, and Unix-like operating systems that evaluates conditional expressions. test was turned into a shell builtin command in 1981 with UNIX System III and at the same time made available under the alternate name [.

cpio is a general file archiver utility and its associated file format. It is primarily installed on Unix-like computer operating systems. The software utility was originally intended as a tape archiving program as part of the Programmer's Workbench (PWB/UNIX), and has been a component of virtually every Unix operating system released thereafter. Its name is derived from the phrase copy in and out, in close description of the program's use of standard input and standard output in its operation.

In computing, exec is a functionality of an operating system that runs an executable file in the context of an already existing process, replacing the previous executable. This act is also referred to as an overlay. It is especially important in Unix-like systems, although it also exists elsewhere. As no new process is created, the process identifier (PID) does not change, but the machine code, data, heap, and stack of the process are replaced by those of the new program.

Spawn in computing refers to a function that loads and executes a new child process. The current process may wait for the child to terminate or may continue to execute concurrent computing. Creating a new subprocess requires enough memory in which both the child process and the current program can execute.

In computing, which is a command for various operating systems used to identify the location of executables. The command is available in Unix and Unix-like systems, the AROS shell, for FreeDOS and for Microsoft Windows. The functionality of the which command is similar to some implementations of the type command. POSIX specifies a command named command that also covers this functionality.

A batch file is a script file in DOS, OS/2 and Microsoft Windows. It consists of a series of commands to be executed by the command-line interpreter, stored in a plain text file. A batch file may contain any command the interpreter accepts interactively and use constructs that enable conditional branching and looping within the batch file, such as IF, FOR, and GOTO labels. The term "batch" is from batch processing, meaning "non-interactive execution", though a batch file might not process a batch of multiple data.

<span class="mw-page-title-main">GNU parallel</span> Shell tool for executing jobs in parallel

GNU parallel is a command-line utility for Linux and other Unix-like operating systems which allows the user to execute shell scripts or commands in parallel. GNU parallel is free software, written by Ole Tange in Perl. It is available under the terms of GPLv3.

cat (Unix) Unix command utility

cat is a standard Unix utility that reads files sequentially, writing them to standard output. The name is derived from its function to (con)catenate files . It has been ported to a number of operating systems.

References

  1. "find(1) – Linux manual page". man7.org. Retrieved 2019-11-19.
  2. McIlroy, M. D. (1987). A Research Unix reader: annotated excerpts from the Programmer's Manual, 1971–1986 (PDF) (Technical report). CSTR. Bell Labs. 139.
  3. "libarchive/libarchive". GitHub. Retrieved 2015-10-04.
  4. Finding Files
  5. "IBM System i Version 7.2 Programming Qshell" (PDF). IBM . Retrieved 2020-09-05.
  6. 1 2 find : find files  Shell and Utilities Reference, The Single UNIX Specification , Version 4 from The Open Group
  7. find(1)    FreeBSD General Commands Manual
  8. find(1)    Linux User Manual – User Commands
  9. "google / walk: Plan 9 style utilities to replace find(1)". GitHub. Retrieved 30 March 2020.
  10. Peter, David (30 March 2020). "sharkdp/fd: A simple, fast and user-friendly alternative to 'find'". GitHub.