Original author(s) | Ken Thompson [1] [2] |
---|---|
Developer(s) | AT&T Bell Laboratories |
Initial release | November 1973 [1] |
Written in | C |
Operating system | Unix, Unix-like, Plan 9, Inferno, OS-9, MSX-DOS, IBM i |
Platform | Cross-platform |
Type | Command |
grep is a command-line utility for searching plaintext datasets for lines that match a regular expression. Its name comes from the ed command g/re/p
(global regular expression search and print), which has the same effect. [3] [4] grep was originally developed for the Unix operating system, but later became available for all Unix-like systems and some others such as OS-9. [5]
Before it was named, grep was a private utility written by Ken Thompson to search files for certain patterns. Doug McIlroy, unaware of its existence, asked Thompson to write such a program. Responding that he would think about such a utility overnight, Thompson actually corrected bugs and made improvements for about an hour on his own program called "s" (short for "search"). The next day he presented the program to McIlroy, who said it was exactly what he wanted. Thompson's account may explain the belief that grep was written overnight. [6]
Thompson wrote the first version in PDP-11 assembly language to help Lee E. McMahon analyze the text of The Federalist Papers to determine authorship of the individual papers. [7] The ed text editor (also authored by Thompson) had regular expression support but could not be used to search through such a large amount of text, as it loaded the entire file into memory to enable random access editing, so Thompson excerpted that regexp code into a standalone tool which would instead process arbitrarily long files sequentially without buffering too much into memory. [1] He chose the name because in ed, the command g/re/p
would print all lines featuring a specified pattern match. [8] [9] grep was first included in Version 4 Unix. Stating that it is "generally cited as the prototypical software tool", McIlroy credited grep with "irrevocably ingraining" Thompson's tools philosophy in Unix. [10]
A variety of grep implementations are available in many operating systems and software development environments. [11] Early variants included egrep and fgrep, introduced in Version 7 Unix. [10] The egrep variant supports an extended regular expression syntax added by Alfred Aho after Ken Thompson's original regular expression implementation. [12] The "fgrep" variant searches for any of a list of fixed strings using the Aho–Corasick string matching algorithm. [13] Binaries of these variants exist in modern systems, usually linking to grep or calling grep as a shell script with the appropriate flag added, e.g. exec grep -E "$@"
. egrep and fgrep, while commonly deployed on POSIX systems, to the point the POSIX specification mentions their widespread existence, are actually not part of POSIX. [14]
Other commands contain the word "grep" to indicate they are search tools, typically ones that rely on regular expression matches. The pgrep utility, for instance, displays the processes whose names match a given regular expression. [15]
In the Perl programming language, grep
is a built-in function that finds elements in a list that satisfy a certain property. [16] This higher-order function is typically named filter
or where
in other languages.
The pcregrep command is an implementation of grep that uses Perl regular expression syntax. [17] Similar functionality can be invoked in the GNU version of grep with the -P
flag. [18]
Ports of grep (within Cygwin and GnuWin32, for example) also run under Microsoft Windows. Some versions of Windows feature the similar qgrep or findstr command. [19]
A grep command is also part of ASCII's MSX-DOS2 Tools for MSX-DOS version 2. [20]
The grep, egrep, and fgrep commands have also been ported to the IBM i operating system. [21]
The software Adobe InDesign has functions GREP (since CS3 version (2007) [22] ), in the find/change dialog box [23] "GREP" tab, and introduced with InDesign CS4 [24] in paragraph styles [25] "GREP styles".
agrep (approximate grep) is an open-source approximate string matching program, developed by Udi Manber and Sun Wu between 1988 and 1991, [26] for use with the Unix operating system. It was later ported to OS/2, DOS, and Windows.
agrep matches even when the text only approximately fits the search pattern. [27]
This following invocation finds netmasks
in file myfile
, but also any other word that can be derived from it, given no more than two substitutions.
agrep -2 netmasks myfile
This example generates a list of matches with the closest, that is those with the fewest, substitutions listed first. The command flag -B
means "best":
agrep -B netmasks myfile
In December 2003, the Oxford English Dictionary Online added "grep" as both a noun and a verb. [28]
A common verb usage is the phrase "You can't grep dead trees"—meaning one can more easily search through digital media, using tools such as grep, than one could with a hard copy (i.e. one made from "dead trees", which in this context is a dysphemism for paper). [29]
AWK is a domain-specific language designed for text processing and typically used as a data extraction and reporting tool. Like sed and grep, it is a filter, and it is a standard feature of most Unix-like operating systems.
ed is a line editor for Unix and Unix-like operating systems. It was one of the first parts of the Unix operating system that was developed, in August 1969. It remains part of the POSIX and Open Group standards for Unix-based operating systems, alongside the more sophisticated full-screen editor vi.
A regular expression, sometimes referred to as rational expression, is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation. Regular expression techniques are developed in theoretical computer science and formal language theory.
sed is a Unix utility that parses and transforms text, using a simple, compact programming language. It was developed from 1973 to 1974 by Lee E. McMahon of Bell Labs, and is available today for most operating systems. sed was based on the scripting features of the interactive editor ed and the earlier qed. It was one of the earliest tools to support regular expressions, and remains in use for text processing, most notably with the substitution command. Popular alternative tools for plaintext string manipulation and "stream editing" include AWK and Perl.
uniq
is a utility command on Unix, Plan 9, Inferno, and Unix-like operating systems which, when fed a text file or standard input, outputs the text with adjacent identical lines collapsed to one, unique line of text.
A newline is a control character or sequence of control characters in character encoding specifications such as ASCII, EBCDIC, Unicode, etc. This character, or a sequence of characters, is used to signify the end of a line of text and the start of a new one.
Sam is a multi-file text editor based on structural regular expressions. It was originally designed in the early 1980s at Bell Labs by Rob Pike with the help of Ken Thompson and other Unix developers for the Blit windowing terminal running on v9 Unix; it was later ported to other systems. Sam follows a classical modular Unix aesthetic. It is internally simple, its power leveraged by the composability of a small command language and extensibility through shell integration.
The computer tool patch is a Unix program that updates text files according to instructions contained in a separate file, called a patch file. The patch file is a text file that consists of a list of differences and is produced by running the related diff program with the original and updated file as arguments. Updating files with patch is often referred to as applying the patch or simply patching the files.
glob is a libc function for globbing, which is the archetypal use of pattern matching against the names in a filesystem directory such that a name pattern is expanded into a list of names matching that pattern. Although globbing may now refer to glob -style pattern matching of any string, not just expansion into a list of filesystem names, the original meaning of the term is still widespread.
xargs is a command on Unix and most Unix-like operating systems used to build and execute commands from standard input. It converts input from standard input into arguments to a command.
tr is a command in Unix, Plan 9, Inferno, and Unix-like operating systems. It is an abbreviation of translate or transliterate, indicating its operation of replacing or removing specific characters in its input data set.
wc
is a command in Unix, Plan 9, Inferno, and Unix-like operating systems. The program reads either standard input or a list of computer files and generates one or more of the following statistics: newline count, word count, and byte count. If a list of files is provided, both individual file and total statistics follow.
nl is a Unix utility for numbering lines, either from a file or from standard input, reproducing output on standard output.
In Unix-like operating systems, find
is a command-line utility that locates files based on some user-specified criteria and either prints the pathname of each matched object or, if another action is requested, performs that action on each matched object.
tail is a program available on Unix, Unix-like systems, FreeDOS and MSX-DOS used to display the tail end of a text file or piped data.
A filter is a computer program or subroutine to process a stream, producing another stream. While a single filter can be used individually, they are frequently strung together to form a pipeline.
In computing, sort is a standard command line program of Unix and Unix-like operating systems, that prints the lines of its input or concatenation of all files listed in its argument list in sorted order. Sorting is done based on one or more sort keys extracted from each line of input. By default, the entire input is taken as sort key. Blank space is the default field separator. The command supports a number of command-line options that can vary by implementation. For instance the "-r
" flag will reverse the sort order.
In computing, find
is a command in the command-line interpreters (shells) of a number of operating systems. It is used to search for a specific text string in a file or files. The command sends the specified lines to the standard output device.
Unix is a family of multitasking, multi-user computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, and others. Initially intended for use inside the Bell System, AT&T licensed Unix to outside parties in the late 1970s, leading to a variety of both academic and commercial Unix variants from vendors including University of California, Berkeley (BSD), Microsoft (Xenix), Sun Microsystems (SunOS/Solaris), HP/HPE (HP-UX), and IBM (AIX).
cat
is a standard Unix utility that reads files sequentially, writing them to standard output. The name is derived from its function to (con)catenate files . It has been ported to a number of operating systems.
QGREP.EXE[:] A similar tool to grep in UNIX, this tool can be used to search for a text string