Original author(s) | Ken Thompson [1] [2] |
---|---|
Developer(s) | AT&T Bell Laboratories |
Initial release | November 1973 [1] |
Written in | C |
Operating system | Unix, Unix-like, Plan 9, Inferno, OS-9, MSX-DOS, IBM i |
Platform | Cross-platform |
Type | Command |
grep
is a command-line utility for searching plaintext datasets for lines that match a regular expression. Its name comes from the ed command g/re/p
(global regular expression search and print), which has the same effect. [3] [4] grep
was originally developed for the Unix operating system, but later became available for all Unix-like systems and some others such as OS-9. [5]
Before it was named, grep was a private utility written by Ken Thompson to search files for certain patterns. Doug McIlroy, unaware of its existence, asked Thompson to write such a program. Responding that he would think about such a utility overnight, Thompson actually corrected bugs and made improvements for about an hour on his own program called s
(short for "search"). The next day he presented the program to McIlroy, who said it was exactly what he wanted. Thompson's account may explain the belief that grep was written overnight. [6]
Thompson wrote the first version in PDP-11 assembly language to help Lee E. McMahon analyze the text of The Federalist Papers to determine authorship of the individual papers. [7] The ed text editor (also authored by Thompson) had regular expression support but could not be used to search through such a large amount of text, as it loaded the entire file into memory to enable random access editing, so Thompson excerpted that regexp code into a standalone tool which would instead process arbitrarily long files sequentially without buffering too much into memory. [1] He chose the name because in ed, the command g/re/p would print all lines featuring a specified pattern match. [8] [9] grep
was first included in Version 4 Unix. Stating that it is "generally cited as the prototypical software tool", McIlroy credited grep
with "irrevocably ingraining" Thompson's tools philosophy in Unix. [10]
A variety of grep
implementations are available in many operating systems and software development environments. [11] Early variants included egrep
and fgrep
, introduced in Version 7 Unix. [10] The "egrep
" variant supports an extended regular expression syntax added by Alfred Aho after Ken Thompson's original regular expression implementation. [12] The "fgrep
" variant searches for any of a list of fixed strings using the Aho–Corasick string matching algorithm. [13] Binaries of these variants exist in modern systems, usually linking to grep
or calling grep as a shell script with the appropriate flag added, e.g. exec grep -E "$@"
. egrep
and fgrep
, while commonly deployed on POSIX systems, to the point the POSIX specification mentions their widespread existence, are actually not part of POSIX. [14]
Other commands contain the word "grep" to indicate they are search tools, typically ones that rely on regular expression matches. The pgrep
utility, for instance, displays the processes whose names match a given regular expression. [15]
In the Perl programming language, grep is the name of the built-in function that finds elements in a list that satisfy a certain property. [16] This higher-order function is typically named filter
or where
in other languages.
The pcregrep
command is an implementation of grep
that uses Perl regular expression syntax. [17] Similar functionality can be invoked in the GNU version of grep
with the -P
flag. [18]
Ports of grep
(within Cygwin and GnuWin32, for example) also run under Microsoft Windows. Some versions of Windows feature the similar qgrep
or findstr
command. [19]
A grep
command is also part of ASCII's MSX-DOS2 Tools for MSX-DOS version 2. [20]
The grep, egrep, and fgrep commands have also been ported to the IBM i operating system. [21]
The software Adobe InDesign has functions GREP (since CS3 version (2007) [22] ), in the find/change dialog box [23] "GREP" tab, and introduced with InDesign CS4 [24] in paragraph styles [25] "GREP styles".
agrep (approximate grep) is an open-source approximate string matching program, developed by Udi Manber and Sun Wu between 1988 and 1991, [26] for use with the Unix operating system. It was later ported to OS/2, DOS, and Windows.
agrep (approximate grep) matches even when the text only approximately fits the search pattern. [27]
This following invocation finds netmasks in file myfile, but also any other word that can be derived from it, given no more than two substitutions.
agrep -2 netmasks myfile
This example generates a list of matches with the closest, that is those with the fewest, substitutions listed first. The command flag B means best:
agrep -B netmasks myfile
In December 2003, the Oxford English Dictionary Online added "grep" as both a noun and a verb. [28]
A common verb usage is the phrase "You can't grep dead trees"—meaning one can more easily search through digital media, using tools such as grep
, than one could with a hard copy (i.e. one made from "dead trees", which in this context is a dysphemism for paper). [29]
grep
grep
grep
"AWK is a domain-specific language designed for text processing and typically used as a data extraction and reporting tool. Like sed and grep, it is a filter, and is a standard feature of most Unix-like operating systems.
ed is a line editor for Unix and Unix-like operating systems. It was one of the first parts of the Unix operating system that was developed, in August 1969. It remains part of the POSIX and Open Group standards for Unix-based operating systems, alongside the more sophisticated full-screen editor vi.
A regular expression, sometimes referred to as rational expression, is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation. Regular expression techniques are developed in theoretical computer science and formal language theory.
sed is a Unix utility that parses and transforms text, using a simple, compact programming language. It was developed from 1973 to 1974 by Lee E. McMahon of Bell Labs, and is available today for most operating systems. sed was based on the scripting features of the interactive editor ed and the earlier qed. It was one of the earliest tools to support regular expressions, and remains in use for text processing, most notably with the substitution command. Popular alternative tools for plaintext string manipulation and "stream editing" include AWK and Perl.
A shell script is a computer program designed to be run by a Unix shell, a command-line interpreter. The various dialects of shell scripts are considered to be command languages. Typical operations performed by shell scripts include file manipulation, program execution, and printing text. A script which sets up the environment, runs the program, and does any necessary cleanup or logging, is called a wrapper.
uniq
is a utility command on Unix, Plan 9, Inferno, and Unix-like operating systems which, when fed a text file or standard input, outputs the text with adjacent identical lines collapsed to one, unique line of text.
A newline is a control character or sequence of control characters in character encoding specifications such as ASCII, EBCDIC, Unicode, etc. This character, or a sequence of characters, is used to signify the end of a line of text and the start of a new one.
Sam is a multi-file text editor based on structural regular expressions. It was originally designed in the early 1980s at Bell Labs by Rob Pike with the help of Ken Thompson and other Unix developers for the Blit windowing terminal running on v9 Unix; it was later ported to other systems. Sam follows a classical modular Unix aesthetic. It is internally simple, its power leveraged by the composability of a small command language and extensibility through shell integration.
The computer tool patch is a Unix program that updates text files according to instructions contained in a separate file, called a patch file. The patch file is a text file that consists of a list of differences and is produced by running the related diff program with the original and updated file as arguments. Updating files with patch is often referred to as applying the patch or simply patching the files.
In computer programming, glob patterns specify sets of filenames with wildcard characters. For example, the Unix Bash shell command mv *.txttextfiles/
moves all files with names ending in .txt
from the current directory to the directory textfiles
. Here, *
is a wildcard and *.txt
is a glob pattern. The wildcard *
stands for "any string of any length including empty, but excluding the path separator characters ".
xargs is a command on Unix and most Unix-like operating systems used to build and execute commands from standard input. It converts input from standard input into arguments to a command.
tr is a command in Unix, Plan 9, Inferno, and Unix-like operating systems. It is an abbreviation of translate or transliterate, indicating its operation of replacing or removing specific characters in its input data set.
wc
is a command in Unix, Plan 9, Inferno, and Unix-like operating systems. The program reads either standard input or a list of computer files and generates one or more of the following statistics: newline count, word count, and byte count. If a list of files is provided, both individual file and total statistics follow.
nl is a Unix utility for numbering lines, either from a file or from standard input, reproducing output on standard output.
In Unix-like operating systems, find
is a command-line utility that locates files based on some user-specified criteria and either prints the pathname of each matched object or, if another action is requested, performs that action on each matched object.
tail is a program available on Unix, Unix-like systems, FreeDOS and MSX-DOS used to display the tail end of a text file or piped data.
UnxUtils is a collection of ports of common GNU Unix-like utilities to native Win32, with executables only depending on the Microsoft C-runtime msvcrt.dll. The collection was last updated externally on April 15, 2003, by Karl M. Syring. The most recent release package available as of December 2016 was an open-source project, UnxUtils at SourceForge, with the latest binary release in March, 2007. The independent distribution included a main zip archive complemented by more recent updates, but the SourceForge project has no UnxUpdates.zip package. An alternative source of Unix-like utilities for Windows is GnuWin32; it has later versions of many programs, but requires supporting files in many cases.
Unix is a family of multitasking, multi-user computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, and others.
The following outline is provided as an overview of and topical guide to the Perl programming language:
cat
is a standard Unix utility that reads files sequentially, writing them to standard output. The name is derived from its function to (con)catenate files . It has been ported to a number of operating systems.
QGREP.EXE[:] A similar tool to grep in UNIX, this tool can be used to search for a text string