Grep

Last updated
grep
Original author(s) Ken Thompson [1] [2]
Developer(s) AT&T Bell Laboratories
Initial releaseNovember 1973;51 years ago (1973-11) [1]
Written in C
Operating system Unix, Unix-like, Plan 9, Inferno, OS-9, MSX-DOS, IBM i
Platform Cross-platform
Type Command

grep is a command-line utility for searching plaintext datasets for lines that match a regular expression. Its name comes from the ed command g/re/p (global regular expression search and print), which has the same effect. [3] [4] grep was originally developed for the Unix operating system, but later became available for all Unix-like systems and some others such as OS-9. [5]

Contents

History

Before it was named, grep was a private utility written by Ken Thompson to search files for certain patterns. Doug McIlroy, unaware of its existence, asked Thompson to write such a program. Responding that he would think about such a utility overnight, Thompson actually corrected bugs and made improvements for about an hour on his own program called "s" (short for "search"). The next day he presented the program to McIlroy, who said it was exactly what he wanted. Thompson's account may explain the belief that grep was written overnight. [6]

Thompson wrote the first version in PDP-11 assembly language to help Lee E. McMahon analyze the text of The Federalist Papers to determine authorship of the individual papers. [7] The ed text editor (also authored by Thompson) had regular expression support but could not be used to search through such a large amount of text, as it loaded the entire file into memory to enable random access editing, so Thompson excerpted that regexp code into a standalone tool which would instead process arbitrarily long files sequentially without buffering too much into memory. [1] He chose the name because in ed, the command g/re/p would print all lines featuring a specified pattern match. [8] [9] grep was first included in Version 4 Unix. Stating that it is "generally cited as the prototypical software tool", McIlroy credited grep with "irrevocably ingraining" Thompson's tools philosophy in Unix. [10]

Implementations

A variety of grep implementations are available in many operating systems and software development environments. [11] Early variants included egrep and fgrep, introduced in Version 7 Unix. [10] The egrep variant supports an extended regular expression syntax added by Alfred Aho after Ken Thompson's original regular expression implementation. [12] The "fgrep" variant searches for any of a list of fixed strings using the Aho–Corasick string matching algorithm. [13] Binaries of these variants exist in modern systems, usually linking to grep or calling grep as a shell script with the appropriate flag added, e.g. exec grep -E "$@". egrep and fgrep, while commonly deployed on POSIX systems, to the point the POSIX specification mentions their widespread existence, are actually not part of POSIX. [14]

Other commands contain the word "grep" to indicate they are search tools, typically ones that rely on regular expression matches. The pgrep utility, for instance, displays the processes whose names match a given regular expression. [15]

In the Perl programming language, grep is a built-in function that finds elements in a list that satisfy a certain property. [16] This higher-order function is typically named filter or where in other languages.

The pcregrep command is an implementation of grep that uses Perl regular expression syntax. [17] Similar functionality can be invoked in the GNU version of grep with the -P flag. [18]

Ports of grep (within Cygwin and GnuWin32, for example) also run under Microsoft Windows. Some versions of Windows feature the similar qgrep or findstr command. [19]

A grep command is also part of ASCII's MSX-DOS2 Tools for MSX-DOS version 2. [20]

The grep, egrep, and fgrep commands have also been ported to the IBM i operating system. [21]

The software Adobe InDesign has functions GREP (since CS3 version (2007) [22] ), in the find/change dialog box [23] "GREP" tab, and introduced with InDesign CS4 [24] in paragraph styles [25] "GREP styles".

agrep

agrep (approximate grep) is an open-source approximate string matching program, developed by Udi Manber and Sun Wu between 1988 and 1991, [26] for use with the Unix operating system. It was later ported to OS/2, DOS, and Windows.

agrep matches even when the text only approximately fits the search pattern. [27]

This following invocation finds netmasks in file myfile, but also any other word that can be derived from it, given no more than two substitutions.

agrep -2 netmasks myfile

This example generates a list of matches with the closest, that is those with the fewest, substitutions listed first. The command flag -B means "best":

agrep -B netmasks myfile

Usage as a verb

In December 2003, the Oxford English Dictionary Online added "grep" as both a noun and a verb. [28]

A common verb usage is the phrase "You can't grep dead trees"—meaning one can more easily search through digital media, using tools such as grep, than one could with a hard copy (i.e. one made from "dead trees", which in this context is a dysphemism for paper). [29]

See also

Related Research Articles

<span class="mw-page-title-main">AWK</span> Programming language

AWK is a domain-specific language designed for text processing and typically used as a data extraction and reporting tool. Like sed and grep, it is a filter, and it is a standard feature of most Unix-like operating systems.

ed (software) Line-oriented text editor for Unix

ed is a line editor for Unix and Unix-like operating systems. It was one of the first parts of the Unix operating system that was developed, in August 1969. It remains part of the POSIX and Open Group standards for Unix-based operating systems, alongside the more sophisticated full-screen editor vi.

<span class="mw-page-title-main">Regular expression</span> Sequence of characters that forms a search pattern

A regular expression, sometimes referred to as rational expression, is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation. Regular expression techniques are developed in theoretical computer science and formal language theory.

sed Standard UNIX utility for editing streams of data

sed is a Unix utility that parses and transforms text, using a simple, compact programming language. It was developed from 1973 to 1974 by Lee E. McMahon of Bell Labs, and is available today for most operating systems. sed was based on the scripting features of the interactive editor ed and the earlier qed. It was one of the earliest tools to support regular expressions, and remains in use for text processing, most notably with the substitution command. Popular alternative tools for plaintext string manipulation and "stream editing" include AWK and Perl.

uniq is a utility command on Unix, Plan 9, Inferno, and Unix-like operating systems which, when fed a text file or standard input, outputs the text with adjacent identical lines collapsed to one, unique line of text.

<span class="mw-page-title-main">Newline</span> Special characters in computing signifying the end of a line of text

A newline is a control character or sequence of control characters in character encoding specifications such as ASCII, EBCDIC, Unicode, etc. This character, or a sequence of characters, is used to signify the end of a line of text and the start of a new one.

<span class="mw-page-title-main">Sam (text editor)</span> Multi-file text editor

Sam is a multi-file text editor based on structural regular expressions. It was originally designed in the early 1980s at Bell Labs by Rob Pike with the help of Ken Thompson and other Unix developers for the Blit windowing terminal running on v9 Unix; it was later ported to other systems. Sam follows a classical modular Unix aesthetic. It is internally simple, its power leveraged by the composability of a small command language and extensibility through shell integration.

patch (Unix) Unix utility to apply changes to text files

The computer tool patch is a Unix program that updates text files according to instructions contained in a separate file, called a patch file. The patch file is a text file that consists of a list of differences and is produced by running the related diff program with the original and updated file as arguments. Updating files with patch is often referred to as applying the patch or simply patching the files.

glob (programming) Patterns used in computer programming

glob is a libc function for globbing, which is the archetypal use of pattern matching against the names in a filesystem directory such that a name pattern is expanded into a list of names matching that pattern. Although globbing may now refer to glob -style pattern matching of any string, not just expansion into a list of filesystem names, the original meaning of the term is still widespread.

xargs is a command on Unix and most Unix-like operating systems used to build and execute commands from standard input. It converts input from standard input into arguments to a command.

tr (Unix) Unix text formatting utility

tr is a command in Unix, Plan 9, Inferno, and Unix-like operating systems. It is an abbreviation of translate or transliterate, indicating its operation of replacing or removing specific characters in its input data set.

wc (Unix) Unix command utility

wc is a command in Unix, Plan 9, Inferno, and Unix-like operating systems. The program reads either standard input or a list of computer files and generates one or more of the following statistics: newline count, word count, and byte count. If a list of files is provided, both individual file and total statistics follow.

nl is a Unix utility for numbering lines, either from a file or from standard input, reproducing output on standard output.

In Unix-like operating systems, find is a command-line utility that locates files based on some user-specified criteria and either prints the pathname of each matched object or, if another action is requested, performs that action on each matched object.

tail is a program available on Unix, Unix-like systems, FreeDOS and MSX-DOS used to display the tail end of a text file or piped data.

A filter is a computer program or subroutine to process a stream, producing another stream. While a single filter can be used individually, they are frequently strung together to form a pipeline.

sort (Unix) Standard UNIX utility

In computing, sort is a standard command line program of Unix and Unix-like operating systems, that prints the lines of its input or concatenation of all files listed in its argument list in sorted order. Sorting is done based on one or more sort keys extracted from each line of input. By default, the entire input is taken as sort key. Blank space is the default field separator. The command supports a number of command-line options that can vary by implementation. For instance the "-r" flag will reverse the sort order.

find (Windows) Command

In computing, find is a command in the command-line interpreters (shells) of a number of operating systems. It is used to search for a specific text string in a file or files. The command sends the specified lines to the standard output device.

<span class="mw-page-title-main">Unix</span> Family of computer operating systems

Unix is a family of multitasking, multi-user computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, and others. Initially intended for use inside the Bell System, AT&T licensed Unix to outside parties in the late 1970s, leading to a variety of both academic and commercial Unix variants from vendors including University of California, Berkeley (BSD), Microsoft (Xenix), Sun Microsystems (SunOS/Solaris), HP/HPE (HP-UX), and IBM (AIX).

cat (Unix) Unix command utility

cat is a standard Unix utility that reads files sequentially, writing them to standard output. The name is derived from its function to (con)catenate files . It has been ported to a number of operating systems.

References

  1. 1 2 3 Kernighan, Brian (1984). The Unix Programming Environment . Prentice Hall. pp.  102. ISBN   0-13-937681-X.
  2. “grep was a private command of mine for quite a while before i made it public.” -Ken Thompson Archived 2015-05-26 at the Wayback Machine , By Benjamin Rualthanzauva, Published on Feb 5, 2014, Medium
  3. Hauben et al. 1997, Ch. 9
  4. Raymond, Eric. "grep". Jargon File. Archived from the original on 2006-06-17. Retrieved 2006-06-29.
  5. Paul S. Dayan (1992). The OS-9 Guru - 1 : The Facts. Galactic Industrial Limited. ISBN   0-9519228-0-7.
  6. VCF East 2019 -- Brian Kernighan interviews Ken Thompson (video). YouTube. 6 May 2019. Archived from the original on 2021-12-11. (35 mins)
  7. Computerphile, Where GREP Came From , interview with Brian Kernighan
  8. "ed regexes". perl.plover.com. Archived from the original on 20 October 2017. Retrieved 24 April 2018.
  9. "How Grep Got its Name". robots.thoughtbot.com. Archived from the original on 9 August 2017. Retrieved 24 April 2018.
  10. 1 2 McIlroy, M. D. (1987). A Research Unix reader: annotated excerpts from the Programmer's Manual, 1971–1986 (PDF) (Technical report). CSTR. Bell Labs. 139. Archived (PDF) from the original on 2017-11-11.
  11. Abou-Assaleh, Tony; Wei Ai (March 2004). Survey of Global Regular Expression Print (GREP) Tools (Technical report). Dalhousie University.
  12. Hume, Andrew (1988). "A Tale of Two Greps". Software: Practice and Experience. 18 (11): 1063. doi:10.1002/spe.4380181105. S2CID   6395770.
  13. Meurant, Gerard (12 Sep 1990). Algorithms and Complexity. Elsevier Science. p. 278. ISBN   9780080933917. Archived from the original on 4 March 2016. Retrieved 12 December 2015.
  14. "grep". www.pubs.opengroup.org. The Open Group. Archived from the original on 28 November 2015. Retrieved 12 December 2015.
  15. "pgrep(1)". www.linux.die.net. Archived from the original on 22 December 2015. Retrieved 12 December 2015.
  16. "grep". www.perldoc.perl.org. Archived from the original on 7 December 2015. Retrieved 12 December 2015.
  17. "pcregrep man page". www.pcre.org. University of Cambridge. Archived from the original on 23 December 2015. Retrieved 12 December 2015.
  18. "grep(1)". www.linux.die.net. Archived from the original on 10 December 2015. Retrieved 12 December 2015.
  19. Spalding, George (2000). Windows 2000 administration . Network professional's library. Osborne/McGraw-Hill. pp.  634. ISBN   978-0-07-882582-8 . Retrieved 2010-12-10. QGREP.EXE[:] A similar tool to grep in UNIX, this tool can be used to search for a text string
  20. "MSX-DOS2 Tools User's Manual by ASCII Corporation". April 1993.
  21. IBM. "IBM System i Version 7.2 Programming Qshell" (PDF). IBM . Retrieved 2020-09-05.
  22. "Review: Adobe InDesign CS3 - CreativePro.com". creativepro.com. 20 April 2007. Archived from the original on 5 January 2018. Retrieved 24 April 2018.
  23. "InDesign Help: find/change". Archived from the original on 2016-08-28. Retrieved 2016-08-12.
  24. "InDesign: GREP Styles (1) Setting text between parentheses in Italic". Archived from the original on 2017-09-24. Retrieved 2018-01-05.
  25. "InDesign Help: GREP styles". Archived from the original on 2016-08-28. Retrieved 2016-08-12.
  26. Wu, Sun; Manber, Udi (20–24 January 1992). Agrep -- a fast approximate pattern-matching tool. 1992 Winter USENIX Conference. San Francisco, California. CiteSeerX   10.1.1.89.5424 .
  27. S. Lee Henry (June 1998). "Proper Searching". Sun Expert. pp. 35–26.
  28. "New words list December 2003". Oxford English Dictionary. Retrieved 2021-12-06.
  29. Jargon File , article "Documentation"
Notes