Metacharacter

Last updated

A metacharacter is a character that has a special meaning to a computer program, such as a shell interpreter or a regular expression (regex) engine.

Contents

In POSIX extended regular expressions, there are 14 metacharacters that must be escaped (preceded by a backslash (\)) in order to drop their special meaning and be treated literally inside an expression: opening and closing square brackets ([ and ]); backslash (\); caret (^); dollar sign ($); period/full stop/dot (.); vertical bar/pipe symbol (|); question mark (?); asterisk (*); plus and minus signs (+ and -); opening and closing curly brackets/braces ({ and }); and opening and closing parentheses (( and )).

For example, to match the arithmetic expression (1+1)*3=6 with a regex, the correct regex is \(1\+1\)\*3=6; otherwise, the parentheses, plus sign, and asterisk will have special meanings.

Other examples

Some other characters may have special meaning in some environments.

Escaping

The term "to escape a metacharacter" means to make the metacharacter ineffective (to strip it of its special meaning), causing it to have its literal meaning. For example, in PCRE, a dot (".") stands for any single character. The regular expression "A.C" will match "ABC", "A3C", or even "A C". However, if the "." is escaped, it will lose its meaning as a metacharacter and will be interpreted literally as ".", causing the regular expression "A\.C" to only match the string "A.C".

The usual way to escape a character in a regex and elsewhere is by prefixing it with a backslash ("\"). Other environments may employ different methods, like MS-DOS/Windows Command Prompt, where a caret ("^") is used instead. [2]

See also

Related Research Articles

<span class="mw-page-title-main">Regular expression</span> Sequence of characters that forms a search pattern

A regular expression is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation. Regular expression techniques are developed in theoretical computer science and formal language theory.

sed Standard UNIX utility for editing streams of data

sed is a Unix utility that parses and transforms text, using a simple, compact programming language. It was developed from 1973 to 1974 by Lee E. McMahon of Bell Labs, and is available today for most operating systems. sed was based on the scripting features of the interactive editor ed and the earlier qed. It was one of the earliest tools to support regular expressions, and remains in use for text processing, most notably with the substitution command. Popular alternative tools for plaintext string manipulation and "stream editing" include AWK and Perl.

In computing and telecommunication, an escape character is a character that invokes an alternative interpretation on the following characters in a character sequence. An escape character is a particular case of metacharacters. Generally, the judgement of whether something is an escape character or not depends on the context.

In software, a wildcard character is a kind of placeholder represented by a single character, such as an asterisk, which can be interpreted as a number of literal characters or an empty string. It is often used in file searches so the full name need not be typed.

In computer science, an escape sequence is a combination of characters that has a meaning other than the literal characters contained therein; it is marked by one or more preceding characters.

<span class="mw-page-title-main">COMMAND.COM</span> Default command line for MS-DOS and Windows 9x

COMMAND.COM is the default command-line interpreter for MS-DOS, Windows 95, Windows 98 and Windows Me. In the case of DOS, it is the default user interface as well. It has an additional role as the usual first program run after boot, hence being responsible for setting up the system by running the AUTOEXEC.BAT configuration file, and being the ancestor of all processes.

<span class="mw-page-title-main">C shell</span> Unix shell

The C shell is a Unix shell created by Bill Joy while he was a graduate student at University of California, Berkeley in the late 1970s. It has been widely distributed, beginning with the 2BSD release of the Berkeley Software Distribution (BSD) which Joy first distributed in 1978. Other early contributors to the ideas or the code were Michael Ubell, Eric Allman, Mike O'Brien and Jim Kulp.

dir (command) Directory information command on various operating systems

In computing, dir (directory) is a command in various computer operating systems used for computer file and directory listing. It is one of the basic commands to help navigate the file system. The command is usually implemented as an internal command in the command-line interpreter (shell). On some systems, a more graphical representation of the directory structure can be displayed using the tree command.

The backslash\ is a typographical mark used mainly in computing and mathematics. It is the mirror image of the common slash /. It is a relatively recent mark, first documented in the 1930s. It is sometimes called a hack, whack, escape, reverse slash, slosh, downwhack, backslant, backwhack, bash, reverse slant, reverse solidus, and reversed virgule.

A string literal or anonymous string is a literal for a string value in the source code of a computer program. Modern programming languages commonly use a quoted sequence of characters, formally "bracketed delimiters", as in x = "foo", where "foo" is a string literal with value foo. Methods such as escape sequences can be used to avoid the problem of delimiter collision and allow the delimiters to be embedded in a string. There are many alternate notations for specifying string literals especially in complicated cases. The exact notation depends on the programming language in question. Nevertheless, there are general guidelines that most modern programming languages follow.

A path is a string of characters used to uniquely identify a location in a directory structure. It is composed by following the directory tree hierarchy in which components, separated by a delimiting character, represent each directory. The delimiting character is most commonly the slash ("/"), the backslash character ("\"), or colon (":"), though some operating systems may use a different delimiter. Paths are used extensively in computer science to represent the directory/file relationships common in modern operating systems and are essential in the construction of Uniform Resource Locators (URLs). Resources can be represented by either absolute or relative paths.

In computer programming, glob patterns specify sets of filenames with wildcard characters. For example, the Unix Bash shell command mv *.txttextfiles/ moves all files with names ending in .txt from the current directory to the directory textfiles. Here, * is a wildcard standing for "any string of characters except /" and *.txt is a glob pattern. The other common wildcard is the question mark (?), which stands for one character. For example, mv?.txtshorttextfiles/ will move all files named with a single character followed by .txt from the current directory to directory shorttextfiles, while ??.txt would match all files whose name consists of 2 characters followed by .txt.

In computing, echo is a command that outputs the strings that are passed to it as arguments. It is a command available in various operating system shells and typically used in shell scripts and batch files to output status text to the screen or a computer file, or as a source part of a pipeline.

Command-line completion is a common feature of command-line interpreters, in which the program automatically fills in partially typed commands.

Command Prompt, also known as cmd.exe or cmd, is the default command-line interpreter for the OS/2, eComStation, ArcaOS, Microsoft Windows, and ReactOS operating systems. On Windows CE .NET 4.2, Windows CE 5.0 and Windows Embedded CE 6.0 it is referred to as the Command Processor Shell. Its implementations differ between operating systems, but the behavior and basic set of commands are consistent. cmd.exe is the counterpart of COMMAND.COM in DOS and Windows 9x systems, and analogous to the Unix shells used on Unix-like systems. The initial version of cmd.exe for Windows NT was developed by Therese Stowell. Windows CE 2.11 was the first embedded Windows release to support a console and a Windows CE version of cmd.exe. The ReactOS implementation of cmd.exe is derived from FreeCOM, the FreeDOS command line interpreter.

<span class="mw-page-title-main">Comparison of command shells</span>

A command shell is a command-line interface to interact with and manipulate a computer's operating system.

In computer programming, leaning toothpick syndrome (LTS) is the situation in which a quoted expression becomes unreadable because it contains a large number of escape characters, usually backslashes ("\"), to avoid delimiter collision.

A batch file is a script file in DOS, OS/2 and Microsoft Windows. It consists of a series of commands to be executed by the command-line interpreter, stored in a plain text file. A batch file may contain any command the interpreter accepts interactively and use constructs that enable conditional branching and looping within the batch file, such as IF, FOR, and GOTO labels. The term "batch" is from batch processing, meaning "non-interactive execution", though a batch file might not process a batch of multiple data.

find (Windows)

In computing, find is a command in the command-line interpreters (shells) of a number of operating systems. It is used to search for a specific text string in a file or files. The command sends the specified lines to the standard output device.

<span class="mw-page-title-main">Command-line interface</span> Computer interface that uses text

A command-line interface (CLI) is a means of interacting with a device or computer program with successive lines of text. Typically a user or client sends characters to a process or device, and the process or device responds with lines of text as well. Such access was first provided by computer terminals starting in the mid-1960s. This provided an interactive environment not available with punched cards or other input methods.

References

  1. "Character entity references in HTML 4". www.w3.org. W3C. December 24, 1999. Retrieved 2018-11-19.
  2. 1 2 3 "Command shell overview". docs.microsoft.com. Microsoft. September 10, 2009. Retrieved 2018-11-19.
  3. "The Open Group Base Specifications Issue 7: fprintf". pubs.opengroup.org. The Open Group. 2018. Retrieved 2018-11-19.
  4. 1 2 "LIKE (Transact-SQL)". docs.microsoft.com. Microsoft. March 14, 2017. Retrieved 2018-11-19.