Escape character

Last updated

In computing and telecommunications, an escape character is a character that invokes an alternative interpretation on the following characters in a character sequence. An escape character is a particular case of metacharacters. Generally, the judgement of whether something is an escape character or not depends on the context.

Contents

In the telecommunications field, escape characters are used to indicate that the following characters are encoded differently. This is used to alter control characters that would otherwise be noticed and acted on by the underlying telecommunications hardware, such as illegal characters. In this context, the use of escape characters is often referred to as quoting.

Definition

An escape character may not have its own meaning, so all escape sequences are of two or more characters.

Escape characters are part of the syntax for many programming languages, data formats, and communication protocols. For a given alphabet an escape character's purpose is to start character sequences (so named escape sequences), which have to be interpreted differently from the same characters occurring without the prefixed escape character.

The functions of escape sequences include:

Control character

Generally, an escape character is not a particular case of (device) control characters, nor vice versa. If we define control characters as non-graphic, or as having a special meaning for an output device (e.g. printer or text terminal) then any escape character for this device is a control one. But escape characters used in programming (such as the backslash, "\") are graphic, hence are not control characters. Conversely most (but not all) of the ASCII "control characters" have some control function in isolation, therefore they are not escape characters.

In many programming languages, an escape character also forms some escape sequences which are referred to as control characters. For example, line break has an escape sequence of \n.

Examples

JavaScript

JavaScript uses the \ (backslash) as an escape character for: [1] [2]

The \v and \0 escapes are not allowed in JSON strings.

Example code:

console.log("Using \\n \nWill shift the characters after \\n one row down")console.log("Using \\t \twill shift the characters after \\t one tab length to the right")console.log("Using \\r \rWill imitate a carriage return, which means shifting to the start of the row")// can be used to clear the screen on some terminals. Windows uses \r\n instead of \n alone

ASCII escape character

The ASCII "escape" character (octal: \033, hexadecimal: \x1B, or, in decimal, 27, also represented by the sequences ^[ or \e) is used in many output devices to start a series of characters called a control sequence or escape sequence. Typically, the escape character was sent first in such a sequence to alert the device that the following characters were to be interpreted as a control sequence rather than as plain characters, then one or more characters would follow to specify some detailed action, after which the device would go back to interpreting characters normally. For example, the sequence of ^[, followed by the printable characters [2;10H, would cause a Digital Equipment Corporation (DEC) VT102 terminal to move its cursor to the 10th cell of the 2nd line of the screen. This was later developed into ANSI escape codes covered by the ANSI X3.64 standard. The escape character also starts each command sequence in the Hewlett-Packard Printer Command Language.

An early reference to the term "escape character" is found in Bob Bemer's IBM technical publications, who is credited with inventing this mechanism during his work on the ASCII character set. [3]

The Escape key is usually found on standard PC keyboards. However, it is commonly absent from keyboards for PDAs and other devices not designed primarily for ASCII communications. The DEC VT220 series was one of the few popular keyboards that did not have a dedicated Esc key, instead of using one of the keys above the main keypad. In user interfaces of the 1970s–1980s it was not uncommon to use this key as an escape character, but in modern desktop computers, such use is dropped. Sometimes the key was identified with AltMode (for alternative mode). Even with no dedicated key, the escape character code could be generated by typing [ while simultaneously holding down Ctrl .

Programming and data formats

Many modern programming languages specify the double-quote character (") as a delimiter for a string literal. The backslash (\) escape character typically provides two ways to include double-quotes inside a string literal, either by modifying the meaning of the double-quote character embedded in the string (\" becomes "), or by modifying the meaning of a sequence of characters including the hexadecimal value of a double-quote character (\x22 becomes ").

C, C++, Java, and Ruby all allow exactly the same two backslash escape styles. The PostScript language and Microsoft Rich Text Format also use backslash escapes. The quoted-printable encoding uses the equals sign as an escape character.

URL and URI use %-escapes to quote characters with a special meaning, as for non-ASCII characters. The ampersand (&) character may be considered as an escape character in SGML and derived formats such as HTML and XML.

Some programming languages also provide other ways to represent special characters in literals, without requiring an escape character (see e.g. delimiter collision).

Communication protocols

The Point-to-Point Protocol (PPP) uses the 0x7D octet (\175, or ASCII: }) as an escape character. The octet immediately following should be XORed by 0x20 before being passed to a higher level protocol. This is applied to both 0x7D itself and the control character 0x7E (which is used in PPP to mark the beginning and end of a frame) when those octets need to be transmitted by a higher level protocol encapsulated by PPP, as well as other octets negotiated when the link is established. That is, when a higher level protocol wishes to transmit 0x7D, it is transmitted as the sequence 0x7D 0x5D, and 0x7E is transmitted as 0x7D 0x5E.

Bourne shell

In Bourne shell (sh), the asterisk (*) and question mark (?) characters are wildcard characters expanded via globbing. Without a preceding escape character, an * will expand to the names of all files in the working directory that do not start with a period if and only if there are such files, otherwise * remains unexpanded. So to refer to a file literally called "*", the shell must be told not to interpret it in this way, by preceding it with a backslash (\). This modifies the interpretation of the asterisk (*). Compare:

 
rm*# delete all files in the current directory
rm\*# delete the file named *

Windows Command Prompt

The Windows command-line interpreter uses a caret character (^) to escape reserved characters that have special meanings (in particular: &, |, (, ), <, >, ^). [4] The DOS command-line interpreter, though it has similar syntax, does not support this.

For example, on the Windows Command Prompt, this will result in a syntax error.

C:\>echo<hello world> The syntax of the command is incorrect.

whereas this will output the string: <hello world>

C:\>echo^<hello world^><hello world>

Windows PowerShell

In Windows, the backslash is used as a path separator; therefore, it generally cannot be used as an escape character. PowerShell uses backtick [5] ( ` ) instead.

For example, the following command:

PS C:\> echo "`tFirst line`nNew line"        First lineNew line

Others

See also

Related Research Articles

<span class="mw-page-title-main">ASCII</span> American character encoding standard

ASCII, an acronym for American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. ASCII has just 128 code points, of which only 95 are printable characters, which severely limit its scope. The set of available punctuation had significant impact on the syntax of computer languages and text markup. ASCII hugely influenced the design of character sets used by modern computers, including Unicode which has over a million code points, but the first 128 of these are the same as ASCII.

In computing and telecommunications, a control character or non-printing character (NPC) is a code point in a character set that does not represent a written character or symbol. They are used as in-band signaling to cause effects other than the addition of a symbol to the text. All other characters are mainly graphic characters, also known as printing characters, except perhaps for "space" characters. In the ASCII standard there are 33 control characters, such as code 7, BEL, which rings a terminal bell.

Multipurpose Internet Mail Extensions (MIME) is a standard that extends the format of email messages to support text in character sets other than ASCII, as well as attachments of audio, video, images, and application programs. Message bodies may consist of multiple parts, and header information may be specified in non-ASCII character sets. Email messages with MIME formatting are typically transmitted with standard protocols, such as the Simple Mail Transfer Protocol (SMTP), the Post Office Protocol (POP), and the Internet Message Access Protocol (IMAP).

In computer science, an escape sequence is a combination of characters that has a meaning other than the literal characters contained therein; it is marked by one or more preceding characters.

<span class="mw-page-title-main">ANSI escape code</span> Method used for display options on video text terminals

ANSI escape sequences are a standard for in-band signaling to control cursor location, color, font styling, and other options on video text terminals and terminal emulators. Certain sequences of bytes, most starting with an ASCII escape character and a bracket character, are embedded into text. The terminal interprets these sequences as commands, rather than text to display verbatim.

The backslash\ is a mark used mainly in computing and mathematics. It is the mirror image of the common slash /. It is a relatively recent mark, first documented in the 1930s. It is sometimes called a hack, whack, escape, reverse slash, slosh, downwhack, backslant, backwhack, bash, reverse slant, reverse solidus, and reversed virgule.

A string literal or anonymous string is a literal for a string value in the source code of a computer program. Modern programming languages commonly use a quoted sequence of characters, formally "bracketed delimiters", as in x = "foo", where, "foo" is a string literal with value foo. Methods such as escape sequences can be used to avoid the problem of delimiter collision and allow the delimiters to be embedded in a string. There are many alternate notations for specifying string literals especially in complicated cases. The exact notation depends on the programming language in question. Nevertheless, there are general guidelines that most modern programming languages follow.

A text file is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists stored as data within a computer file system.

In computer programming, Base64 is a group of binary-to-text encoding schemes that transforms binary data into a sequence of printable characters, limited to a set of 64 unique characters. More specifically, the source binary data is taken 6 bits at a time, then this group of 6 bits is mapped to one of 64 unique characters.

<span class="mw-page-title-main">Newline</span> Special characters in computing signifying the end of a line of text

A newline is a control character or sequence of control characters in character encoding specifications such as ASCII, EBCDIC, Unicode, etc. This character, or a sequence of characters, is used to signify the end of a line of text and the start of a new one.

An email address identifies an email box to which messages are delivered. While early messaging systems used a variety of formats for addressing, today, email addresses follow a set of specific rules originally standardized by the Internet Engineering Task Force (IETF) in the 1980s, and updated by RFC 5322 and 6854. The term email address in this article refers to just the addr-spec in Section 3.4 of RFC 5322. The RFC defines address more broadly as either a mailbox or group. A mailbox value can be either a name-addr, which contains a display-name and addr-spec, or the more common addr-spec alone.

UTF-7 is an obsolete variable-length character encoding for representing Unicode text using a stream of ASCII characters. It was originally intended to provide a means of encoding Unicode text for use in Internet E-mail messages that was more efficient than the combination of UTF-8 with quoted-printable.

Quoted-Printable, or QP encoding, is a binary-to-text encoding system using printable ASCII characters to transmit 8-bit data over a 7-bit data path or, generally, over a medium which is not 8-bit clean. Historically, because of the wide range of systems and protocols that could be used to transfer messages, e-mail was often assumed to be non-8-bit-clean – however, modern SMTP servers are in most cases 8-bit clean and support 8BITMIME extension. It can also be used with data that contains non-permitted octets or line lengths exceeding SMTP limits. It is defined as a MIME content transfer encoding for use in e-mail.

The backtick` is a typographical mark used mainly in computing. It is also known as backquote, grave, or grave accent.

<span class="mw-page-title-main">Delimiter</span> Characters that specify the boundary between regions in a data stream

A delimiter is a sequence of one or more characters for specifying the boundary between separate, independent regions in plain text, mathematical expressions or other data streams. An example of a delimiter is the comma character, which acts as a field delimiter in a sequence of comma-separated values. Another example of a delimiter is the time gap used to separate letters and words in the transmission of Morse code.

Ascii85, also called Base85, is a form of binary-to-text encoding developed by Paul E. Rutter for the btoa utility. By using five ASCII characters to represent four bytes of binary data, it is more efficient than uuencode or Base64, which use four characters to represent three bytes of data.

A binary-to-text encoding is encoding of data in plain text. More precisely, it is an encoding of binary data in a sequence of printable characters. These encodings are necessary for transmission of data when the communication channel does not allow binary data or is not 8-bit clean. PGP documentation uses the term "ASCII armor" for binary-to-text encoding when referring to Base64.

The octet is a unit of digital information in computing and telecommunications that consists of eight bits. The term is often used when the term byte might be ambiguous, as the byte has historically been used for storage units of a variety of sizes.

.properties is a file extension for files mainly used in Java-related technologies to store the configurable parameters of an application. They can also be used for storing strings for Internationalization and localization; these are known as Property Resource Bundles.

In the C programming language, an escape sequence is specially delimited text in a character or string literal that represents one or more other characters to the compiler. It allows a programmer to specify characters that are otherwise difficult or impossible to specify in a literal.

References

  1. "JavaScript character escape sequences". Mathias Bynens. 21 December 2011. Retrieved 2014-06-30.
  2. "Special Characters (JavaScript)". Microsoft Developer Network. Archived from the original on Dec 14, 2014. Retrieved 2014-06-30.
  3. Bemer, Bob (Oct 25, 2003). "How Bob Bemer Invented the ESCAPE Sequence and Key". Bob Bemer. Archived from the original on 4 January 2018. Retrieved 22 March 2018.
  4. Tim Hill (1998). "The Windows NT Command Shell". Microsoft Learn. MacMillan Technical Publishing. Retrieved 2010-01-13.
  5. "about_Escape_Characters". Microsoft Developer Network. 2014-05-08. Archived from the original on 2016-11-25. Retrieved 2016-11-24.

PD-icon.svg This article incorporates public domain material from Federal Standard 1037C. General Services Administration. Archived from the original on 2022-01-22.