Escape sequence

Last updated

In computer science, an escape sequence is a combination of characters that has a meaning other than the literal characters contained therein; [1] it is marked by one or more preceding (and possibly terminating) characters. [2]

Contents

Examples

Control sequences

When directed this series of characters is used to change the state of computers and their attached peripheral devices, rather than to be displayed or printed as regular data bytes would be, these are also known as control sequences, reflecting their use in device control, beginning with the Control Sequence Initiator - originally the "escape character" ASCII code - character 27 (decimal) - often written "Esc" on keycaps.

With the introduction of ANSI terminals most escape sequences began with the two characters "ESC" then "[" or a specially-allocated CSI character with a code 155 (decimal).

Not all control sequences used an escape character; for example:

Escape sequences in communications are commonly used when a computer and a peripheral have only a single channel through which to send information back and forth (so escape sequences are an example of in-band signaling). [11] [12] They were common when most dumb terminals used ASCII with 7 data bits for communication, and sometimes would be used to switch to a different character set for "foreign" or graphics characters that would otherwise been restricted by the 128 codes available in 7 data bits. Even relatively "dumb" terminals responded to some escape sequences, including the original mechanical Teletype printers (on which "glass Teletypes" or VDUs were based) responded to characters 27 and 31 to alternate between letters and figures modes.

Keyboard

An escape character is usually assigned to the Esc key on a computer keyboard, and can be sent in other ways than as part of an escape sequence. For example, the Esc key may be used as an input character in editors such as vi, [13] or for backing up one level in a menu in some applications. [14] The Hewlett Packard HP 2640 terminals had a key for a "display functions" mode which would display graphics for all control characters, including Esc, to aid in debugging applications.

If the Esc key and other keys that send escape sequences are both supposed to be meaningful to an application, an ambiguity arises if a character terminal is in use. When the application receives the ASCII escape character, it is not clear whether that character is the result of the user pressing the Esc key or whether it is the initial character of an escape sequence (e.g., resulting from an arrow key press). The traditional method of resolving the ambiguity is to observe whether or not another character quickly follows the escape character. If not, it is assumed not to be part of an escape sequence. This heuristic can fail under some circumstances, especially without fast modern communication speeds.

Escape sequences date back at least to the 1874 Baudot code. [15] [16] [17]

Modem control

The Hayes command set, for instance, defines a single escape sequence, +++ . (In order to interpret +++, which may be a part of data, as the escape sequence, the sender stops communication for one second before and after the +++.) When the modem encounters this in a stream of data, it switches from its normal mode of operation, which simply sends any characters to the phone, to a command mode in which the following data is assumed to be a part of the command language. You can switch back to the online mode by sending the O command.

The Hayes command set is modal, switching from command mode to online mode. [18] [19] This is not appropriate in the case where the commands and data will switch back and forth rapidly. An example of a non-modal escape sequence control language is the VT100, which used a series of commands prefixed by a Control Sequence Introducer.

Comparison with control characters

A control character is a character that, in isolation, has some control function, such as carriage return (CR). Escape sequences, by contrast, consist of one or more escape characters which change the interpretation of subsequent characters.

ASCII video data terminals

The VT52 terminal used simple digraph commands like escape-A: in isolation, "A" simply meant the letter "A", but as part of the escape sequence "escape-A", it had a different meaning. The VT52 also supported parameters: it was not a straightforward control language encoded as substitution.

The later VT100 terminal implemented the more sophisticated ANSI escape sequences standard (now ECMA-48) for functions such as controlling cursor movement, character set, and display enhancements. The Hewlett Packard HP 2640 series had perhaps the most elaborate escape sequences for block and character modes, programming keys and their soft labels, graphics vectors, and even saving data to tape or disk files.

Use in DOS and Windows

A utility, ANSI.SYS, [20] can be used to enable the interpreting of the ANSI (ECMA-48) terminal escape sequences under DOS (by using $e in the PROMPT command) or in command windows in 16-bit Windows. The rise of GUI applications, which directly write to display cards, has greatly reduced the usage of escape sequences on Microsoft platforms, but they can still be used to create interactive random-access character-based screen interfaces with the character-based library routines such as printf without resorting to a GUI program.

Use in Linux and Unix displays

The default text terminal, and text windows (such as using xterm) respond to ANSI escape sequences.

Quoting escape

Overview

When an escape character is needed within the quoted/escaped string, there are two strategies used within programming and scripting languages:

An example of the latter is in the use of the caret (^). E.g. this outputs "You can do so via Cut&Paste" in CMD. (otherwise, the ampersand has a restricted use) [22]

echo You can do so via Cut^&Paste

In detail

A common use of escape sequences is in fact to remove control characters found in a binary data stream so that they will not cause their control function by mistake. In this case, the control character is replaced by a defined "escape character" (which need not be the US-ASCII escape character) and one or more other characters; after exiting the context where the control character would have caused an action, the sequence is recognized and replaced by the removed character. [22] To transmit the "escape character" itself, two copies are sent. [21]

In many programming languages and command line interfaces escape sequences are used in character literals and string literals, to express characters which are not printable or clash with the syntax of characters or strings. For example, control characters themselves might not be allowed to be placed in the program coded by the editor program, or may have undesirable side-effects if typed into a command. The end-of-quote character is also a problem for programmers that can be solved by escaping it. In most contexts the escape character is the backslash ("\").

Samples

For example, the single quotation mark character might be expressed as '\'' since writing ''' is not acceptable.

Many modern programming languages specify the doublequote character (") as a delimiter for a string literal. The backslash escape character typically provides ways to include doublequotes inside a string literal, such as by modifying the meaning of the doublequote character embedded in the string (\"), or by modifying the meaning of a sequence of characters including the hexadecimal value of a doublequote character (\x22). Both sequences encode a literal doublequote (").

In Perl or Python 2

print"Nancy said "HelloWorld!" to the crowd.";

produces a syntax error, whereas:

print"Nancy said \"Hello World!\" to the crowd.";### example of \"

produces the intended output. Another alternative:

print"Nancy said \x22Hello World!\x22 to the crowd.";### example of \x22

uses "\x" to indicate the following two characters are hexadecimal digits, "22" being the ASCII value for a doublequote in hexadecimal.

C, C++, Java, and Ruby all allow exactly the same two backslash escape styles. The PostScript language and Microsoft Rich Text Format also use backslash escapes. The quoted-printable encoding uses the equals sign as an escape character.

URL and URI use percent-encoding to quote characters with a special meaning, as for non-ASCII characters.

Another similar (and partially overlapping) syntactic trick is stropping.

Some programming languages also provide other ways to represent special characters in literals, without requiring an escape character (see e.g. delimiter collision).

See also

Related Research Articles

<span class="mw-page-title-main">ASCII</span> American character encoding standard

ASCII, abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of technical limitations of computer systems at the time it was invented, ASCII has just 128 code points, of which only 95 are printable characters, which severely limited its scope. Modern computer systems have evolved to use Unicode, which has millions of code points, but the first 128 of these are the same as the ASCII set.

<span class="mw-page-title-main">Character encoding</span> Using numbers to represent text characters

Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values that make up a character encoding are known as "code points" and collectively comprise a "code space", a "code page", or a "character map".

In computing and telecommunication, a control character or non-printing character (NPC) is a code point in a character set that does not represent a written character or symbol. They are used as in-band signaling to cause effects other than the addition of a symbol to the text. All other characters are mainly graphic characters, also known as printing characters, except perhaps for "space" characters. In the ASCII standard there are 33 control characters, such as code 7, BEL, which rings a terminal bell.

In mathematics and computing, the hexadecimal numeral system is a positional numeral system that represents numbers using a radix (base) of sixteen. Unlike the decimal system representing numbers using ten symbols, hexadecimal uses sixteen distinct symbols, most often the symbols "0"–"9" to represent values 0 to 9, and "A"–"F" to represent values from ten to fifteen.

<span class="mw-page-title-main">String (computer science)</span> Sequence of characters, data type

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed. A string is generally considered as a data type and is often implemented as an array data structure of bytes that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence data types and structures.

In computing and telecommunication, an escape character is a character that invokes an alternative interpretation on the following characters in a character sequence. An escape character is a particular case of metacharacters. Generally, the judgement of whether something is an escape character or not depends on the context.

<span class="mw-page-title-main">ANSI escape code</span> Method used for display options on video text terminals

ANSI escape sequences are a standard for in-band signaling to control cursor location, color, font styling, and other options on video text terminals and terminal emulators. Certain sequences of bytes, most starting with an ASCII escape character and a bracket character, are embedded into text. The terminal interprets these sequences as commands, rather than text to display verbatim.

The Hayes command set is a specific command language originally developed by Dale Heatherington and Dennis Hayes for the Hayes Smartmodem 300 baud modem in 1981.

The backslash\ is a mark used mainly in computing and mathematics. It is the mirror image of the common slash /. It is a relatively recent mark, first documented in the 1930s. It is sometimes called a hack, whack, escape, reverse slash, slosh, downwhack, backslant, backwhack, bash, reverse slant, reverse solidus, and reversed virgule.

A string literal or anonymous string is a literal for a string value in the source code of a computer program. Modern programming languages commonly use a quoted sequence of characters, formally "bracketed delimiters", as in x = "foo", where "foo" is a string literal with value foo. Methods such as escape sequences can be used to avoid the problem of delimiter collision and allow the delimiters to be embedded in a string. There are many alternate notations for specifying string literals especially in complicated cases. The exact notation depends on the programming language in question. Nevertheless, there are general guidelines that most modern programming languages follow.

<span class="mw-page-title-main">Newline</span> Special characters in computing signifying the end of a line of text

A newline is a control character or sequence of control characters in character encoding specifications such as ASCII, EBCDIC, Unicode, etc. This character, or a sequence of characters, is used to signify the end of a line of text and the start of a new one.

The null character is a control character with the value zero. It is present in many character sets, including those defined by the Baudot and ITA2 codes, ISO/IEC 646, the C0 control code, the Universal Coded Character Set, and EBCDIC. It is available in nearly all mainstream programming languages. It is often abbreviated as NUL. In 8-bit codes, it is known as a null byte.

ISO/IEC 2022Information technology—Character code structure and extension techniques, is an ISO/IEC standard in the field of character encoding. It is equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202. Originating in 1971, it was most recently revised in 1994.

The backtick` is a typographical mark used mainly in computing. It is also known as backquote, grave, or grave accent.

<span class="mw-page-title-main">Esc key</span> Computer key

On computer keyboards, the Esc keyEsc is a key used to generate the escape character. The escape character, when sent from the keyboard to a computer, often is interpreted by software as "stop", "cancel" or "exit", and when sent from the computer to an external device marks the beginning of an escape sequence to specify operating modes or characteristics generally.

Caret notation is a notation for control characters in ASCII. The notation assigns ^A to control-code 1, sequentially through the alphabet to ^Z assigned to control-code 26 (0x1A). For the control-codes outside of the range 1–26, the notation extends to the adjacent, non-alphabetic ASCII characters.

Command mode and Data mode refers to the two modes in which a computer modem may operate. These modes are defined in the Hayes command set, which is the de facto standard for all modems. These modes exist because there is only one channel of communication between the modem and the computer, which must carry both the computer's commands to the modem, as well as the data that the modem is enlisted to transmit to the remote party over the telephone line.

<span class="mw-page-title-main">Sixel</span> Bitmap graphics format

Sixel, short for "six pixels", is a bitmap graphics format supported by terminals and printers from DEC. It consists of a pattern six pixels high and one wide, resulting in 64 possible patterns. Each possible pattern is assigned an ASCII character, making the sixels easy to transmit on 7-bit serial links.

The MARC-8 charset is a MARC standard used in MARC-21 library records. The MARC formats are standards for the representation and communication of bibliographic and related information in machine-readable form, and they are frequently used in library database systems. The character encoding now known as MARC-8 was introduced in 1968 as part of the MARC format. Originally based on the Latin alphabet, from 1979 to 1983 the JACKPHY initiative expanded the repertoire to include Japanese, Arabic, Chinese, and Hebrew characters, with the later addition of Cyrillic and Greek scripts. If a character is not representable in MARC-8 of a MARC-21 record, then UTF-8 must be used instead. UTF-8 has support for many more characters than MARC-8, which is rarely used outside library data.

In the C programming language, an escape sequence is specially delimited text in a character or string literal that represents one or more other characters to the compiler. It allows a programmer to specify characters that are otherwise difficult or impossible to specify in a literal.

References

  1. "Escape Sequence".
  2. "Characters". The Java Tutorials.
  3. "Escape Sequences". Character combinations consisting of a backslash \ followed by a letter or by a combination of digits are called escape sequences.
  4. "ISO/IEC 9899:201x Committee Draft N1570" (PDF). 5.1.1.2 Translation phases, 2.: Each instance of a backslash character (\) immediately followed by a new-line character is deleted, splicing physical source lines to form logical source lines. [...]
  5. "Escape sequences". IBM .
  6. "Chapter 5 – AT Commands" (PDF).
  7. "AT Command Set and Register Summary for Analog Modem Modules".
  8. "Data General terminals: discussion of".
  9. "What's a Terminal?".
  10. "Data General DG210 DG211 Terminal Emulation Software".
  11. "Escape sequence".
  12. "Terminals & Printers Handbook Glossary".
  13. "Twelve Useful "vi" Commands". vi commands […] Pressing the Esc (Escape) key is how you […]
  14. "Five Unexpected Uses for the Esc Key". PCworld . 2009-10-29.
  15. "What is ASCII? The Economist explains". The Economist . 2013-06-09.
  16. "Baudot and CCITT code". The Baudot code, invented in 1870 and patented in 1874 by J. Baudot is […]
  17. "Guide to the use of Character Sets in Europe". elements C0 and C1 of control characters […] a 5-bit code patented by Jean-Maurice-Emile Baudot (1845-1903) in 1874
  18. "Basic Hayes AT Command Set". 2011-02-05. +++ - "Escape Sequence" - This command initiates an escape sequence to return the modem to the on-line command mode
  19. "Modem Programming Basics". When a modem is in command mode, the modem can accept commands from you
  20. 17. Understanding ANSI.SYS - Special Edition Using MS-DOS 6.22.
  21. 1 2 "Apostrophe Editing ('aaa') (FORTRAN 77 Language Reference)". Within the field, two consecutive apostrophes […]
  22. 1 2 "CMD - Batch - Escaping with Caret".