Graphic character

Last updated

In ISO/IEC 646 (commonly known as ASCII) and related standards including ISO 8859 and Unicode, a graphic character, also known as printing character (or printable character), is any character intended to be written, printed, or otherwise displayed in a form that can be read by humans. In other words, it is any encoded character that is associated with one or more glyphs.

Contents

ISO/IEC 646

In ISO 646, graphic characters are contained in rows 2 through 7 of the code table. However, two of the characters in these rows, namely the space character SP at row 2 column 0 and the delete character  DEL (also called the rubout character) at row 7 column 15, require special mention.

The space is considered to be both a graphic character and a control character in ISO 646. It can have a visible form, and also a control function (moving the print head). [1]

The delete character is strictly a control character, not a graphic character. This is true not only in ISO 646, but also in all related[ clarification needed ] standards including Unicode. However, many modern character sets deviate from ISO 646, and as a result a graphic character might[ where? ] occupy the position originally reserved for the delete character.

Unicode

In Unicode, Graphic characters are those with General Category Letter, Mark, Number, Punctuation, Symbol or Zs=space. Other code points (General categories Control, Zl=line separator, Zp=paragraph separator) are Format, Control, Private Use, Surrogate, Noncharacter or Reserved (unassigned). [2]

Spacing and non-spacing characters

Most graphic characters are spacing characters, which means that each instance of a spacing character has to occupy some area in a graphic representation. For a teletype or a typewriter this implies moving of the carriage after typing of a character. In the context of text mode display, each spacing character occupies one rectangular character box of equal sizes. Or maybe two adjacent ones, for non-alphabetic characters of East Asian languages. If a text is rendered using proportional fonts, widths of character boxes are not equal, but are positive.

There exist also non-spacing graphic characters. Most of non-spacing characters are modifiers , also called combining characters in Unicode, such as diacritical marks. Although non-spacing graphic characters are uncommon in traditional code pages, there are many such in Unicode. A combining character has its distinct glyph, but it applies to a character box of another character, a spacing one. In some historical systems such as line printers this was implemented as overstrike.

Note that not all modifiers are non-spacing – there exists Spacing Modifier Letters Unicode block.

See also

Related Research Articles

<span class="mw-page-title-main">ASCII</span> American character encoding standard

ASCII, abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of technical limitations of computer systems at the time it was invented, ASCII has just 128 code points, of which only 95 are printable characters, which severely limited its scope. Many computer systems instead use Unicode, which has millions of code points, but the first 128 of these are the same as the ASCII set.

In computing and telecommunication, a control character or non-printing character (NPC) is a code point in a character set that does not represent a written character or symbol. They are used as in-band signaling to cause effects other than the addition of a symbol to the text. All other characters are mainly graphic characters, also known as printing characters, except perhaps for "space" characters. In the ASCII standard there are 33 control characters, such as code 7, BEL, which rings a terminal bell.

ISO/IEC 8859 is a joint ISO and IEC series of standards for 8-bit character encodings. The series of standards consists of numbered parts, such as ISO/IEC 8859-1, ISO/IEC 8859-2, etc. There are 15 parts, excluding the abandoned ISO/IEC 8859-12. The ISO working group maintaining this series of standards has been disbanded.

<span class="mw-page-title-main">Unicode</span> Character encoding standard

Unicode, formally The Unicode Standard, is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, which is maintained by the Unicode Consortium, defines as of the current version (15.0) 149,186 characters covering 161 modern and historic scripts, as well as symbols, thousands of emoji, and non-visual control and formatting codes.

ISO/IEC 646 is a set of ISO/IEC standards, described as Information technology — ISO 7-bit coded character set for information interchange and developed in cooperation with ASCII at least since 1964. Since its first edition in 1967 it has specified a 7-bit character code from which several national standards are derived.

<span class="mw-page-title-main">Newline</span> Special characters in computing signifying the end of a line of text

Newline is a control character or sequence of control characters in character encoding specifications such as ASCII, EBCDIC, Unicode, etc. This character, or a sequence of characters, is used to signify the end of a line of text and the start of a new one.

ISO/IEC 2022Information technology—Character code structure and extension techniques, is an ISO/IEC standard in the field of character encoding. Originating in 1971, it was most recently revised in 1994.

Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese.

GB/T 2312-1980 is a key official character set of the People's Republic of China, used for Simplified Chinese characters. GB2312 is the registered internet name for EUC-CN, which is its usual encoded form. GB refers to the Guobiao standards (国家标准), whereas the T suffix denotes a non-mandatory standard.

<span class="mw-page-title-main">ArmSCII</span> Set of obsolete single-byte character encodings

ArmSCII or ARMSCII is a set of obsolete single-byte character encodings for the Armenian alphabet defined by Armenian national standard 166–9. ArmSCII is an acronym for Armenian Standard Code for Information Interchange, similar to ASCII for the American standard. It has been superseded by the Unicode standard.

<span class="mw-page-title-main">Code page 437</span> Character set of the original IBM PC

Code page 437 is the character set of the original IBM PC. It is also known as CP437, OEM-US, OEM 437, PC-8, or DOS Latin US. The set includes all printable ASCII characters as well as some accented letters (diacritics), Greek letters, icons, and line-drawing symbols. It is sometimes referred to as the "OEM font" or "high ASCII", or as "extended ASCII".

The C0 and C1 control code or control character sets define control codes for use in text by computer systems that use ASCII and derivatives of ASCII. The codes represent additional information about the text, such as the position of a cursor, an instruction to start a new line, or a message that the text has been received.

T.51 / ISO/IEC 6937:2001, Information technology — Coded graphic character set for text communication — Latin alphabet, is a multibyte extension of ASCII, or rather of ISO/IEC 646-IRV. It was developed in common with ITU-T for telematic services under the name of T.51, and first became an ISO standard in 1983. Certain byte codes are used as lead bytes for letters with diacritics (accents). The value of the lead byte often indicates which diacritic that the letter has, and the follow byte then has the ASCII-value for the letter that the diacritic is on.

<span class="mw-page-title-main">Universal Character Set characters</span> Complete list of the characters available on most computers

The Unicode Consortium and the ISO/IEC JTC 1/SC 2/WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set, is an international standard to map characters, discrete symbols used in natural language, mathematics, music, and other domains, to unique machine-readable data values. By creating this mapping, the UCS enables computer software vendors to interoperate, and transmit—interchange—UCS-encoded text strings from one to another. Because it is a universal map, it can be used to represent multiple languages at the same time. This avoids the confusion of using multiple legacy character encodings, which can result in the same sequence of codes having multiple interpretations depending on the character encoding in use, resulting in mojibake if the wrong one is chosen.

Many Unicode characters are used to control the interpretation or display of text, but these characters themselves have no visual or spatial representation. For example, the null character is used in C-programming application environments to indicate the end of a string of characters. In this way, these programs only require a single starting memory address for a string, since the string ends once the program reads the null character.

<span class="mw-page-title-main">OCR-A</span> Typeface designed for early computer OCR

OCR-A is a font issued in 1966 and first implemented in 1968. A special font was needed in the early days of computer optical character recognition, when there was a need for a font that could be recognized not only by the computers of that day, but also by humans. OCR-A uses simple, thick strokes to form recognizable characters. The font is monospaced (fixed-width), with the printer required to place glyphs 0.254 cm apart, and the reader required to accept any spacing between 0.2286 cm and 0.4572 cm.

<span class="mw-page-title-main">Extended ASCII</span> Nick-name for 8-bit ASCII-derived character sets

Extended ASCII is a repertoire of character encodings that include the original 96 ASCII character set, plus up to 128 additional characters. There is no formal definition of "extended ASCII", and even use of the term is sometimes criticized, because it can be mistakenly interpreted to mean that the American National Standards Institute (ANSI) had updated its ANSI X3.4-1986 standard to include more characters, or that the term identifies a single unambiguous encoding, neither of which is the case.

The ISO basic Latin alphabet is an international standard for a Latin-script alphabet that consists of two sets of 26 letters, codified in various national and international standards and used widely in international communication. They are the same letters that comprise the current English alphabet. Since medieval times, they are also the same letters of the modern Latin alphabet. The order is also important for sorting words into alphabetical order.

JIS X 0208 is a 2-byte character set specified as a Japanese Industrial Standard, containing 6879 graphic characters suitable for writing text, place names, personal names, and so forth in the Japanese language. The official title of the current standard is 7-bit and 8-bit double byte coded KANJI sets for information interchange. It was originally established as JIS C 6226 in 1978, and has been revised in 1983, 1990, and 1997. It is also called Code page 952 by IBM. The 1978 version is also called Code page 955 by IBM.

ISO 2047 is a standard for graphical representation of the control characters for debugging purposes, such as may be found in the character generator of a computer terminal; it also establishes a two-letter abbreviation of each control character. The graphics and two-letter codes are essentially unchanged from the 1968 European standard ECMA-17 and the 1973 American standard ANSI X3.32-1973. It became an ISO standard in 1975. It is also standardized as GB/T 3911-1983 in China, as KS X 1010 in Korea, and was enacted in Japan as "graphical representation of information exchange capabilities for character" JIS X 0209:1976.

References

  1. L.R. Henderson; A.M. Mumford (20 May 2014). The Computer Graphics Metafile: Butterworth Series in Computer Graphics Standards. Elsevier Science. p. 102. ISBN   978-1-4831-4484-9.
  2. https://www.unicode.org/versions/Unicode5.2.0/ch02.pdf#G25564 Chapter 2, table 2.3