ISO/IEC 10367

Last updated

ISO/IEC 10367:1991 is a standard developed by ISO/IEC JTC 1/SC 2, [1] defining graphical character sets for use in character encodings implementing levels 2 and 3 of ISO/IEC 4873 [2] (as opposed to ISO/IEC 8859, which defines character encodings at level 1 of ISO/IEC 4873).

Contents

Relationship to ISO/IEC 8859

The parts of ISO/IEC 8859 define complete encodings at level 1 of ISO/IEC 4873 (i.e., as stateless extended ASCII single-byte encodings, reserving the C1 area), and do not allow for use of multiple parts together. For use at levels 2 and 3 of ISO/IEC 4873 (i.e., with shift codes for additional graphical character sets), ISO/IEC 8859 stipulates that equivalent sets from ISO/IEC 10367 should be used instead. [3]

ISO/IEC 10367:1991 includes ASCII, as well as sets matching the G1 sets used for the right-hand sides (non-ASCII parts) of ISO/IEC 6937 (ITU T.51) and of ISO/IEC 8859 parts 1 through 9 (i.e., those parts that existed as of 1991, when it was published), a set of additional Roman characters supplementing some of those parts, and a set of box drawing characters (shown below). [2] [4]

Supplementary G3 Latin set

ISO/IEC 10367 includes the ISO-IR-154 graphical set, which is intended to supplement Latin alphabets number 1, 2 and 5 (i.e., ISO-8859-1, ISO-8859-2 and ISO-8859-9). [4] Specifically, it is intended for use as a G3 set in a profile of ISO/IEC 4873 in which the G1 and G2 sets include the right hand side of ISO-8859-2, and also that of either ISO-8859-1 or ISO-8859-9. [5] These configurations represent the entire ISO/IEC 6937 repertoire (ITU T.51 Annex A) without non-spacing codes. [6]

For instance, the letter Ĉ would be encoded under ISO/IEC 4873 level 2 as 0x8F 0x23 if this set is included.

Highlighted characters also appear in ISO-8859-1 or ISO-8859-9. Under the current edition of ISO/IEC 4873 / ECMA-43 (though not earlier editions), [7] characters must be used from the lowest-numbered working set they appear in, hence those characters are not used from this G3 set when the respective ISO-8859 right-hand side set is used as the G1 or G2 set. [8]

ISO/IEC 10367 supplementary G3 Latin set [5]
0123456789ABCDEF
2x/Ax Ā Ĉ Ċ Ė Ē Ĝ
3x/Bx ā ĉ ċ ð ė ē ĝ
4x/Cx Ğ Ġ Ģ Ĥ Ħ Ĩ İ Ī Į IJ Ĵ Ķ Ļ Ŀ Ņ
5x/Dx Ŋ Ō Œ Ŗ Ŝ Ŧ Þ Ũ Ŭ Ū Ų Ŵ Ý Ŷ Ÿ
6x/Ex ğ ġ ģ ĥ ħ ĩ ı ī į ij ĵ ķ ļ ŀ ņ
7x/Fx ĸ ŋ ō œ ŗ ŝ ŧ þ ũ ŭ ū ų ŵ ý ŷ ʼn
  Also in ISO-8859-1
  Also in ISO-8859-9

Box drawing set

The following shows the box drawing set from ISO/IEC 10367, which is registered for ISO/IEC 2022 use as ISO-IR-155. It does not use the 0x20/A0 or 0x7F/FF positions, but is nonetheless registered as a 96-character set. [9]

Perl libintl includes a "ISO_10367-BOX" codec. This encodes/decodes ASCII over GL and the ISO-IR-155 box drawing set over GR with a few deviations. Specifically, it includes double-lined box-drawing characters in place of heavy-lined characters, and it replaces the upper half block (▀) at 0xCB with a private use character U+E019, documented as "Unit space B". [10]

ISO/IEC 10367 box drawing set [9]
0123456789ABCDEF
2x/Ax
3x/Bx
4x/Cx
5x/Dx
6x/Ex
7x/Fx

Related Research Articles

<span class="mw-page-title-main">ISO/IEC 8859-1</span> Character encoding

ISO/IEC 8859-1:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 1: Latin alphabet No. 1, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. ISO/IEC 8859-1 encodes what it refers to as "Latin alphabet no. 1", consisting of 191 characters from the Latin script. This character-encoding scheme is used throughout the Americas, Western Europe, Oceania, and much of Africa. It is the basis for some popular 8-bit character sets and the first two blocks of characters in Unicode.

ISO/IEC 8859 is a joint ISO and IEC series of standards for 8-bit character encodings. The series of standards consists of numbered parts, such as ISO/IEC 8859-1, ISO/IEC 8859-2, etc. There are 15 parts, excluding the abandoned ISO/IEC 8859-12. The ISO working group maintaining this series of standards has been disbanded.

ISO/IEC 8859-3:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 3: Latin alphabet No. 3, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1988. It is informally referred to as Latin-3 or South European. It was designed to cover Turkish, Maltese and Esperanto, though the introduction of ISO/IEC 8859-9 superseded it for Turkish. The encoding was popular for users of Esperanto, but fell out of use as application support for Unicode became more common.

ISO/IEC 646 is a set of ISO/IEC standards, described as Information technology — ISO 7-bit coded character set for information interchange and developed in cooperation with ASCII at least since 1964. Since its first edition in 1967 it has specified a 7-bit character code from which several national standards are derived.

ISO/IEC 8859-8, Information technology — 8-bit single-byte coded graphic character sets — Part 8: Latin/Hebrew alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings. ISO/IEC 8859-8:1999 from 1999 represents its second and current revision, preceded by the first edition ISO/IEC 8859-8:1988 in 1988. It is informally referred to as Latin/Hebrew. ISO/IEC 8859-8 covers all the Hebrew letters, but no Hebrew vowel signs. IBM assigned code page 916 to it. This character set was also adopted by Israeli Standard SI1311:2002, with some extensions.

ISO/IEC 8859-5:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 5: Latin/Cyrillic alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1988. It is informally referred to as Latin/Cyrillic. It was designed to cover languages using a Cyrillic alphabet such as Bulgarian, Belarusian, Russian, Serbian and Macedonian but was never widely used. It would also have been usable for Ukrainian in the Soviet Union from 1933 to 1990, but it is missing the Ukrainian letter ge, ґ, which is required in Ukrainian orthography before and since, and during that period outside Soviet Ukraine. As a result, IBM created Code page 1124.

ISO/IEC 8859-6:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 6: Latin/Arabic alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is informally referred to as Latin/Arabic. It was designed to cover Arabic. Only nominal letters are encoded, no preshaped forms of the letters, so shaping processing is required for display. It does not include the extra letters needed to write most Arabic-script languages other than Arabic itself.

ISO/IEC 8859-9:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 9: Latin alphabet No. 5, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1989. It is designated ECMA-128 by Ecma International and TS 5881 as a Turkish standard. It is informally referred to as Latin-5 or Turkish. It was designed to cover the Turkish language, designed as being of more use than the ISO/IEC 8859-3 encoding. It is identical to ISO/IEC 8859-1 except for the replacement of six Icelandic characters with characters unique to the Turkish alphabet. And the uppercase of i is İ; the lowercase of I is ı.

ISO/IEC 8859-10:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 10: Latin alphabet No. 6, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1992. It is informally referred to as Latin-6. It was designed to cover the Nordic languages, deemed of more use for them than ISO 8859-4.

ISO/IEC 2022Information technology—Character code structure and extension techniques, is an ISO/IEC standard in the field of character encoding. Originating in 1971, it was most recently revised in 1994.

Thai Industrial Standard 620-2533, commonly referred to as TIS-620, is the most common character set and character encoding for the Thai language. The standard is published by the Thai Industrial Standards Institute (TISI), an organ of the Ministry of Industry under the Royal Thai Government, and is the sole official standard for encoding Thai in Thailand.

The C0 and C1 control code or control character sets define control codes for use in text by computer systems that use ASCII and derivatives of ASCII. The codes represent additional information about the text, such as the position of a cursor, an instruction to start a new line, or a message that the text has been received.

T.51 / ISO/IEC 6937:2001, Information technology — Coded graphic character set for text communication — Latin alphabet, is a multibyte extension of ASCII, or more precisely ISO/IEC 646-IRV. It was developed in common with ITU-T for telematic services under the name of T.51, and first became an ISO standard in 1983. Certain byte codes are used as lead bytes for letters with diacritics (accents). The value of the lead byte often indicates which diacritic that the letter has, and the follow byte then has the ASCII-value for the letter that the diacritic is on.

Many Unicode characters are used to control the interpretation or display of text, but these characters themselves have no visual or spatial representation. For example, the null character is used in C-programming application environments to indicate the end of a string of characters. In this way, these programs only require a single starting memory address for a string, since the string ends once the program reads the null character.

The ISO basic Latin alphabet is an international standard for a Latin-script alphabet that consists of two sets of 26 letters, codified in various national and international standards and used widely in international communication. They are the same letters that comprise the current English alphabet. Since medieval times, they are also the same letters of the modern Latin alphabet. The order is also important for sorting words into alphabetical order.

<span class="mw-page-title-main">BraSCII</span>

BraSCII is an encoded repertoire of characters that was used in Brazil. It was used in the 1980s on several printers, in applications like Carta Certa, in video boards and it was the standard character set in the Brazilian line of MSX computers. This code page is known by Star printers as Code page 3847.

The ISO 2033:1983 standard defines character sets for use with Optical Character Recognition or Magnetic Ink Character Recognition systems. The Japanese standard JIS X 9010:1984 is closely related.

ISO 5426 is a character set developed by ISO, similar to ISO/IEC 6937. It was first published in 1983.

ISO-IR-111 or KOI8-E is an 8-bit character set. It is a multinational extension of KOI-8 for Belarusian, Macedonian, Serbian, and Ukrainian. The name "ISO-IR-111" refers to its registration number in the ISO-IR registry, and denotes it as a set usable with ISO/IEC 2022.

ISO-IR-197 is an 8-bit, single-byte character encoding which was designed for the Sámi languages. It is a modification of ISO 8859-1, replacing certain punctuation and symbol characters with additional letters used in certain Sámi orthographies.

References

  1. ISO/IEC JTC 1/SC 2 (1991). "Information technology — Standardized coded graphic character sets for use in 8-bit codes". ISO. ISO/IEC 10367:1991.
  2. 1 2 van Wingen, Johan W (1999). "8. Code Extension, ISO 2022 and 2375, ISO 4873 and 10367". Character sets. Letters, tokens and codes. Terena. Archived from the original on 2020-08-01.
  3. ISO/IEC JTC 1/SC 2 (1998-02-12). Final Text of DIS 8859-10, Information Technology — 8-bit single-byte coded graphic character sets — Part 10: Latin alphabet No. 6 (PDF). ISO/IEC FDIS 8859-10:1998, JTC1/SC2 N2992, WG3 N415.
  4. 1 2 "8-Bit Character Sets - ISO/IEC 10367". Guide to the use of Character Sets in Europe. DKUUG.
  5. 1 2 ECMA (1990-03-01). Supplementary Set for Latin Alphabets 1, 2 and 5 (PDF). ITSCJ/IPSJ. ISO-IR-154.
  6. ISO/IEC JTC 1/SC 2/WG 3 (1998-04-15). "Annex E: Alternative coded representation of the repertoire with no non-spacing diacritical marks". WD 6937, Coded graphic character set for text communication - Latin alphabet (PDF). p. 37. JTC1/SC2/N454.
  7. ECMA (1991). "Main differences between the second edition (1985) and the present (third) edition of this ECMA Standard". ECMA-43: 8-Bit Coded Character Set Structure and Rules (PDF) (ECMA Standard) (3rd ed.). p. 23.
  8. ECMA (1991). "Unique coding of characters". ECMA-43: 8-Bit Coded Character Set Structure and Rules (PDF) (ECMA Standard) (3rd ed.). p. 10.
  9. 1 2 ISO/IEC/JTC1/SC2/WG3 (1990-04-16). Basic Box-Drawings Set (PDF). ITSCJ/IPSJ. ISO-IR-155.
  10. Flohr, Guido. "Conversion routines for ISO_10367_BOX". libintl-perl. Locale::RecodeData::ISO_10367_BOX.