ISO-8859-8-I

Last updated

ISO-8859-8-I is the IANA charset name for the character encoding ISO/IEC 8859-8 used together with the control codes from ISO/IEC 6429 for the C0 (00–1F hex) and C1 (80–9F) parts. The characters are in logical order.

Internet Assigned Numbers Authority Department of ICANN, a nonprofit private American corporation

The Internet Assigned Numbers Authority (IANA) is a function of ICANN, a nonprofit private American corporation that oversees global IP address allocation, autonomous system number allocation, root zone management in the Domain Name System (DNS), media types, and other Internet Protocol-related symbols and Internet numbers.

Character encoding is used to represent a repertoire of characters by some kind of encoding system. Depending on the abstraction level and context, corresponding code points and the resulting code space may be regarded as bit patterns, octets, natural numbers, electrical pulses, etc. A character encoding is used in computation, data storage, and transmission of textual data. "Character set", "character map", "codeset" and "code page" are related, but not identical, terms.

ISO/IEC 8859-8, Information technology — 8-bit single-byte coded graphic character sets — Part 8: Latin/Hebrew alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings. ISO/IEC 8859-8:1999 from 1999 represents its second and current revision, preceded by the first edition ISO/IEC 8859-8:1988 in 1988. It is informally referred to as Latin/Hebrew. ISO/IEC 8859-8 covers all the Hebrew letters, but no Hebrew vowel signs. IBM assigned code page 916 to it. This character set was also adopted by Israeli Standard SI1311:2002, with some extensions.

Escape sequences (from ISO/IEC 6429 or ISO/IEC 2022) are not to be interpreted. Most applications only interpret the control codes for LF, CR, and HT. A few applications also interpret VT, FF, and NEL (in C1). Very few applications interpret the other C0 and C1 control codes.

ISO/IEC 2022Information technology—Character code structure and extension techniques, is an ISO standard specifying

A carriage return, sometimes known as a cartridge return and often shortened to CR, <CR> or return, is a control character or mechanism used to reset a device's position to the beginning of a line of text. It is closely associated with the line feed and newline concepts, although it can be considered separately in its own right.

Newline Special character in computing signifying the end of a line of text

Newline is a control character or sequence of control characters in a character encoding specification that is used to signify the end of a line of text and the start of a new one. Some text editors set this special character when pressing the ↵ Enter key.

ISO-8859-8 is sometimes in logical order (HTML, XML), and sometimes in visual (left-to-right) order (plain text without any markup). The WHATWG Encoding Standard used by HTML5 treats ISO-8859-8 and ISO-8859-8-I as distinct encodings with the same mapping due to influence on the layout direction, but notes that this no longer applies to ISO-8859-6 (Arabic), only to ISO-8859-8. [1]

The Web Hypertext Application Technology Working Group (WHATWG) is a community of people interested in evolving HTML and related technologies. The WHATWG was founded by individuals from Apple Inc., the Mozilla Foundation and Opera Software, leading Web browser vendors, in 2004. Google subsequently joined the WHATWG with the move of the then-editor of the WHATWG specifications, Ian Hickson, to Google.

HTML5 Fifth and current version of the hypertext markup language for structuring and presenting content for the World Wide Web

HTML5 is a software solution stack that defines the properties and behaviors of web page content by implementing a markup based pattern to it.

Logical order for this charset requires bidi processing for display.

The Microsoft Windows code page for Hebrew, Windows-1255, uses logical order, and adds support for vowel points as combining characters, and some additional punctuation. It is mostly an extension of ISO-8859-8-I without C1 controls, except for the omission of the double underscore, and replacement of the universal currency sign (¤) with the sheqel sign (₪).

Microsoft Windows is a group of several graphical operating system families, all of which are developed, marketed and sold by Microsoft. Each family caters to a certain sector of the computing industry. Active Microsoft Windows families include Windows NT and Windows IoT; these may encompass subfamilies, e.g. Windows Server or Windows Embedded Compact. Defunct Microsoft Windows families include Windows 9x, Windows Mobile and Windows Phone.

Windows-1255 is a code page used under Microsoft Windows to write Hebrew. It is an almost compatible superset of ISO 8859-8 – most of the symbols are in the same positions, but Windows-1255 adds vowel-points and other signs in lower positions.

The currency sign (¤) is a character used to denote an unspecified currency. It can be described as a circle the size of a lowercase character with four short radiating arms at 45° (NE), 135° (SE), 225°, (SW) and 315° (NW). It is raised slightly above the baseline. It is represented in Unicode as U+00A4¤CURRENCY SIGN, \textcurrency in LaTeX. The character is sometimes called scarab.

Related Research Articles

ISO/IEC 8859-1 character encoding

ISO/IEC 8859-1:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 1: Latin alphabet No. 1, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. ISO 8859-1 encodes what it refers to as "Latin alphabet no. 1", consisting of 191 characters from the Latin script. This character-encoding scheme is used throughout the Americas, Western Europe, Oceania, and much of Africa. It is also commonly used in most standard romanizations of East-Asian languages. It is the basis for most popular 8-bit character sets and the first block of characters in Unicode.

ISO/IEC 8859-15:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 15: Latin alphabet No. 9, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1999. It is informally referred to as Latin-9. It is similar to ISO 8859-1, and thus also intended for “Western European” languages, but replaces some less common symbols with the euro sign and some letters that were deemed necessary:

ISO/IEC 8859-3:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 3: Latin alphabet No. 3, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1988. It is informally referred to as Latin-3 or South European. It was designed to cover Turkish, Maltese and Esperanto, though the introduction of ISO/IEC 8859-9 superseded it for Turkish. The encoding remains popular with users of Esperanto, though use is waning as application support for Unicode becomes more common.

Windows-1252 code page

Windows-1252 or CP-1252 is a single-byte character encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows for English and some other Western languages.

ISO/IEC 8859-2:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 2: Latin alphabet No. 2, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is informally referred to as "Latin-2". It is generally intended for Central or "Eastern European" languages that are written in the Latin script. Note that ISO/IEC 8859-2 is very different from code page 852 which is also referred to as "Latin-2" in Czech and Slovak regions. Code page 912 is an extension.

ISO/IEC 8859-4:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 4: Latin alphabet No. 4, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1988. It is informally referred to as Latin-4 or North European. It was designed to cover Estonian, Latvian, Lithuanian, Greenlandic, and Sami. It has been largely superseded by ISO/IEC 8859-10 and Unicode. Microsoft has assigned code page 28594 a.k.a. Windows-28594 to ISO-8859-4 in Windows. IBM has assigned code page 914 to ISO 8859-4.

ISO/IEC 8859-5:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 5: Latin/Cyrillic alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1988. It is informally referred to as Latin/Cyrillic. It was designed to cover languages using a Cyrillic alphabet such as Bulgarian, Belarusian, Russian, Serbian and Macedonian but was never widely used. It would also have been usable for Ukrainian in the Soviet Union from 1933–1990, but it is missing the Ukrainian letter ge, ґ, which is required in Ukrainian orthography before and since, and during that period outside Soviet Ukraine. As a result, IBM created Code page 1124.

ISO/IEC 8859-6:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 6: Latin/Arabic alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is informally referred to as Latin/Arabic. It was designed to cover Arabic. Only nominal letters are encoded, no preshaped forms of the letters, so shaping processing is required for display. It does not include the extra letters needed to write most Arabic-script languages other than Arabic itself.

ISO/IEC 8859-7:2003, Information technology — 8-bit single-byte coded graphic character sets — Part 7: Latin/Greek alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is informally referred to as Latin/Greek. It was designed to cover the modern Greek language. The original 1987 version of the standard had the same character assignments as the Greek national standard ELOT 928, published in 1986. The table in this article shows the updated 2003 version which adds three characters. Microsoft has assigned code page 28597 a.k.a. Windows-28597 to ISO-8859-7 in Windows. IBM has assigned code page 813 to ISO 8859-7.

ISO/IEC 8859-9:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 9: Latin alphabet No. 5, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1989. It is informally referred to as Latin-5 or Turkish. It was designed to cover the Turkish language, designed as being of more use than the ISO/IEC 8859-3 encoding. It is identical to ISO/IEC 8859-1 except for these six replacements of Icelandic characters with characters unique to the Turkish alphabet:

ISO/IEC 8859-10:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 10: Latin alphabet No. 6, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1992. It is informally referred to as Latin-6. It was designed to cover the Nordic languages, deemed of more use for them than ISO 8859-4.

ISO/IEC 8859-13:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 13: Latin alphabet No. 7, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1998. It is informally referred to as Latin-7 or Baltic Rim. It was designed to cover the Baltic languages, and added characters used in the Polish language missing from the earlier encodings ISO 8859-4 and ISO 8859-10. Unlike these two, it does not cover the Nordic languages. It is similar to the earlier-published Windows-1257.

ISO/IEC 8859-14:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 14: Latin alphabet No. 8 (Celtic), is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1998. It is informally referred to as Latin-8 or Celtic. It was designed to cover the Celtic languages, such as Irish, Manx, Scottish Gaelic, Welsh, Cornish, and Breton.

ISO/IEC 8859-16:2001, Information technology — 8-bit single-byte coded graphic character sets — Part 16: Latin alphabet No. 10, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 2001. It is informally referred to as Latin-10 or South-Eastern European. It was designed to cover Albanian, Croatian, Hungarian, Polish, Romanian, Serbian and Slovenian, but also French, German, Italian and Irish Gaelic.

Thai Industrial Standard 620-2533, commonly referred to as TIS-620, is the most common character set and character encoding for the Thai language. The standard is published by the Thai Industrial Standards Institute (TISI), an organ of the Ministry of Industry under the Royal Thai Government, and is the sole official standard for encoding Thai in Thailand. The descriptive name of the standard is "Standard for Thai Character Codes for Computers". "2533" refers to year 2533 of the Buddhist Era (1990), the year the present version of the standard was published; a previous revision, TIS 620-2529 (1986), is now obsolete.

Windows code pages are sets of characters or code pages used in Microsoft Windows from the 1980s and 1990s. Windows code pages were gradually superseded when Unicode was implemented in Windows, although they are still supported both within Windows and other platforms.

ISO-IR-153 is an 8-bit character set that covers the Russian and Bulgarian alphabets. Unlike the KOI encodings, this encoding lists the Cyrillic letters in their correct traditional order. This has become the basis for ISO/IEC 8859-5 and the Cyrillic Unicode block.

References

  1. van Kesteren, Anne. "9. Legacy single-byte encodings". Encoding Standard. WHATWG. Note: ISO-8859-8 and ISO-8859-8-I are distinct encoding names, because ISO-8859-8 has influence on the layout direction. And although historically this might have been the case for ISO-8859-6 and "ISO-8859-6-I" as well, that is no longer true.