ISO/IEC 8859

Last updated

ISO 8859 encoding family
StandardISO/IEC 8859
Classification8-bit extended ASCII, ISO/IEC 4873 level 1
Extends US-ASCII
Preceded by ISO/IEC 646
Succeeded by ISO/IEC 10646 (Unicode)
Other related encoding(s) ISO/IEC 10367, Windows-125x

ISO/IEC 8859 is a joint ISO and IEC series of standards for 8-bit character encodings. The series of standards consists of numbered parts, such as ISO/IEC 8859-1, ISO/IEC 8859-2, etc. There are 15 parts, excluding the abandoned ISO/IEC 8859-12. [1] The ISO working group maintaining this series of standards has been disbanded.

Contents

ISO/IEC 8859 parts 1, 2, 3, and 4 were originally Ecma International standard ECMA-94.

Introduction

While the bit patterns of the 95 printable ASCII characters are sufficient to exchange information in modern English, most other languages that use Latin alphabets need additional symbols not covered by ASCII. ISO/IEC 8859 sought to remedy this problem by utilizing the eighth bit in an 8-bit byte to allow positions for another 96 printable characters. Early encodings were limited to 7 bits because of restrictions of some data transmission protocols, and partially for historical reasons. However, more characters were needed than could fit in a single 8-bit character encoding, so several mappings were developed, including at least ten suitable for various Latin alphabets.

The ISO/IEC 8859 standard parts only define printable characters, although they explicitly set apart the byte ranges 0x00–1F and 0x7F–9F as "combinations that do not represent graphic characters" (i.e. which are reserved for use as control characters) in accordance with ISO/IEC 4873; they were designed to be used in conjunction with a separate standard defining the control functions associated with these bytes, such as ISO 6429 or ISO 6630. [2] To this end a series of encodings registered with the IANA add the C0 control set (control characters mapped to bytes 0 to 31) from ISO 646 and the C1 control set (control characters mapped to bytes 128 to 159) from ISO 6429, resulting in full 8-bit character maps with most, if not all, bytes assigned. These sets have ISO-8859-n as their preferred MIME name or, in cases where a preferred MIME name is not specified, their canonical name. Many people use the terms ISO/IEC 8859-n and ISO-8859-n interchangeably. ISO/IEC 8859-11 did not get such a charset assigned, presumably because it was almost identical to TIS 620.

Characters

The ISO/IEC 8859 standard is designed for reliable information exchange, not typography; the standard omits symbols needed for high-quality typography, such as optional ligatures, curly quotation marks, dashes, etc. As a result, high-quality typesetting systems often use proprietary or idiosyncratic extensions on top of the ASCII and ISO/IEC 8859 standards, or use Unicode instead.

An inexact rule based on practical experience states that if a character or symbol was not already part of a widely used data-processing character set and was also not usually provided on typewriter keyboards for a national language, it did not get in. Hence the directional double quotation marks « and » used for some European languages were included, but not the directional double quotation marks and used for English and some other languages.

French did not get its œ and Œ ligatures because they could be typed as 'oe'. Likewise, Ÿ, needed for all-caps text, was dropped as well. [3] [4] [5] Albeit under different codepoints, these three characters were later reintroduced with ISO/IEC 8859-15 in 1999, which also introduced the new euro sign character €. Likewise Dutch did not get the ij and IJ letters, because Dutch speakers had become used to typing these as two letters instead.

Romanian did not initially get its Ș/ș and Ț/ț (with comma) letters, because these letters were initially unified with Ş/ş and Ţ/ţ (with cedilla) by the Unicode Consortium, considering the shapes with comma beneath to be glyph variants of the shapes with cedilla. However, the letters with explicit comma below were later added to the Unicode standard and are also in ISO/IEC 8859-16.

Most of the ISO/IEC 8859 encodings provide diacritic marks required for various European languages using the Latin script. Others provide non-Latin alphabets: Greek, Cyrillic, Hebrew, Arabic and Thai. Most of the encodings contain only spacing characters, although the Thai, Hebrew, and Arabic ones do also contain combining characters.

The standard makes no provision for the scripts of East Asian languages ( CJK ), as their ideographic writing systems require many thousands of code points. Although it uses Latin based characters, Vietnamese does not fit into 96 positions (without using combining diacritics such as in Windows-1258) either. Each Japanese syllabic alphabet (hiragana or katakana, see Kana) would fit, as in JIS X 0201, but like several other alphabets of the world they are not encoded in the ISO/IEC 8859 system.

The parts of ISO/IEC 8859

ISO/IEC 8859 is divided into the following parts:

PartNameRevisionsOther standardsDescription
Part 1 Latin-1
Western European
1987, 1998 ECMA-94 (1985, 1986)Perhaps the most widely used part of ISO/IEC 8859, covering most Western European languages: Danish (partial), [nb 1] Dutch (partial), [nb 2] English, Faeroese, Finnish (partial), [nb 3] French (partial), [nb 3] German, Icelandic, Irish, Italian, Norwegian, Portuguese, Rhaeto-Romanic, Scottish Gaelic, Spanish, Catalan, and Swedish. Languages from other parts of the world are also covered, including: Eastern European Albanian, Southeast Asian Indonesian, as well as the African languages Afrikaans and Swahili.

A modification of DEC MCS; the first (1985) standard version at the ECMA level lacked the times sign and division obelus, which were added the next year. The missing euro sign and capital Ÿ are in the revised version ISO/IEC 8859-15 (see below). The corresponding IANA character set is ISO-8859-1.

Part 2 Latin-2
Central European
1987, 1999 ECMA-94 (1986) [nb 4] Supports those Central and Eastern European languages that use the Latin alphabet, including Bosnian, Polish, Croatian, Czech, Slovak, Slovene, Serbian, and Hungarian. The missing euro sign can be found in version ISO/IEC 8859-16.
Part 3 Latin-3
South European
1988, 1999 Turkish, Maltese, and Esperanto. Largely superseded by ISO/IEC 8859-9 for Turkish.
Part 4 Latin-4
North European
1988, 1998 Estonian, Latvian, Lithuanian, Greenlandic, and Sami.
Part 5 Latin/Cyrillic 1988, 1999 ECMA-113 (1988, 1999) [nb 5] Covers mostly Slavic languages that use a Cyrillic alphabet, including Belarusian, Bulgarian, Macedonian, Russian, Serbian, and Ukrainian (partial). [nb 6]
Part 6 Latin/Arabic 1987, 1999
Covers the most common Arabic language characters. Does not support other languages using the Arabic script. Needs to be BiDi and cursive joining processed for display.
Part 7 Latin/Greek 1987, 2003
Covers the modern Greek language (monotonic orthography). Can also be used for Ancient Greek written without accents or in monotonic orthography, but lacks the diacritics for polytonic orthography. These were introduced with Unicode. Updated 2003 to add the euro sign, drachma sign and spacing ypogegrammeni.
Part 8 Latin/Hebrew 1988, 1999
Covers the modern Hebrew alphabet as used in Israel. In practice two different encodings exist, logical order (needs to be BiDi processed for display) and visual (left-to-right) order (in effect, after bidi processing and line breaking). Updated 1999 to add LRM and RLM. Updated at national standard level in 2002 to add euro and shekel signs and more bidirectional format effectors; the 2002 additions were never incorporated back into the ISO standard version.
Part 9 Latin-5
Turkish
1989, 1999
Largely the same as ISO/IEC 8859-1, replacing the rarely used Icelandic letters with Turkish ones.
Part 10 Latin-6
Nordic
1992, 1998 ECMA-144 (1990, 1992, 2000)A rearrangement of Latin-4. Considered more useful for Nordic languages. Baltic languages use Latin-4 more.
Part 11 Latin/Thai 2001 TIS-620 (1986, 1990)Contains characters needed for the Thai language. First revision established in 1986 at national standard level as TIS 620. Elevated to ISO standard status as a part of ISO 8859 in 2001, with the addition of a non-breaking space.
Part 12 Latin/DevanagariN/A-The work in making a part of 8859 for Devanagari was officially abandoned in 1997. ISCII and Unicode/ISO/IEC 10646 cover Devanagari.
Part 13 Latin-7
Baltic Rim
1998 -Added some characters for Baltic languages which were missing from Latin-4 and Latin-6. Related to the earlier-published [nb 7] Windows-1257.
Part 14 Latin-8
Celtic
1998 -Covers Celtic languages such as Gaelic and the Breton language. Welsh letters correspond to the earlier (1994) ISO-IR-182.
Part 15 Latin-9 1999 -A revision of 8859-1 that removes some little-used symbols, replacing them with the euro sign and the letters Š, š, Ž, ž, Œ, œ, and Ÿ, which completes the coverage of French, Finnish and Estonian.
Part 16 Latin-10
South-Eastern European
2001 SR 14111 (1998)Intended for Albanian, Croatian, Hungarian, Italian, Polish, Romanian and Slovene, but also Finnish, French, German and Irish Gaelic (new orthography). The focus lies more on letters than symbols. The generic currency sign is replaced with the euro sign.

Each part of ISO/IEC 8859 is designed to support languages that often borrow from each other, so the characters needed by each language are usually accommodated by a single part. However, there are some characters and language combinations that are not accommodated without transcriptions. Efforts were made to make conversions as smooth as possible. For example, German has all of its seven special characters at the same positions in all Latin variants (1–4, 9, 10, 13–16), and in many positions the characters only differ in the diacritics between the sets. In particular, variants 1–4 were designed jointly, and have the property that every encoded character appears either at a given position or not at all.

Table

Comparison of the various parts (1–16) of ISO/IEC 8859
Binary Oct Dec Hex 1 2 3 4 5 6 7 8 9 10 11 13 14 15 16
1010 0000240160A0 Non-breaking space (NBSP)
1010 0001241161A1 ¡ Ą Ħ Ą Ё     ¡ Ą ¡ Ą
1010 0010242162A2 ¢ ˘ ĸ Ђ   ¢ Ē ¢ ¢ ą
1010 0011243163A3 £ Ł £ Ŗ Ѓ   £ Ģ £ Ł
1010 0100244164A4 ¤ Є ¤ ¤ Ī ¤ Ċ
1010 0101245165A5 ¥ Ľ   Ĩ Ѕ   ¥ Ĩ ċ ¥
1010 0110246166A6 ¦ Ś Ĥ Ļ І   ¦ Ķ ¦ Š
1010 0111247167A7 § Ї   § §
1010 1000250168A8 ¨ Ј   ¨ Ļ Ø š
1010 1001251169A9 © Š İ Š Љ   © Đ ©
1010 1010252170AA ª Ş Ē Њ   ͺ × ª Š Ŗ ª Ș
1010 1011253171AB « Ť Ğ Ģ Ћ   « Ŧ « «
1010 1100254172AC ¬ Ź Ĵ Ŧ Ќ ، ¬ Ž ¬ ¬ Ź
1010 1101255173AD Soft hyphen (SHY) SHY
1010 1110256174AE ® Ž   Ž Ў    ® Ū ® ź
1010 1111257175AF ¯ Ż ¯ Џ   ¯ Ŋ Æ Ÿ ¯ Ż
1011 0000260176B0 ° А   ° ° °
1011 0001261177B1 ± ą ħ ą Б   ± ą ± ±
1011 0010262178B2 ² ˛ ² ˛ В   ² ē ² Ġ ² Č
1011 0011263179B3 ³ ł ³ ŗ Г   ³ ģ ³ ġ ³ ł
1011 0100264180B4 ´ Д   ΄ ´ ī Ž
1011 0101265181B5 µ ľ µ ĩ Е   ΅ µ ĩ µ µ
1011 0110266182B6 ś ĥ ļ Ж   Ά ķ
1011 0111267183B7 · ˇ · ˇ З   · · ·
1011 1000270184B8 ¸ И   Έ ¸ ļ ø ž
1011 1001271185B9 ¹ š ı š Й   Ή ¹ đ ¹ ¹ č
1011 1010272186BA º ş ē К   Ί ÷ º š ŗ º ș
1011 1011273187BB » ť ğ ģ Л ؛ » ŧ » »
1011 1100274188BC ¼ ź ĵ ŧ М   Ό ¼ ž ¼ Œ
1011 1101275189BD ½ ˝ ½ Ŋ Н  ½ ½ œ
1011 1110276190BE ¾ ž   ž О   Ύ ¾ ū ¾ Ÿ
1011 1111277191BF ¿ ż ŋ П ؟ Ώ   ¿ ŋ æ ¿ ż
1100 0000300192C0 À Ŕ À Ā Р   ΐ   À Ā Ą À
1100 0001301193C1 Á С ء Α   Á Į Á
1100 0010302194C2 Â Т آ Β   Â Ā Â
1100 0011303195C3 Ã Ă   Ã У أ Γ   Ã Ć Ã Ă
1100 0100304196C4 Ä Ф ؤ Δ   Ä Ä
1100 0101305197C5 Å Ĺ Ċ Å Х إ Ε   Å Å Ć
1100 0110306198C6 Æ Ć Ĉ Æ Ц ئ Ζ   Æ Ę Æ
1100 0111307199C7 Ç Į Ч ا Η   Ç Į Ē Ç
1100 1000310200C8 È Č È Č Ш ب Θ   È Č Č È
1100 1001311201C9 É Щ ة Ι   É É
1100 1010312202CA Ê Ę Ê Ę Ъ ت Κ   Ê Ę Ź Ê
1100 1011313203CB Ë Ы ث Λ   Ë Ė Ë
1100 1100314204CC Ì Ě Ì Ė Ь ج Μ   Ì Ė Ģ Ì
1100 1101315205CD Í Э ح Ν   Í Ķ Í
1100 1110316206CEÎ Ю خ Ξ  Î Ī Î
1100 1111317207CFÏ Ď Ï Ī Я د Ο  Ï Ļ Ï
Binary Oct Dec Hex 123456789101113141516
1101 0000320208D0Ð Đ   Đ а ذ Π   Ğ Ð Š Ŵ Ð
1101 0001321209D1Ñ Ń Ñ Ņ б ر Ρ  Ñ Ņ Ń Ñ Ń
1101 0010322210D2Ò Ň Ò Ō в ز   Ò Ō Ņ Ò
1101 0011323211D3Ó Ķ г س Σ  Ó Ó
1101 0100324212D4Ô д ش Τ  Ô Ō Ô
1101 0101325213D5Õ Ő Ġ Õ е ص Υ  Õ Õ Ő
1101 0110326214D6Ö ж ض Φ  Ö Ö
1101 0111327215D7× з ط Χ  × Ũ × × Ś
1101 1000330216D8Ø Ř Ĝ Ø и ظ Ψ  Ø Ų Ø Ű
1101 1001331217D9Ù Ů Ù Ų й ع Ω  Ù Ų Ł Ù
1101 1010332218DAÚ к غ Ϊ  Ú Ś Ú
1101 1011333219DBÛ Ű Û л   Ϋ  Û  Ū Û
1101 1100334220DCÜ м   ά  Ü Ü
1101 1101335221DDÝ Ŭ Ũ н   έ   İ Ý  Ż Ý Ę
1101 1110336222DE Þ Ţ Ŝ Ū о   ή   Ş Þ   Ž Ŷ Þ Ț
1101 1111337223DF ß п   ί ß ฿ ß
1110 0000340224E0à ŕ à ā р ـ ΰ א à ā ą à
1110 0001341225E1á с ف α ב á į á
1110 0010342226E2â т ق β ג â ā â
1110 0011343227E3ã ă  ã у ك γ ד ã ć ã ă
1110 0100344228E4ä ф ل δ ה ä ä
1110 0101345229E5å ĺ ċ å х م ε ו å å ć
1110 0110346230E6æ ć ĉ æ ц ن ζ ז æ ę æ
1110 0111347231E7 ç į ч ه η ח ç į ē ç
1110 1000350232E8 è č è č ш و θ ט è č č è
1110 1001351233E9 é щ ى ι י é é
1110 1010352234EA ê ę ê ę ъ ي κ ך ê ę ź ê
1110 1011353235EB ë ы ً λ כ ë ė ë
1110 1100354236EC ì ě ì ė ь ٌ μ ל ì ė ģ ì
1110 1101355237ED í э ٍ ν ם í ķ í
1110 1110356238EEî ю َ ξ מ î ī î
1110 1111357239EFï ď ï ī я ُ ο ן ï ļ ï
1111 0000360240F0ð đ   đ ِ π נ ğ ð š ŵ ð đ
1111 0001361241F1ñ ń ñ ņ ё ّ ρ ס ñ ņ ń ñ ń
1111 0010362242F2ò ň ò ō ђ ْ ς ע ò ō ņ ò
1111 0011363243F3ó ķ ѓ   σ ף óó
1111 0100364244F4ô є   τ פ ô ō ô
1111 0101365245F5õ ő ġ õ ѕ   υ ץ õõ ő
1111 0110366246F6ö і   φ צ öö
1111 0111367247F7÷ ї   χ ק ÷ ũ ÷ ÷ ś
1111 1000370248F8ø ř ĝ ø ј   ψ ר ø ų ø ű
1111 1001371249F9ù ů ù ų љ   ω ש ù ų ł ù
1111 1010372250FAú њ   ϊ ת ú ś ú
1111 1011373251FBû ű û ћ   ϋ  û ū û
1111 1100374252FCü ќ   ό  ü ü
1111 1101375253FDý ŭ ũ §  ύ LRM ı ý  ż ý ę
1111 1110376254FE þ ţ ŝ ū ў   ώ RLM ş þ   ž ŷ þ ț
1111 1111377255FFÿ ˙ џ    ÿ ĸ   ÿ
Binary Oct Dec Hex 123456789101113141516

  unassigned code points.
  new additions in ISO/IEC 8859-7:2003 and ISO/IEC 8859-8:1999 versions, previously unassigned.

Relationship to Unicode and the UCS

Since 1991, the Unicode Consortium has been working with ISO and IEC to develop the Unicode Standard and ISO/IEC 10646: the Universal Character Set (UCS) in tandem. Newer editions of ISO/IEC 8859 express characters in terms of their Unicode/UCS names and the U+nnnn notation, effectively causing each part of ISO/IEC 8859 to be a Unicode/UCS character encoding scheme that maps a very small subset of the UCS to single 8-bit bytes. The first 256 characters in Unicode and the UCS are identical to those in ISO/IEC-8859-1 (Latin-1).

Single-byte character sets including the parts of ISO/IEC 8859 and derivatives of them were favoured throughout the 1990s, having the advantages of being well-established and more easily implemented in software: the equation of one byte to one character is simple and adequate for most single-language applications, and there are no combining characters or variant forms. As Unicode-enabled operating systems became more widespread, ISO/IEC 8859 and other legacy encodings became less popular. While remnants of ISO 8859 and single-byte character models remain entrenched in many operating systems, programming languages, data storage systems, networking applications, display hardware, and end-user application software, most modern computing applications use Unicode internally, and rely on conversion tables to map to and from other encodings, when necessary.

Current status

The ISO/IEC 8859 standard was maintained by ISO/IEC Joint Technical Committee 1, Subcommittee 2, Working Group 3 (ISO/IEC JTC 1/SC 2/WG 3). In June 2004, WG 3 disbanded, and maintenance duties were transferred to SC 2. The standard is not currently being updated, as the Subcommittee's only remaining working group, WG 2, is concentrating on development of Unicode's Universal Coded Character Set.

The WHATWG Encoding Standard, which specifies the character encodings permitted in HTML5 which compliant browsers must support, [7] includes most parts of ISO/IEC 8859, [8] except for parts 1, 9 and 11, which are instead interpreted as Windows-1252, Windows-1254 and Windows-874 respectively. [9] Authors of new pages and the designers of new protocols are instructed to use UTF-8 instead. [9]

See also

Notes

  1. Missing several accented vowels including Ǿ and ǿ. These can be replaced with non-accented vowels at the cost of increased ambiguity.
  2. Only the IJ/ij (letter IJ) is missing, which is usually represented as IJ.
  3. 1 2 Missing characters are in ISO/IEC 8859-15.
  4. The 1985 edition includes only a version of ISO-8859-1.
  5. The 1986 edition defines KOI8-E, which is an entirely different encoding.
  6. 8859-5 misses the Ґ/ґ letter, which was reintroduced into the Ukrainian alphabet in 1990.
  7. Published 1995, registered 1996. [6]

Related Research Articles

<span class="mw-page-title-main">ISO/IEC 8859-1</span> Character encoding

ISO/IEC 8859-1:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 1: Latin alphabet No. 1, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. ISO/IEC 8859-1 encodes what it refers to as "Latin alphabet no. 1", consisting of 191 characters from the Latin script. This character-encoding scheme is used throughout the Americas, Western Europe, Oceania, and much of Africa. It is the basis for some popular 8-bit character sets and the first two blocks of characters in Unicode.

ISO/IEC 8859-15:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 15: Latin alphabet No. 9, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1999. It is informally referred to as Latin-9. It is similar to ISO 8859-1, and thus also intended for “Western European” languages, but replaces some less common symbols with the euro sign and some letters that were deemed necessary: This encoding is by far most used, close to half the use, by German, though this is the least used encoding for German.

ISO/IEC 8859-3:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 3: Latin alphabet No. 3, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1988. It is informally referred to as Latin-3 or South European. It was designed to cover Turkish, Maltese and Esperanto, though the introduction of ISO/IEC 8859-9 superseded it for Turkish. The encoding was popular for users of Esperanto, but fell out of use as application support for Unicode became more common.

ISO/IEC 646 is a set of ISO/IEC standards, described as Information technology — ISO 7-bit coded character set for information interchange and developed in cooperation with ASCII at least since 1964. Since its first edition in 1967 it has specified a 7-bit character code from which several national standards are derived.

ISO/IEC 8859-2:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 2: Latin alphabet No. 2, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is informally referred to as "Latin-2". It is generally intended for Central or "Eastern European" languages that are written in the Latin script. Note that ISO/IEC 8859-2 is very different from code page 852 which is also referred to as "Latin-2" in Czech and Slovak regions. Code page 912 is an extension. Almost half the use of the encoding is for Polish, and it's the main legacy encoding for Polish, while virtually all use of it has been replaced by UTF-8.

ISO/IEC 8859-8, Information technology — 8-bit single-byte coded graphic character sets — Part 8: Latin/Hebrew alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings. ISO/IEC 8859-8:1999 from 1999 represents its second and current revision, preceded by the first edition ISO/IEC 8859-8:1988 in 1988. It is informally referred to as Latin/Hebrew. ISO/IEC 8859-8 covers all the Hebrew letters, but no Hebrew vowel signs. IBM assigned code page 916 to it. This character set was also adopted by Israeli Standard SI1311:2002, with some extensions.

ISO/IEC 8859-4:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 4: Latin alphabet No. 4, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1988. It is informally referred to as Latin-4 or North European. It was designed to cover Estonian, Latvian, Lithuanian, Greenlandic, and Sámi. It has been largely superseded by ISO/IEC 8859-10 and Unicode. Microsoft has assigned code page 28594 a.k.a. Windows-28594 to ISO-8859-4 in Windows. IBM has assigned code page 914 to ISO 8859-4.

ISO/IEC 8859-5:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 5: Latin/Cyrillic alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1988. It is informally referred to as Latin/Cyrillic.

ISO/IEC 8859-6:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 6: Latin/Arabic alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is informally referred to as Latin/Arabic. It was designed to cover Arabic. Only nominal letters are encoded, no preshaped forms of the letters, so shaping processing is required for display. It does not include the extra letters needed to write most Arabic-script languages other than Arabic itself.

ISO/IEC 8859-7:2003, Information technology — 8-bit single-byte coded graphic character sets — Part 7: Latin/Greek alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is informally referred to as Latin/Greek. It was designed to cover the modern Greek language. The original 1987 version of the standard had the same character assignments as the Greek national standard ELOT 928, published in 1986. The table in this article shows the updated 2003 version which adds three characters. Microsoft has assigned code page 28597 a.k.a. Windows-28597 to ISO-8859-7 in Windows. IBM has assigned code page 813 to ISO 8859-7. (IBM CCSID 813 is the original encoding. CCSID 4909 adds the euro sign. CCSID 9005 further adds the drachma sign and ypogegrammeni.)

ISO/IEC 8859-9:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 9: Latin alphabet No. 5, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1989. It is designated ECMA-128 by Ecma International and TS 5881 as a Turkish standard. It is informally referred to as Latin-5 or Turkish. It was designed to cover the Turkish language, designed as being of more use than the ISO/IEC 8859-3 encoding. It is identical to ISO/IEC 8859-1 except for the replacement of six Icelandic characters with characters unique to the Turkish alphabet. And the uppercase of i is İ; the lowercase of I is ı.

ISO/IEC 8859-10:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 10: Latin alphabet No. 6, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1992. It is informally referred to as Latin-6. It was designed to cover the Nordic languages, deemed of more use for them than ISO 8859-4.

ISO/IEC 8859-13:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 13: Latin alphabet No. 7, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1998. It is informally referred to as Latin-7 or Baltic Rim. It was designed to cover the Baltic languages, and added characters used in Polish missing from the earlier encodings ISO 8859-4 and ISO 8859-10. Unlike these two, it does not cover the Nordic languages. It is similar to the earlier-published Windows-1257; its encoding of the Estonian alphabet also matches IBM-922.

ISO/IEC 8859-14:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 14: Latin alphabet No. 8 (Celtic), is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1998. It is informally referred to as Latin-8 or Celtic. It was designed to cover the Celtic languages, such as Irish, Manx, Scottish Gaelic, Welsh, Cornish, and Breton.

ISO/IEC 8859-16:2001, Information technology — 8-bit single-byte coded graphic character sets — Part 16: Latin alphabet No. 10, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 2001. The same encoding was defined as Romanian Standard SR 14111 in 1998, named the "Romanian Character Set for Information Interchange". It is informally referred to as Latin-10 or South-Eastern European. It was designed to cover Albanian, Croatian, Hungarian, Polish, Romanian, Serbian and Slovenian, but also French, German, Italian and Irish Gaelic.

ISO/IEC 2022Information technology—Character code structure and extension techniques, is an ISO/IEC standard in the field of character encoding. It is equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202. Originating in 1971, it was most recently revised in 1994.

The currency sign¤ is a character used to denote an unspecified currency. It can be described as a circle the size of a lowercase character with four short radiating arms at 45° (NE), 135° (SE), 225° (SW) and 315° (NW). It is raised slightly above the baseline. The character is sometimes called scarab.

Many Unicode characters are used to control the interpretation or display of text, but these characters themselves have no visual or spatial representation. For example, the null character is used in C-programming application environments to indicate the end of a string of characters. In this way, these programs only require a single starting memory address for a string, since the string ends once the program reads the null character.

ISO/IEC 10367:1991 is a standard developed by ISO/IEC JTC 1/SC 2, defining graphical character sets for use in character encodings implementing levels 2 and 3 of ISO/IEC 4873.

ISO-IR-111 or KOI8-E is an 8-bit character set. It is a multinational extension of KOI-8 for Belarusian, Macedonian, Serbian, and Ukrainian. The name "ISO-IR-111" refers to its registration number in the ISO-IR registry, and denotes it as a set usable with ISO/IEC 2022.

References

  1. Chaudhuri, Arindam; Mandaviya, Krupa; Badelia, Pratixa; Ghosh, Soumya K. (2016-12-24), "Optical Character Recognition Systems for French Language", Optical Character Recognition Systems for Different Languages with Soft Computing, Cham: Springer International Publishing, pp. 109–136, doi:10.1007/978-3-319-50252-6_5, ISBN   978-3-319-50251-9 , retrieved 2023-12-04
  2. ISO/IEC JTC 1/SC 2/WG 3 (1998-02-12). Final Text of DIS 8859-1, 8-bit single-byte coded graphic character sets—Part 1: Latin alphabet No.1 (PDF). ISO/IEC FDIS 8859-1:1998; JTC1/SC2/N2988; WG3/N411. This set of coded graphic characters may be regarded as a version of an 8-bit code according to ISO/IEC 2022 or ISO/IEC 4873 at level 1. […] The shaded positions in the code table correspond to bit combinations that do not represent graphic characters. Their use is outside the scope of ISO/IEC 8859; it is specified in other International Standards, for example ISO/IEC 6429.{{citation}}: CS1 maint: numeric names: authors list (link)
  3. Haralambous, Yannis (September 2007). Fonts & Encodings . Translated by Horne, P. Scott (1st ed.). Sebastopol, California, USA: O'Reilly Media, Inc. pp.  37–38. ISBN   978-0-596-10242-5. […] According to an urban legend, the French delegate was out sick the day when the standard came up for a vote and had to have his Belgian counterpart act as his proxy. In fact, the French delegate was an engineer, who was convinced that this ligature was useless, and the Swiss and German representatives pressed hard to have the mathematical symbols × and ÷ included at the positions where Œ and œ would logically appear. […]
  4. André, Jacques (2003-10-15) [2003-10-02]. André, Bernard; Baron, Georges-Louis; Bruillard, Éric (eds.). "Histoire d'Œ, histoire d'@ des rumeurs typographiques et de leurs enseignements". Traitement de Texte et Production de Documents INRP/GEDIAPS (in French): 19–34. Archived from the original on 2016-12-08. Retrieved 2016-12-09.
  5. André, Jacques (November 1996). "ISO Latin-1, norme de codage des caractères européens? trois caractères français en sont absents!" (PDF). Cahiers GUTenberg (in French) (25): 65–77. Archived from the original (PDF) on 2008-11-30.
  6. Lazhintseva, Katya (1996-05-03). "Registration of new MIME charset: Windows-1257". IANA.
  7. "8.2.2.3. Character encodings". HTML 5.1 2nd Edition. W3C. User agents must support the encodings defined in the WHATWG Encoding standard, including, but not limited to […]
  8. van Kesteren, Anne. "Legacy single-byte encodings". Encoding Standard. WHATWG.
  9. 1 2 van Kesteren, Anne. "Names and labels". Encoding Standard. WHATWG.