Code page 852

Last updated
OEM 852 (DOS-Latin 2)
MIME / IANAIBM852
Alias(es)cp852, 852, csPCp852 [1]
Language(s) Serbo-Croatian, Slovene, Czech, Slovak, Polish, Romanian, Hungarian
Classification OEM code page, extended ASCII
Based on OEM 850 (DOS-Latin 1), OEM 437 (OEM-US)
Transforms / Encodes ISO/IEC 8859-2 (reordered)

Code page 852 (CCSID 852) (also known as CP 852, IBM 00852, OEM 852 (Latin II), [2] [3] MS-DOS Latin 2 [4] ) is a code page used under DOS to write Central European languages that use Latin script (such as Serbo-Croatian, Czech, Hungarian, Polish, Romanian or Slovene). [5]

Contents

CCSID 9044 is the euro currency update of code page/CCSID 852. [6] Byte AA replaces ¬ with € in that update. [7] [8]

Code page 852 (DOS Latin 2) is very different from ISO/IEC 8859-2 (ISO Latin-2), although both are informally referred to as "Latin-2" in different language regions. [9] However, all printable characters from ISO 8859-2 are included, in a different arrangement which preserves a subset of the box-drawing characters of the original DOS code page 437, while sacrificing others (those combining both single and double lining) in order to include more letters with diacritics. This is the same approach taken by code page 850, the equivalent for ISO 8859-1.

This reduced box-drawing support caused display glitches in DOS applications that made use of the box-drawing characters to display a GUI-like surface in text mode (e.g. Norton Commander). Several local, more language-specific encodings were invented to avoid the problem, for example the Kamenický encoding for Czech and Slovak [10] or the Mazovia encoding for Polish.

Character set

The following table shows code page 852. [2] [11] Each character is shown with its equivalent Unicode code point. Only the second half of the table (128255) is shown, the first half (0127) being the same as code page 437.

Code page 852 [4] [7] [8] [12]
0123456789ABCDEF
8x Ç ü é â ä ů ć ç ł ë Ő ő î Ź Ä Ć
9x É Ĺ ĺ ô ö Ľ ľ Ś ś Ö Ü Ť ť Ł × č
Ax á í ó ú Ą ą Ž ž Ę ę ¬ ź Č ş « »
Bx Á Â Ě Ş Ż ż
Cx Ă ă ¤
Dx đ Đ Ď Ë ď Ň Í Î ě Ţ Ů
Ex Ó ß Ô Ń ń ň Š š Ŕ Ú ŕ Ű ý Ý ţ ´
Fx SHY ˝ ˛ ˇ ˘ § ÷ ¸ ° ¨ ˙ ű Ř ř NBSP
  Match both code page 437 and code page 850
  Differences from code page 437 which match code page 850
  Differences from both code page 437 and code page 850

See also

Related Research Articles

ISO/IEC 8859-3:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 3: Latin alphabet No. 3, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1988. It is informally referred to as Latin-3 or South European. It was designed to cover Turkish, Maltese and Esperanto, though the introduction of ISO/IEC 8859-9 superseded it for Turkish. The encoding was popular for users of Esperanto, but fell out of use as application support for Unicode became more common.

In computing, a code page is a character encoding and as such it is a specific association of a set of printable characters and control characters with unique numbers. Typically each number represents the binary value in a single byte.

ISO/IEC 8859-11:2001, Information technology — 8-bit single-byte coded graphic character sets — Part 11: Latin/Thai alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 2001. It is informally referred to as Latin/Thai. It is nearly identical to the national Thai standard TIS-620 (1990). The sole difference is that ISO/IEC 8859-11 allocates non-breaking space to code 0xA0, while TIS-620 leaves it undefined.

ISO/IEC 8859-6:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 6: Latin/Arabic alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is informally referred to as Latin/Arabic. It was designed to cover Arabic. Only nominal letters are encoded, no preshaped forms of the letters, so shaping processing is required for display. It does not include the extra letters needed to write most Arabic-script languages other than Arabic itself.

ISO/IEC 8859-9:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 9: Latin alphabet No. 5, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1989. It is designated ECMA-128 by Ecma International and TS 5881 as a Turkish standard. It is informally referred to as Latin-5 or Turkish. It was designed to cover the Turkish language, designed as being of more use than the ISO/IEC 8859-3 encoding. It is identical to ISO/IEC 8859-1 except for the replacement of six Icelandic characters with characters unique to the Turkish alphabet. And the uppercase of i is İ; the lowercase of I is ı.

<span class="mw-page-title-main">Code page 850</span> Computer character set for Latin scripts

Code page 850 is a code page used under DOS operating systems in Western Europe. Depending on the country setting and system configuration, code page 850 is the primary code page and default OEM code page in many countries, including various English-speaking locales, whilst other English-speaking locales default to the hardware code page 437.

The Kamenický encoding, named for the brothers Jiří and Marian Kamenický, was a code page for personal computers running DOS, very popular in Czechoslovakia around 1985–1995. Another name for this encoding is KEYBCS2, the name of the terminate-and-stay-resident utility which implemented the matching keyboard driver. It was also named KAMENICKY.

<span class="mw-page-title-main">Code page 855</span> Code page

Code page 855 is a code page used under DOS to write Cyrillic script.

<span class="mw-page-title-main">Code page 866</span> Computer character set for Russian

Code page 866 is a code page used under DOS and OS/2 in Russia to write Cyrillic script. It is based on the "alternative code page" developed in 1984 in IHNA AS USSR and published in 1986 by a research group at the Academy of Science of the USSR. The code page was widely used during the DOS era because it preserves all of the pseudographic symbols of code page 437 and maintains alphabetic order of Cyrillic letters. Initially this encoding was only available in the Russian version of MS-DOS 4.01 (1990), but with MS-DOS 6.22 it became available in any language version.

Windows-1250 is a code page used under Microsoft Windows to represent texts in Central European and Eastern European languages that use the Latin script. It is primarily used by Czech, though Czech has now moved to UTF-8 and mostly abandoned this legacy encoding. It is also used for Polish, Slovak, Hungarian, Slovene, Serbo-Croatian, Romanian, Rotokas and Albanian. It may also be used with the German language, though it's missing uppercase ẞ. German-language texts encoded with Windows-1250 and Windows-1252 are identical.

Mac OS Central European is a character encoding used on Apple Macintosh computers to represent texts in Central European and Southeastern European languages that use the Latin script. This encoding is also known as Code Page 10029. IBM assigns code page/CCSID 1282 to this encoding. This codepage contains diacritical letters that ISO 8859-2 does not have, and vice versa.

Code page 860 is a code page used under DOS in Portugal to write Portuguese and it is also suitable to write Spanish and Italian. In Brazil, however, the most widespread codepage – and that which DOS in Brazilian Portuguese used by default – was code page 850.

Code page 857 is a code page used under DOS in Turkey to write Turkish.

<span class="mw-page-title-main">Code page 737</span> VGA text mode code page

Code page 737 is a code page used under DOS to write the Greek language. It was much more popular than code page 869 although it lacks the letters ΐ and ΰ.

Code page 869 is a code page used under DOS to write Greek and may also be used to get Greek letters for other uses such as math. It is also called DOS Greek 2. It was designed to include all characters from ISO 8859-7.

Windows code pages are sets of characters or code pages used in Microsoft Windows from the 1980s and 1990s. Windows code pages were gradually superseded when Unicode was implemented in Windows, although they are still supported both within Windows and other platforms, and still apply when Alt code shortcuts are used.

Code page 862 is a code page used under DOS in Israel for Hebrew.

Code page 775 is a code page used under DOS to write the Estonian, Lithuanian and Latvian languages. In Lithuania, this code page is standardised as LST 1590-1, alongside the related Code page 778.

Code page 912 is a code page used under IBM AIX and DOS to write the Albanian, Bosnian, Croatian, Czech, English, German, Hungarian, Polish, Romanian, Serbian, Slovak, and Slovene languages. It is an extension of ISO/IEC 8859-2.

Code page 856, is a code page used under DOS for Hebrew in Israel.

References

  1. Character Sets, Internet Assigned Numbers Authority (IANA), 2018-12-12
  2. 1 2 "OEM 852". Go Global Developer Center. Microsoft. Retrieved 11 Nov 2011.
  3. "Code Pages Supported by Windows: OEM Code Pages". Go Global Developer Center. Microsoft. Archived from the original on 2 November 2011. Retrieved 11 Oct 2011.
  4. 1 2 "Code Page 852 DOS Latin 2". Developing International Software. Microsoft. Retrieved 11 Nov 2011.
  5. "CCSID 852 information document". Archived from the original on 2016-03-27.
  6. "CCSID 9044 information document". Archived from the original on 2016-03-27.
  7. 1 2 Code Page CPGID 00852 (pdf) (PDF), IBM[ permanent dead link ]
  8. 1 2 Code Page CPGID 00852 (txt), IBM
  9. "The Czech and Slovak Character Encoding Mess Explained". luki.sdf-eu.org. Retrieved 2022-02-27.
  10. The Czech and Slovak Character Encoding Mess Explained / Kamenicky
  11. "cp852_DOSLatin2 to Unicode table" (TXT). The Unicode Consortium. Retrieved 11 Nov 2011.
  12. International Components for Unicode (ICU), ibm-852_P100-1995.ucm, 2002-12-03