KOI8-R

Last updated
KOI8-R
Alias(es)cp878 (code page 878)
Language(s) Russian, Bulgarian
Classification8-bit KOI, extended ASCII
Extends KOI8-B
Based on KOI-8
Other related encoding(s) KOI8-U, KOI8-RU

KOI8-R (RFC 1489) is an 8-bit character encoding, derived from the KOI-8 encoding by the programmer Andrei Chernov in 1993 and designed to cover Russian, which uses a Cyrillic alphabet. KOI8-R was based on Russian Morse code, which was created from a phonetic version of Latin Morse code. As a result, Russian Cyrillic letters are in pseudo-Roman order rather than the normal Cyrillic alphabetical order. Although this may seem unnatural, if the 8th bit is stripped, the text is partially readable in ASCII and may convert to syntactically correct KOI7. For example, "Русский Текст" in KOI8-R becomes rUSSKIJ tEKST ("Russian Text").

Contents

KOI8 stands for Kod Obmena Informatsiey, 8 bit (Russian : Код Обмена Информацией, 8 бит) which means "Code for Information Exchange, 8 bit". In Microsoft Windows, KOI8-R is assigned the code page number 20866. In IBM, KOI8-R is assigned code page 878. [1] [2] KOI8-R also happens to cover Bulgarian, but has not been used for that purpose since CP1251 was accepted. The use of these older code pages is being replaced with Unicode as a more common way to represent Cyrillic together with other languages.

Unicode is preferred to KOI-8 and its variants or other Cyrillic encodings in modern applications, especially on the Internet, making UTF-8 the dominant encoding for web pages. (For further discussion of Unicode's complete coverage, of 436 Cyrillic letters/code points, including for Old Cyrillic, and how single-byte character encodings, such as Windows-1251 and KOI8 variants, cannot provide this, see Cyrillic script in Unicode.)

Character set

The following table shows the KOI8-R encoding. Each character is shown with its equivalent Unicode code point.

KOI8-R [3] [4] [5] [6]
0123456789ABCDEF
0x
1x
2x  SP   ! " # $ % & ' ( ) * + , - . /
3x 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4x @ A B C D E F G H I J K L M N O
5x P Q R S T U V W X Y Z [ \ ] ^ _
6x ` a b c d e f g h i j k l m n o
7x p q r s t u v w x y z { | } ~
8x
2500

2502

250C

2510

2514

2518

251C

2524

252C

2534

253C

2580

2584

2588

258C

2590
9x
2591

2592

2593

2320

25A0

2219

221A

2248

2264

2265
NBSP
2321
°
00B0
²
00B2
·
00B7
÷
00F7
Ax
2550

2551

2552
ё
0451

2553

2554

2555

2556

2557

2558

2559

255A

255B

255C

255D

255E
Bx
255F

2560

2561
Ё
0401

2562

2563

2564

2565

2566

2567

2568

2569

256A

256B

256C
©
00A9
Cx ю
044E
а
0430
б
0431
ц
0446
д
0434
е
0435
ф
0444
г
0433
х
0445
и
0438
й
0439
к
043A
л
043B
м
043C
н
043D
о
043E
Dx п
043F
я
044F
р
0440
с
0441
т
0442
у
0443
ж
0436
в
0432
ь
044C
ы
044B
з
0437
ш
0448
э
044D
щ
0449
ч
0447
ъ
044A
Ex Ю
042E
А
0410
Б
0411
Ц
0426
Д
0414
Е
0415
Ф
0424
Г
0413
Х
0425
И
0418
Й
0419
К
041A
Л
041B
М
041C
Н
041D
О
041E
Fx П
041F
Я
042F
Р
0420
С
0421
Т
0422
У
0423
Ж
0416
В
0412
Ь
042C
Ы
042B
З
0417
Ш
0428
Э
042D
Щ
0429
Ч
0427
Ъ
042A

See also

Related Research Articles

ISO/IEC 8859-3:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 3: Latin alphabet No. 3, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1988. It is informally referred to as Latin-3 or South European. It was designed to cover Turkish, Maltese and Esperanto, though the introduction of ISO/IEC 8859-9 superseded it for Turkish. The encoding was popular for users of Esperanto, but fell out of use as application support for Unicode became more common.

ISO/IEC 8859-11:2001, Information technology — 8-bit single-byte coded graphic character sets — Part 11: Latin/Thai alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 2001. It is informally referred to as Latin/Thai. It is nearly identical to the national Thai standard TIS-620 (1990). The sole difference is that ISO/IEC 8859-11 allocates non-breaking space to code 0xA0, while TIS-620 leaves it undefined.

The Multinational Character Set is a character encoding created in 1983 by Digital Equipment Corporation (DEC) for use in the popular VT220 terminal. It was an 8-bit extension of ASCII that added accented characters, currency symbols, and other character glyphs missing from 7-bit ASCII. It is only one of the code pages implemented for the VT220 National Replacement Character Set (NRCS). MCS is registered as IBM code page/CCSID 1100 since 1992. Depending on associated sorting Oracle calls it WE8DEC, N8DEC, DK8DEC, S8DEC, or SF8DEC.

ISO/IEC 8859-5:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 5: Latin/Cyrillic alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1988. It is informally referred to as Latin/Cyrillic. It was designed to cover languages using a Cyrillic alphabet such as Bulgarian, Belarusian, Russian, Serbian and Macedonian but was never widely used. It would also have been usable for Ukrainian in the Soviet Union from 1933 to 1990, but it is missing the Ukrainian letter ge, ґ, which is required in Ukrainian orthography before and since, and during that period outside Soviet Ukraine. As a result, IBM created Code page 1124.

ISO/IEC 8859-7:2003, Information technology — 8-bit single-byte coded graphic character sets — Part 7: Latin/Greek alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is informally referred to as Latin/Greek. It was designed to cover the modern Greek language. The original 1987 version of the standard had the same character assignments as the Greek national standard ELOT 928, published in 1986. The table in this article shows the updated 2003 version which adds three characters. Microsoft has assigned code page 28597 a.k.a. Windows-28597 to ISO-8859-7 in Windows. IBM has assigned code page 813 to ISO 8859-7. (IBM CCSID 813 is the original encoding. CCSID 4909 adds the euro sign. CCSID 9005 further adds the drachma sign and ypogegrammeni.)

KOI8-U is an 8-bit character encoding, designed to cover Ukrainian, which uses a Cyrillic alphabet. It is based on KOI8-R, which covers Russian and Bulgarian, but replaces eight box drawing characters with four Ukrainian letters Ґ, Є, І, and Ї in both upper case and lower case.

Windows-1251 is an 8-bit character encoding, designed to cover languages that use the Cyrillic script such as Russian, Ukrainian, Belarusian, Bulgarian, Serbian Cyrillic, Macedonian and other languages.

KOI (КОИ) is a family of several code pages for the Cyrillic script. The name stands for Kod obmena informatsiey which means "Code for Information Interchange".

Code page 855

Code page 855 is a code page used under DOS to write Cyrillic script.

Code page 866

Code page 866 is a code page used under DOS and OS/2 in Russia to write Cyrillic script. It is based on the "alternative code page" developed in 1984 in IHNA AS USSR and published in 1986 by a research group at the Academy of Science of the USSR. The code page was widely used during the DOS era because it preserves all of the pseudographic symbols of code page 437 and maintains alphabetic order of Cyrillic letters. Initially, this encoding was only available in the Russian version of MS-DOS 4.01 (1990) and since MS-DOS 6.22 in any language version.

Windows-1255 is a code page used under Microsoft Windows to write Hebrew. It is an almost compatible superset of ISO-8859-8 – most of the symbols are in the same positions, but Windows-1255 adds vowel-points and other signs in lower positions.

Windows-1257 is an 8-bit, single-byte extended ASCII code page used to support the Estonian, Latvian and Lithuanian languages under Microsoft Windows. In Lithuania, it is standardised as LST 1590-3, alongside a modified variant named LST 1590-4.

Mac OS Cyrillic is a character encoding used on Apple Macintosh computers to represent texts in the Cyrillic script.

Windows code pages are sets of characters or code pages used in Microsoft Windows from the 1980s and 1990s. Windows code pages were gradually superseded when Unicode was implemented in Windows, although they are still supported both within Windows and other platforms, and still apply when Alt code shortcuts are used.

Code page 895 is a 7-bit character set and is Japan's national ISO 646 variant. It is the Roman set of the JIS X 0201 Japanese Standard and is variously called Japan 7-Bit Latin, JISCII, JIS Roman, JIS C6220-1969-ro, ISO646-JP or Japanese-Roman. Its ISO-IR registration number is 14.

Code page 1287, also known as CP1287, DEC Greek (8-bit) and EL8DEC, is one of the code pages implemented for the VT220 terminals. It supports the Greek language.

Code page 1288, also known as CP1288, DEC Turkish (8-bit) and TR8DEC, is one of the code pages implemented for the VT220 terminals. It supports the Turkish language.

KOI8-RU is an 8-bit character encoding, designed to cover Russian, Ukrainian, and Belarusian which use a Cyrillic alphabet. It is closely related to KOI8-R, which covers Russian and Bulgarian, but replaces ten box drawing characters with five Ukrainian and Belarusian letters Ґ, Є, І, Ї, and Ў in both upper case and lower case. It is even more closely related to KOI8-U, which does not include Ў but otherwise makes the same replacements. The additional letter allocations are matched by KOI8-E, except for Ґ which is added to KOI8-F.

KOI8-F or KOI8 Unified is an 8-bit character set. It was designed by Peter Cassetta of Fingertip Software as an attempt to support all the encoded letters from both KOI8-E (ISO-IR-111) and KOI8-RU, along with some of the pseudographics from KOI8-R, with some additional punctuation in the remaining space, sourced partly from Windows-1251. This encoding was only used in the software of that company.

ISO-IR-111 or KOI8-E is an 8-bit character set. It is a multinational extension of KOI-8 for Belarusian, Macedonian, Serbian, and Ukrainian. The name "ISO-IR-111" refers to its registration number in the ISO-IR registry, and denotes it as a set usable with ISO/IEC 2022.

References

  1. "SBCS code page information - CPGID: 00878 / Name: Russian internet koi8-r". IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers. IBM. C-H 3-3220-050. Archived from the original on 2017-02-18. Retrieved 2017-02-18.
  2. "CCSID information document; CCSID 878; KOI8-R CYRILLIC". IBM . Retrieved 2017-02-18.
  3. Richter, Helmut (2016-01-04) [1999-08-18]. "KOI8-R.TXT". 2.0. Retrieved 2016-12-09.
  4. Code Page CPGID 00878 (pdf) (PDF), IBM
  5. Code Page CPGID 00878 (txt), IBM
  6. International Components for Unicode (ICU), ibm-878_P100-1996.ucm, 2002-12-03

Further reading