YUSCII

Last updated
YUSCII encoding family
MIME / IANALatin:JUS_I.B1.002
Serbian Cyrillic:JUS_I.B1.003-serb
Macedonian:JUS_I.B1.003-mac [1]
Alias(es)Latin: ISO 646-YU, CROSCII, SLOSCII
Serbian: SRPSCII
Macedonian: MAKSCII
Language(s) Serbo-Croatian, Slovenian, Macedonian
StandardLatin: JUS I.B1.002
Serbian Cyrillic: JUS I.B1.003
Macedonian: JUS I.B1.004
Classification7-bit encoding
Latin: ISO 646
Succeeded byLatin: ISO 8859-2, Windows-1250
Cyrillic: ISO 8859-5, Windows-1251
Other related encoding(s) KOI-7

YUSCII is an informal name for several JUS standards for 7-bit character encoding. These include:

Contents

The encodings are based on ISO 646, 7-bit Latinic character encoding standard, and were used in Yugoslavia before widespread use of later CP 852, ISO-8859-2/8859-5, Windows-1250/1251 and Unicode standards. It was named after ASCII, having the first word "American" replaced with "Yugoslav": "Yugoslav Standard Code for Information Interchange". Specific standards are also sometimes called by a local name: SLOSCII, CROSCII or SRPSCII for JUS I.B1.002, SRPSCII for JUS I.B1.003, MAKSCII for JUS I.B1.004.

JUS I.B1.002 is a national ISO 646 variant, i.e. equal to basic ASCII with less frequently used symbols replaced with specific letters of Gaj's alphabet. Cyrillic standards further replace Latin alphabet letters with corresponding Cyrillic letters. Љ (lj), Њ (nj), Џ (dž) and ѕ (dz) correspond to Latin digraphs, and are mapped over Latin letters which are not used in Serbian or Macedonian (q, w, x, y).

YUSCII was originally developed for teleprinters but it also spread for computer use. This was widely considered a bad idea among software developers who needed the original ASCII such as {, [, }, ], ^, ~, |, \ in their source code (an issue partly addressed by trigraphs in C). On the other hand, an advantage of YUSCII is that it remains comparatively readable even when support for it is not available, similarly to the Russian KOI-7. Numerous attempts to replace it with something better kept failing due to limited support. Eventually, Microsoft's introduction of code pages, appearance of Unicode and availability of fonts finally spelled sure (but nevertheless still slow) end of YUSCII.[ citation needed ]

Codepage layout

Code points remained largely the same as in ASCII to maintain maximum compatibility. Following table shows allocation of character codes in YUSCII. Both Latin and Cyrillic glyphs are shown:

YUSCII [2] [3] [4]
0123456789ABCDEF
0x NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI
1x DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US
2x  SP   ! " # $ % & ' ( ) * + , - . /
3x 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4x Ž/Ж A/А B/Б C/Ц D/Д E/Е F/Ф G/Г H/Х I/И J/Ј K/К L/Л M/М N/Н O/О
5x P/П Q/Љ R/Р S/С T/Т U/У V/В W/Њ X/Џ Y/Ѕ Z/З Š/Ш Đ/Ђ/Ѓ Ć/Ћ/Ќ Č/Ч _
6x ž/ж a/а b/б c/ц d/д e/е f/ф g/г h/х i/и j/ј k/к l/л m/м n/н o/о
7x p/п q/љ r/р s/с t/т u/у v/в w/њ x/џ y/ѕ z/з š/ш đ/ђ/ѓ ć/ћ/ќ č/ч DEL
  Latin characters are different from ASCII

World System Teletext

YUSCII should not be confused with the G0 Latin set for Serbian, Croatian and Slovene, [5] or the G0 Cyrillic set for Serbian, [6] defined by World System Teletext. Like YUSCII, these are based on ASCII and are where possible homologous with each other for Serbian letters. However, they make different decisions and consequently are not compatible with YUSCII. Macedonian letters Ќ and Ѓ are also assigned unique positions rather than the same as their Serbian equivalents, whereas the lowercase form of Џ and the Macedonian letter Ѕ are not supported. [lower-alpha 1] The WST G0 sets are detailed below for reference.

World System Teletext G0 sets for Latin [5] and Cyrillic [6] script Serbian, Croatian and Slovene
0123456789ABCDEF
0x NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI
1x DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US
2x  SP   ! " # Ë/$ % & ' ( ) * + , - . /
3x 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4x Č/Ч A/А B/Б C/Ц D/Д E/Е F/Ф G/Г H/Х I/И J/Ј K/К L/Л M/М N/Н O/О
5x P/П Q/Ќ R/Р S/С T/Т U/У V/В W/Ѓ X/Љ Y/Њ Z/З Ć/Ћ Ž/Ж Đ/Ђ Š/Ш ë/Џ
6x č/ч a/а b/б c/ц d/д e/е f/ф g/г h/х i/и j/ј k/к l/л m/м n/н o/о
7x p/п q/ќ r/р s/с t/т u/у v/в w/ѓ x/љ y/њ z/з ć/ћ ž/ж đ/ђ š/ш
  Different from YUSCII

See also

Footnotes

  1. The Teletext G1 set for use with Cyrillic, listed in section 15.6.7 table 41 of the standard, contains a subset of Roman letters, mostly those without Cyrillic homoglyphs in the G0 sets. These include S.

Related Research Articles

ISO/IEC 8859-3:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 3: Latin alphabet No. 3, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1988. It is informally referred to as Latin-3 or South European. It was designed to cover Turkish, Maltese and Esperanto, though the introduction of ISO/IEC 8859-9 superseded it for Turkish. The encoding was popular for users of Esperanto, but fell out of use as application support for Unicode became more common.

ISO/IEC 646 is a set of ISO/IEC standards, described as Information technology — ISO 7-bit coded character set for information interchange and developed in cooperation with ASCII at least since 1964. Since its first edition in 1967 it has specified a 7-bit character code from which several national standards are derived.

ISO/IEC 8859-8, Information technology — 8-bit single-byte coded graphic character sets — Part 8: Latin/Hebrew alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings. ISO/IEC 8859-8:1999 from 1999 represents its second and current revision, preceded by the first edition ISO/IEC 8859-8:1988 in 1988. It is informally referred to as Latin/Hebrew. ISO/IEC 8859-8 covers all the Hebrew letters, but no Hebrew vowel signs. IBM assigned code page 916 to it. This character set was also adopted by Israeli Standard SI1311:2002, with some extensions.

ISO/IEC 8859-5:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 5: Latin/Cyrillic alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1988. It is informally referred to as Latin/Cyrillic.

ISO/IEC 8859-6:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 6: Latin/Arabic alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is informally referred to as Latin/Arabic. It was designed to cover Arabic. Only nominal letters are encoded, no preshaped forms of the letters, so shaping processing is required for display. It does not include the extra letters needed to write most Arabic-script languages other than Arabic itself.

ISO/IEC 8859-13:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 13: Latin alphabet No. 7, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1998. It is informally referred to as Latin-7 or Baltic Rim. It was designed to cover the Baltic languages, and added characters used in Polish missing from the earlier encodings ISO 8859-4 and ISO 8859-10. Unlike these two, it does not cover the Nordic languages. It is similar to the earlier-published Windows-1257; its encoding of the Estonian alphabet also matches IBM-922.

ISO/IEC 8859-14:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 14: Latin alphabet No. 8 (Celtic), is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1998. It is informally referred to as Latin-8 or Celtic. It was designed to cover the Celtic languages, such as Irish, Manx, Scottish Gaelic, Welsh, Cornish, and Breton.

ISO/IEC 8859-16:2001, Information technology — 8-bit single-byte coded graphic character sets — Part 16: Latin alphabet No. 10, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 2001. The same encoding was defined as Romanian Standard SR 14111 in 1998, named the "Romanian Character Set for Information Interchange". It is informally referred to as Latin-10 or South-Eastern European. It was designed to cover Albanian, Croatian, Hungarian, Polish, Romanian, Serbian and Slovenian, but also French, German, Italian and Irish Gaelic.

ISO/IEC 2022Information technology—Character code structure and extension techniques, is an ISO/IEC standard in the field of character encoding. It is equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202. Originating in 1971, it was most recently revised in 1994.

<span class="mw-page-title-main">Shift Out and Shift In characters</span> ASCII control characters

Shift Out (SO) and Shift In (SI) are ASCII control characters 14 and 15, respectively. These are sometimes also called "Control-N" and "Control-O".

The C0 and C1 control code or control character sets define control codes for use in text by computer systems that use ASCII and derivatives of ASCII. The codes represent additional information about the text, such as the position of a cursor, an instruction to start a new line, or a message that the text has been received.

T.51 / ISO/IEC 6937:2001, Information technology — Coded graphic character set for text communication — Latin alphabet, is a multibyte extension of ASCII, or more precisely ISO/IEC 646-IRV. It was developed in common with ITU-T for telematic services under the name of T.51, and first became an ISO standard in 1983. Certain byte codes are used as lead bytes for letters with diacritics (accents). The value of the lead byte often indicates which diacritic that the letter has, and the follow byte then has the ASCII-value for the letter that the diacritic is on.

The ISO basic Latin alphabet is an international standard for a Latin-script alphabet that consists of two sets of 26 letters, codified in various national and international standards and used widely in international communication. They are the same letters that comprise the current English alphabet. Since medieval times, they are also the same letters of the modern Latin alphabet. The order is also important for sorting words into alphabetical order.

The MARC-8 charset is a MARC standard used in MARC-21 library records. The MARC formats are standards for the representation and communication of bibliographic and related information in machine-readable form, and they are frequently used in library database systems. The character encoding now known as MARC-8 was introduced in 1968 as part of the MARC format. Originally based on the Latin alphabet, from 1979 to 1983 the JACKPHY initiative expanded the repertoire to include Japanese, Arabic, Chinese, and Hebrew characters, with the later addition of Cyrillic and Greek scripts. If a character is not representable in MARC-8 of a MARC-21 record, then UTF-8 must be used instead. UTF-8 has support for many more characters than MARC-8, which is rarely used outside library data.

The ISO 2033:1983 standard defines character sets for use with Optical Character Recognition or Magnetic Ink Character Recognition systems. The Japanese standard JIS X 9010:1984 is closely related.

ISO/IEC 10367:1991 is a standard developed by ISO/IEC JTC 1/SC 2, defining graphical character sets for use in character encodings implementing levels 2 and 3 of ISO/IEC 4873.

INIS-8 is an 8-bit character encoding developed by the International Nuclear Information System (INIS). It is an 8-bit extension of the 7-bit INIS character set, adding a G1 set, and has MIB 52. It is also known as iso-ir-50 and csISO50INIS8.

ISO-IR-111 or KOI8-E is an 8-bit character set. It is a multinational extension of KOI-8 for Belarusian, Macedonian, Serbian, and Ukrainian. The name "ISO-IR-111" refers to its registration number in the ISO-IR registry, and denotes it as a set usable with ISO/IEC 2022.

ISO-IR-197 is an 8-bit, single-byte character encoding which was designed for the Sámi languages. It is a modification of ISO 8859-1, replacing certain punctuation and symbol characters with additional letters used in certain Sámi orthographies.

This article covers technical details of the character encoding system defined by ETS 300 706 of the ETSI, a standard for World System Teletext, and used for the Viewdata and Teletext variants of Videotex in Europe.

References

  1. "Character Sets". IANA. 2018-12-12.
  2. 1 2 Federal Institution for Standardization (1987-11-01). ISO-IR-141: Serbocroatian and Slovenian Latin Alphabet (PDF). ITSCJ/IPSJ.
  3. 1 2 Federal Institution for Standardization (1988-10-01). ISO-IR-146: Serbocroatian Cyrillic Alphabet (PDF). ITSCJ/IPSJ.
  4. 1 2 Federal Institution for Standardization (1988-10-01). ISO-IR-147: Macedonian Cyrillic Alphabet (PDF). ITSCJ/IPSJ.
  5. 1 2 "15.6.2 Latin National Option Sub-Sets, Table 36". ETS 300 706: Enhanced Teletext specification (PDF). European Telecommunications Standards Institute (ETSI). p. 115.
  6. 1 2 "15.6.4 Cyrillic G0 Set - Option 1 - Serbian/Croatian, Table 38". ETS 300 706: Enhanced Teletext specification (PDF). European Telecommunications Standards Institute (ETSI). p. 117.