GOST 10859

Last updated

GOST 10859 (1964) is a standard of the Soviet Union which defined how to encode data on punched cards. This standard allowed a variable word size, depending on the type of data being encoded, but only uppercase characters.

Contents

These include the non-ASCII . It was used to express real numbers in scientific notation. For example: 6.0221415⏨23.

The character was also part of the ALGOL programming language specifications and was incorporated into the then German character encoding standard ALCOR. GOST 10859 also included numerous other non-ASCII characters/symbols useful to ALGOL programmers, e.g.: ∨, ∧, ⊃, ≡, ¬, ≠, ↑, ↓, ×, ÷, ≤, ≥, °, &, ∅, compare with ALGOL operators.

Character sets

GOST 10859 4-bit code: Binary-coded decimal
0123456789ABCDEF
0x0123456789+-/,. DEL
GOST 10859 5-bit code: with BCD & mathematical operators
0123456789ABCDEF
0x0123456789+-/,.  SP  
1x()×=;[]*<> DEL
GOST 10859 6-bit code: with only Cyrillic upper-case letters
0123456789ABCDEF
0x0123456789+-/,.  SP  
1x()×=;[]*<> :
2x А Б В Г Д Е Ж З И Й К Л М Н О П
3x Р С Т У Ф Х Ц Ч Ш Щ Ы Ь Э Ю Я DEL
GOST 10859 7-bit code: Cyrillic and Latin upper-case letters
0123456789ABCDEF
0x0123456789+-/,.  SP  
1x () × =;[]* <> :
2xtitle="U+0410 CYRILLIC CAPITAL LETTER A orU+0041 LATIN CAPITAL LETTER A" style="padding:0px;background:#FEE"}}|А Б title="U+0412 CYRILLIC CAPITAL LETTER VE orU+0042 LATIN CAPITAL LETTER B" style="padding:0px;background:#FEE"}}|В Г Д title="U+0415 CYRILLIC CAPITAL LETTER IE orU+0045 LATIN CAPITAL LETTER E" style="padding:0px;background:#FEE"}}|Е Ж З И Й title="U+041A CYRILLIC CAPITAL LETTER KA orU+004A LATIN CAPITAL LETTER J" style="padding:0px;background:#FEE"}}|К Л title="U+041C CYRILLIC CAPITAL LETTER EM orU+004C LATIN CAPITAL LETTER L" style="padding:0px;background:#FEE"}}|М title="U+041D CYRILLIC CAPITAL LETTER EN orU+0048 LATIN CAPITAL LETTER H" style="padding:0px;background:#FEE"}}|Н title="U+041E CYRILLIC CAPITAL LETTER O orU+004E LATIN CAPITAL LETTER N" style="padding:0px;background:#FEE"}}|О П
3xtitle="U+0420 CYRILLIC CAPITAL LETTER ER orU+0050 LATIN CAPITAL LETTER P" style="padding:0px;background:#FEE"}}|Р title="U+0421 CYRILLIC CAPITAL LETTER ES orU+0043 LATIN CAPITAL LETTER C" style="padding:0px;background:#FEE"}}|С title="U+0422 CYRILLIC CAPITAL LETTER TE orU+0054 LATIN CAPITAL LETTER T" style="padding:0px;background:#FEE"}}|Т title="U+0423 CYRILLIC CAPITAL LETTER U orU+0059 LATIN CAPITAL LETTER Y" style="padding:0px;background:#FEE"}}|У Ф title="U+0425 CYRILLIC CAPITAL LETTER HA orU+0058 LATIN CAPITAL LETTER X" style="padding:0px;background:#FEE"}}|Х Ц Ч Ш Щ Ы Ь Э Ю Я D
4xFGIJLNQRSUVWZ
5x ¬ ÷ % | _ ! " Ъ ° '
6x ? ±
7x DEL
  Cyrillic and Latin letters with identical (A, B, C, E, H, K, M, O, P, T, X) and similar (Y/У) glyphs were unified.
GOST 10859 6-bit code: with only Latin upper-case letters
0123456789ABCDEF
0x0123456789+-/,.  SP  
1x()×=;[]*<> :
2xABCDEFGHIJKLMNOP
3xQRSTUVWXYZ ¬ ÷ DEL

See also

Related Research Articles

ASCII American character encoding standard

ASCII, abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Most modern character-encoding schemes are based on ASCII, although they support many additional characters.

ALGOL Family of programming languages

ALGOL is a family of imperative computer programming languages originally developed in 1958. ALGOL heavily influenced many other languages and was the standard method for algorithm description used by the Association for Computing Machinery (ACM) in textbooks and academic sources for more than thirty years.

Character encoding Using numbers to represent text characters

In computing, data storage, and data transmission, character encoding is used to represent a repertoire of characters by some kind of encoding system that assigns a number to each character for digital representation. Depending on the abstraction level and context, corresponding code points and the resulting code space may be regarded as bit patterns, octets, natural numbers, electrical pulses, or anything of the like. A character encoding is used in computation, data storage, and transmission of textual data. "Character set", "character map", "codeset" and "code page" are related, but not identical, terms.

While Hypertext Markup Language (HTML) has been in use since 1991, HTML 4.0 from December 1997 was the first standardized version where international characters were given reasonably complete treatment. When an HTML document includes special characters outside the range of seven-bit ASCII, two goals are worth considering: the information's integrity, and universal browser display.

Extended Binary Coded Decimal Interchange Code is an eight-bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems. It descended from the code used with punched cards and the corresponding six-bit binary-coded decimal code used with most of IBM's computer peripherals of the late 1950s and early 1960s. It is supported by various non-IBM platforms, such as Fujitsu-Siemens' BS2000/OSD, OS-IV, MSP, and MSP-EX, the SDS Sigma series, Unisys VS/9, Unisys MCP and ICL VME.

UTF-8 is a variable-width character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from UnicodeTransformation Format – 8-bit.

The backslash\ is a typographical mark used mainly in computing and is the mirror image of the common slash /. It is sometimes called a hack, whack, escape, reverse slash, slosh, downwhack, backslant, backwhack, bash, reverse slant, and reversed virgule. In Unicode and ASCII it is encoded at U+005C\REVERSE SOLIDUS (92decimal).

Mojibake Garbled text as a result of incorrect character encoding

Mojibake is the garbled text that is the result of text being decoded using an unintended character encoding. The result is a systematic replacement of symbols with completely unrelated ones, often from a different writing system.

Non-English-based programming languages are programming languages that do not use keywords taken from or inspired by English vocabulary.

ISO/IEC 8859-5:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 5: Latin/Cyrillic alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1988. It is informally referred to as Latin/Cyrillic. It was designed to cover languages using a Cyrillic alphabet such as Bulgarian, Belarusian, Russian, Serbian and Macedonian but was never widely used. It would also have been usable for Ukrainian in the Soviet Union from 1933 to 1990, but it is missing the Ukrainian letter ge, ґ, which is required in Ukrainian orthography before and since, and during that period outside Soviet Ukraine. As a result, IBM created Code page 1124.

KOI (КОИ) is a family of several code pages for the Cyrillic script. The name stands for Kod obmena informatsiey which means "Code for Information Interchange".

KOI-8 (КОИ-8) is an 8-bit character set standardized in GOST 19768-74. It is an extension of KOI-7 which allows the use of the Latin alphabet along with the Russian alphabet, both the upper and lower case letters; however, the letter Ёё and the uppercase Ъ are missed, the latter to avoid conflicts with the delete character. The first 127 code points are identical to ASCII with the exception of the dollar sign $ replaced by the universal currency sign ¤. The rows x8_ and x9_ might be filled with the additional control characters from EBCDIC.

KOI-7 (КОИ-7) is a 7-bit character encoding, designed to cover Russian, which uses the Cyrillic alphabet.

The currency sign¤ is a character used to denote an unspecified currency. It can be described as a circle the size of a lowercase character with four short radiating arms at 45° (NE), 135° (SE), 225° (SW) and 315° (NW). It is raised slightly above the baseline. The character is sometimes called scarab.

Windows code pages are sets of characters or code pages used in Microsoft Windows from the 1980s and 1990s. Windows code pages were gradually superseded when Unicode was implemented in Windows, although they are still supported both within Windows and other platforms, and still apply when Alt code shortcuts are used.

YUSCII is an informal name for several JUS standards for 7-bit character encoding. These include:

Extended ASCII

Extended ASCII character encodings are eight-bit or larger encodings that include the standard seven-bit ASCII characters, plus additional characters. Using the term "extended ASCII" on its own is sometimes criticized, because it can be mistakenly interpreted to mean that the ASCII standard has been updated to include more than 128 characters or that the term unambiguously identifies a single encoding, neither of which is the case.

Rostest is the largest organization of practical metrology and certification on the territory of the Russian Federation. The main goal of Rostest is governmental standards control aimed at ensuring and maintaining uniformity of measurements in industry, health care, communication systems, trading, military defense, and resource counting as well as environmental protection and other economic activities.

INIS-8 is an 8-bit character encoding developed by the International Nuclear Information System (INIS). It is an 8-bit extension of the 7-bit INIS character set, adding a G1 set, and has MIB 52. It is also known as iso-ir-50 and csISO50INIS8.

References

Further reading