Code page 936 (IBM)

Last updated
IBM-936
Alias(es)SHIFTGB [1]
Language(s) Simplified Chinese
Created by IBM
Current statusDeprecated
Transforms / Encodes GB 2312
Succeeded by IBM-1381
Other related encoding(s) Shift JIS

IBM code page 936 is a character encoding for Simplified Chinese including 1880 user-defined characters (UDC), which was superseded in 1993. It is a combination of the single-byte Code page 903 and the double-byte Code page 928. [2] [3] Code page 946 uses the same double-byte component, but an extended single-byte component (Code page 1042). [2] [4]

Contents

IBM code page 936 should not be confused with the identically numbered Windows code page, which is a variant of the GBK encoding; [2] GBK is called Code page 1386 by IBM. While GBK is a superset of the EUC-CN encoding of GB 2312, IBM-936 uses a different coded form of GB 2312, more closely resembling the relationship of Shift JIS to JIS X 0208.

History

Except for Shift JIS itself, the similarly structured code pages for other CJK locales were phased out between 1992 and 2016. IBM CJK Code Page Numbers.svg
Except for Shift JIS itself, the similarly structured code pages for other CJK locales were phased out between 1992 and 2016.

The encoding was in use mainly during the 1980s and early 1990s. While the original IBM PC (IBM 5150) lacked functionality for processing data in CJK languages, the IBM 5550 possessed such functionality, and was available in models supporting Japanese, Korean, Traditional Chinese or Simplified Chinese. Code page 936 for Simplified Chinese accompanied code page 932 (Shift JIS) for Japanese, code page 934 for Korean and code page 938 for Traditional Chinese.

The last revision of IBM-928/936/946 was documented in 1992, and it was superseded in 1993 by the EUC-CN-based code pages 1380 through 1383; code page 1380 encodes the same characters as code page 928, but in a different layout. [5] As of 1998, "some older Chinese packages" still included an algorithm for converting between IBM-936 and other encodings of GB 2312. [1]

Status

Although chart definitions for Code page 1380 (the document C-H 3-3220-130 1993-11) are provided online by IBM, IBM does not similarly provide the chart definition for the older Code page 928 (the document C-H 3-3220-130 1992-11, i.e. an earlier revision of the same specification). [5] [6] International Components for Unicode (ICU) does not include an IBM-936 or IBM-946 codec, and uses the Windows code page for the "cp936" label. [7] The ICU project does possess mapping data for IBM-946, which it makes publicly available, [8] but does not ship it with ICU.

Structure

Code page 928, the double byte component, includes 9,355 characters as double-byte sequences starting with 0x81 through 0xAC and 0xF0 through 0xFA. [9]

The 0x81–AC lead byte range is used for GB 2312 characters: lead bytes 0x81–87 were used for non-hanzi, 0x88–9C are used for level 1 hanzi and 0x9C–AC are used for level 2 hanzi. [1] [5] [8] Like Shift JIS, trail (second) bytes are in the range 0x40–FC excluding 0x7F, allowing two GB 2312 rows to be encoded per lead byte; [8] unlike Shift JIS, the bytes 0xA0–AC are not excluded from the lead byte range, [5] [8] since JIS X 0201 compatibility was not required. The 0xF0–FA lead byte range is used for IBM extensions: 0xF0 through 0xF9 are used for user-defined characters, and 0xFA is used for additional non-hanzi. [5]

Related Research Articles

Big-5 or Big5 is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for traditional Chinese characters.

In computing, Chinese character encodings can be used to represent text written in the CJK languages—Chinese, Japanese, Korean—and (rarely) obsolete Vietnamese, all of which use Chinese characters. Several general-purpose character encodings accommodate Chinese characters, and some of them were developed specifically for Chinese.

ISO/IEC 8859-7:2003, Information technology — 8-bit single-byte coded graphic character sets — Part 7: Latin/Greek alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is informally referred to as Latin/Greek. It was designed to cover the modern Greek language. The original 1987 version of the standard had the same character assignments as the Greek national standard ELOT 928, published in 1986. The table in this article shows the updated 2003 version which adds three characters. Microsoft has assigned code page 28597 a.k.a. Windows-28597 to ISO-8859-7 in Windows. IBM has assigned code page 813 to ISO 8859-7. (IBM CCSID 813 is the original encoding. CCSID 4909 adds the euro sign. CCSID 9005 further adds the drachma sign and ypogegrammeni.)

Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese (characters).

GB/T 2312-1980 is a key official character set of the People's Republic of China, used for Simplified Chinese characters. GB2312 is the registered internet name for EUC-CN, which is its usual encoded form. GB refers to the Guobiao standards (国家标准), whereas the T suffix denotes a non-mandatory standard.

Windows-1257 is an 8-bit, single-byte extended ASCII code page used to support the Estonian, Latvian and Lithuanian languages under Microsoft Windows. In Lithuania, it is standardised as LST 1590-3, alongside a modified variant named LST 1590-4.

<span class="mw-page-title-main">GBK (character encoding)</span> Simplified Chinese character encoding

GBK is an extension of the GB 2312 character set for Simplified Chinese characters, used in the People's Republic of China. It includes all unified CJK characters found in GB 13000.1-93, i.e. ISO/IEC 10646:1993, or Unicode 1.1. Since its initial release in 1993, GBK has been extended by Microsoft in Code page 936/1386, which was then extended into GBK 1.0. GBK is also the IANA-registered internet name for the Microsoft mapping, which differs from other implementations primarily by the single-byte euro sign at 0x80.

Windows code page 936, is Microsoft's legacy (pre-Unicode) character encoding for representing simplified Chinese text on computers. It is one of the four Windows DBCSs for East Asian languages, accompanying code pages 932 (Japanese), 949 (Korean) and 950. It is a variant of the Mainland Chinese Guójiā Biāozhǔn Kuòzhǎn (GBK) encoding, and roughly corresponds to IBM code page 1386.

<span class="mw-page-title-main">Code page 950</span> Windows code page for Traditional Chinese, based on Big5

Code page 950 is the code page used on Microsoft Windows for Traditional Chinese. It is Microsoft's implementation of the de facto standard Big5 character encoding. The code page is not registered with IANA, and hence, it is not a standard to communicate information over the internet, although it is usually labelled simply as big5, including by Microsoft library functions.

IBM code page 932 is one of IBM's extensions of Shift JIS. The coded character sets are JIS X 0201:1976, JIS X 0208:1983, IBM extensions and IBM extensions for IBM 1880 UDC. It is the combination of the single-byte Code page 897 and the double-byte Code page 301. Code page 301 is designed to encode the same repertoire as IBM Japanese DBCS-Host.

<span class="mw-page-title-main">Unified Hangul Code</span> Windows character encoding for Korean

Unified Hangul Code (UHC), or Extended Wansung, also known under Microsoft Windows as Code Page 949, is the Microsoft Windows code page for the Korean language. It is an extension of Wansung Code to include all 11172 non-partial Hangul syllables present in Johab. This corresponds to the pre-composed syllables available in Unicode 2.0 and later.

<span class="mw-page-title-main">JIS X 0201</span> Japanese single byte character encoding

JIS X 0201, a Japanese Industrial Standard developed in 1969, was the first Japanese electronic character set to become widely used. The character set was initially known as JIS C 6220 before the JIS category reform. Its two forms were a 7-bit encoding or an 8-bit encoding, although the 8-bit form was dominant until Unicode replaced it. The full name of this standard is 7-bit and 8-bit coded character sets for information interchange (7ビット及び8ビットの情報交換用符号化文字集合).

JIS X 0212 is a Japanese Industrial Standard defining a coded character set for encoding supplementary characters for use in Japanese. This standard is intended to supplement JIS X 0208. It is numbered 953 or 5049 as an IBM code page.

In mathematics, the radical symbol, radical sign, root symbol, radix, or surd is a symbol for the square root or higher-order root of a number. The square root of a number x is written as

The CCITT Chinese Primary Set is a multi-byte graphic character set for Chinese communications created for the Consultative Committee on International Telephone and Telegraph (CCITT) in 1992. It is defined in ITU T.101, annex C, which codifies Data Syntax 2 Videotex. It is registered with the ISO-IR registry for use with ISO/IEC 2022 as ISO-IR-165, and encodable in the ISO-2022-CN-EXT code version.

Microsoft Windows code page 932, also called Windows-31J amongst other names, is the Microsoft Windows code page for the Japanese language, which is an extended variant of the Shift JIS Japanese character encoding. It contains standard 7-bit ASCII codes, and Japanese characters are indicated by the high bit of the first byte being set to 1. Some code points in this page require a second byte, so characters use either 8 or 16 bits for encoding.

Code page 942 is one of IBM's extensions of Shift JIS. The coded character sets are JIS X 0201, JIS X 0208, IBM extensions for IBM 1880 UDC and IBM extensions. It is the combination of the single-byte Code page 1041 and the double-byte Code page 301.

<span class="mw-page-title-main">Code page 949 (IBM)</span>

IBM code page 949 (IBM-949) is a character encoding which has been used by IBM to represent Korean language text on computers. It is a variable-width encoding which represents the characters from the Wansung code defined by the South Korean standard KS X 1001 in a format compatible with EUC-KR, but adds IBM extensions for additional hanja, additional precomposed Hangul syllables, and user-defined characters.

Code page 903 is encoded for use as the single byte component of certain simplified Chinese character encodings. It is used in China. Despite this, it follows ISO 646-JP / the Roman half of JIS X 0201, in that it replaces the ASCII backslash 0x5C with the yen/yuan sign. It also uses the same C0 replacement graphics as code page 897. When combined with the double-byte Code page 928, it forms the two code-sets of IBM code page 936.

Several mutually incompatible versions of the Extended Binary Coded Decimal Interchange Code (EBCDIC) have been used to represent the Japanese language on computers, including variants defined by Hitachi, Fujitsu, IBM and others. Some are variable-width encodings, employing locking shift codes to switch between single-byte and double-byte modes. Unlike other EBCDIC locales, the lowercase basic Latin letters are often not preserved in their usual locations.

References

  1. 1 2 3 Leisher, Mark (2008) [1998-03-06]. "SHIFTGB.TXT: Shifted GB2312.1980. Generated from an algorithm provided with some older Chinese packages". Department of Mathematical Sciences, New Mexico State University. Archived from the original on 2023-01-20.
  2. 1 2 3 Lunde, Ken (2009). "Chapter 4: Encoding Methods (§ Code Pages)". CJKV Information Processing (2nd ed.). Sebastopol, California: O'Reilly Media. pp. 278–282. ISBN   978-0-596-51447-1.
  3. "CCSID 936". IBM. Archived from the original on 2016-03-27.
  4. "CCSID 946". IBM. Archived from the original on 2016-03-26.
  5. 1 2 3 4 5 "Table 1: Registration of GCSGID and CPGID for the IBM CH-S Graphic Character Set". C-H 3-3220-130 1993-11: IBM Simplified Chinese Graphic Character Set (PDF). 1993. p. 6.
  6. "Code page 928 information document". Archived from the original on 2016-03-17.
  7. "windows-936-2000 (alias cp936)". ICU Demonstration - Converter Explorer. International Components for Unicode.
  8. 1 2 3 4 "ibm-946_P100-1995". International Components for Unicode Data Repository. Unicode Consortium, IBM.
  9. "CCSID 928 information document". Archived from the original on 2016-03-26.