JIS X 0212

Last updated
JIS X 0212
Language(s)Intended to be used alongside JIS X 0208 for Japanese support. Does not substantially support any language on its own.
StandardJIS X 0212:1990
Current status Unihan source. Coded character set itself not as widely supported as JIS X 0208, but sometimes used in EUC-JP. [1]
Classification
Extensions
  • IBM code page 953
  • OSF extensions
  • JIS X 0212/0213 hybrid plane 2
Encoding formats
Succeeded by JIS X 0213
Other related encoding(s)Intended to supplement: JIS X 0208
Other supplementary ISO 2022 CJK DBCSes: KS X 1002

JIS X 0212 is a Japanese Industrial Standard defining a coded character set for encoding supplementary characters for use in Japanese. This standard is intended to supplement JIS X 0208 (Code page 952). It is numbered 953 or 5049 as an IBM code page (see below).

Contents

It is one of the source standards for Unicode's CJK Unified Ideographs.

History

In 1990 the Japanese Standards Association (JSA) released a supplementary character set standard: JIS X 0212-1990 Code of the Supplementary Japanese Graphic Character Set for Information Interchange (情報交換用漢字符号-補助漢字, Jōhō Kōkan'yō Kanji Fugō - Hojo Kanji). This standard was intended to build upon the range of characters available in the main JIS X 0208 character set, and to address shortcomings in the coverage of that set.

Features

Euler diagram comparing repertoires of JIS X 0208, JIS X 0212, JIS X 0213, Windows-31J, the Microsoft standard repertoire and Unicode. Euler diag for jp charsets.svg
Euler diagram comparing repertoires of JIS X 0208, JIS X 0212, JIS X 0213, Windows-31J, the Microsoft standard repertoire and Unicode.

The standard specified 6,067 characters, comprising:

Encodings

The following encodings or encapsulations are used to enable JIS X 0212 characters to be used in files, etc.

No encapsulation of JIS X 0212 characters in the popular Shift JIS encoding is possible, as Shift JIS does not have sufficient unallocated code space for the characters.

Implementations

EUC-JP.svg
Windows-20932.svg
Encoding of JIS X 0212 in conformant EUC-JP (left) and Windows code page 20932 (right).

JIS X 0212 is called Code page 953 by IBM, which includes vendor extensions. [2] [3] [4] The alternative CCSID 5049 excludes these extensions. [5]

As JIS X 0212 characters cannot be encoded in Shift JIS, the coding system which has traditionally dominated Japanese information processing, few practical implementations of the character set have taken place. As mentioned above, it can be encoded in EUC-JP, which is commonly used in Unix/Linux systems, and it is here that most implementations have occurred:

Many WWW browsers such as the Netscape/Mozilla/Firefox family, Opera, etc. and related applications such as Mozilla Thunderbird support the display of JIS X 0212 characters in EUC-JP encoding, however Internet Explorer has no support for JIS X 0212 characters. Modern terminal emulation packages, such as the GNOME Terminal also support JIS X 0212 characters.

Applications which support JIS X 0212 in the EUC coding include:

JIS X 0212 and Unicode

The kanji in JIS X 0212 were taken as one of the sources for the Han unification which led to the unified set of CJK characters in the initial ISO 10646/Unicode standard. All the 5,801 kanji were incorporated.

The future

Apart from the applications mentioned above, the JIS X 0212 standard is effectively dead. 2,743 kanji from it were included in the later JIS X 0213 standard. In the longer term, its contribution will probably be seen to be the 5,801 kanji which were incorporated in Unicode.

See also

Related Research Articles

Big-5 or Big5 is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for traditional Chinese characters.

In internationalization, CJK characters is a collective term for the Chinese, Japanese, and Korean languages, all of which include Chinese characters and derivatives in their writing systems, sometimes paired with other scripts. Collectively, the CJK characters often include Hànzì in Chinese, Kanji and Kana in Japanese, and Hanja and Hangul in Korean. Vietnamese can be included, making the abbreviation CJKV, as Vietnamese historically used Chinese characters known as chữ Hán and chữ Nôm in Vietnamese.

<span class="mw-page-title-main">Japanese language and computers</span>

In relation to the Japanese language and computers many adaptation issues arise, some unique to Japanese and others common to languages which have a very large number of characters. The number of characters needed in order to write in English is quite small, and thus it is possible to use only one byte (28=256 possible values) to encode each English character. However, the number of characters in Japanese is many more than 256 and thus cannot be encoded using a single byte - Japanese is thus encoded using two or more bytes, in a so-called "double byte" or "multi-byte" encoding. Problems that arise relate to transliteration and romanization, character encoding, and input of Japanese text.

In computing, JIS encoding refers to several Japanese Industrial Standards for encoding the Japanese language. Strictly speaking, the term means either:

ISO/IEC 2022Information technology—Character code structure and extension techniques, is an ISO/IEC standard in the field of character encoding. It is equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202. Originating in 1971, it was most recently revised in 1994.

Shift JIS is a character encoding for the Japanese language, originally developed by a Japanese company called ASCII Corporation in conjunction with Microsoft and standardized as JIS X 0208 Appendix 1.

Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese (characters).

GB/T 2312-1980 is a key official character set of the People's Republic of China, used for Simplified Chinese characters. GB2312 is the registered internet name for EUC-CN, which is its usual encoded form. GB refers to the Guobiao standards (国家标准), whereas the T suffix denotes a non-mandatory standard.

<i>Mojikyō</i> Character encoding scheme

Mojikyō, also known by its full name Konjaku Mojikyō, is a character encoding scheme. The Mojikyō Institute, which published the character set, also published computer software and TrueType fonts to accompany it. The Mojikyō Institute, chaired by Tadahisa Ishikawa (石川忠久), originally had its character set and related software and data redistributed on CD-ROMs sold in Kinokuniya stores.

TRON Code is a multi-byte character encoding used in the TRON project. It is similar to Unicode but does not use Unicode's Han unification process: each character from each CJK character set is encoded separately, including archaic and historical equivalents of modern characters. This means that Chinese, Japanese, and Korean text can be mixed without any ambiguity as to the exact form of the characters; however, it also means that many characters with equivalent semantics will be encoded more than once, complicating some operations.

<span class="mw-page-title-main">JIS X 0201</span> Japanese single byte character encoding

JIS X 0201, a Japanese Industrial Standard developed in 1969, was the first Japanese electronic character set to become widely used. The character set was initially known as JIS C 6220 before the JIS category reform. Its two forms were a 7-bit encoding or an 8-bit encoding, although the 8-bit form was dominant until Unicode replaced it. The full name of this standard is 7-bit and 8-bit coded character sets for information interchange (7ビット及び8ビットの情報交換用符号化文字集合).

Half-width kana are katakana characters displayed compressed at half their normal width, instead of the usual square (1:1) aspect ratio. For example, the usual (full-width) form of the katakana ka is カ while the half-width form is カ. Half-width hiragana is not included in Unicode, although it is usable on Web or in e-books via CSS's font-feature-settings: "hwid" 1 with Adobe-Japan1-6 based OpenType fonts. Half-width kanji is not usable on modern computers, but is used in some receipt printers, electric bulletin board and old computers.

<span class="mw-page-title-main">JIS X 0213</span> Japanese standard character set

JIS X 0213 is a Japanese Industrial Standard defining coded character sets for encoding the characters used in Japan. This standard extends JIS X 0208. The first version was published in 2000 and revised in 2004 (JIS2004) and 2012. As well as adding a number of special characters, characters with diacritic marks, etc., it included an additional 3,625 kanji. The full name of the standard is 7-bit and 8-bit double byte coded extended KANJI sets for information interchange.

JIS X 0208 is a 2-byte character set specified as a Japanese Industrial Standard, containing 6879 graphic characters suitable for writing text, place names, personal names, and so forth in the Japanese language. The official title of the current standard is 7-bit and 8-bit double byte coded KANJI sets for information interchange. It was originally established as JIS C 6226 in 1978, and has been revised in 1983, 1990, and 1997. It is also called Code page 952 by IBM. The 1978 version is also called Code page 955 by IBM.

Code page 895 is a 7-bit character set and is Japan's national ISO 646 variant. It is the Roman set of the JIS X 0201 Japanese Standard and is variously called Japan 7-Bit Latin, JISCII, JIS Roman, JIS C6220-1969-ro, ISO646-JP or Japanese-Roman. Its ISO-IR registration number is 14.

Microsoft Windows code page 932, also called Windows-31J amongst other names, is the Microsoft Windows code page for the Japanese language, which is an extended variant of the Shift JIS Japanese character encoding. It contains standard 7-bit ASCII codes, and Japanese characters are indicated by the high bit of the first byte being set to 1. Some code points in this page require a second byte, so characters use either 8 or 16 bits for encoding.

<span class="mw-page-title-main">ARIB STD B24 character set</span> Character encoding and character set extensions used in Japanese broadcasting.

Volume 1 of the Association of Radio Industries and Businesses (ARIB) STD-B24 standard for Broadcast Markup Language specifies, amongst other details, a character encoding for use in Japanese-language broadcasting. It was introduced on 1999-10-26. The latest revision is version 6.3 as of 2016-07-06.

Code page 896, called Japan 7-Bit Katakana Extended, is IBM's code page for code-set G2 of EUC-JP, a 7-bit code page representing the Kana set of JIS X 0201 and accompanying Code page 895 which corresponds to the lower half of that standard. It encodes half-width katakana.

Several mutually incompatible versions of the Extended Binary Coded Decimal Interchange Code (EBCDIC) have been used to represent the Japanese language on computers, including variants defined by Hitachi, Fujitsu, IBM and others. Some are variable-width encodings, employing locking shift codes to switch between single-byte and double-byte modes. Unlike other EBCDIC locales, the lowercase basic Latin letters are often not preserved in their usual locations.

Ghost characters are erroneous kanji included in the Japanese Industrial Standard, JIS X 0208. 12 of the 6,355 kanji characters are ghost characters.

References

  1. van Kesteren, Anne. "5. Indexes (§ Index jis0212)". Encoding Standard. WHATWG.
  2. "Code page 953 information document". Archived from the original on 2016-03-17.
  3. "CCSID 953 information document". Archived from the original on 2016-03-28.
  4. Code Page CPGID 00953 (pdf) (PDF), IBM
  5. "CCSID 5049 information document". Archived from the original on 2016-03-27.