JIS X 0213

Last updated
JIS X 0213
Language(s) Japanese, English, Ainu, Russian
Partial support: Greek, Chinese
StandardJIS X 0213
Classification ISO 2022, DBCS, CJK encoding
Extends JIS X 0208
Encoding formats Shift_JIS-2004
ISO-2022-JP-2004
EUC-JIS-2004
Preceded by JIS X 0208, JIS X 0212
Euler diagram comparing repertoires of JIS X 0208, JIS X 0212, JIS X 0213, Windows-31J, the Microsoft standard repertoire and Unicode Euler diag for jp charsets.svg
Euler diagram comparing repertoires of JIS X 0208, JIS X 0212, JIS X 0213, Windows-31J, the Microsoft standard repertoire and Unicode

JIS X 0213 is a Japanese Industrial Standard defining coded character sets for encoding the characters used in Japan. This standard extends JIS X 0208. The first version was published in 2000 and revised in 2004 (JIS2004) and 2012. [1] [2] [3] [4] As well as adding a number of special characters, characters with diacritic marks, etc., it included an additional 3,625 kanji. The full name of the standard is 7-bit and 8-bit double byte coded extended KANJI sets for information interchange (7ビット及び8ビットの2バイト情報交換用符号化拡張漢字集合, Nana-Bitto Oyobi Hachi-Bitto no Ni-Baito Jōhō Kōkan'yō Fugōka Kakuchō Kanji Shūgō).

Contents

JIS X 0213 has two "planes" (94×94 character tables). Plane 1 is a superset of JIS X 0208 containing kanji sets level 1 to 3 and non-kanji characters such as Hiragana, Katakana (including letters used to write the Ainu language), Latin, Greek and Cyrillic alphabets, digits, symbols and so on. Plane 2 contains only level 4 kanji set. Total number of the defined characters is 11,233. Each character is capable of being encoded in two bytes.

This standard largely replaced the rarely used JIS X 0212-1990 "supplementary" standard, which included 5,801 kanji and 266 non-kanji. Of the additional 3,695 kanji in JIS X 0213, all but 952 were already in JIS X 0212.

JIS X 0213 defines several 7-bit and 8-bit encodings including EUC-JIS-2004, ISO-2022-JP-2004 and Shift JIS-2004. Also, it defines the mapping from each of these encodings to ISO/IEC 10646 (Unicode) for each character.

Unicode version 3.2 incorporated all characters of JIS X 0213 except for the characters that could be represented using combining characters. Because about 300 kanji are in Unicode Plane 2, Unicode implementations supporting only the Basic Multilingual Plane cannot handle all of the JIS X 0213 characters. This is not an issue for most applications, however.

Glyph variants changed by the 2004 edition (click to enlarge). JIS X 0213 2000-2004.gif
Glyph variants changed by the 2004 edition (click to enlarge).

The 2004 edition of JIS X 0213 changed the recommended renderings of 168 kanji. [5] Ten additional kanji were added in JIS X 0213:2004. [6]

See also

Related Research Articles

In computing, JIS encoding refers to several Japanese Industrial Standards for encoding the Japanese language. Strictly speaking, the term means either:

ISO/IEC 2022Information technology—Character code structure and extension techniques, is an ISO/IEC standard in the field of character encoding. It is equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202. Originating in 1971, it was most recently revised in 1994.

Shift JIS is a character encoding for the Japanese language, originally developed by the Japanese company ASCII Corporation in conjunction with Microsoft and standardized as JIS X 0208 Appendix 1.

Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese (characters).

<i>Mojikyō</i> Character encoding scheme

Mojikyō, also known by its full name Konjaku Mojikyō, is a character encoding scheme created to provide a complete index of characters used in the Chinese, Japanese, Korean, Vietnamese Chữ Nôm and other historical Chinese logographic writing systems. The Mojikyō Institute, which published the character set, also published computer software and TrueType fonts to accompany it. The Mojikyō Institute, chaired by Tadahisa Ishikawa (石川忠久), originally had its character set and related software and data redistributed on CD-ROMs sold in Kinokuniya stores.

TRON Code is a multi-byte character encoding used in the TRON project. It is similar to Unicode but does not use Unicode's Han unification process: each character from each CJK character set is encoded separately, including archaic and historical equivalents of modern characters. This means that Chinese, Japanese, and Korean text can be mixed without any ambiguity as to the exact form of the characters; however, it also means that many characters with equivalent semantics will be encoded more than once, complicating some operations.

<span class="mw-page-title-main">Japanese postal mark</span> Character representing the service mark of the postal operator in Japan

is the service mark of Japan Post and its successor, Japan Post Holdings, the postal operator in Japan. It is also used as a Japanese postal code mark since the introduction of the latter in 1968. Historically, it was used by the Ministry of Communications, which operated the postal service. The mark is a stylized katakana syllable te (テ), from the word teishin. The mark was introduced on February 8, 1887.

<span class="mw-page-title-main">JIS X 0201</span> Japanese single byte character encoding

JIS X 0201, a Japanese Industrial Standard developed in 1969, was the first Japanese electronic character set to become widely used. The character set was initially known as JIS C 6220 before the JIS category reform. Its two forms were a 7-bit encoding or an 8-bit encoding, although the 8-bit form was dominant until Unicode replaced it. The full name of this standard is 7-bit and 8-bit coded character sets for information interchange (7ビット及び8ビットの情報交換用符号化文字集合).

Half-width kana are katakana characters displayed compressed at half their normal width, instead of the usual square (1:1) aspect ratio. For example, the usual (full-width) form of the katakana ka is カ while the half-width form is カ. Additionally, half-width hiragana is included in Unicode, and it is usable on Web or in e-books via CSS's font-feature-settings: "hwid" 1 with Adobe-Japan1-6 based OpenType fonts. Finally, half-width kanji is usable on modern computers, and is used in some receipt printers, electric bulletin board and old computers.

, in hiragana, in katakana, is one of the Japanese kana, which each represent one mora. Both represent and are derived from a simplification of the kanji. The hiragana character き, like さ, is drawn with the lower line either connected or disconnected.

, or , is one of the Japanese kana, each of which represents one mora. Both represent. The shape of these kana come from the kanji 計 and 介, respectively.

, in hiragana or in katakana, is one of the Japanese kana, each of which represents one mora. Both represent IPA:[ko]. The shape of these kana comes from the kanji 己.

, in hiragana or in katakana, is one of the Japanese kana, each of which represents one mora. Their shapes come from the kanji 寸 and 須, respectively. Both kana represent the sound. In the Ainu language, the katakana ス can be written as small ㇲ to represent a final s and is used to emphasize the pronunciation of [s] rather than the normal [ɕ].

JIS X 0212 is a Japanese Industrial Standard defining a coded character set for encoding supplementary characters for use in Japanese. This standard is intended to supplement JIS X 0208. It is numbered 953 or 5049 as an IBM code page.

JIS X 0208 is a 2-byte character set specified as a Japanese Industrial Standard, containing 6879 graphic characters suitable for writing text, place names, personal names, and so forth in the Japanese language. The official title of the current standard is 7-bit and 8-bit double byte coded KANJI sets for information interchange. It was originally established as JIS C 6226 in 1978, and has been revised in 1983, 1990, and 1997. It is also called Code page 952 by IBM. The 1978 version is also called Code page 955 by IBM.

Microsoft Windows code page 932, also called Windows-31J amongst other names, is the Microsoft Windows code page for the Japanese language, which is an extended variant of the Shift JIS Japanese character encoding. It contains standard 7-bit ASCII codes, and Japanese characters are indicated by the high bit of the first byte being set to 1. Some code points in this page require a second byte, so characters use either 8 or 16 bits for encoding.

<span class="mw-page-title-main">ARIB STD B24 character set</span> Character encoding and character set extensions used in Japanese broadcasting

Volume 1 of the Association of Radio Industries and Businesses (ARIB) STD-B24 standard for Broadcast Markup Language specifies, amongst other details, a character encoding for use in Japanese-language broadcasting. It was introduced on 1999-10-26. The latest revision is version 6.3 as of 2016-07-06.

Several mutually incompatible versions of the Extended Binary Coded Decimal Interchange Code (EBCDIC) have been used to represent the Japanese language on computers, including variants defined by Hitachi, Fujitsu, IBM and others. Some are variable-width encodings, employing locking shift codes to switch between single-byte and double-byte modes. Unlike other EBCDIC locales, the lowercase basic Latin letters are often not preserved in their usual locations.

EPWING is the standard format for electronic dictionaries mainly used for Japanese. A subset of EPWING V1 is standardized as JIS X 4081.

Ghost characters are erroneous kanji included in the Japanese Industrial Standard, JIS X 0208. 12 of the 6,355 kanji characters are ghost characters.

References

  1. "日本工業標準調査会:データベース-JIS詳細表示". 2012-02-20. Retrieved 15 Mar 2015.
  2. "日本工業標準調査会:データベース-JIS規格詳細表示". 2000-01-20. Retrieved 15 Mar 2015.
  3. "日本工業標準調査会:データベース-JIS規格詳細表示". 2004-02-20. Retrieved 15 Mar 2015.
  4. "日本工業標準調査会:データベース-JIS規格詳細表示". 2008-10-01. Retrieved 15 Mar 2015.
  5. http://kakijun.jp/main/jis2004.html (in Japanese)
  6. Lunde, Ken (2014-04-07). "JIS X 0212 versus JIS X 0213". CJK Type Blog. Adobe Inc. Archived from the original on 2021-11-04. Retrieved 2021-11-04.