International Ideographs Core

Last updated

International Ideographs Core (IICore) is a subset of up to ten thousand CJK Unified Ideographs characters, which can be implemented on devices with limited memories and capability that make it not feasible to implement the full ISO 10646/Unicode standard [1] .

Contents

History

The IICore subset was initially raised in the 21st meeting of the Ideographic Rapporteur Group (IRG) in Guilin during 17th-20 November in 2003, and is subsequently passed in the group's 22nd meeting in Chengdu in May 2004.

See also

Related Research Articles

<span class="mw-page-title-main">Unicode</span> Character encoding standard

Unicode, formally The Unicode Standard, is a text encoding standard maintained by the Unicode Consortium designed to support the use of text written in all of the world's major writing systems. Version 15 of the standard defines 149186 characters and 161 scripts used in various ordinary, literary, academic, and technical contexts. Many common characters, including numerals, punctuation, and other symbols, are unified within the standard and are not treated as specific to any given writing system. Unicode encodes thousands of emoji, with the continued development thereof conducted by the Consortium as a part of the standard. Moreover, the widespread adoption of Unicode was in large part responsible for the initial popularization of emoji outside of Japan. Unicode is ultimately capable of encoding more than 1.1 million characters.

In internationalization, CJK characters is a collective term for the Chinese, Japanese, and Korean languages, all of which include Chinese characters and derivatives in their writing systems, sometimes paired with other scripts. Collectively, the CJK characters often include Hànzì in Chinese, Kanji and Kana in Japanese, and Hanja and Hangul in Korean. Vietnamese can be included, making the abbreviation CJKV, as Vietnamese historically used Chinese characters in which they were known as chữ Hán and chữ Nôm in Vietnamese.

UTF-32 (32-bit Unicode Transformation Format) is a fixed-length encoding used to encode Unicode code points that uses exactly 32 bits (four bytes) per code point (but a number of leading bits must be zero as there are far fewer than 232 Unicode code points, needing actually only 21 bits). UTF-32 is a fixed-length encoding, in contrast to all other Unicode transformation formats, which are variable-length encodings. Each 32-bit value in UTF-32 represents one Unicode code point and is exactly equal to that code point's numerical value.

Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a feature shared in common by written Chinese (hanzi), Japanese (kanji), Korean (hanja) and Vietnamese.

ISO/IEC 2022Information technology—Character code structure and extension techniques, is an ISO/IEC standard in the field of character encoding. Originating in 1971, it was most recently revised in 1994.

The Hong Kong Supplementary Character Set is a set of Chinese characters – 4,702 in total in the initial release—used in Cantonese, as well as when writing the names of some places in Hong Kong.

<span class="mw-page-title-main">ArmSCII</span> Set of obsolete single-byte character encodings

ArmSCII or ARMSCII is a set of obsolete single-byte character encodings for the Armenian alphabet defined by Armenian national standard 166–9. ArmSCII is an acronym for Armenian Standard Code for Information Interchange, similar to ASCII for the American standard. It has been superseded by the Unicode standard.

The Ideographic Research Group (IRG), formerly called the Ideographic Rapporteur Group, is a subgroup of Working Group 2 (WG2) of ISO/IEC JTC 1/SC 2, the subcommittee of the Joint Technical Committee of ISO and IEC which is responsible for developing standards within the field of coded character sets. IRG is composed of experts from China, Japan, South Korea, Vietnam and other countries and regions that use Han characters, as well as experts representing the Unicode Consortium. The group is responsible for coordinating the addition of new CJK unified ideographs to the Universal Multiple-Octet Coded Character Set and the Unicode Standard. The group meets twice a year for 4-5 days each time, and reports its activity to the subsequent meeting of WG2.

The Chinese character description languages are several proposed languages to most accurately and completely describe Chinese characters and information such as their list of components, list of strokes, their order, and the location of each of them on a background empty square. They are designed to overcome the inherent lack of information within a bitmap description. This enriched information can be used to identify variants of characters that are unified into one code point by Unicode and ISO/IEC 10646, as well as to provide an alternative form of representation for rare characters that do not yet have a standardized encoding in Unicode or ISO/IEC 10646. Many aim to work for Kaishu style and Song style, as well as to provide the character's internal structure which can be used for easier look-up of a character by indexing the character's internal make-up and cross-referencing among similar characters.

A Unicode font is a computer font that maps glyphs to code points defined in the Unicode Standard. The vast majority of modern computer fonts use Unicode mappings, even those fonts which only include glyphs for a single writing system, or even only support the basic Latin alphabet. Fonts which support a wide range of Unicode scripts and Unicode symbols are sometimes referred to as "pan-Unicode fonts", although as the maximum number of glyphs that can be defined in a TrueType font is restricted to 65,535, it is not possible for a single font to provide individual glyphs for all defined Unicode characters. This article lists some widely used Unicode fonts that support a comparatively large number and broad range of Unicode characters.

The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. During the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Unicode 15.0, Unicode defines a total of 97,058 characters.

JIS X 0212 is a Japanese Industrial Standard defining a coded character set for encoding supplementary characters for use in Japanese. This standard is intended to supplement JIS X 0208. It is numbered 953 or 5049 as an IBM code page.

KPS 9566 is a North Korean standard specifying a character encoding for the Chosŏn'gŭl (Hangul) writing system used for the Korean language. The edition of 1997 specified an ISO 2022-compliant 94×94 two-byte coded character set. Subsequent editions have added additional encoded characters outside of the 94×94 plane, in a manner comparable to UHC or GBK.

The Universal Coded Character Set is a standard set of characters defined by the international standard ISO/IEC 10646, Information technology — Universal Coded Character Set (UCS), which is the basis of many character encodings, improving as characters from previously unrepresented typing systems are added.

CJK Symbols and Punctuation is a Unicode block containing symbols and punctuation used for writing the Chinese, Japanese and Korean languages. It also contains one Chinese character.

Hangul Syllables is a Unicode block containing precomposed Hangul syllable blocks for modern Korean. The syllables can be directly mapped by algorithm to sequences of two or three characters in the Hangul Jamo Unicode block:

Enclosed CJK Letters and Months is a Unicode block containing circled and parenthesized Katakana, Hangul, and CJK ideographs. Also included in the block are miscellaneous glyphs that would more likely fit in CJK Compatibility or Enclosed Alphanumerics: a few unit abbreviations, circled numbers from 21 to 50, and circled multiples of 10 from 10 to 80 enclosed in black squares.

ISO/IEC JTC 1/SC 2 Coded character sets is a standardization subcommittee of the Joint Technical Committee ISO/IEC JTC 1 of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC), that develops and facilitates standards within the field of coded character sets. The international secretariat of ISO/IEC JTC 1/SC 2 is the Japanese Industrial Standards Committee (JISC), located in Japan. SC 2 is responsible for the development of the Universal Coded Character Set which is the international standard corresponding to the Unicode Standard.

Hangul, Hangul Supplementary-A, and Hangul Supplementary-B were character blocks that existed in Unicode 1.0 and 1.1, and ISO/IEC 10646-1:1993. These blocks encoded precomposed modern Hangul syllables. These three Unicode 1.x blocks were deleted and superseded by the new Hangul Syllables block (U+AC00–U+D7AF) in Unicode 2.0 and ISO/IEC 10646-1:1993 Amd. 5 (1998), and are now occupied by CJK Unified Ideographs Extension A and Yijing Hexagram Symbols. Moving or removing existing characters has been prohibited by the Unicode Stability Policy for all versions following Unicode 2.0, so the Hangul Syllables block introduced in Unicode 2.0 is immutable.

References

  1. "OGCIO : What is the ISO 10646 International Standard". www.ogcio.gov.hk.