Ideographic Research Group

Last updated

The Ideographic Research Group (IRG), formerly called the Ideographic Rapporteur Group, is a subgroup of Working Group 2 (WG2) of ISO/IEC JTC1 Subcommittee 2 (SC2), which is the committee responsible for developing the Universal Coded Character Set (ISO/IEC 10646). IRG is tasked with preparing and reviewing sets of CJK unified ideographs for eventual inclusion in both ISO/IEC 10646 and The Unicode Standard . [1] [2] The IRG is composed of representatives from national standards bodies from China, Japan, South Korea, Vietnam, and other regions that have historically used Chinese characters, as well as experts from liaison organizations such as the SAT Daizōkyō Text Database Committee (SAT), Taipei Computer Association (TCA), and the Unicode Technical Committee (UTC). The group holds two meetings every year lasting 4-5 days each, subsequently reporting its activities to its parent ISO/IEC JTC 1/SC 2 (SC2/WG2) committee.

Contents

History

Ken Lunde, IRG convenor since June 2024 Ken Lunde 2019.jpg
Ken Lunde, IRG convenor since June 2024

The precursor to the IRG was the CJK Joint Research Group (CJK-JRG), established in 1990. In May 1993, this group was re-established as the Ideographic Rapporteur Group (IRG) as a subgroup of WG2. [3] [4] In June 2019, the subgroup acquired its current name. [2]

The first IRG rapporteur was Kato Shigenobu (加藤重信), from 1993 to 1994, followed by Kido Akio (木戸彰夫) from 1994 to 1995. [4] From 1995 to 2004, the IRG rapporteur was Zhang Zhoucai (张轴材), who had been convenor and chief editor of CJK-JRG from 1990 to 1993. From 2004 to 2018 the IRG rapporteur was Hong Kong Polytechnic University professor Lu Qin (陸勤), [1] [5] but in June 2018 the title of "rapporteur" was changed to "convenor", and Lu Qin continued as IRG convenor for another six years. [6] Since June 2024, the IRG convenor has been Ken Lunde. [7]

Overview

IRG is responsible for reviewing proposals to add new CJK unified ideographs to the Universal Multiple-Octet Coded Character Set (ISO/IEC 10646), and equivalently the Unicode Standard, and submitting consolidated proposals for sets of unified ideographs to WG2, which are then processed for encoding in the respective standards by SC2 and the Unicode Technical Committee. [8] [9] National and liaison bodies that have been represented in IRG include China, Hong Kong, Macau, Japan (no longer active), North Korea (no longer active), South Korea, Singapore (no longer active), the Taipei Computer Association (TCA), the United Kingdom, Vietnam, and the Unicode Technical Committee (UTC).

As of Unicode version 15.1, the IRG has been responsible for submitting the following blocks of CJK unified and compatibility ideographs for encoding: [10]

Since 2015, proposed characters submitted by IRG member bodies have been processed in batches called "IRG Working Sets". Each working set undergoes several years of review by IRG experts before official submission of the working set to WG2 as a new block. Once accepted by WG2, the proposed block is processed according to the individual procedures followed by ISO/IEC JTC1 SC2 and the Unicode Technical Committee (UTC). In the case of SC2, this involves balloting of ISO member bodies. [11] The following working sets have been processed by IRG:

WS2015. 5,547 submitted characters which resulted in 4,939 characters encoded in CJK Unified Ideographs Extension G (Unicode version 13.0, March 2020):

WS2017. 5,027 submitted characters which resulted in 4,192 characters encoded in CJK Unified Ideographs Extension H (Unicode version 15.0, September 2022):

WS2021. 4,951 submitted characters which may result in up to 4,302 characters to be encoded in CJK Unified Ideographs Extension J in a future version of Unicode: [14]

WS2024. A total of 4,674 characters were submitted for Working Set 2024 in July 2024 by China, Republic of Korea, SAT, TCA, United Kingdom, UTC, and Vietnam: [4]

Related Research Articles

Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a feature shared in common by written Chinese (hanzi), Japanese (kanji), Korean (hanja) and Vietnamese.

<i>Mojikyō</i> Character encoding scheme

Mojikyō, also known by its full name Konjaku Mojikyō, is a character encoding scheme created to provide a complete index of characters used in the Chinese, Japanese, Korean, Vietnamese Chữ Nôm and other historical Chinese logographic writing systems. The Mojikyō Institute, which published the character set, also published computer software and TrueType fonts to accompany it. The Mojikyō Institute, chaired by Tadahisa Ishikawa (石川忠久), originally had its character set and related software and data redistributed on CD-ROMs sold in Kinokuniya stores.

<span class="mw-page-title-main">Biangbiang noodles</span> Type of Chinese noodles

Biangbiang noodles, alternatively known as youpo chemian in Chinese, are a type of Chinese noodle originating from Shaanxi cuisine. The noodles, touted as one of the "eight curiosities" of Shaanxi (陕西八大怪), are described as being like a belt, owing to their thickness and length.

A Unicode font is a computer font that maps glyphs to code points defined in the Unicode Standard. The vast majority of modern computer fonts use Unicode mappings, even those fonts which only include glyphs for a single writing system, or even only support the basic Latin alphabet. Fonts which support a wide range of Unicode scripts and Unicode symbols are sometimes referred to as "pan-Unicode fonts", although as the maximum number of glyphs that can be defined in a TrueType font is restricted to 65,535, it is not possible for a single font to provide individual glyphs for all defined Unicode characters. This article lists some widely used Unicode fonts that support a comparatively large number and broad range of Unicode characters.

The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. During the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Unicode 15.1, Unicode defines a total of 97,680 characters.

<span class="mw-page-title-main">Ken Lunde</span>

Ken Roger Lunde is an American specialist in information processing for East Asian languages.

KPS 9566 is a North Korean standard specifying a character encoding for the Chosŏn'gŭl (Hangul) writing system used for the Korean language. The edition of 1997 specified an ISO 2022-compliant 94×94 two-byte coded character set. Subsequent editions have added additional encoded characters outside of the 94×94 plane, in a manner comparable to UHC or GBK.

The Universal Coded Character Set is a standard set of characters defined by the international standard ISO/IEC 10646, Information technology — Universal Coded Character Set (UCS), which is the basis of many character encodings, improving as characters from previously unrepresented typing systems are added.

<span class="mw-page-title-main">Taito (kanji)</span> Kanji character

Taito, daito, or otodo is a kokuji written with 84 strokes, and thus the most graphically complex CJK character—collectively referring to Chinese characters and derivatives used in the written Chinese, Japanese, and Korean languages. This rare and complex character graphically places the 36-stroke tai, meaning "cloudy", above the 48-stroke "appearance of a dragon in flight".

Tatsuo Kobayashi is a Japanese web architect who specializes in international standardization.

CJK Unified Ideographs Extension B is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese submitted to the Ideographic Research Group between 1998 and 2000, plus seven gongche characters for kunqu added in Unicode 13.0, and two characters for the Macao Supplementary Character Set added in Unicode 14.0.

CJK Unified Ideographs Extension C is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese submitted to the Ideographic Research Group between 2002 and 2006, plus five "urgently needed" characters added in Unicode versions 14.0 and 15.0, some of which had previously been mistakenly unified with other characters.

CJK Unified Ideographs Extension D is a Unicode block containing uncommon CJK ideographs for Chinese, Japanese, Korean, and Vietnamese, some of which are in current use. Much smaller than most Unicode blocks for CJK unified ideographs, Extension D consists of characters which were submitted to the Ideographic Research Group as "urgently needed characters" between 2006 and 2009. Characters submitted during the same period which were needed less urgently were included in CJK Unified Ideographs Extension E instead.

CJK Compatibility Ideographs is a Unicode block created to contain mostly Han characters that were encoded in multiple locations in other established character encodings, in addition to their CJK Unified Ideographs assignments, in order to retain round-trip compatibility between Unicode and those encodings. However, it also contains 12 unified ideographs sourced from Japanese character sets from IBM.

CJK Unified Ideographs Extension E is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese submitted to the Ideographic Research Group between 2006 and 2013, excluding the characters submitted as "urgently needed" between 2006 and 2009, which were included in CJK Unified Ideographs Extension D.

CJK Unified Ideographs Extension F is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese, as well as more than a thousand Sawndip characters for writing the Zhuang language, which were submitted to the Ideographic Research Group between 2012 and 2015.

International Ideographs Core (IICore) is a subset of up to ten thousand CJK Unified Ideographs characters, which can be implemented on devices with limited memories and capability that make it not feasible to implement the full ISO 10646/Unicode standard.

CJK Unified Ideographs Extension G is a Unicode block containing rare and historic CJK Unified Ideographs for Chinese, Japanese, Korean, and Vietnamese which were submitted to the Ideographic Research Group during 2015. It is the first block to be allocated to the Tertiary Ideographic Plane.

CJK Unified Ideographs Extension H is a Unicode block containing rare and historic CJK Unified Ideographs for Chinese, Japanese, Korean, Sawndip, and Vietnamese submitted to the Ideographic Research Group during 2017.

CJK Unified Ideographs Extension I is a Unicode block comprising CJK Unified Ideographs included in drafts of an amendment to China's GB 18030 standard circulated in 2022 and 2023, which were fast-tracked into Unicode in 2023.

References

  1. 1 2 "ISO/IEC JTC1/SC2/WG2/IRG: Ideographic Rapporteur Group".
  2. 1 2 "Resolutions of the 24th ISO/IEC JTC 1/SC 2 Plenary Meeting, Redmond, WA, US, 2019-06-17 and 21". ISO/IEC JTC 1/SC 2. 24 June 2019. Retrieved 24 June 2019.
  3. The Unicode Consortium (2021). "Han Unification History: Ideographic Rapporteur Group". The Unicode Standard, Version 14.0.0 (PDF). The Unicode Consortium. p. 987. ISBN   978-1-936213-29-0.
  4. 1 2 3 "Ideographic Research Group (ISO/IEC JTC 1/SC 2/WG 2/IRG)" . Retrieved 29 July 2024.
  5. "LU, Qin(Lu Chin)". Archived from the original on 22 September 2020. Retrieved 24 June 2019.
  6. "Resolutions of the 23rd ISO/IEC JTC 1/SC 2 Plenary Meeting, London, UK, 2018-06-18, 22". ISO/IEC JTC 1/SC 2. 28 June 2018. Retrieved 24 June 2019.
  7. "Recommendations from WG 2 meeting 71" (PDF). 14 June 2024. Retrieved 20 July 2024.
  8. "Unicode Standard Annex #45: U-source Ideographs". The Unicode Standard. Unicode Consortium.
  9. "Appendix E: Han Unification History" (PDF). The Unicode Standard. Unicode Consortium. September 2021.
  10. "Ideographic Rapporteur Group". Office of the Government Chief Information Officer.
  11. "FAQ - Chinese and Japanese".
  12. "IRG2133: IRG 2015 Collection Version 1.1 attributes" . Retrieved 2024-05-08.
  13. "IRG Working Set 2017 - Index of Characters" . Retrieved 2024-05-08.
  14. "IRGN2678: WS 2021 V7.0" . Retrieved 2024-05-08.
  15. "IRG Working Set 2021 - Index of Characters" . Retrieved 2024-05-08.

IRG Working Sets

IRG Working Document Series (IWDS)