Cangjie input method

Cangjie input method
Cangjie input method
	Coding of "倉頡輸入法" (i.e. Cangjie method) in traditional Chinese characters

Last updated January 13, 2026

- ：日月
- ：中土日
- ：日山
- ：日竹月山

The auxiliary shapes of each Cangjie radical have changed slightly across different versions of the Cangjie method. Thus, this is one reason that different versions of the Cangjie method are not completely compatible.

Chu also provided alternate names for some letters according to their characteristics as a mnemonics. They form a rhyme to help learners memorize the letters, each group being in a line:^[5]

Original keys	Mnemonics
日月金木水火土竹戈十大中一弓人心手口尸廿山女田卜	日月金木水火土斜點交叉縱橫鈎人心手口側並仰紐方卜

Keyboard layout

Basic rules

There are several general decomposition rules (拆字規則) that define how to analyze a character to arrive at a Cangjie code, as follows:^[6]

Order of decomposition – left to right, top to bottom, and outside to inside.
Geometrically connected forms (compounds) - identify components and break up the character, i.e. 想→相+心.
- First component (字首) – usually the upper-most or the left-most part according to rule (1) Order of decomposition, i.e. 相.
- The body (字身) – except the first component, i.e. 心.
Number of codes – take at most 5 codes
- For non-geometrically connected forms, take at most 4 codes.
- For geometrically connected forms, take at most 5 codes, 2 from the first component and 3 from the body.
  - if the first component has more than 2 codes, take the first and the last.
  - If the body has more than 3 codes, consider breaking it up further.
    - If it can be broken up into second and third components,
      - take the first code from the second component and the first and last codes from the third.
    - If it cannot be broken up further, take the first, second and last codes.

The rules are subject to various principles:^[7]

Conciseness (精簡) – if multiple ways of decomposition are possible, the shorter decomposition is considered to be correct.
Completeness (完整) – if multiple ways of decomposition with the same length of code are possible, the one that identifies a more complex form first is correct.
Reflection of the form of the radical (字型特徵) – the decomposition should reflect the shape of the radical, meaning (a) using the same code twice or more should be avoided if possible, and (b) the shape of the character should not be "cut" at a corner in the form.
Omission of codes (省略)
- Partial omission (部分省略) – when the number of codes in a complete decomposition exceeds the permitted number of codes, the extra codes are ignored.
- Omission in enclosed forms (包含省略) – when part of the character to be decomposed and the form is an enclosed form, only the shape of the enclosure is decomposed; the enclosed forms are omitted.

Examples

Typing Chinese with Cangjie input method version 5

Typing Chinese with Cangjie input method on an Android device

車; chē; 'vehicle'
- This character is geometrically connected, consisting of a single vertical structure, so we take the first, second, and last Cangjie codes from top to bottom.
- The Cangjie code is thus 十田十 (JWJ), corresponding to the basic shapes of the codes in this example.
謝; xiè; 'to thank', 'to wither'
- This character consists of geometrically unconnected parts arranged horizontally. For the initial decomposition, we treat it as two parts, 言 and 射.
- The first part, 言, is geometrically unconnected from top to bottom; we take the first (亠, auxiliary shape of 卜 Y) and last parts (口, basic shape of 口 R) and arrive at 卜口 (YR).
- The second part is again geometrically unconnected, arranged horizontally. The two parts are 身 and 寸.
  - For the first part of this second part, 身, we take the first and last codes. Both are slants and therefore H; the first and last codes are thus 竹竹 (HH).
  - For the second part of the original second part, 寸, we take only the last part. Because this is geometrically unconnected and consists of two parts, the first part is the outer form while the second part is the dot in the middle. The dot is I, and therefore the last code is 戈 (I).
- The Cangjie code is thus 卜口 (YR) 竹竹 (HH) 戈 (I), or 卜口竹竹戈 (YRHHI).
谢 (simplified version of 謝)
- This example is identical to the example just above, except that the first part is 讠; the first and last codes are 戈 (I) and 女 (V).
- Repeating the same steps as in the above example, we get 戈女 (IV) 竹竹 (HH) 戈 (I), or 戈女竹竹戈 (IVHHI).

Exceptions

Some forms are always decomposed in the same way, whether the rules say they should be decomposed this way or not. The number of such exceptions is small:

Form	Fixed decomposition
Form	Version 2	Version 3	Version 5
門 (door)	日弓 (AN)
目 (eye)	月山 (BU)
鬼 (ghost)	竹戈 (HI)	竹戈 (HI) or HUI	—
几 (small table)	竹山 (HU)	竹弓 (HN)
贏 (win)	—	卜口月月弓 (YRBBN)	卜弓月山金 (YNBUC)
虍 (tiger [radical])	卜心 (YP)
亡 on top of 口 (吂)	卜口 (YR)		卜女口 (YVR)
隹 (fowl)	人土 (OG)
气 (air [radical])	人山 (OU)	人弓 (ON)	人一弓 (OMN)
畿 minus the 田	女戈 (VI)
鬥 (compete)	中弓 (LN)
阝 (mound or city radical)	弓中 (NL)

Some forms cannot be decomposed. They are represented by an X, which is the 難 key on a Cangjie keyboard.^[8]

Form	Fixed decomposition (v5)
臼	竹難 (HX)
與	竹難卜金 (HXYC)
興	竹難月金 (HXBC)
盥	竹難月廿 (HXBT)
姊	女中難竹 (VLXH)
齊	卜難 (YX)
兼	廿難金 (TXC)
鹿	戈難心 (IXP)
身	竹難竹 (HXH)
卍	弓難 (NX)
黽	口難山 (RXU)
龜	弓難山 (NXU)
廌	戈難火 (IXF)
慶	戈難水 (IXE)
淵	水中難中 (ELXL)
肅	中難 (LX)

Early development

Initially, the Cangjie input method was not intended to produce a character in any character set. Instead, it was part of an integrated system consisting of the Cangjie input rules and a Cangjie controller board. This controller board contains character generator firmware, which dynamically generates Chinese characters from Cangjie codes when characters are output, using the hi-res graphics mode of the Apple II. In the preface of the Cangjie user's manual, Chu Bong-Foo wrote in 1982:

[in translation]
In terms of output: The output and input, in fact, [form] an integrated whole; there is no reason that [they should be] dogmatically separated into two different facilities.… This is in fact necessary.…

Demonstration of character generator Mingzhu's capability to generate characters according to the codes. The first character is ([?]
Shi Ta ), which denotes a kind of soup in Xuzhou cuisine. Mingzhu xiaoziku1.PNG — Demonstration of character generator *Mingzhu*'s capability to generate characters according to the codes. The first character is 𮨻 (⿰飠它), which denotes a kind of soup in Xuzhou cuisine.

In this early system, when the user types "yk", for example, to get the Chinese character 文, the Cangjie codes do not get converted to any character encoding and the actual string "yk" is stored. The Cangjie code for each character (a string of 1 to 5 lowercase letters plus a space) was the encoding of that particular character.

A particular "feature" of this early system is that, if one sends random lowercase words to it, the character generator will attempt to construct Chinese characters according to the Cangjie decomposition rules, sometimes causing strange, unknown characters to appear. This unintended feature, "automatic generation of characters", is described in the manual and is responsible for producing more than 10,000 of the 15,000 characters that the system can handle. The name Cangjie, evocative of the creation of new characters, was indeed apt for this early version of Cangjie.

The presence of the integrated character generator also explains the historical necessity for the existence of the "X" key, which is used for the disambiguation of decomposition collisions: because characters are "chosen" when the codes are "output", every character that can be displayed must in fact have a unique Cangjie decomposition. It would not make sense—nor would it be practical—for the system to provide a choice of candidate characters when a random text file is displayed, as the user would not know which of the candidates is correct.

Issues

Steep learning curve

Cangjie was designed to be an easy-to-use system to help promote the use of Chinese computing. However, many users find Cangjie is difficult to learn and use, with many difficulties caused by poor instruction.^{[ citation needed ]}

In order to input using Cangjie, knowledge of both the names of the radicals as well as their auxiliary shapes is required. It is common to find tables of the Cangjie radicals with their auxiliary shapes taped onto the monitors of computer users.
One must also be familiar with the decomposition rules, lack of knowledge of which results in increased difficulty in typing the intended characters.
The user cannot type a character that they have forgotten how to write (a problem with all non-phonetic based input methods).

With enough practice, users can overcome the above problems. Typical touch-typists can type Chinese at 25 characters per minute (cpm), or better, using Cangjie, despite having difficulty remembering the list of auxiliary shapes or the decomposition rules. Experienced Cangjie typists can reportedly attain a typing speed from 60 cpm to over 200 cpm.^{[ citation needed ]}

According to Chen Minzheng, his teaching experience at Longtian Elementary School in Taitung in 1990, the average typing speed of children was 90 words per minute, and some children even reached more than 130 words per minute.^[9]^{[ better source needed ]}

Limitations in implementation

The decomposition of a character depends on a predefined set of "standard shapes" (標準字形). However, as many variations of Cangjie exist in different countries, the standard shape of a certain character in Cangjie is not always the one the user has learnt before. Learning Cangjie then entails learning not only Cangjie itself but also unfamiliar standard shapes for some characters. The Cangjie input method editor (IME) does not handle mistakes in decomposition except by informing the user (usually by beeping) that there is a mistake. However, Cangjie is originally designed to assign different codes to different variants of a character. For example, in the Cangjie provided on Windows, the code for 產 is YHHQM, which corresponds not to the shape of this character but to another variant, 産. This is a problem resulting from the implementation of Cangjie on Windows. In the original Cangjie, 產 should be YKMHM (the first part is 文) while 産 is YHHQM (the first part is 产).

Punctuation marks are not geometrically decomposed, but rather given predefined codes that begin with ZX followed by a string of three letters related to the ordering of the characters in the Big5 code. (This set of codes was added to Cangjie on the traditional Chinese version of Windows 95. On Windows 3.1, Cangjie did not have a set of codes for punctuation marks.) Typing punctuation marks in Cangjie thus becomes a frustrating exercise involving either memorization or pick-and-peck. However, this is solved on modern systems through accessing a virtual keyboard on screen (On Windows, this is activated by pressing Ctrl + Alt + comma key).

Commonly-made errors include not considered as alternative codes. For example, if one does not decompose 方 from top to bottom into YHS, but instead type YSH according to stroke order, Cangjie does not return the character 方 as a choice.

Since Cangjie requires all 26 keys of the QWERTY keyboard, it cannot be used to input Chinese characters on feature phones, which have only a 12-key keypad. Alternative input methods, such as Zhuyin, 5-stroke (or 9-stroke by Motorola), and the Q9 input method, are used instead.

Versions

The Cangjie input method is commonly said to have gone through five generations (commonly referred to as "versions" in English), each of which is slightly incompatible with the others. Currently, version 3 is the most common and supported natively by Microsoft Windows. Version 5, supported by the Free Cangjie IME and previously the only Cangjie supported by SCIM, represents a significant minority method and is supported by iOS, and supported by Microsoft Windows since Windows Vista. Before Windows Vista, Microsoft Windows needs to install HKSCS update to support Cangjie Version 5.^[10]

The early Cangjie system supported by the Zero One card on the Apple II was Version 2; Version 1 was never released.

The Cangjie input method supported on the classic Mac OS resembles both Version 3 and Version 5.

Version 5, like the original Cangjie input method, was created directly by Chu. He had hoped that the release of Version 5, originally slated to be Version 6, would bring an end to the "more than ten versions of Cangjie input method" (slightly incompatible versions created by different vendors).

Version 6 has not yet been released to the public, but is being used to create a database which can accurately store every historical Chinese text.

Variations

Most modern implementations of Cangjie input method editors (IME) provide various convenient features:

Some IMEs list all characters beginning with the code you have typed. For example, if you type A, the system gives you all characters whose Cangjie code begins with A, so that you can select the correct character if it is on the screen; if you type another A, the list is shortened to give all characters whose code begins with AA. Examples of such implementations include the IME in Mac OS X, and the Smart Common Input Method (SCIM).
Some IMEs provide one or more wildcard keys, usually but not always * and/or ?, that allow the user to omit part(s) of the Cangjie code; the system will display a list of matching characters for the user to choose. Examples include the X window Chinese INput XIM server (xcin), the Smart Common Input Method (SCIM), and the IME of the Founder Group (University of Peking) typesetting systems. Microsoft Windows's standard "Changjie" IME allows * to substitute for in-between characters (effectively reducing it to Simplified Cangjie entries), while the "New Changjie" IME allows * as a wildcard anywhere except for the first character.
Some IMEs provide an "abbreviation" feature, where impossible Cangjie codes are interpreted as abbreviations for the Cangjie codes of more than one character. This allows more characters to be input with fewer keys. An example is the Smart Common Input Method (SCIM).
Some IMEs provide an "association" (聯想 lianxiang) feature, where the system anticipates what you are going to type next, and provides you with a list of characters or even phrases associated with what the user has typed. An example is the Microsoft "Changjie" IME.
Some IMEs present the list of candidate characters differently, depending on the frequency of character use (how often that character has been typed by the user). An example is the Cangjie IME in the NJStar Chinese word processor.

Besides the wildcard key, many of these features are convenient for casual users but unsuitable for touch-typists because they make the Cangjie IME unpredictable.

There have also been various attempts to "simplify" Cangjie one way or another:

Simplified Cangjie, also known as quick, 簡易; jiǎnyì or 速成; sùchéng, has the same radicals, auxiliary shapes, decomposition rules, and short list of exceptions as Cangjie, but only the first and last codes are used if more than two codes are required in Cangjie.

Applications

Many researchers have discussed ways to decompose Chinese characters into their major components, and tried to build applications based on the decomposition system. The idea can be referred to as the study of the Genes of Chinese Characters [ zh ]. Cangjie codes offer a basis for such an endeavour. Academia Sinica in Taiwan^[11] and Jiaotong University in Shanghai^[12] have similar projects as well.

One direct application of the use of decomposed characters is the possibility of computing the similarities between different Chinese characters.^[13] The Cangjie input method offers a good starting point for this kind of application. By relaxing the limit of five codes for each Chinese character and adopting more detailed Cangjie codes, visually similar characters can be found by computation. Integrating this with pronunciation information enables computer-assisted learning of Chinese characters.^[14]

References

Citations

↑ A spelling used as filename on ETen Chinese System.
↑ Chu, Chyi-Hwa (朱麒華) (1 February 2012). "教育科技的專利與普及". National Academy for Educational Research e-Newsletter (in Chinese). Archived from the original on 25 August 2022. Retrieved 14 December 2022.
↑ Chu Bong-foo (朱邦復). "智慧之旅". 開放文學 (in Traditional Chinese). Archived from the original on 19 October 2017. Retrieved 8 June 2017.
1 2 "倉頡輸入法/輔助字形 - 维基教科书，自由的教学读本". zh.wikibooks.org (in Chinese). Retrieved 2024-12-06.
↑ 零壹科技股份有限公司 1984 , p. 13
↑ 零壹科技股份有限公司 1984 , p. 2
↑ 零壹科技股份有限公司 1984 , pp. 2–3
↑ "倉頡取碼規則及方法" [Cangjie code retrieval rules and methods]. Friends of Cangjie (in Chinese). 1997–2002. Archived from the original on 1 January 2019. Retrieved 2 October 2020.
↑ https://www.chinesecj.com/forum/forum.php?mod=attachment&aid=MTIwNnw1MjMxNmQwMXwxNjg2OTYyNTE4fDB8MTUwMjQ%3D page 58
↑ "FAQ: How to enable Cantonese characters and Unicode CKJ extensions in Windows :: Pinyin Joe". www.pinyinjoe.com. Retrieved 2025-04-26.
↑ "漢字構形資料庫" [Chinese Character Configuration Database]. Chinese Document Processing Lab (in Chinese). 2013. Archived from the original on 27 July 2020. Retrieved 2 October 2020.
↑ 上海交通大學漢字編碼組,上海漢語拼音文字研究組編著。漢字信息字典。北京市科學出版社，1988。
↑ 宋柔，林民，葛詩利。漢字字形計算及其在校對系統中的應用，小型微型計算機系統，第29卷第10期，第1964至1968頁，2008。
↑ Liu, Chao-Lin; Lai, Min-Hua; Tien, Kan-Wen; Chuang, Yi-Hsuan; Wu, Shih-Hung; Lee, Chia-Ying (2011). "Visually and phonologically similar characters in incorrect Chinese words: Analyses, identification, and applications". ACM Transactions on Asian Language Information Processing. 10 (2): 1–39. doi:10.1145/1967293.1967297. S2CID 7288710.

Sources

零壹科技股份有限公司 (1984). 倉頡第三代中文字母輸入法: 倉頡字母, 部首, 注音三用檢字對照[The Third Generation Cangjie Chinese Input Method: Cangjie, radicals, and Zhuyin Dictionary].
Part of the information from this article comes from the equivalent Chinese-language Wikipedia article
The decomposition rules come from the "Friend of Cangjie — Malaysia" web site at http://www.chinesecj.com/ The site also gives the typing speed of experienced typists and provides software for version 5 of the Cangjie method for Microsoft Windows.
It might be difficult to find specific references to the "not error-forgiving" property of Cangjie. The table at https://web.archive.org/web/20050206223713/http://www.array.com.tw/keytool/compete.htm is one external reference that states this fact.
Input.foruto.com has a brief history of the Cangjie input method as seen by that article's author. Versions 1 and 2 are clearly identified in the article.
Cbflabs.com contains a number of articles written by Chu Bong-Foo, with references not only to the Cangjie input method, but also Chinese language computing in general. Versions 5 and 6 (now referred to as 5) of the Cangjie input method are clearly identified.

External links

Online Cangjie Input Method 網上倉頡輸入法
Chinese Character Database: With Word-formations Phonologically Disambiguated According to the Cantonese Dialect at The Chinese University of Hong Kong Research Centre for Humanities Computing: A Chinese character database covering the entire set of Big-5 Chinese characters (5401 Level 1 and 7652 Level 2 Hanzi) as well as 7 additional ETen Hanzi. Cangjie input codes are shown for each character in the database. Note: The Hong Kong Supplementary Character Set (HKSCS - 2001) is not included in this database.
Mingzhu generator (in Chinese): Chu Bong Foo's page. Includes the executable, sourcecode and instructions. Mingzhu is a Canjie character generator that runs on MS Windows.
Friend of the Cangjie: a Cangjie reference and a place where it is possible to download the Cangjie 5 for various operating systems, and Cangjie's supplementary input code lists for inputting the Simplified characters
CjExplorer: a tool for learning Cangjie. The Cangjie code for a highlighted Chinese character will be displayed when the tool is running.
Overview of the Cang-Jie Method: a resource for English speakers to learn the rules and method of Cangjie

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] A spelling used as filename on ETen Chinese System.

[2] Chu, Chyi-Hwa (朱麒華) (1 February 2012). "教育科技的專利與普及". National Academy for Educational Research e-Newsletter (in Chinese). Archived from the original on 25 August 2022. Retrieved 14 December 2022.

[3] Chu Bong-foo (朱邦復). "智慧之旅". 開放文學 (in Traditional Chinese). Archived from the original on 19 October 2017. Retrieved 8 June 2017.

[:0-4] 1 2 "倉頡輸入法/輔助字形 - 维基教科书，自由的教学读本". zh.wikibooks.org (in Chinese). Retrieved 2024-12-06.

[5] 零壹科技股份有限公司 1984 , p. 13

[6] 零壹科技股份有限公司 1984 , p. 2

[7] 零壹科技股份有限公司 1984 , pp. 2–3

[8] "倉頡取碼規則及方法" [Cangjie code retrieval rules and methods]. Friends of Cangjie (in Chinese). 1997–2002. Archived from the original on 1 January 2019. Retrieved 2 October 2020.

[9] ttps://www.chinesecj.com/forum/forum.php?mod=attachment&aid=MTIwNnw1MjMxNmQwMXwxNjg2OTYyNTE4fDB8MTUwMjQ%3D page 58

[10] "FAQ: How to enable Cantonese characters and Unicode CKJ extensions in Windows :: Pinyin Joe". www.pinyinjoe.com. Retrieved 2025-04-26.

[11] "漢字構形資料庫" [Chinese Character Configuration Database]. Chinese Document Processing Lab (in Chinese). 2013. Archived from the original on 27 July 2020. Retrieved 2 October 2020.

[12] 上海交通大學漢字編碼組,上海漢語拼音文字研究組編著。漢字信息字典。北京市科學出版社，1988。

[13] 宋柔，林民，葛詩利。漢字字形計算及其在校對系統中的應用，小型微型計算機系統，第29卷第10期，第1964至1968頁，2008。

[14] Liu, Chao-Lin; Lai, Min-Hua; Tien, Kan-Wen; Chuang, Yi-Hsuan; Wu, Shih-Hung; Lee, Chia-Ying (2011). "Visually and phonologically similar characters in incorrect Chinese words: Analyses, identification, and applications". ACM Transactions on Asian Language Information Processing. 10 (2): 1–39. doi:10.1145/1967293.1967297. S2CID 7288710.

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

Cangjie input method

Coding of "倉頡輸入法" (i.e. Cangjie method) in traditional Chinese characters
B	月 moon		：月一十：月月月：月月心水：卜月竹土
C	金 gold		：金卜廿山：金弓中竹：卜中弓金：田金
D	木 wood		：木人人：田木：木竹：心木
E	水 water		：戈一水：水戈：水金口：戈十水
F	火 fire		：竹木火：日口火：女火女戈火：一火
G	土 earth		：土卜人：一土月：土口：木土廿戈
Stroke group	H	竹 bamboo (斜 apostrophe)		：竹日弓日：竹日：弓竹尸：竹人日山
	I	戈 dagger axe (點 dot)		：竹手戈：戈弓人：戈月金弓：土戈
	J	十 ten (交 cruciform)		：十口：卜十大尸十：十女：廿十一一
	K	大 big (叉 cross)		：大大大大：大口：卜大：大一人月
	L	中 centre (緃 vertical)		：人中：弓中：中土日：中戈十十
	M	一 one (橫 horizontal)		：日一：尸一尸戈一：一竹日火：竹金一
	N	弓 bow (鈎 hook)		：弓卜女戈：一土中弓：弓竹尸：弓日山：弓人竹廿人
Body parts group	O	人 person		：女戈人：弓日心人：水人田卜：人一一：戈弓人
	P	心 heart		：田心：心竹日：廿金心：十大心：心廿：人戈心：一口心口山
	Q	手 hand		：人一口手：手一弓：弓弓手人：竹手月山：人手
	R	口 mouth		：口弓人：一口：尸口口口：十口中口
Character shapes group	S	尸 corpse		：尸人：尸山：尸一口：尸中尸中；尸十
	T	廿 twenty		：廿一：廿日：廿日十：卜心廿一：月廿：卜廿
	U	山 mountain		：人山：月山：弓木山：廿山月
	V	女 woman		：戈竹一女：一女弓一：竹難女卜女：手一女
	W	田 field		：十田十：田戈口一：田十
	Y	卜 fortune telling		：弓戈卜：卜戈竹山：一中月卜：卜女女女
Collision/ Difficult key*	X	難 difficult	(1) disambiguation of Cangjie code decomposition collisions (2) code for a "difficult-to-decompose" part
Special character key*	Z	重 collision	This key is used for entering special characters (no meaning on its own). In most cases, this key combined with other keys will produce Chinese punctuations (such as 。,、,「」,『』). Note: Some variants use Z as a collision key instead of X. In those systems, Z has the name "collision" (重) and X has the name "difficult" (難); but the use of Z as a collision key is neither in the original Cangjie nor used in the current mainstream implementations. In other variants, Z may have the name "user-defined" (造) or some other name.
Wildcard	Shift + 8 (*)	Wildcard	It can replace any in-between keys. It is useful for unknown guesses when you are sure about the first and last input. E.g. Input 竹*竹 will include: 身, 物, 秒, 第 (in this case, the output is identical to that of Simplified Cangjie.)