Cangjie input method

Last updated

[in translation]
In terms of output: The output and input, in fact, [form] an integrated whole; there is no reason that [they should be] dogmatically separated into two different facilities.… This is in fact necessary.…

In this early system, when the user types "yk", for example, to get the Chinese character , the Cangjie codes do not get converted to any character encoding and the actual string "yk" is stored. The Cangjie code for each character (a string of 1 to 5 lowercase letters plus a space) was the encoding of that particular character.

Demonstration of character generator Mingzhu's capability to generate characters according to the codes. The first character is  ([?] 
Shi Ta ), which denotes a kind of soup in Xuzhou cuisine. Mingzhu xiaoziku1.PNG
Demonstration of character generator Mingzhu's capability to generate characters according to the codes. The first character is 𮨻 (⿰飠它), which denotes a kind of soup in Xuzhou cuisine.

A particular "feature" of this early system is that, if one sends random lowercase words to it, the character generator will attempt to construct Chinese characters according to the Cangjie decomposition rules, sometimes causing strange, unknown characters to appear. This unintended feature, "automatic generation of characters", is described in the manual and is responsible for producing more than 10,000 of the 15,000 characters that the system can handle. The name Cangjie, evocative of the creation of new characters, was indeed apt for this early version of Cangjie.

The presence of the integrated character generator also explains the historical necessity for the existence of the "X" key, which is used for the disambiguation of decomposition collisions: because characters are "chosen" when the codes are "output", every character that can be displayed must in fact have a unique Cangjie decomposition. It would not make sense—nor would it be practical—for the system to provide a choice of candidate characters when a random text file is displayed, as the user would not know which of the candidates is correct.

Issues

Cangjie was designed to be an easy-to-use system to help promote the use of Chinese computing. However, many users find Cangjie is difficult to learn and use, with many difficulties caused by poor instruction.[ citation needed ]

Perceived difficulties

  • In order to input using Cangjie, knowledge of both the names of the radicals as well as their auxiliary shapes is required. It is common to find tables of the Cangjie radicals with their auxiliary shapes taped onto the monitors of computer users.
  • One must also be familiar with the decomposition rules, lack of knowledge of which results in increased difficulty in typing the intended characters.
  • The user cannot type a character that they have forgotten how to write (a problem with all non-phonetic based input methods).

With enough practice, users can overcome the above problems. Typical touch-typists can type Chinese at 25 characters per minute (cpm), or better, using Cangjie, despite having difficulty remembering the list of auxiliary shapes or the decomposition rules. Experienced Cangjie typists can reportedly attain a typing speed from 60 cpm to over 200 cpm.[ citation needed ]

According to Chen Minzheng, his teaching experience at Longtian Elementary School in Taitung in 1990, the average typing speed of children was 90 words per minute, and some children even reached more than 130 words per minute. [5] [ better source needed ]

Limitations in implementation

The decomposition of a character depends on a predefined set of "standard shapes" (標準字形). However, as many variations of Cangjie exist in different countries, the standard shape of a certain character in Cangjie is not always the one the user has learnt before. Learning Cangjie then entails learning not only Cangjie itself but also unfamiliar standard shapes for some characters. The Cangjie input method editor (IME) does not handle mistakes in decomposition except by informing the user (usually by beeping) that there is a mistake. However, Cangjie is originally designed to assign different codes to different variants of a character. For example, in the Cangjie provided on Windows, the code for is YHHQM, which corresponds not to the shape of this character but to another variant, . This is a problem resulting from the implementation of Cangjie on Windows. In the original Cangjie, should be YKMHM (the first part is ) while is YHHQM (the first part is ).

Punctuation marks are not geometrically decomposed, but rather given predefined codes that begin with ZX followed by a string of three letters related to the ordering of the characters in the Big5 code. (This set of codes was added to Cangjie on the traditional Chinese version of Windows 95. On Windows 3.1, Cangjie did not have a set of codes for punctuation marks.) Typing punctuation marks in Cangjie thus becomes a frustrating exercise involving either memorization or pick-and-peck. However, this is solved on modern systems through accessing a virtual keyboard on screen (On Windows, this is activated by pressing Ctrl + Alt + comma key).

Commonly-made errors include not considered as alternative codes. For example, if one does not decompose from top to bottom into YHS, but instead type YSH according to stroke order, Cangjie does not return the character as a choice.

Since Cangjie requires all 26 keys of the QWERTY keyboard, it cannot be used to input Chinese characters on feature phones, which have only a 12-key keypad. Alternative input methods, such as Zhuyin, 5-stroke (or 9-stroke by Motorola), and the Q9 input method, are used instead.

Versions

The Cangjie input method is commonly said to have gone through five generations (commonly referred to as "versions" in English), each of which is slightly incompatible with the others. Currently, version 3 (第三代倉頡) is the most common and supported natively by Microsoft Windows. Version 5 (第五代倉頡), supported by the Free Cangjie IME and previously the only Cangjie supported by SCIM, represents a significant minority method and is supported by iOS.

The early Cangjie system supported by the Zero One card on the Apple II was Version 2; Version 1 was never released.

The Cangjie input method supported on the classic Mac OS resembles both Version 3 and Version 5.

Version 5, like the original Cangjie input method, was created directly by Chu. He had hoped that the release of Version 5, originally slated to be Version 6, would bring an end to the "more than ten versions of Cangjie input method" (slightly incompatible versions created by different vendors).

Version 6 has not yet been released to the public, but is being used to create a database which can accurately store every historical Chinese text.

Variants

Most modern implementations of Cangjie input method editors (IME) provide various convenient features:

Besides the wildcard key, many of these features are convenient for casual users but unsuitable for touch-typists because they make the Cangjie IME unpredictable.

There have also been various attempts to "simplify" Cangjie one way or another:

Applications

Many researchers have discussed ways to decompose Chinese characters into their major components, and tried to build applications based on the decomposition system. The idea can be referred to as the study of the Genes of Chinese Characters  [ zh ]. Cangjie codes offer a basis for such an endeavour. Academia Sinica in Taiwan [6] and Jiaotong University in Shanghai [7] have similar projects as well.

One direct application of the use of decomposed characters is the possibility of computing the similarities between different Chinese characters. [8] The Cangjie input method offers a good starting point for this kind of application. By relaxing the limit of five codes for each Chinese character and adopting more detailed Cangjie codes, visually similar characters can be found by computation. Integrating this with pronunciation information enables computer-assisted learning of Chinese characters. [9]

See also

Notes

Cangjie input method
Cang Jie Shu Ru Fa Chai Ma .jpg
Coding of "倉頡輸入法" (i.e. Cangjie method) in traditional Chinese characters

Related Research Articles

Several input methods allow the use of Chinese characters with computers. Most allow selection of characters based either on their pronunciation or their graphical shape. Phonetic input methods are easier to learn but are less efficient, while graphical methods allow faster input, but have a steep learning curve.

<span class="mw-page-title-main">Wubi method</span> Chinese character input method

The Wubizixing input method, often abbreviated to simply Wubi or Wubi Xing, is a Chinese character input method primarily for inputting simplified Chinese and traditional Chinese text on a computer. Wubi should not be confused with the Wubihua (五笔画) method, which is a different input method that shares the categorization into five types of strokes.

<span class="mw-page-title-main">Four-corner method</span> Method of encoding Chinese characters

The four-corner method or four-corner system is a character-input method used for encoding Chinese characters into either a computer or a manual typewriter, using four or five numerical digits per character.

<span class="mw-page-title-main">Input method</span> Method for generating non-native characters on devices

An input method is an operating system component or program that enables users to generate characters not natively available on their input devices by using sequences of characters that are available to them. Using an input method is usually necessary for languages that have more graphemes than there are keys on the keyboard.

The CKC Chinese Input System is a Chinese input method for computers that uses the four corner method to encode characters.

<span class="mw-page-title-main">Cangjie</span> Legendary ancient Chinese figure

Cangjie is a legendary ancient Chinese figure said to have been an official historian of the Yellow Emperor and the inventor of Chinese characters. Legend has it that he had four eyes, and that when he invented the characters, the deities and ghosts cried and the sky rained millet. He is considered a legendary rather than historical figure, or at least not considered to be the sole inventor of Chinese characters. Cangjie was the eponym for the Cangjiepian proto-dictionary, the Cangjie method of inputting characters into a computer, and a Martian rock visited by the Mars rover Spirit, and named by the rover team.

Simplified Cangjie, known as Quick or Sucheng is a stroke based keyboard input method based on the Cangjie IME but simplified with select lists. Unlike full Cangjie, the user enters only the first and last keystrokes used in the Cangjie system, and then chooses the desired character from a list of candidate Chinese characters that pops up. This method is popular in Hong Kong and Macau, the latter in particular.

<span class="mw-page-title-main">Dayi method</span> Chinese character input method based on components

Dayi is a system for entering Chinese characters on a standard QWERTY keyboard using a set of 46 character components. A character is built by combining up to four of the 46 characters, using a system similar to that of Cangjie, but is decomposed in stroke order instead of in geometric shape in Cangjie.

Chu Bong-Foo is the inventor of the Tsang-chieh (Cangjie), a widely used Chinese input method. His renowned input method, created in 1976 and given to the public domain in 1982, has sped up the computerization of Chinese society. Chu spent his childhood in Taiwan, and has worked in Brazil, United States, Taiwan, Shenzhen and Macau.

Several systems have been proposed for describing the internal structure of Chinese characters, including their strokes, components, and the stroke order, and the location of each in the character's ideal square. This information is useful for identifying variants of characters that are unified into one code point by Unicode and ISO/IEC 10646, as well as to provide an alternative form of representation for rare characters that do not yet have a standardized encoding in Unicode. Many aim to work for regular script, as well as to provide the character's internal structure which can be used for easier look-up of a character by indexing the character's internal make-up and cross-referencing among similar characters.

<span class="mw-page-title-main">Pinyin input method</span> Method of entering Chinese characters into a computer

The pinyin method refers to a family of input methods based on the pinyin method of romanization.

<span class="mw-page-title-main">Japanese input method</span> Methods used to input Japanese characters on a computer

Japanese input methods are used to input Japanese characters on a computer.

OpenVanilla (OV) is an open-source text-entry and processing architecture designed to enhance the text-entry experience across different operating systems. Initially developed to address the need for alternative input methods on Applesystems and cater to Windows users transitioning to macOS, OV has since expanded its compatibility to include Microsoft Windows and Linux/FreeBSD environments through SCIM integration.

<span class="mw-page-title-main">Zhengma method</span> Stroke-based Chinese character input method

The Zhengma Input Method is a Chinese language input method. The primary goal of Zhengma design is compatibility with different types of characters, scalability and ease of use, especially for people who are experienced with how ideographs are formed. For these reasons this input method is used more by scholars of the Chinese language or people who need to use both traditional and simplified Chinese. This input method is one of two stroke-based input method that are included with Microsoft Windows.

<span class="mw-page-title-main">Unicode input</span> Input characters using their Unicode code points

Unicode input is the insertion of a specific Unicode character on a computer by a user; it is a common way to input characters not directly supported by a physical keyboard. Unicode characters can be produced either by selecting them from a display or by typing a certain sequence of keys on a physical keyboard. In addition, a character produced by one of these methods in one web page or document can be copied into another. In contrast to ASCII's 96 element character set, Unicode encodes hundreds of thousands of graphemes (characters) from almost all of the world's written languages and many other signs and symbols besides.

Wang Yongmin is a Chinese programmer, who developed Wubi, a very fast input method for entering Chinese characters using a standard Latin keyboard. Currently he is the president of Wangma, a Beijing-based software development company.

Menksoft is an IT company in Inner Mongolia, who developed Menksoft Mongolian IME, the most widely used Mongolian language input method editor (IME) in Inner Mongolia.

<span class="mw-page-title-main">Intelligent Input Bus</span> Framework for multilingual input

The Intelligent Input Bus is an input method (IM) framework for multilingual input in Unix-like operating-systems. The name "Bus" comes from its bus-like architecture.

The Vietnamese language is written with a Latin script with diacritics which requires several accommodations when typing on phone or computers. Software-based systems are a form of writing Vietnamese on phones or computers with software that can be installed on the device or from third-party software such as UniKey. Telex is the oldest input method devised to encode the Vietnamese language with its tones. Other input methods may also include VNI and VIQR. VNI input method is not to be confused with VNI code page.

Chinese character IT is the information technology for computer processing of Chinese characters. While the English writing system uses a few dozen different characters, Chinese language needs a much larger character set. There are over ten thousand characters in the Xinhua Dictionary. In the Unicode multilingual character set of 149,813 characters, 98,682 are Chinese. That means computer processing of Chinese characters is the toughest among other languages.

References

  1. A spelling used as filename on ETen Chinese System.
  2. Chu, Chyi-Hwa (朱麒華) (1 February 2012). "教育科技的專利與普及". National Academy for Educational Research e-Newsletter (in Chinese). Archived from the original on 25 August 2022. Retrieved 14 December 2022.
  3. Chu Bong-foo (朱邦復). "智慧之旅". 開放文學 (in Traditional Chinese). Archived from the original on 19 October 2017. Retrieved 8 June 2017.
  4. "倉頡取碼規則及方法" [Cangjie code retrieval rules and methods]. Friends of Cangjie (in Chinese). 1997–2002. Archived from the original on 1 January 2019. Retrieved 2 October 2020.
  5. https://www.chinesecj.com/forum/forum.php?mod=attachment&aid=MTIwNnw1MjMxNmQwMXwxNjg2OTYyNTE4fDB8MTUwMjQ%3D page 58
  6. "漢字構形資料庫" [Chinese Character Configuration Database]. Chinese Document Processing Lab (in Chinese). 2013. Archived from the original on 27 July 2020. Retrieved 2 October 2020.
  7. 上海交通大學漢字編碼組,上海漢語拼音文字研究組編著。漢字信息字典。北京市科學出版社,1988。
  8. 宋柔,林民,葛詩利。漢字字形計算及其在校對系統中的應用,小型微型計算機系統,第29卷第10期,第1964至1968頁,2008。
  9. Liu, Chao-Lin; Lai, Min-Hua; Tien, Kan-Wen; Chuang, Yi-Hsuan; Wu, Shih-Hung; Lee, Chia-Ying (2011). "Visually and phonologically similar characters in incorrect Chinese words: Analyses, identification, and applications". ACM Transactions on Asian Language Information Processing. 10 (2): 1–39. doi:10.1145/1967293.1967297. S2CID   7288710.