The factual accuracy of parts of this article (those related to handwriting, OCR, and voice recognition) may be compromised due to out-of-date information. The reason given is: Tech advances have vastly improved these input methods.(June 2021) |
Several input methods allow the use of Chinese characters with computers. Most allow selection of characters based either on their pronunciation or their graphical shape. Phonetic input methods are easier to learn but are less efficient, while graphical methods allow faster input, but have a steep learning curve.
Other methods allow users to write characters directly via touchscreens, such as those found on mobile phones and tablet computers.
Chinese input methods predate the computer. One of the early attempts was an electro-mechanical Chinese typewriter Ming kwai (Chinese :明快; pinyin :míngkuài; Wade–Giles :ming-k'uai) which was invented by Lin Yutang, a prominent Chinese writer, in the 1940s. It assigned thirty base shapes or strokes to different keys and adopted a new way of categorizing Chinese characters. But the typewriter was not produced commercially and Lin soon found himself deeply in debt. [2]
Before the 1980s, Chinese publishers hired teams of workers and selected a few thousand type pieces from an enormous Chinese character set. Chinese government agencies entered characters using a long, complicated list of Chinese telegraph codes, which assigned different numbers to each character. During the early computer era, Chinese characters were categorized by their radicals or Pinyin romanization, but results were less than satisfactory.
In the 1970s to 1980s, large keyboards with thousands of keys were used to input Chinese. Each key was mapped to several Chinese characters. To type a character, one pressed the character key and then a selection key. [3] There were also experimental "radical keyboards" with dozens to several hundreds keys. Chinese characters were decomposed into "radicals", each of which was represented by a key. [1] [4] [5] Unwieldy and difficult to use, these keyboards became obsolete after the introduction of Cangjie input method, the first method to use only the standard keyboard and make Chinese touch typing possible. [5]
Chu Bong-Foo invented a common input method in 1976 with his Cangjie input method, which assigns different "roots" to each key on a standard computer keyboard. With this method, for example, the character 日 is assigned to the A key, and 月 is assigned to B. Typing them together will result in the character 明 ("bright").
Despite its steeper learning curve, this method remains popular in Chinese communities that use traditional Chinese characters, such as Hong Kong and Taiwan; the method allows very precise input, thus allowing users to type more efficiently and quickly, provided they are familiar with the fairly complicated rules of the method. It was the first method that allowed users to enter more than a hundred Chinese characters per minute. Its popularity is also helped by its omnipresence on traditional Chinese computer systems, since Chu has given up its patent in 1982, stating that it should be part of the cultural asset. Developers of Chinese systems can adopt it freely, and users do not have the hassle of it being absent on devices with Chinese support. [6] [7] Cangjie input programs supporting a large CJK character set have been developed. [8] [9] [10]
All methods have their strengths and weaknesses. The pinyin method can be learned rapidly but its maximum input rate is limited. The Wubi method takes longer to learn, but expert typists can enter text much more rapidly with it than with phonetic methods. However, Wubi is proprietary, and a version of it has become freely available only after its inventor lost a patent lawsuit in 1997. [11]
Due to these complexities, there is no "standard" method.
In mainland China, pinyin methods such as Sogou Pinyin and Google Pinyin are the most popular. In Taiwan, use of Cangjie, Dayi, Boshiamy, and bopomofo predominate; and in Hong Kong and Macau, the Cangjie is most often taught in schools, while a few schools teach CKC Chinese Input System. [12]
Other methods include handwriting recognition, OCR and speech recognition. The computer itself must first be "trained" before the first or second of these methods are used; that is, the new user enters the system in a special "learning mode" so that the system can learn to identify their handwriting or speech patterns. The latter two methods are used less frequently than keyboard-based input methods and suffer from relatively high error rates, especially when used without proper "training", though higher error rates are an acceptable trade-off to many users.
The user enters pronunciations that are converted into relevant Chinese characters. The user must select the desired character from homophones, which are common in Chinese. Modern systems, such as Sogou Pinyin and Google Pinyin, predict the desired characters based on context and user preferences. For example, if one enters the sounds jicheng, the software will type 繼承 (to inherit), but if jichengche is entered, 計程車 (taxi) will appear.
Various Chinese dialects complicate the system. Phonetic methods are mainly based on standard pinyin, Zhuyin/Bopomofo, and Jyutping in China, Taiwan, and Hong Kong, respectively. Input methods based on other varieties of Chinese, like Hakka or Minnan, also exist.
While the phonetic system is easy to learn, choosing appropriate Chinese characters slows typing speed. Most users report a typing speed of fifty characters per minute, though some reach over one hundred per minute. [13] With some phonetic IMEs (Input Method Editors), in addition to predictive input based on previous conversions, it is possible for users to create custom dictionary entries for frequently used characters and phrases, potentially lowering the number of characters required to evoke it.
Shuangpin (双拼; 雙拼), literally dual spell, is a stenographical phonetic input method based on hanyu pinyin that reduces the number of keystrokes for one Chinese character to two by distributing every vowel and consonant composed of more than one letter to a specific key. In most Shuangpin layout schemes such as Xiaohe, Microsoft 2003 and Ziranma, the most frequently used vowels are placed on the middle layer, reducing the risk of repetitive strain injury.
Shuangpin is supported by a large number of pinyin input software including QQ, Microsoft Bing Pinyin, Sogou Pinyin and Google Pinyin.
The Wubizixing input method, often abbreviated to simply Wubi or Wubi Xing, is a Chinese character input method primarily for inputting simplified Chinese and traditional Chinese text on a computer. Wubi should not be confused with the Wubihua (五笔画) method, which is a different input method that shares the categorization into five types of strokes.
The Linguistic Society of Hong Kong Cantonese Romanization Scheme, also known as Jyutping, is a romanisation system for Cantonese developed in 1993 by the Linguistic Society of Hong Kong (LSHK).
The Cangjie input method is a system for entering Chinese characters into a computer using a standard computer keyboard. In filenames and elsewhere, the name Cangjie is sometimes abbreviated as cj.
The four-corner method or four-corner system is a character-input method used for encoding Chinese characters into either a computer or a manual typewriter, using four or five numerical digits per character.
An input method is an operating system component or program that enables users to generate characters not natively available on their input devices by using sequences of characters that are available to them. Using an input method is usually necessary for languages that have more graphemes than there are keys on the keyboard.
Simplified Cangjie, known as Quick or Sucheng is a stroke based keyboard input method based on the Cangjie IME but simplified with select lists. Unlike full Cangjie, the user enters only the first and last keystrokes used in the Cangjie system, and then chooses the desired character from a list of candidate Chinese characters that pops up. This method is popular in Taiwan and Hong Kong, the latter in particular.
The Stroke Count Method, Wubihua method, Stroke input method or Bihua IME is a relatively simple Chinese input method for writing text on a computer or a mobile phone. It is based on the stroke order of a word, not pronunciation. It uses five or six buttons, and is often placed on a numerical keypad. Although it is possible to input Traditional Chinese characters with this method, this method is often associated with Simplified Chinese characters. The Wubihua method should not be confused with the Wubi method.
Dayi is a system for entering Chinese characters on a standard QWERTY keyboard using a set of 46 character components. A character is built by combining up to four of the 46 characters, using a system similar to that of Cangjie, but is decomposed in stroke order instead of in geometric shape in Cangjie.
Chu Bong-Foo is the inventor of the Tsang-chieh (Cangjie), a widely used Chinese input method. His renowned input method, created in 1976 and given to the public domain in 1982, has sped up the computerization of Chinese society. Chu spent his childhood in Taiwan, and has worked in Brazil, United States, Taiwan, Shenzhen and Macau.
The pinyin method refers to a family of input methods based on the pinyin method of romanization.
OpenVanilla (OV) is a free, open-source text-entry and processing architecture. It includes a collection of popular input methods and text processing filters, serving as a bridge between input methods and the operating system. It was originally designed to offer a better text-entry experience and alternative input methods not found in Apple's built-in set or suit better the needs for Windows "switchers." However, the developers have since worked on a Microsoft Windows port and a bridge between OV and SCIM on the X Window System. The macOS version is compatible with Mac OS X 10.3 (Panther) and Mac OS X 10.4 (Tiger). OV's input methods can also be used through SCIM on Linux or FreeBSD. An experimental Win32 Unicode version is also available.
Google Pinyin IME is a discontinued input method developed by Google China Labs. The tool was made publicly available on April 4, 2007. Aside from Pinyin input, it also includes stroke count method input. As of March 2019, Google Pinyin has been discontinued and the download page tools
Bopomofo, also called zhuyin or occasionally zhuyin fuhao, is a transliteration system for Standard Chinese and other Sinitic languages. It is commonly used in Taiwan. It consists of 37 characters and five tone marks, which together can transcribe all possible sounds in Mandarin Chinese.
Wang Yongmin is a Chinese programmer, who developed Wubi, a very fast input method for entering Chinese characters using a standard Latin keyboard. Currently he is the president of Wangma, a Beijing-based software development company.
The Intelligent Input Bus is an input method (IM) framework for multilingual input in Unix-like operating-systems. The name "Bus" comes from its bus-like architecture.
A keyboard layout is any specific physical, visual, or functional arrangement of the keys, legends, or key-meaning associations (respectively) of a computer keyboard, mobile phone, or other computer-controlled typographic keyboard.
Bengali input methods refer to different systems developed to type the characters of the Bengali script for Bengali language and others, using a typewriter or a computer keyboard.
The Biaoxingma Input Method, also abbreviated to simply Biaoxingma, is a kind of shape-based Chinese character input method invented by Chen Aiwen, an overseas Chinese scholar living in France in the 1980s. Because it is intuitive in the splitting of Chinese characters and has theoretical support in Chinese characters, it had once attracted widespread attention at the beginning of the invention and was listed as a key project in China Torch Project. However, there was afterwards no such influence as Wubi method and Zhengma method in terms of popularization and commercialization.
Chinese character IT is the information technology for computer processing of Chinese characters. While the English writing system uses a few dozen different characters, Chinese language needs a much larger character set. There are over ten thousand characters in the Xinhua Dictionary. In the Unicode multilingual character set of 149,813 characters, 98,682 are Chinese. That means computer processing of Chinese characters is the toughest among other languages.
Chinese computational linguistics is the scientific study and information processing of the Chinese language by means of computers. The purpose is to obtain a better understanding of how the language works and to bring more convenience to language applications. The term Chinese computational linguistics is often employed interchangeably with Chinese information processing, though the former may sound more theoretical while the latter more technical.