Stroke-based sorting, also called stroke-based ordering or stroke-based order, is one of the five sorting methods frequently used in modern Chinese dictionaries, the others being radical-based sorting, pinyin-based sorting, bopomofo and the four-corner method. [1] In addition to functioning as an independent sorting method, stroke-based sorting is often employed to support the other methods. [2] For example, in Xinhua Dictionary (新华字典), Xiandai Hanyu Cidian (现代汉语词典) and Oxford Chinese Dictionary, [3] stroke-based sorting is used to sort homophones in Pinyin sorting, while in radical-based sorting it helps to sort the radical list, the characters under a common radical, as well as the list of characters difficult to lookup by radicals.
In stroke-based sorting, Chinese characters are ordered by different features of strokes, including stroke counts, stroke forms, stroke orders, stroke combinations, stroke positions, etc. [4]
This method arranges characters according to their numbers of strokes ascendingly. A character with less strokes is put before those of more strokes. For example, the different characters in "漢字筆劃, 汉字笔画" (Chinese character strokes) are sorted into "汉(5)字(6)画(8)笔(10)[筆(12)畫(12)]漢(14)", where stroke counts are put in brackets. (Please note that both 筆 and 畫 are of 12 strokes and their order is not determinable by stroke-count sorting.).
Stroke-count sorting was first used in Zihui to arrange the radicals and the characters under each radical when the dictionary was published in 1615 [5] It was also used in Kangxi Chinese Character Dictionary when the dictionary was first compiled in 1710s. [5]
This is a combination of stroke-count sorting and stroke-order sorting. Characters are first arranged by stroke-counts in ascending order. Then Stroke-order sorting is employed to sort characters with the same number of strokes. The characters are firstly arranged by their first strokes according to an order of stroke form groups, such as “heng (横, ㇐), shu (竖, ㇑), pie (撇, ㇓), dian (点, ㇔), zhe (折, ㇕)”, or “dian (点), heng (横), shu (竖), pie (撇), zhe (折)”. If the first strokes of two characters belong to the same group, then sort by their second strokes in a similar way, and so on.
In our example of the previous section, both 筆 and 畫 are of 12 strokes. 筆 starts with stroke "㇓" of the pie (撇) group, and 畫 starts with "㇕" of the zhe (折) group, and pie is before zhe in the groups order, so 筆 comes before 畫. Hence the different characters in "汉字笔画, 漢字筆劃" are finally sorted into "汉(5)字(6)画(8)笔(10)筆(12㇓)畫(12㇕)漢(14)", where each character is put at its unique position.
Stroke-count-stroke-order sorting was used in Xinhua Dictionary and Xiandai Hanyu Cidian before the national standard for stroke-based sorting was released in 1999.
The Standard of GB13000.1 Character Set Chinese Character Order (Stroke-Based Order) (GB13000.1字符集汉字字序(笔画序)规范)) [6] is a standard released by the National Language Commission of China in 1999 for Chinese characters sorting by strokes. This is an enhanced version of the traditional stroke-count–stroke-order sorting.
According to this standard,
This standard has been employed by the new editions of Xinhua Dictionary [7] and Xiandai Hanyu Cidian. [8]
YES is a simplified stroke-based sorting method free of stroke counting and grouping, without comprise in accuracy. Briefly speaking, YES arranges Chinese characters according to their stroke orders and an "alphabet" of 30 strokes:
㇐ ㇕ ㇅ ㇎ ㇡ ㇋ ㇊ ㇍ ㇈ ㇆ ㇇ ㇌ ㇀ ㇑ ㇗ ㇞ ㇉ ㄣ ㇙ ㇄ ㇟ ㇚ ㇓ ㇜ ㇛ ㇢ ㇔ ㇏ ㇂
built on the basis of Unicode CJK strokes. [9] [10]
To compare the sort-order of two characters, one expands each character into a string of strokes and compare them using the sort-order of the 30 strokes, much like one sorts two words in a dictionary using the sort-order of letters. Equivalently, one first decides whether the first stroke is sufficient to result in a sort (for example, because 汉 starts with ㇔ and 笔 starts with ㇚, 笔 sorts before 汉); if they happen to be identical, then one moves on to the second stroke (for example, 汉 expands to ㇔㇔... and 字 expands to ㇔㇑..., hence 字 sorts before 汉).
The YES order of the different characters in "汉字笔画, 漢字筆劃" is "画畫筆笔字漢汉", where each character is put at its unique position.
YES sorting has been applied to the indexing of all the characters in Xinhua Zidian and Xiandai Hanyu Cidian. [10]
All of the aforementioned examples describe the sorting of single characters. To sort two words that consists of multiple characters:
This method is used in the YES-CEDICT Chinese Dictionary, using YES for character comparison. [11]
The four-corner method or four-corner system is a character-input method used for encoding Chinese characters into either a computer or a manual typewriter, using four or five numerical digits per character.
The Stroke Count Method, Wubihua method, Stroke input method or Bihua IME is a relatively simple Chinese input method for writing text on a computer or a mobile phone. It is based on the stroke order of a word, not pronunciation. It uses five or six buttons, and is often placed on a numerical keypad. Although it is possible to input Traditional Chinese characters with this method, this method is often associated with Simplified Chinese characters. The Wubihua method should not be confused with the Wubi method.
The Kangxi radicals, also known as Zihui radicals, are a set of 214 radicals that were collated in the 18th-century Kangxi Dictionary to aid categorization of Chinese characters. They are primarily sorted by stroke count. They are the most popular system of radicals for dictionaries that order characters by radical and stroke count. They are encoded in Unicode alongside other CJK characters, under the block "Kangxi radicals", while graphical variants are included in the block "CJK Radicals Supplement".
Xiandai Hanyu Cidian, also known as A Dictionary of Current Chinese or Contemporary Chinese Dictionary, is an important one-volume dictionary of Standard Mandarin Chinese published by the Commercial Press, now into its 7th (2016) edition. It was originally edited by Lü Shuxiang and Ding Shengshu as a reference work on modern Standard Mandarin Chinese. Compilation started in 1958 and trial editions were issued in 1960 and 1965, with a number of copies printed in 1973 for internal circulation and comments, but due to the Cultural Revolution the final draft was not completed until the end of 1977, and the first formal edition was not published until December 1978. It was the first People's Republic of China dictionary to be arranged according to Hanyu Pinyin, the phonetic standard for Standard Mandarin Chinese, with explanatory notes in simplified Chinese. The subsequent second through seventh editions were respectively published in 1983, 1996, 2002, 2005, 2012 and 2016.
The Table of Indexing Chinese Character Components is a lexicographic tool used to order the Chinese characters in mainland China. The specification is also known as GF 0011-2009.
Xiandai Hanyu Guifan Cidian is a dictionary of Standard Chinese created as part of a proposal in the Eighth Five-year Plan of China. It is similar to Xiandai Hanyu Cidian, but with notable divergences. The third edition has entries for 12,000 characters and 72,000 words, with over 80,000 example usages.
Modern Chinese characters are the Chinese characters used in modern languages, including Chinese, Japanese, Korean and Vietnamese. Chinese characters are composed of components, which are in turn composed of strokes. The 100 most frequently-used characters cover over 40% of modern Chinese texts. The 1000 most frequently-used characters cover approximately 90% of the texts. There are a variety of novel aspects of modern Chinese characters, including that of orthography, phonology, and semantics, as well as matters of collation and organization and statistical analysis, computer processing, and pedagogy.
Pinyin alphabetical order, also called Pinyin-based order, or Pinyin order in short, is a sound-based Chinese character sorting method which has been used for arrangement of entries in Xinhua Dictionary, Xiandai Hanyu Cidian, Oxford Chinese Dictionary and many other modern dictionaries. In this method, Chinese characters are arranged according to the order of the Latin alphabet adopted in "Chinese Pinyin Scheme".
The GB stroke-based order, full name GB13000.1 Character Set Chinese Character Order , is a standard released by the State Language Commission of China in 1999. It is the current national standard for stroke-based sorting, and has been applied to the arrangement of the List of Commonly Used Standard Chinese Characters (通用规范汉字表), and the new versions of Xinhua Zidian and Xiandai Hanyu Cidian, etc.
Chinese character order, or Chinese character indexing, Chinese character collation and Chinese character sorting, is the way in which a Chinese character set is sorted into a sequence for the convenience of information retrieval. It may also refer to the sequence so produced.
The YES stroke alphabetical order (一二三漢字筆順排檢法), also called YES stroke-order sorting, briefly YES order or YES sorting, is a Chinese character sorting method based on a stroke alphabet and stroke orders. It is a simplified stroke-based sorting method free of stroke counting and grouping.
Strokes are the smallest structural units making up written Chinese characters. In the act of writing, a stroke is defined as a movement of a writing instrument on a writing material surface, or the trace left on the surface from a discrete application of the writing implement. The modern sense of discretized strokes first came into being with the clerical script during the Han dynasty. In the regular script that emerged during the Tang dynasty—the most recent major style, highly studied for its aesthetics in East Asian calligraphy—individual strokes are discrete and highly regularized. By contrast, the ancient seal script has line terminals within characters that are often unclear, making them non-trivial to count.
Stroke number, or stroke count, is the number of strokes of a Chinese character. It may also refer to the number of different strokes in a Chinese character set. Stroke number plays an important role in Chinese character sorting, teaching and computer information processing.
Chinese character forms are the shapes and structures of Chinese characters. They are the physical carriers of written Chinese.
A Chinese character set is a group of Chinese characters. Since the size of a set is the number of elements in it, an introduction to Chinese character sets will also introduce the Chinese character numbers in them.
Stroke Orders of the Commonly Used Standard Chinese Characters is a language standard jointly published by the Ministry of Education and the National Language Commission of China in November, 2020.
CJK Unified Ideographs (YES order) is a list of CJK Unified Ideographs sorted in YES order, a simpler alternative to the traditional Radical order employed in CJK Unified Ideographs (Unicode block), List of CJK Unified Ideographs, part 1, part 2, part 3, part 4.
Stroke orders of CJK Unified Ideographs (YES order) is a list of stroke orders of the CJK Unified Ideographs sorted in YES order, a simpler alternative to the traditional Radical order employed in CJK Unified Ideographs (Unicode block), List of CJK Unified Ideographs, part 1, part 2, part 3, part 4.
Stroke Order Standard of GB 13000.1 Character Set, full name GB 13000.1 Character Set Chinese Character Stroke Order Standard , is a Chinese national standard on the order of strokes in writing Chinese characters. It has stipulated the stroke orders of 20,902 CJK Unified Ideographs. This standard was promulgated by the State Language Commission on October 1, 1999 and implemented on January 1, 2000. It is applicable to Chinese character information processing, dictionary compilation, Chinese character teaching and research, etc.
Pianpangs are components in Chinese character internal structures. A compound character is normally divided into two pianpangs according to their relationship in sounds and meanings. Originally, the left side component of the character was called pian, and the right side pang. Nowadays, it is customary to refer to the left and right, upper and lower, outer and inner parts of compound characters as pianpangs.
{{cite book}}
: CS1 maint: multiple names: authors list (link){{cite journal}}
: CS1 maint: multiple names: authors list (link)