Stroke-based sorting

Last updated

Stroke-based sorting, also called stroke-based ordering or stroke-based order, is one of the five sorting methods frequently used in modern Chinese dictionaries, the others being radical-based sorting, pinyin-based sorting, bopomofo and the four-corner method. [1] In addition to functioning as an independent sorting method, stroke-based sorting is often employed to support the other methods. [2] For example, in Xinhua Dictionary (新华字典), Xiandai Hanyu Cidian (现代汉语词典) and Oxford Chinese Dictionary, [3] stroke-based sorting is used to sort homophones in Pinyin sorting, while in radical-based sorting it helps to sort the radical list, the characters under a common radical, as well as the list of characters difficult to lookup by radicals.

Contents

In stroke-based sorting, Chinese characters are ordered by different features of strokes, including stroke counts, stroke forms, stroke orders, stroke combinations, stroke positions, etc. [4]

Stroke-count sorting

This method arranges characters according to their numbers of strokes ascendingly. A character with less strokes is put before those of more strokes. For example, the different characters in "漢字筆劃, 汉字笔画" (Chinese character strokes) are sorted into "汉(5)字(6)画(8)笔(10)[筆(12)畫(12)]漢(14)", where stroke counts are put in brackets. (Please note that both 筆 and 畫 are of 12 strokes and their order is not determinable by stroke-count sorting.).

Stroke-count sorting was first used in Zihui to arrange the radicals and the characters under each radical when the dictionary was published in 1615 [5] It was also used in Kangxi Chinese Character Dictionary when the dictionary was first compiled in 1710s. [5]

Stroke-countstroke-order sorting

This is a combination of stroke-count sorting and stroke-order sorting. Characters are first arranged by stroke-counts in ascending order. Then Stroke-order sorting is employed to sort characters with the same number of strokes. The characters are firstly arranged by their first strokes according to an order of stroke form groups, such as “heng (横, ㇐), shu (竖, ㇑), pie (撇, ㇓), dian (点, ㇔), zhe (折, ㇕)”, or “dian (点), heng (横), shu (竖), pie (撇), zhe (折)”. If the first strokes of two characters belong to the same group, then sort by their second strokes in a similar way, and so on.

In our example of the previous section, both 筆 and 畫 are of 12 strokes. 筆 starts with stroke "㇓" of the pie (撇) group, and 畫 starts with "㇕" of the zhe (折) group, and pie is before zhe in the groups order, so 筆 comes before 畫. Hence the different characters in "汉字笔画, 漢字筆劃" are finally sorted into "汉(5)字(6)画(8)笔(10)筆(12㇓)畫(12㇕)漢(14)", where each character is put at its unique position.

Stroke-count-stroke-order sorting was used in Xinhua Dictionary and Xiandai Hanyu Cidian before the national standard for stroke-based sorting was released in 1999.

GB stroke-based order

The Standard of GB13000.1 Character Set Chinese Character Order (Stroke-Based Order) (GB13000.1字符集汉字字序(笔画序)规范)) [6] is a standard released by the National Language Commission of China in 1999 for Chinese characters sorting by strokes. This is an enhanced version of the traditional stroke-countstroke-order sorting.

According to this standard,

  1. Two characters are first sorted by stroke counts.
  2. If they are of the same stroke counts, sort by stroke order (of the five families of heng, shu, pie, dian and zhe).
  3. If the characters are of the same stroke order, they will be sorted by the primary-secondary stroke order.
    • For example, 子 and 孑 each have three strokes and are written, in stroke-order, ㇐㇚㇐ and ㇐㇚㇀. ㇐ and ㇀ both belong to the heng family, so there is a tie under (2). Under (3), ㇐ is considered a primary stroke and sorts before the secondary stroke ㇀. As a result, 子 sorts before 孑.
  4. If two characters are of the same stroke count, stroke order and primary-secondary stroke, then sort them according to their modes of stroke combination. Stroke separation comes before stroke connection, and connection comes before stroke intersection.
    • For example, 八, 人, 乂 all have 2 strokes in the order of ㇓㇏. They sort in the order of 八, 人, 乂, because 八 has separated strokes, 人 has a simple connection, and 乂 has an intersection.

This standard has been employed by the new editions of Xinhua Dictionary [7] and Xiandai Hanyu Cidian. [8]

YES sorting

YES is a simplified stroke-based sorting method free of stroke counting and grouping, without comprise in accuracy. Briefly speaking, YES arranges Chinese characters according to their stroke orders and an "alphabet" of 30 strokes:

㇐ ㇕ ㇅ ㇎ ㇡ ㇋ ㇊ ㇍ ㇈ ㇆ ㇇ ㇌  Cjk m str hpj.svg  ㇀ ㇑ ㇗ ㇞ ㇉ ㄣ ㇙ ㇄ ㇟ ㇚ ㇓ ㇜ ㇛ ㇢ ㇔ ㇏ ㇂ 

built on the basis of Unicode CJK strokes. [9] [10]

To compare the sort-order of two characters, one expands each character into a string of strokes and compare them using the sort-order of the 30 strokes, much like one sorts two words in a dictionary using the sort-order of letters. Equivalently, one first decides whether the first stroke is sufficient to result in a sort (for example, because 汉 starts with ㇔ and 笔 starts with ㇚, 笔 sorts before 汉); if they happen to be identical, then one moves on to the second stroke (for example, 汉 expands to ㇔㇔... and 字 expands to ㇔㇑..., hence 字 sorts before 汉).

The YES order of the different characters in "汉字笔画, 漢字筆劃" is "画畫筆笔字漢汉", where each character is put at its unique position.

YES sorting has been applied to the indexing of all the characters in Xinhua Zidian and Xiandai Hanyu Cidian. [10]

Word-sorting

All of the aforementioned examples describe the sorting of single characters. To sort two words that consists of multiple characters:

This method is used in the YES-CEDICT Chinese Dictionary, using YES for character comparison. [11]

See also

Related Research Articles

<span class="mw-page-title-main">Four-corner method</span> Method of encoding Chinese characters

The four-corner method or four-corner system is a character-input method used for encoding Chinese characters into either a computer or a manual typewriter, using four or five numerical digits per character.

<span class="mw-page-title-main">Stroke count method</span> Chinese character input method

The Stroke Count Method, Wubihua method, Stroke input method or Bihua IME is a relatively simple Chinese input method for writing text on a computer or a mobile phone. It is based on the stroke order of a word, not pronunciation. It uses five or six buttons, and is often placed on a numerical keypad. Although it is possible to input Traditional Chinese characters with this method, this method is often associated with Simplified Chinese characters. The Wubihua method should not be confused with the Wubi method.

The Kangxi radicals, also known as Zihui radicals, are a set of 214 radicals that were collated in the 18th-century Kangxi Dictionary to aid categorization of Chinese characters. They are primarily sorted by stroke count. They are the most popular system of radicals for dictionaries that order characters by radical and stroke count. They are encoded in Unicode alongside other CJK characters, under the block "Kangxi radicals", while graphical variants are included in the block "CJK Radicals Supplement".

<i>Xiandai Hanyu Cidian</i> Authoritative one-volume Chinese-language dictionary

Xiandai Hanyu Cidian, also known as A Dictionary of Current Chinese or Contemporary Chinese Dictionary, is an important one-volume dictionary of Standard Mandarin Chinese published by the Commercial Press, now into its 7th (2016) edition. It was originally edited by Lü Shuxiang and Ding Shengshu as a reference work on modern Standard Mandarin Chinese. Compilation started in 1958 and trial editions were issued in 1960 and 1965, with a number of copies printed in 1973 for internal circulation and comments, but due to the Cultural Revolution the final draft was not completed until the end of 1977, and the first formal edition was not published until December 1978. It was the first People's Republic of China dictionary to be arranged according to Hanyu Pinyin, the phonetic standard for Standard Mandarin Chinese, with explanatory notes in simplified Chinese. The subsequent second through seventh editions were respectively published in 1983, 1996, 2002, 2005, 2012 and 2016.

The Table of Indexing Chinese Character Components is a lexicographic tool used to order the Chinese characters in mainland China. The specification is also known as GF 0011-2009.

<i>Xiandai Hanyu Guifan Cidian</i> Chinese dictionary

Xiandai Hanyu Guifan Cidian is a dictionary of Standard Chinese created as part of a proposal in the Eighth Five-year Plan of China. It is similar to Xiandai Hanyu Cidian, but with notable divergences. The third edition has entries for 12,000 characters and 72,000 words, with over 80,000 example usages.

Modern Chinese characters are the Chinese characters used in modern languages, including Chinese, Japanese, Korean and Vietnamese. Chinese characters are composed of components, which are in turn composed of strokes. The 100 most frequently-used characters cover over 40% of modern Chinese texts. The 1000 most frequently-used characters cover approximately 90% of the texts. There are a variety of novel aspects of modern Chinese characters, including that of orthography, phonology, and semantics, as well as matters of collation and organization and statistical analysis, computer processing, and pedagogy.

Pinyin alphabetical order, also called Pinyin-based order, or Pinyin order in short, is a sound-based Chinese character sorting method which has been used for arrangement of entries in Xinhua Dictionary, Xiandai Hanyu Cidian, Oxford Chinese Dictionary and many other modern dictionaries. In this method, Chinese characters are arranged according to the order of the Latin alphabet adopted in "Chinese Pinyin Scheme".

The GB stroke-based order, full name GB13000.1 Character Set Chinese Character Order , is a standard released by the State Language Commission of China in 1999. It is the current national standard for stroke-based sorting, and has been applied to the arrangement of the List of Commonly Used Standard Chinese Characters (通用规范汉字表), and the new versions of Xinhua Zidian and Xiandai Hanyu Cidian, etc.

Chinese character order, or Chinese character indexing, Chinese character collation and Chinese character sorting, is the way in which a Chinese character set is sorted into a sequence for the convenience of information retrieval. It may also refer to the sequence so produced.

The YES stroke alphabetical order (一二三漢字筆順排檢法), also called YES stroke-order sorting, briefly YES order or YES sorting, is a Chinese character sorting method based on a stroke alphabet and stroke orders. It is a simplified stroke-based sorting method free of stroke counting and grouping.

<span class="mw-page-title-main">Chinese character strokes</span> Smallest writing units of Chinese characters

Strokes are the smallest structural units making up written Chinese characters. In the act of writing, a stroke is defined as a movement of a writing instrument on a writing material surface, or the trace left on the surface from a discrete application of the writing implement. The modern sense of discretized strokes first came into being with the clerical script during the Han dynasty. In the regular script that emerged during the Tang dynasty—the most recent major style, highly studied for its aesthetics in East Asian calligraphy—individual strokes are discrete and highly regularized. By contrast, the ancient seal script has line terminals within characters that are often unclear, making them non-trivial to count.

Stroke number, or stroke count, is the number of strokes of a Chinese character. It may also refer to the number of different strokes in a Chinese character set. Stroke number plays an important role in Chinese character sorting, teaching and computer information processing.

Chinese character forms are the shapes and structures of Chinese characters. They are the physical carriers of written Chinese.

A Chinese character set is a group of Chinese characters. Since the size of a set is the number of elements in it, an introduction to Chinese character sets will also introduce the Chinese character numbers in them.

Stroke Orders of the Commonly Used Standard Chinese Characters is a language standard jointly published by the Ministry of Education and the National Language Commission of China in November, 2020.

CJK Unified Ideographs (YES order) is a list of CJK Unified Ideographs sorted in YES order, a simpler alternative to the traditional Radical order employed in CJK Unified Ideographs (Unicode block), List of CJK Unified Ideographs, part 1, part 2, part 3, part 4.

Stroke orders of CJK Unified Ideographs (YES order) is a list of stroke orders of the CJK Unified Ideographs sorted in YES order, a simpler alternative to the traditional Radical order employed in CJK Unified Ideographs (Unicode block), List of CJK Unified Ideographs, part 1, part 2, part 3, part 4.

Stroke Order Standard of GB 13000.1 Character Set, full name GB 13000.1 Character Set Chinese Character Stroke Order Standard , is a Chinese national standard on the order of strokes in writing Chinese characters. It has stipulated the stroke orders of 20,902 CJK Unified Ideographs. This standard was promulgated by the State Language Commission on October 1, 1999 and implemented on January 1, 2000. It is applicable to Chinese character information processing, dictionary compilation, Chinese character teaching and research, etc.

Pianpangs are components in Chinese character internal structures. A compound character is normally divided into two pianpangs according to their relationship in sounds and meanings. Originally, the left side component of the character was called pian, and the right side pang. Nowadays, it is customary to refer to the left and right, upper and lower, outer and inner parts of compound characters as pianpangs.

References

  1. Su, Peicheng (苏培成) (2014). 现代汉字学纲要 (Essentials of Modern Chinese Characters) (in Chinese) (3rd ed.). Beijing: Commercial Press. pp. 189–207. ISBN   978-7-100-10440-1.
  2. Wang, Ning (王寧,鄒曉麗) (2003). 工具書 (Reference Books) (in Chinese). Hong Kong: 和平圖書有限公司. pp. 23–25. ISBN   962-238-363-7.
  3. Kleeman, Julie (and Harry Yu) (2010). Oxford Chinese Dictionary (牛津英漢-漢英詞典). Oxfoed: Oxford University Press. ISBN   978-0-19-920761-9.
  4. Su 2014, pp. 205–207.
  5. 1 2 Su 2014, p. 187.
  6. National Language Commission of China (October 1, 1999). GB13000.1字符集汉字字序(笔画序)规范 (Standard of GB13000.1 Character Set Chinese Character Order (Stroke-Based Order)) (PDF) (in Chinese). Shanghai Education Press. ISBN   7-5320-6674-6.
  7. Language Institute, Chinese Academy of Social Sciences (2020). 新华字典 (Xinhua Dictionary ) (in Chinese) (12th ed.). Beijing: Commercial Press. ISBN   978-7-100-17093-2.
  8. Language Institute, Chinese Academy of Social Sciences (2016). 现代汉语词典 (Modern Chinese Dictionary) (in Chinese) (7th ed.). Beijing: Commercial Press. ISBN   978-7-100-12450-8.
  9. "Unicode CJK Strokes" (PDF). The Unicode Standard. Retrieved 2023-06-21.
  10. 1 2 Zhang, Xiaoheng et. al (张小衡, 李笑通) (2013). 一二三笔顺检字手册 (Handbook of the YES Sorting Method) (in Chinese). Beijing: 语文出版社 (The Language Press). ISBN   978-7-80241-670-3.{{cite book}}: CS1 maint: multiple names: authors list (link)
  11. Zhang, X. (Li, X. and Lin, S.) (2015b). "A Brief Introduction to the YES-CEDICT Chinese Dictionary (《一二三汉英大词典》简介)". The Journal of Modernization of Chinese Language Education (中文教学现代化学报). 4 (2015) (1): 27–31.{{cite journal}}: CS1 maint: multiple names: authors list (link)