CEDICT

Last updated

The CEDICT project was started by Paul Denisowski in 1997 and is maintained by a team on mdbg.net under the name CC-CEDICT, with the aim to provide a complete Chinese to English dictionary with pronunciation in pinyin for the Chinese characters.

Contents

Content

CEDICT is a text file; other programs (or simply Notepad or egrep or equivalent) are needed to search and display it. This project is used by several other Chinese-English projects. The Unihan Database uses CEDICT data for most of its information about character compounds, but this is auxiliary and is explicitly not a part of the main Unicode database. [1]

Features:

The basic format of a CEDICT entry is:

Traditional Simplified [pin1 yin1] /American English equivalent 1/equivalent 2/ 漢字 汉字 [han4 zi4] /Chinese character/CL:個|个/

Example of a simple egrep search:

$ egrep -i 有勇無謀 cedict.txt 有勇無謀 有勇无谋 [you3 yong3 wu2 mou2] /bold but not very astute/

History

YearEvent
1991 EDICT Japanese dictionary project was started by Jim Breen.
1997CEDICT project started by Paul Denisowski, on the model of EDICT. Continued by Erik Peterson.
2007MDBG started a new project called CC-CEDICT which continues the CEDICT project with a new license: Creative Commons Attribution-Share Alike 3.0 License, allowing more projects to use it. [3] Additionally a work flow has been set up to streamline the process of submitting, reviewing and processing new entries.

CEDICT has shown the way to some other projects:

Related Research Articles

Hanyu Pinyin, or simply pinyin, is the most common romanization system for Standard Chinese. In official documents, it is referred to as the Chinese Phonetic Alphabet. It is the official system used in China, Singapore, and by the United Nations. Its use has become common when transliterating Standard Chinese mostly regardless of region, though it is less ubiquitous in Taiwan. It is used to teach Standard Chinese, normally written with Chinese characters, to students already familiar with the Latin alphabet. The system makes use of diacritics to indicate the four tones found in Standard Chinese, though these are often omitted in various contexts, such as when spelling Chinese names in non-Chinese texts, or when writing non-Chinese words in Chinese-language texts. Pinyin is also used by various input methods on computers and to categorize entries in some Chinese dictionaries. The word Hànyǔ literally means 'Han language'—meaning, the Chinese language—while pīnyīn (拼音) literally means 'spelled sounds'.

Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a feature shared in common by written Chinese (hanzi), Japanese (kanji), Korean (hanja) and Vietnamese.

Simplified characters are one of two standardized character sets widely used to write contemporary Chinese languages, along with traditional characters. Their officialization during the 20th century was part of an initiative by the People's Republic of China to promote literacy, and their use in ordinary circumstances on the mainland has been encouraged by the Chinese government since the 1950s. They are the official forms used in mainland China, Malaysia and Singapore, while traditional characters are officially used in Hong Kong, Macau, and Taiwan.

<span class="mw-page-title-main">Jyutping</span> Romanization scheme for Cantonese

The Linguistic Society of Hong Kong Cantonese Romanization Scheme, also known as Jyutping, is a romanisation system for Cantonese developed in 1993 by the Linguistic Society of Hong Kong (LSHK).

<i>Guangyun</i> Chinese rime dictionary compiled during the Song dynasty

The Guangyun is a Chinese rime dictionary that was compiled from 1007 to 1008 under the patronage of Emperor Zhenzong of Song. Its full name was Dà Sòng chóngxiū guǎngyùn. Chen Pengnian and Qiu Yong (邱雍) were the chief editors.

Guangdong Romanization refers to the four romanization schemes published by the Guangdong Provincial Education Department in 1960 for transliterating Cantonese, Teochew, Hakka and Hainanese. The schemes utilized similar elements with some differences in order to adapt to their respective spoken varieties.

<span class="mw-page-title-main">Yi script</span> Script used to write Yi peoples language

The Yi scripts are two scripts used to write the Yi languages; Classical Yi, and the later Yi syllabary. The script is historically known in Chinese as Cuan Wen or Wei Shu and various other names (夷字、倮語、倮倮文、畢摩文), among them "tadpole writing" (蝌蚪文).

<span class="mw-page-title-main">Chinese dictionary</span>

A Chinese dictionary is a reference work for the Chinese language. There are two main types of Chinese dictionaries: zidian, which list individual Chinese characters and their definitions, and cidian, which list words and short phrases along with their meanings. Because written Chinese consists of tens of thousands of characters, over time editors of Chinese dictionaries have developed a number of ways to organize them for convenient reference.

The Yale romanization of Cantonese was developed by Gerard P. Kok for his and Parker Po-fei Huang's textbook Speak Cantonese initially circulated in looseleaf form in 1952 but later published in 1958. Unlike the Yale romanization of Mandarin, it is still widely used in books and dictionaries, especially for foreign learners of Cantonese. It shares some similarities with Hanyu Pinyin in that unvoiced, unaspirated consonants are represented by letters traditionally used in English and most other European languages to represent voiced sounds. For example, is represented as b in Yale, whereas its aspirated counterpart, is represented as p. Students attending The Chinese University of Hong Kong's New-Asia Yale-in-China Chinese Language Center are taught using Yale romanization.

<span class="mw-page-title-main">Chinese telegraph code</span> Character encoding for messages with Chinese characters

The Chinese telegraph code, Chinese telegraphic code, or Chinese commercial code is a four-digit decimal code for electrically telegraphing messages written with Chinese characters.

Bopomofo (ㄅㄆㄇㄈ), also called Zhuyin, occasionally Mandarin Phonetic Symbols, is a Chinese transliteration and writing system for Mandarin Chinese and other related languages and dialects. More commonly used in Taiwanese Mandarin, it may also be used to transcribe other varieties of Chinese, particularly other varieties of Mandarin Chinese dialects, as well as Taiwanese Hokkien. Consisting of 37 characters and five tone marks, it transcribes all possible sounds in Mandarin.

The CCITT Chinese Primary Set is a multi-byte graphic character set for Chinese communications created for the Consultative Committee on International Telephone and Telegraph (CCITT) in 1992. It is defined in ITU T.101, annex C, which codifies Data Syntax 2 Videotex. It is registered with the ISO-IR registry for use with ISO/IEC 2022 as ISO-IR-165, and encodable in the ISO-2022-CN-EXT code version.

<span class="mw-page-title-main">Radical 100</span> Chinese character radical

Radical 100 or radical life (生部) meaning "life" is one of the 23 Kangxi radicals composed of 5 strokes.

The Table of Indexing Chinese Character Components is a lexicographic tool used to order the Chinese characters in mainland China. The specification is also known as GF 0011-2009.

JMdict is a large machine-readable multilingual Japanese dictionary. As of March 2023, it contains Japanese–English translations for around 199,000 entries, representing 282,000 unique headword-reading combinations. The dictionary files are free to use with attribution and have been widely adopted on the Internet and are used in many computer and smartphone applications. The project is considered a standard Japanese–English reference on the Internet and is used by the Unihan Database and several other Japanese–English projects.

<span class="mw-page-title-main">CFDICT</span> Chinese–French dictionary

The CFDICT project was started by David Houstin in 2010 and is maintained by a team on Chine Informations, with the aim to provide a complete Chinese to French dictionary with pronunciation in pinyin for the Chinese characters.

Liding, sometimes lixie, is the practice of rewriting ancient Chinese character forms in clerical or regular script. Liding is often used in Chinese textual studies.

Chinese character order, or Chinese character indexing, Chinese character collation and Chinese character sorting, is the way in which a Chinese character set is sorted into a sequence for the convenience of information retrieval. It may also refer to the sequence so produced. English dictionaries and indexes are normally arranged in alphabetical order for quick lookup. But Chinese is written in tens of thousands of different characters, not just dozens of letters in an alphabet, and that makes the sorting job much more challenging.

Chinese character IT is the information technology for computer processing of Chinese characters. While the English writing system uses a few dozen different characters, Chinese language needs a much larger character set. There are over ten thousand characters in the Xinhua Dictionary. In the Unicode multilingual character set of 149,813 characters, 98,682 are Chinese. That means computer processing of Chinese characters is the toughest among other languages. Chinese faces special issues compared to other languages, including the technology of computer input, internal encoding and output of Chinese characters.

A Chinese character set is a group of Chinese characters. Since the size of a set is the number of elements in it, an introduction to Chinese character sets will also introduce the Chinese character numbers in them.

References

  1. "Unihan Database Lookup". unicode.org.
  2. "MDBG English to Chinese dictionary". www.mdbg.net.
  3. The original CEDICT license was for non-commercial use only, and did not allow entries to be added without permission.
  4. "CC-Canto - A Cantonese dictionary for everyone". cantonese.org.
  5. http://writecantonese8.wordpress.com/2012/02/04/cantonese-cedict-project/ "Later, I was guided to merge data from Cantonese Stardict, which is an electronic version of “A Dictionary of Cantonese Slang”, into Cantonese CEDICT"
  6. "StarDict". Stardict.sourceforge.net. Retrieved 18 November 2011.