Chinese language

Last updated

汉语; 漢語; Hànyǔ or 中文; Zhōngwén
Hànyǔ written in traditional (top) and simplified (middle) forms, Zhōngwén (bottom)
Native toThe Sinophone world: Mainland China, Taiwan, Singapore
Native speakers
1.35 billion (2017–2022) [1]
Early forms
Standard forms
Official status
Official language in
Recognised minority
language in
Flag of Russia.svg  Russia
Regulated by
Language codes
ISO 639-1 zh
ISO 639-2 chi  (B)
zho  (T)
ISO 639-3 zho – inclusive code
Individual codes:
cdo    Eastern Min
cjy    Jinyu
cmn    Mandarin
cpx    Pu-Xian Min
czh    Huizhou
czo    Central Min
gan    Gan
hak    Hakka
hsn    Xiang
mnp    Northern Min
nan    Southern Min
wuu    Wu
yue    Yue
csp   Southern Pinghua
cnp   Northern Pinghua
och    Old Chinese
ltc   Late Middle Chinese
lzh    Classical Chinese
Glottolog sini1245
Linguasphere 79-AAA
Map-Sinophone World.png
Map of the Chinese-speaking world
  Regions with a native Chinese-speaking majority.
  Regions with significant Chinese-speaking minorities.
  Regions where Chinese is not native but an official or educational language.
This article contains IPA phonetic symbols. Without proper rendering support, you may see question marks, boxes, or other symbols instead of Unicode characters. For an introductory guide on IPA symbols, see Help:IPA.

The sinologist Jerry Norman has estimated that there are hundreds of mutually unintelligible varieties of Chinese. [42] These varieties form a dialect continuum, in which differences in speech generally become more pronounced as distances increase, though the rate of change varies immensely. Generally, mountainous South China exhibits more linguistic diversity than the North China Plain. Until the late 20th century, Chinese emigrants to Southeast Asia and North America came from southeast coastal areas, where Min, Hakka, and Yue dialects are spoken. Specifically, most Chinese immigrants to North America until the mid-20th century spoke Taishanese, a variety of Yue from a small coastal area around Taishan, Guangdong. [43]

In parts of South China, the dialect of a major city may be only marginally intelligible to its neighbours. For example, Wuzhou and Taishan are located approximately 260 km (160 mi) and 190 km (120 mi) away from Guangzhou respectively. However, the Yue variety spoken in Wuzhou is more similar to the Guangzhou dialect than Taishanese is—while Wuzhou is located directly upstream from Guangzhou on the Pearl River, Taishan is to Guangzhou's southwest, with the two cities separated by several river valleys. [44] In parts of Fujian, the speech of some neighbouring counties or villages is mutually unintelligible. [45]


Range of dialect groups in China proper and Taiwan according to the Language Atlas of China Map of sinitic languages cropped-en.svg
Range of dialect groups in China proper and Taiwan according to the Language Atlas of China

Local varieties of Chinese are conventionally classified into seven dialect groups, largely based on the different evolution of Middle Chinese voiced initials: [47] [48]

Proportions of first-language speakers [6]

   Mandarin (65.7%)
   Min (6.2%)
   Wu (6.1%)
   Yue (5.6%)
   Jin (5.2%)
   Gan (3.9%)
   Hakka (3.5%)
   Xiang (3.0%)
   Huizhou (0.3%)
   Pinghua, others (0.6%)

The classification of Li Rong, which is used in the Language Atlas of China (1987), distinguishes three further groups: [46] [49]

  • Jin, previously included in Mandarin.
  • Huizhou, previously included in Wu.
  • Pinghua, previously included in Yue.

Some varieties remain unclassified, including the Danzhou dialect on Hainan, Waxianghua spoken in western Hunan, and Shaozhou Tuhua spoken in northern Guangdong. [50]

Standard Chinese

Standard Chinese is the standard language of China (where it is called 普通话; pǔtōnghuà) and Taiwan, and one of the four official languages of Singapore (where it is called either 华语; 華語; Huáyǔ or 汉语; 漢語; Hànyǔ). Standard Chinese is based on the Beijing dialect of Mandarin. The governments of both China and Taiwan intend for speakers of all Chinese speech varieties to use it as a common language of communication. Therefore, it is used in government agencies, in the media, and as a language of instruction in schools.

Diglossia is common among Chinese speakers. For example, a Shanghai resident may speak both Standard Chinese and Shanghainese; if they grew up elsewhere, they are also likely fluent in the dialect of their home region. In addition to Standard Chinese, a majority of Taiwanese people also speak Taiwanese Hokkien (also called 台語; 'Taiwanese' [51] [52] ), Hakka, or an Austronesian language. [53] A speaker in Taiwan may mix pronunciations and vocabulary from Standard Chinese and other languages of Taiwan in everyday speech. [54] In part due to traditional cultural ties with Guangdong, Cantonese is used as an everyday language in Hong Kong and Macau.


The designation of various Chinese branches remains controversial. Some linguists and most ordinary Chinese people consider all the spoken varieties as one single language, as speakers share a common national identity and a common written form. [55] Others instead argue that it is inappropriate to refer to major branches of Chinese such as Mandarin, Wu and so on as "dialects" because the mutual unintelligibility between them is too great. [56] [57] However, calling major Chinese branches "languages" would also be wrong under the same criterion, since a branch such as Wu, itself contains many mutually unintelligible varieties, and could not be properly called a single language. [42]

There are also viewpoints pointing out that linguists often ignore mutual intelligibility when varieties share intelligibility with a central variety (i.e. prestige variety, such as Standard Mandarin), as the issue requires some careful handling when mutual intelligibility is inconsistent with language identity. [58]

The Chinese government's official Chinese designation for the major branches of Chinese is 方言; fāngyán; 'regional speech', whereas the more closely related varieties within these are called 地点方言; 地點方言; dìdiǎn fāngyán; 'local speech'. [59]

Because of the difficulties involved in determining the difference between language and dialect, other terms have been proposed. These include topolect, [60] lect , [61] vernacular, [62] regional, [59] and variety . [63] [64]


A man speaking Mandarin with a Malaysian accent

Syllables in the Chinese languages have some unique characteristics. They are tightly related to the morphology and also to the characters of the writing system; and phonologically they are structured according to fixed rules.

The structure of each syllable consists of a nucleus that has a vowel (which can be a monophthong, diphthong, or even a triphthong in certain varieties), preceded by an onset (a single consonant, or consonant + glide; a zero onset is also possible), and followed (optionally) by a coda consonant; a syllable also carries a tone. There are some instances where a vowel is not used as a nucleus. An example of this is in Cantonese, where the nasal sonorant consonants /m/ and /ŋ/ can stand alone as their own syllable.

In Mandarin much more than in other spoken varieties, most syllables tend to be open syllables, meaning they have no coda (assuming that a final glide is not analyzed as a coda), but syllables that do have codas are restricted to nasals /m/, /n/, /ŋ/, the retroflex approximant /ɻ/, and voiceless stops /p/, /t/, /k/, or /ʔ/. Some varieties allow most of these codas, whereas others, such as Standard Chinese, are limited to only /n/, /ŋ/, and /ɻ/.

The number of sounds in the different spoken dialects varies, but in general there has been a tendency to a reduction in sounds from Middle Chinese. The Mandarin dialects in particular have experienced a dramatic decrease in sounds and so have far more polysyllabic words than most other spoken varieties. The total number of syllables in some varieties is therefore only about a thousand, including tonal variation, which is only about an eighth as many as English. [lower-alpha 7]


All varieties of spoken Chinese use tones to distinguish words. [65] A few dialects of north China may have as few as three tones, while some dialects in south China have up to 6 or 12 tones, depending on how one counts. One exception from this is Shanghainese which has reduced the set of tones to a two-toned pitch accent system much like modern Japanese.

A very common example used to illustrate the use of tones in Chinese is the application of the four tones of Standard Chinese, along with the neutral tone, to the syllable ma. The tones are exemplified by the following five Chinese words:

First tone (Mandarin).svg
Second tone (Mandarin).svg
Third tone (Mandarin).svg
Fourth tone (Mandarin).svg
The syllable ma with each of the primary tones in Standard Chinese
Han language
Simplified Chinese 汉语
Traditional Chinese 漢語
Literal meaning Han language
Examples of the Standard Mandarin tones
CharacterGlossPinyinPitch contour
; 'mother'high, level
'hemp'high, rising
; 'horse'low falling, then rising
; 'scold'high falling
; INTR.PTCma(varies) [lower-alpha 8]

In contrast, Standard Cantonese has six tones. Historically, finals that end in a stop consonant were considered to be "checked tones" and thus counted separately for a total of nine tones. However, they are considered to be duplicates in modern linguistics and are no longer counted as such: [66]

Examples of the Standard Cantonese tones
CharacterGloss Jyutping Yale Pitch contour
; 'poem'si1high, level; high, falling
'history'si2high, rising
'assassinate'si3simid, level
; 'time'si4sìhlow, falling
'market'si5síhlow, rising
'yes'si6sihlow, level


Chinese is often described as a 'monosyllabic' language. However, this is only partially correct. It is largely accurate when describing Old and Middle Chinese; in Classical Chinese, around 90% of words consist of a single character that corresponds one-to-one with a morpheme , the smallest unit of meaning in a language. In modern varieties, it usually remains the case that a morphemes are monosyllabic—in contrast, English has many multi-syllable morphemes, both bound and free, such as 'seven', 'elephant', 'para-' and '-able'. Some of the more conservative modern varieties, usually found in the south, have largely monosyllabic words , especially with basic vocabulary. However, most nouns, adjectives and verbs in modern Mandarin are disyllabic. A significant cause of this is phonological attrition: sound changes over time have steadily reduced the number of possible syllables in the language's inventory. In modern Mandarin, there are only around 1,200 possible syllables, including the tonal distinctions, compared with about 5,000 in Vietnamese (still a largely monosyllabic language), and over 8,000 in English. [lower-alpha 7]

Most modern varieties have the tendency to form new words through polysyllabic compounds. In some cases, monosyllabic words have become disyllabic formed from different characters without the use of compounding, as in 窟窿; kūlong from ; kǒng; this is especially common in Jin varieties. This phonological collapse has led to a corresponding increase in the number of homophones. As an example, the small Langenscheidt Pocket Chinese Dictionary [67] lists six words that are commonly pronounced as shí in Standard Chinese:

CharacterGlossMC [lower-alpha 9] Cantonese
; 'actual'zyitsat6
; 'recognize'dzyeksik1
; 'time'dzyisi4

In modern spoken Mandarin, however, tremendous ambiguity would result if all of these words could be used as-is. The 20th century Yuen Ren Chao poem Lion-Eating Poet in the Stone Den exploits this, consisting of 92 characters all pronounced shi. As such, most of these words have been replaced in speech, if not in writing, with less ambiguous disyllabic compounds. Only the first one, , normally appears in monosyllabic form in spoken Mandarin; the rest are normally used in the polysyllabic forms of

实际; 實際shíjì'actual-connection'
认识; 認識rènshi'recognize-know'
石头; 石頭shítou'stone-head'
时间; 時間shíjiān'time-interval'

respectively. In each, the homophone was disambiguated by addition of another morpheme, typically either a near-synonym or some sort of generic word (e.g. 'head', 'thing'), the purpose of which is to indicate which of the possible meanings of the other, homophonic syllable is specifically meant.

However, when one of the above words forms part of a compound, the disambiguating syllable is generally dropped and the resulting word is still disyllabic. For example, ; shí alone, and not 石头; 石頭; shítou, appears in compounds as meaning 'stone' such as 石膏; shígāo; 'plaster', 石灰; shíhuī; 'lime', 石窟; shíkū; 'grotto', 石英; 'quartz', and 石油; shíyóu; 'petroleum'. Although many single-syllable morphemes (; ) can stand alone as individual words, they more often than not form multi-syllable compounds known as ; ; , which more closely resembles the traditional Western notion of a word. A Chinese can consist of more than one character–morpheme, usually two, but there can be three or more.

Examples of Chinese words of more than two syllables include 汉堡包; 漢堡包; hànbǎobāo; 'hamburger', 守门员; 守門員; shǒuményuán; 'goalkeeper', and 电子邮件; 電子郵件; diànzǐyóujiàn; 'e-mail'.

All varieties of modern Chinese are analytic languages: they depend on syntax (word order and sentence structure), rather than inflectional morphology (changes in the form of a word), to indicate a word's function within a sentence. [68] In other words, Chinese has very few grammatical inflections—it possesses no tenses, no voices, no grammatical number, [lower-alpha 10] and only a few articles. [lower-alpha 11] They make heavy use of grammatical particles to indicate aspect and mood. In Mandarin, this involves the use of particles such as ; le; 'PFV', ; ; hái; 'still', and 已经; 已經; yǐjīng; 'already'.

Chinese has a subject–verb–object word order, and like many other languages of East Asia, makes frequent use of the topic–comment construction to form sentences. Chinese also has an extensive system of classifiers and measure words, another trait shared with neighboring languages such as Japanese and Korean. Other notable grammatical features common to all the spoken varieties of Chinese include the use of serial verb construction, pronoun dropping and the related subject dropping. Although the grammars of the spoken varieties share many traits, they do possess differences.


The entire Chinese character corpus since antiquity comprises well over 50,000 characters, of which only roughly 10,000 are in use and only about 3,000 are frequently used in Chinese media and newspapers. [69] However, Chinese characters should not be confused with Chinese words. Because most Chinese words are made up of two or more characters, there are many more Chinese words than characters. A more accurate equivalent for a Chinese character is the morpheme, as characters represent the smallest grammatical units with individual meanings in the Chinese language.

Estimates of the total number of Chinese words and lexicalized phrases vary greatly. The Hanyu Da Zidian , a compendium of Chinese characters, includes 54,678 head entries for characters, including oracle bone versions. The Zhonghua Zihai (1994) contains 85,568 head entries for character definitions, and is the largest reference work based purely on character and its literary variants. The CC-CEDICT project (2010) contains 97,404 contemporary entries including idioms, technology terms and names of political figures, businesses and products. The 2009 version of the Webster's Digital Chinese Dictionary (WDCD), [70] based on CC-CEDICT, contains over 84,000 entries.

The most comprehensive pure linguistic Chinese-language dictionary, the 12-volume Hanyu Da Cidian , records more than 23,000 head Chinese characters and gives over 370,000 definitions. The 1999 revised Cihai , a multi-volume encyclopedic dictionary reference work, gives 122,836 vocabulary entry definitions under 19,485 Chinese characters, including proper names, phrases and common zoological, geographical, sociological, scientific and technical terms.

The 2016 edition of Xiandai Hanyu Cidian , an authoritative one-volume dictionary on modern standard Chinese language as used in mainland China, has 13,000 head characters and defines 70,000 words.


Like many other languages, Chinese has absorbed a sizable number of loanwords from other cultures. Most Chinese words are formed out of native Chinese morphemes, including words describing imported objects and ideas. However, direct phonetic borrowing of foreign words has gone on since ancient times.

Some early Indo-European loanwords in Chinese have been proposed, notably 'honey' (; ), 'lion' (; ; shī), and perhaps 'horse' (; ; ), 'pig' (; ; zhū), 'dog' (; quǎn), and 'goose' (; ; é). [71] Ancient words borrowed from along the Silk Road during the Old Chinese period include 'grape' (葡萄; pútáo), 'pomegranate' (石榴; shíliú), and 'lion' (狮子; 獅子; shīzi). Some words were borrowed from Buddhist scriptures, including 'Buddha' (; ) and 'bodhisattva' (菩萨; 菩薩; Púsà). Other words came from nomadic peoples to the north, such as ' hutong ' (胡同). Words borrowed from the peoples along the Silk Road, such as 'grape' (葡萄), generally have Persian etymologies. Buddhist terminology is generally derived from Sanskrit or Pali, the liturgical languages of northern India. Words borrowed from the nomadic tribes of the Gobi, Mongolian or northeast regions generally have Altaic etymologies, such as 琵琶 (pípá), the Chinese lute, or 'cheese or yogurt' (; lào), but from exactly which source is not always clear. [72]

Modern borrowings

Modern neologisms are primarily translated into Chinese in one of three ways: free translation (calques), phonetic translation (by sound), or a combination of the two. Today, it is much more common to use existing Chinese morphemes to coin new words to represent imported concepts, such as technical expressions and international scientific vocabulary, wherein the Latin and Greek components usually converted one-for-one into the corresponding Chinese characters. The word 'telephone' was initially loaned phonetically as 德律风; 德律風 (délǜfēng; Shanghainese télífon[təlɪfoŋ])—this word was widely used in Shanghai during the 1920s, but the later 电话; 電話 (diànhuà; 'electric speech'), built out of native Chinese morphemes became prevalent. Other examples include

电视; 電視 (diànshì; 'electric vision')'television'
电脑; 電腦 (diànnǎo; 'electric brain')'computer'
手机; 手機 (shǒujī; 'hand machine')'mobile phone'
蓝牙; 藍牙 (lányá; 'blue tooth')'Bluetooth'
网志; 網誌 (wǎngzhì; 'internet logbook') [lower-alpha 12] 'blog'

Occasionally, compromises between the transliteration and translation approaches become accepted, such as 汉堡包; 漢堡包 (hànbǎobāo; 'hamburger') from 汉堡; 'Hamburg' + ('bun'). Sometimes translations are designed so that they sound like the original while incorporating Chinese morphemes (phono-semantic matching), such as 马利奥; 馬利奧 (Mǎlì'ào) for the video game character 'Mario'. This is often done for commercial purposes, for example 奔腾; 奔騰 (bēnténg; 'dashing-leaping') for 'Pentium' and 赛百味; 賽百味 (Sàibǎiwèi; 'better-than hundred tastes') for 'Subway'.

Foreign words, mainly proper nouns, continue to enter the Chinese language by transcription according to their pronunciations. This is done by employing Chinese characters with similar pronunciations. For example, 'Israel' becomes 以色列 (Yǐsèliè), and 'Paris' becomes 巴黎 (Bālí). A rather small number of direct transliterations have survived as common words, including 沙发; 沙發 (shāfā; 'sofa'), 马达; 馬達 (mǎdá; 'motor'), 幽默 (yōumò; 'humor'), 逻辑; 邏輯 (luóji, luójí; 'logic'), 时髦; 時髦 (shímáo; 'smart (fashionable)'), and 歇斯底里 (xiēsīdǐlǐ; 'hysterics'). The bulk of these words were originally coined in Shanghai during the early 20th century, and later loaned from there into Mandarin, hence their Mandarin pronunciations occasionally being quite divergent from the English. For example, in Shanghainese 沙发; 沙發 (sofa) and 马达; 馬達 ('motor') sound more like their English counterparts. Cantonese differs from Mandarin with some transliterations, such as 梳化 (so1 faa3,2; 'sofa') and 摩打 (mo1 daa2; 'motor').

Western foreign words representing Western concepts have influenced Chinese since the 20th century through transcription. From French, 芭蕾 (bālěi) and 香槟; 香檳 (xiāngbīn) were borrowed for 'ballet' and 'champagne' respectively; 咖啡 (kāfēi) was borrowed from Italian caffè 'coffee'. The influence of English is particularly pronounced: from the early 20th century, many English words were borrowed into Shanghainese, such as 高尔夫; 高爾夫 (gāo'ěrfū; 'golf') and the aforementioned 沙发; 沙發 (shāfā; 'sofa'). Later, American soft power gave rise to 迪斯科 (dísīkē; 'disco'), 可乐; 可樂 (kělè; 'cola'), and mínǐ ('miniskirt'). Contemporary colloquial Cantonese has distinct loanwords from English, such as 卡通 (kaa1 tung1; 'cartoon'), 基佬 (gei1 lou2; 'gay people'), 的士 (dik1 si6,2; 'taxi'), and 巴士 (baa1 si6,2; 'bus'). With the rising popularity of the Internet, there is a current vogue in China for coining English transliterations, for example, 粉丝; 粉絲 (fěnsī; 'fans'), 黑客 (hēikè; 'hacker'), and 博客 (bókè; 'blog'). In Taiwan, some of these transliterations are different, such as 駭客 (hàikè; 'hacker') and 部落格 (bùluògé; 'interconnected tribes') for 'blog'.

Another result of English influence on Chinese is the appearance in of so-called 字母词; 字母詞 (zìmǔcí; 'lettered words') spelled with letters from the English alphabet. These have appeared in colloquial usage, as well as in magazines and newspapers, and on websites and television:

'third generation of cell phones'
(sān; 'three')+G; 'generation'+手机; shǒujī ('cell phone')
'IT circles'
IT+ (jiè; 'industry')
'Cost, Insurance, Freight'
CIF+; jià; 'price'
e; 'electronic'+家庭; jiātíng; 'home'
'wireless era'
W; 'wireless'+时代; shídài; 'era'
TV; 'television'+; TV zú; 'clan'

Since the 20th century, another source of words has been kanji: Japan re-molded European concepts and inventions into 和製漢語, wasei-kango , 'Japanese-made Chinese', and many of these words have been re-loaned into modern Chinese. Other terms were coined by the Japanese by giving new senses to existing Chinese terms or by referring to expressions used in classical Chinese literature. For example, 经济; 經濟; jīngjì; 経済, keizai in Japanese, which in the original Chinese meant 'the workings of the state', narrowed to 'economy' in Japanese; this narrowed definition was then reimported into Chinese. As a result, these terms are virtually indistinguishable from native Chinese words: indeed, there is some dispute over some of these terms as to whether the Japanese or Chinese coined them first. As a result of this loaning, Chinese, Korean, Japanese, and Vietnamese share a corpus of linguistic terms describing modern terminology, paralleling the similar corpus of terms built from Greco-Latin and shared among European languages.

Writing system

"Preface to the Poems Composed at the Orchid Pavilion" by Wang Xizhi, written in semi-cursive style XingshuLantingxv.jpg
"Preface to the Poems Composed at the Orchid Pavilion" by Wang Xizhi, written in semi-cursive style

The Chinese orthography centers on Chinese characters, which are written within imaginary square blocks, traditionally arranged in vertical columns, read from top to bottom down a column, and right to left across columns, despite alternative arrangement with rows of characters from left to right within a row and from top to bottom across rows (like English and other Western writing systems) having become more popular since the 20th century. [73] Chinese characters denote morphemes independent of phonetic variation in different languages. Thus the character ('one') is pronounced as in Standard Chinese, yat1 in Cantonese and it in Hokkien, a form of Min.

Most modern written Chinese is in the form of written vernacular Chinese, based on spoken Standard Chinese, regardless of dialectical background. Written vernacular Chinese largely replaced Literary Chinese in the early 20th century as the country's standard written language. [74] However, vocabularies from different Chinese-speaking areas have diverged, and the divergence can be observed in written Chinese. [75] [ better source needed ]

Due to the divergence of variants, there are a number of unique morphemes that are not found in Standard Chinese. Characters rarely used in Standard Chinese have also been created or inherited from archaic literary standard to represent these unique morphemes. For example, characters like and are actively used in Cantonese and Hakka, while being archaic or unused in standard written Chinese. The most prominent example of a non-Standard Chinese orthography is Written Cantonese, which is used in tabloids and on the internet among Cantonese speakers in Hong Kong and elsewhere. [76] [ better source needed ]

Chinese had no uniform system of phonetic transcription until the mid-20th century, although enunciation patterns were recorded in early rime books and dictionaries. Early Indian translators, working in Sanskrit and Pali, were the first to attempt to describe the sounds and enunciation patterns of Chinese in a foreign language. After the 15th century, the efforts of Jesuits and Western court missionaries resulted in some Latin character transcription/writing systems, based on various variants of Chinese languages. Some of these Latin character based systems are still being used to write various Chinese variants in the modern era. [77]

In Hunan, women in certain areas write their local Chinese language variant in Nüshu, a syllabary derived from Chinese characters. The Dungan language, considered by many a dialect of Mandarin, is nowadays written in Cyrillic, and was previously written in the Arabic script. The Dungan people are primarily Muslim and live mainly in Kazakhstan, Kyrgyzstan, and Russia; many Hui people, living mainly in China, also speak the language.

Chinese characters

Yong is often used to illustrate the eight basic types of strokes of Chinese characters 8 strokes of Yong -zh.svg
is often used to illustrate the eight basic types of strokes of Chinese characters

Each Chinese character represents a monosyllabic Chinese word or morpheme. In 100 CE, the famed Han dynasty scholar Xu Shen classified characters into six categories: pictographs, simple ideographs, compound ideographs, phonetic loans, phonetic compounds and derivative characters. Only 4% were categorized as pictographs, including many of the simplest characters, such as (rén; 'human'), (; 'Sun'), (shān; 'mountain'), and (shuǐ; 'water'). Between 80% and 90% were classified as phonetic compounds such as (chōng; 'pour'), combining a phonetic component (zhōng) with a semantic component of the radical , a reduced form of ; 'water'. Almost all characters created since have been made using this format. The 18th-century Kangxi Dictionary classified characters under a now-common set of 214 radicals.

Modern characters are styled after the regular script. Various other written styles are also used in Chinese calligraphy, including seal script, cursive script and clerical script. Calligraphy artists can write in Traditional and Simplified characters, but they tend to use Traditional characters for traditional art.

There are currently two systems for Chinese characters. Traditional characters, used in Hong Kong, Taiwan, Macau, and many overseas Chinese speaking communities, largely take their form from received character forms dating back to the late Han dynasty and standardized during the Ming. Simplified characters, introduced by the PRC in 1954 to promote mass literacy, simplifies most complex traditional glyphs to fewer strokes, many to common cursive shorthand variants. Singapore, which has a large Chinese community, was the second nation to officially adopt simplified characters, although it has also become the de facto standard for younger ethnic Chinese in Malaysia.

The Internet provides practice reading each of these systems, and most Chinese readers are capable of, if not necessarily comfortable with, reading the alternative system through experience and guesswork. [78]

A well-educated Chinese reader today recognizes approximately 4,000 to 6,000 characters; approximately 3,000 characters are required to read a mainland newspaper. The PRC defines literacy amongst workers as a knowledge of 2,000 characters, though this would be only functional literacy. School-children typically learn around 2,000 characters whereas scholars may memorize up to 10,000. [79] A large unabridged dictionary like the Kangxi dictionary, contains over 40,000 characters, including obscure, variant, rare, and archaic characters; fewer than a quarter of these characters are now commonly used.


Guo Yu ;
Guo Yu ; Guoyu; 'National language' written in traditional and simplified forms, followed by various romanizations Gwoyu.svg
国语; 國語; Guóyǔ; 'National language' written in traditional and simplified forms, followed by various romanizations

Romanization is the process of transcribing a language into the Latin script. There are many systems of romanization for the Chinese varieties, due to the lack of a native phonetic transcription until modern times. Chinese is first known to have been written in Latin characters by Western Christian missionaries in the 16th century.

Today the most common romanization standard for Standard Mandarin is Hanyu Pinyin , introduced in 1956 by the PRC, and later adopted by Singapore and Taiwan. Pinyin is almost universally employed now for teaching standard spoken Chinese in schools and universities across the Americas, Australia, and Europe. Chinese parents also use Pinyin to teach their children the sounds and tones of new words. In school books that teach Chinese, the pinyin romanization is often shown below a picture of the thing the word represents, with the Chinese character alongside.

The second-most common romanization system, the Wade–Giles, was invented by Thomas Wade in 1859 and modified by Herbert Giles in 1892. As this system approximates the phonology of Mandarin Chinese into English consonants and vowels–it is largely an anglicization, it may be particularly helpful for beginner Chinese speakers of an English-speaking background. Wade–Giles was found in academic use in the United States, particularly before the 1980s, and until 2009 was widely used in Taiwan.

When used within European texts, the tone transcriptions in both pinyin and Wade–Giles are often left out for simplicity; Wade–Giles's extensive use of apostrophes is also usually omitted. Thus, most Western readers will be much more familiar with Beijing than they will be with Běijīng (pinyin), and with Taipei than T'ai2-pei3 (Wade–Giles). This simplification presents syllables as homophones which really are none, and therefore exaggerates the number of homophones almost by a factor of four.

For comparison:

Comparison of Mandarin romanizations
中国; 中國Chung1-kuo2Zhōngguó China
台湾; 台灣T'ai2-wan1Táiwān Taiwan
北京Pei3-ching1Běijīng Beijing
台北; 臺北T'ai2-pei3Táiběi Taipei
孫文Sun1-wên2Sūn Wén Sun Yat-sen
毛泽东; 毛澤東Mao2 Tse2-tung1Máo Zédōng Mao Zedong
蒋介石; 蔣介石Chiang3 Chieh4-shih2Jiǎng Jièshí Chiang Kai-shek
孔子K'ung3 Tsu3Kǒngzǐ Confucius

Other systems include Gwoyeu Romatzyh, the French EFEO, the Yale system (invented for use by US troops during World War II), as well as distinct systems for the phonetic requirements of Cantonese, Min Nan, Hakka, and other varieties.

Other phonetic transcriptions

Chinese varieties have been phonetically transcribed into many other writing systems over the centuries. The 'Phags-pa script, for example, has been very helpful in reconstructing the pronunciations of premodern forms of Chinese. Bopomofo (or zhuyin) is a semi-syllabary that is still widely used in Taiwan to aid standard pronunciation. There are also at least two systems of cyrillization for Chinese. The most widespread is the Palladius system.

As a foreign language

Yang Lingfu, former curator of the National Museum of China, giving Chinese language instruction at the Civil Affairs Staging Area in 1945 Chinese Language Training at CASA.PNG
Yang Lingfu, former curator of the National Museum of China, giving Chinese language instruction at the Civil Affairs Staging Area in 1945

With the growing importance and influence of China's economy globally, Standard Chinese instruction has been gaining popularity in schools throughout East Asia, Southeast Asia, and the Western world. [80]

Besides Mandarin, Cantonese is the only other Chinese language that is widely taught as a foreign language, largely due to the economic and cultural influence of Hong Kong and its widespread usage among significant Overseas Chinese communities. [81]

In 1991 there were 2,000 foreign learners taking China's official Chinese Proficiency Test, called Hanyu Shuiping Kaoshi (HSK), comparable to the English Cambridge Certificate, but by 2005 the number of candidates had risen sharply to 117,660 [82] and in 2010 to 750,000. [83]

See also


  1. The colloquial layers of many varieties, particularly Min varieties, reflect features that predate Middle Chinese. [2] [3]
  2. De facto spoken language—while no specific variety of Chinese is official in Hong Kong and Macau, Cantonese is the predominant spoken form and the de facto regional spoken standard. The Hong Kong government promotes trilingualism between Cantonese, Mandarin, and English; while the Macau government promotes quadrilingualism between Cantonese, Mandarin, Portuguese, and English, especially in public education.
  3. National Commission on Language and Script Work  [ zh ]
  4. Especially when distinguished from other languages of China
  5. "Chinese" refers collectively to the various language varieties that have descended from Old Chinese: native speakers often consider these to be "dialects" of a single language—though the Chinese term 方言; fāngyán; 'dialect' does not carry the precise connotations of "dialect" in English—while linguists typically analyze them as separate languages. See Dialect continuum and Varieties of Chinese for details.
  6. Various examples include:
    • David Crystal, The Cambridge Encyclopedia of Language (Cambridge: Cambridge University Press, 1987), p. 312. "The mutual unintelligibility of the varieties is the main ground for referring to them as separate languages."
    • Charles N. Li, Sandra A. Thompson. Mandarin Chinese: A Functional Reference Grammar (1989), p. 2. "The Chinese language family is genetically classified as an independent branch of the Sino-Tibetan language family."
    • Norman (1988), p. 1. "[...] the modern Chinese dialects are really more like a family of languages [...]"
    • DeFrancis (1984), p. 56. "To call Chinese a single language composed of dialects with varying degrees of difference is to mislead by minimizing disparities that according to Chao are as great as those between English and Dutch. To call Chinese a family of languages is to suggest extralinguistic differences that in fact do not exist and to overlook the unique linguistic situation that exists in China."
    Linguists in China often use a formulation introduced by Fu Maoji in the Encyclopedia of China : 《汉语在语言系属分类中相当于一个语族的地位。》; "In language classification, Chinese has a status equivalent to a language family." [5]
  7. 1 2 DeFrancis (1984), p. 42 counts Chinese as having 1,277 tonal syllables, and about 398 to 418 if tones are disregarded; he cites Jespersen, Otto (1928) Monosyllabism in English; London, p. 15 for a count of over 8000 syllables for English.
  8. See neutral tone.
  9. Using Baxter's transcription for Middle Chinese
  10. There are plural markers in the language, such as ; ; men, used with personal pronouns.
  11. A distinction is made between ; 'he' and ; 'she' in writing, but this was only introduced in the 20th century—both characters remain exactly homophonous.
  12. Hong Kong and Macau Cantonese.

Related Research Articles

<span class="mw-page-title-main">Mandarin Chinese</span> Major branch of Chinese languages

Mandarin is a group of Chinese language dialects that are natively spoken across most of northern and southwestern China. The group includes the Beijing dialect, the basis of the phonology of Standard Chinese, the official language of China. Because Mandarin originated in North China and most Mandarin dialects are found in the north, the group is sometimes referred to as Northern Chinese. Many varieties of Mandarin, such as those of the Southwest and the Lower Yangtze, are not mutually intelligible with the standard language. Nevertheless, Mandarin as a group is often placed first in lists of languages by number of native speakers.

<span class="mw-page-title-main">Standard Chinese</span> Standard form of Chinese and official language of China

Standard Chinese is a modern standard form of Mandarin Chinese that was first codified during the republican era (1912‒1949). It is designated as the official language of mainland China and a major language in the United Nations, Singapore, and Taiwan. It is largely based on the Beijing dialect. Standard Chinese is a pluricentric language with local standards in mainland China, Taiwan and Singapore that mainly differ in their lexicon. Hong Kong written Chinese, used for formal written communication in Hong Kong and Macau, is a form of Standard Chinese that is read aloud with the Cantonese reading of characters.

<span class="mw-page-title-main">Hakka Chinese</span> Sinitic language originating in southern China

Hakka forms a language group of varieties of Chinese, spoken natively by the Hakka people in parts of Southern China and some diaspora areas of Taiwan, Southeast Asia and in overseas Chinese communities around the world.

Written Chinese is a writing system that uses Chinese characters and other symbols to represent the Chinese languages. Chinese characters do not directly represent pronunciation, unlike letters in an alphabet or syllabograms in a syllabary. Rather, the writing system is morphosyllabic: characters are one spoken syllable in length, but generally correspond to morphemes in the language, which may either be independent words, or part of a polysyllabic word. Most characters are constructed from smaller components that may reflect the character's meaning or pronunciation. Literacy requires the memorization of thousands of characters; college-educated Chinese speakers know approximately 4,000. This has led in part to the adoption of complementary transliteration systems as a means of representing the pronunciation of Chinese.

<span class="mw-page-title-main">Yue Chinese</span> Primary branch of Chinese spoken in southern China

Yue is a branch of the Sinitic languages primarily spoken in Southern China, particularly in the provinces of Guangdong and Guangxi.

<span class="mw-page-title-main">Min Chinese</span> Primary branch of Sinitic spoken in southern China and Taiwan

Min is a broad group of Sinitic languages with about 70 million native speakers. These languages are spoken in Fujian province as well as by the descendants of Min-speaking colonists on the Leizhou Peninsula and Hainan and by the assimilated natives of Chaoshan, parts of Zhongshan, three counties in southern Wenzhou, the Zhoushan archipelago, Taiwan and scattered in pockets or sporadically across Hong Kong, Macau, and several countries in Southeast Asia, particularly Singapore, Malaysia, the Philippines, Indonesia, Thailand, Myanmar, Cambodia, Vietnam, Brunei. The name is derived from the Min River in Fujian, which is also the abbreviated name of Fujian Province. Min varieties are not mutually intelligible with one another nor with any other variety of Chinese.

Tone sandhi is a phonological change that occurs in tonal languages. It involves changes to the tones assigned to individual words or morphemes, based on the pronunciation of adjacent words or morphemes. This change typically simplifies a bidirectional tone into a one-directional tone. Tone sandhi is a type of sandhi, which refers to fusional changes, and is derived from the Sanskrit word for "joining."

<span class="mw-page-title-main">Varieties of Chinese</span> Family of local language varieties

There are hundreds of local Chinese language varieties forming a branch of the Sino-Tibetan language family, many of which are not mutually intelligible. Variation is particularly strong in the more mountainous southeast part of mainland China. The varieties are typically classified into several groups: Mandarin, Wu, Min, Xiang, Gan, Jin, Hakka and Yue, though some varieties remain unclassified. These groups are neither clades nor individual languages defined by mutual intelligibility, but reflect common phonological developments from Middle Chinese.

<span class="mw-page-title-main">Lion-Eating Poet in the Stone Den</span> Chinese one-syllable poem

"Lion-Eating Poet in the Stone Den" is a short narrative poem written in Literary Chinese that is composed of about 94 characters in which every word is pronounced shi when read in modern Standard Chinese, a dialect based on the Mandarin Chinese spoken in Beijing, with only the tones differing.

<span class="mw-page-title-main">Cantonese</span> Variety of Yue Chinese

Cantonese is a language within the Chinese (Sinitic) branch of the Sino-Tibetan languages originating from the city of Guangzhou and its surrounding Pearl River Delta. It is the traditional prestige variety of the Yue Chinese group, which has over 82.4 million native speakers. While the term Cantonese specifically refers to the prestige variety, it is often used to refer to the entire Yue subgroup of Chinese, including related but partially mutually intelligible varieties like Taishanese.

<span class="mw-page-title-main">Teochew Min</span> Southern Min language of China

Teochew, also known as Teo-Swa, is a Southern Min language spoken by the Teochew people in the Chaoshan region of eastern Guangdong and by their diaspora around the world. It is sometimes referred to as Chiuchow, its Cantonese rendering, due to English romanization by colonial officials and explorers. It is closely related to Hokkien, as it shares some cognates and phonology with Hokkien.

General Chinese is a diaphonemic orthography invented by Yuen Ren Chao to represent the pronunciations of all major varieties of Chinese simultaneously. It is "the most complete genuine Chinese diasystem yet published". It can also be used for the Korean, Japanese, and Vietnamese pronunciations of Chinese characters, and challenges the claim that Chinese characters are required for interdialectal communication in written Chinese.

A checked tone, commonly known by the Chinese calque entering tone, is one of the four syllable types in the phonology of Middle Chinese. Although usually translated as "tone", a checked tone is not a tone in the phonetic sense but rather a syllable that ends in a stop consonant or a glottal stop. Separating the checked tone allows -p, -t, and -k to be treated as allophones of -m, -n, and -ng, respectively, since they are in complementary distribution. Stops appear only in the checked tone, and nasals appear only in the other tones. Because of the origin of tone in Chinese, the number of tones found in such syllables is smaller than the number of tones in other syllables. Chinese phonetics have traditionally counted them separately.

Sino-Xenic or Sinoxenic pronunciations are regular systems for reading Chinese characters in Japan, Korea and Vietnam, originating in medieval times and the source of large-scale borrowings of Chinese words into the Japanese, Korean and Vietnamese languages, none of which are genetically related to Chinese. The resulting Sino-Japanese, Sino-Korean and Sino-Vietnamese vocabularies now make up a large part of the lexicons of these languages. The pronunciation systems are used alongside modern varieties of Chinese in historical Chinese phonology, particularly the reconstruction of the sounds of Middle Chinese. Some other languages, such as Hmong–Mien and Kra–Dai languages, also contain large numbers of Chinese loanwords but without the systematic correspondences that characterize Sino-Xenic vocabularies.

The earliest historical linguistic evidence of the spoken Chinese language dates back approximately 4,500 years, while examples of the writing system that would become written Chinese are attested in a body of inscriptions made on bronze vessels and oracle bones during the Late Shang period, with the very oldest dated to c. 1200 BCE.

<span class="mw-page-title-main">Romanization of Chinese</span> Writing Chinese with the Latin alphabet

Romanization of Chinese is the use of the Latin alphabet to transliterate Chinese. Chinese uses a logographic script and its characters do not represent phonemes directly. There have been many systems using Roman characters to represent Chinese throughout history. Linguist Daniel Kane wrote, "It used to be said that sinologists had to be like musicians, who might compose in one key and readily transcribe into other keys." The dominant international standard for Standard Mandarin since about 1982 has been Hanyu Pinyin, invented by a group of Chinese linguists, including Zhou Youguang, in the 1950s. Other well-known systems include Wade–Giles and Yale romanization.

Bopomofo, also called zhuyin or occasionally zhuyin fuhao, is a transliteration system system for Standard Chinese and other Sinitic languages. It is commonly used in Taiwan. It consists of 37 characters and five tone marks, which together can transcribe all possible sounds in Mandarin Chinese.

The phonology of Standard Chinese has historically derived from the Beijing dialect of Mandarin. However, pronunciation varies widely among speakers, who may introduce elements of their local varieties. Television and radio announcers are chosen for their ability to affect a standard accent. Elements of the sound system include not only the segments—e.g. vowels and consonants—of the language, but also the tones applied to each syllable. In addition to its four main tones, Standard Chinese has a neutral tone that appears on weak syllables.

Standard Cantonese pronunciation is that of Guangzhou, also known as Canton, capital of Guangdong Province. Hong Kong Cantonese is related to Guangzhou dialect, and they diverge only slightly. Yue dialects in other parts of Guangdong and Guangxi provinces like Taishanese, may be considered divergent to a greater degree.

Hokkien, a Southern Min variety of Chinese spoken in Southeastern China, Taiwan and Southeast Asia, does not have a unitary standardized writing system, in comparison with the well-developed written forms of Cantonese and Vernacular Chinese (Mandarin). In Taiwan, a standard for Written Hokkien has been developed by the Republic of China Ministry of Education including its Dictionary of Frequently-Used Taiwan Minnan, but there are a wide variety of different methods of writing in Vernacular Hokkien. Nevertheless, vernacular works written in Hokkien are still commonly seen in literature, film, performing arts and music.



  1. Chinese at Ethnologue (27th ed., 2024) Closed Access logo transparent.svg
    Eastern Min at Ethnologue (27th ed., 2024) Closed Access logo transparent.svg
    Jinyu at Ethnologue (27th ed., 2024) Closed Access logo transparent.svg
    Mandarin at Ethnologue (27th ed., 2024) Closed Access logo transparent.svg
    Pu-Xian Min at Ethnologue (27th ed., 2024) Closed Access logo transparent.svg
    Huizhou at Ethnologue (27th ed., 2024) Closed Access logo transparent.svg
    Central Min at Ethnologue (27th ed., 2024) Closed Access logo transparent.svg
    (Additional references under 'Language codes' in the information box)
  2. Norman (1988), pp. 211–214.
  3. Pulleyblank (1984), p. 3.
  4. "Summary by language size". Ethnologue. 3 October 2018. Retrieved 7 March 2021.
  5. Mair (1991), pp. 10, 21.
  6. 1 2 Chinese Academy of Social Sciences (2012), pp. 3, 125.
  7. Norman (1988), pp. 12–13.
  8. Handel (2008), pp. 422, 434–436.
  9. Handel (2008), p. 426.
  10. Handel (2008), p. 431.
  11. Norman (1988), pp. 183–185.
  12. Schüssler (2007), p. 1.
  13. Baxter (1992), pp. 2–3.
  14. Norman (1988), pp. 42–45.
  15. Baxter (1992), p. 177.
  16. Baxter (1992), pp. 181–183.
  17. Schüssler (2007), p. 12.
  18. Baxter (1992), pp. 14–15.
  19. Ramsey (1987), p. 125.
  20. Norman (1988), pp. 34–42.
  21. Norman (1988), p. 24.
  22. Norman (1988), p. 48.
  23. Norman (1988), pp. 48–49.
  24. Norman (1988), pp. 49–51.
  25. Norman (1988), pp. 133, 247.
  26. Norman (1988), p. 136.
  27. Coblin (2000), pp. 549–550.
  28. Coblin (2000), pp. 540–541.
  29. Ramsey (1987), pp. 3–15.
  30. Norman (1988), p. 133.
  31. Zhang & Yang (2004).
  32. Sohn & Lee (2003), p. 23.
  33. Miller (1967), pp. 29–30.
  34. Kornicki (2011), pp. 75–77.
  35. Kornicki (2011), p. 67.
  36. Miyake (2004), pp. 98–99.
  37. Shibatani (1990), pp. 120–121.
  38. Sohn (2001), p. 89.
  39. Shibatani (1990), p. 146.
  40. Wilkinson (2000), p. 43.
  41. Shibatani (1990), p. 143.
  42. 1 2 Norman (2003), p. 72.
  43. Norman (1988), pp. 189–191; Ramsey (1987), p. 98.
  44. Ramsey (1987), p. 23.
  45. Norman (1988), p. 188.
  46. 1 2 Wurm et al. (1987).
  47. Norman (1988), p. 181.
  48. Kurpaska (2010), pp. 53–55.
  49. Kurpaska (2010), pp. 55–56.
  50. Kurpaska (2010), pp. 72–73.
  51. 何, 信翰 (10 August 2019). "自由廣場》Taigi與台語". 自由時報. Retrieved 11 July 2021.
  52. 李, 淑鳳 (1 March 2010). "台、華語接觸所引起的台語語音的變化趨勢". 台語研究. 2 (1): 56–71. Retrieved 11 July 2021.
  53. Klöter, Henning (2004). "Language Policy in the KMT and DPP eras". China Perspectives. 56. ISSN   1996-4617 . Retrieved 30 May 2015.
  54. Kuo, Yun-Hsuan (2005). New dialect formation: the case of Taiwanese Mandarin (PhD). University of Essex. Retrieved 26 June 2015.
  55. Baxter (1992), pp. 7–8.
  56. DeFrancis (1984), pp. 55–57.
  57. Thomason (1988), pp. 27–28.
  58. Campbell (2008).
  59. 1 2 DeFrancis (1984), p. 57.
  60. Mair (1991), p. 7.
  61. Bailey (1973 :11), cited in Groves (2010 :531)
  62. Haugen (1966), p. 927.
  63. Hudson (1996), p. 22.
  64. Mair (1991), p. 17.
  65. Norman (1988), p. 52.
  66. Matthews & Yip (1994), pp. 20–22.
  67. Terrell, Peter, ed. (2005). Langenscheidt Pocket Chinese Dictionary . Berlin and Munich: Langenscheidt KG. ISBN   978-1-58573-057-5.
  68. Norman (1988), p. 10.
  69. "Languages - Real Chinese - Mini-guides - Chinese characters". BBC.
  70. Timothy Uy and Jim Hsia, Editors, Webster's Digital Chinese Dictionary – Advanced Reference Edition, July 2009
    • Egerod, Søren Christian. "Chinese languages". Encyclopædia Britannica. Old Chinese vocabulary already contained many words not generally occurring in the other Sino-Tibetan languages. The words for 'honey' and 'lion', and probably also 'horse', 'dog', and 'goose', are connected with Indo-European and were acquired through trade and early contacts. (The nearest known Indo-European languages were Tocharian and Sogdian, a middle Iranian language.) A number of words have Austroasiatic cognates and point to early contacts with the ancestral language of Muong–Vietnamese and Mon–Khmer.
    • Ulenbrook, Jan (1967), Einige Übereinstimmungen zwischen dem Chinesischen und dem Indogermanischen (in German) proposes 57 items.
    • Chang, Tsung-tung (1988). "Indo-European Vocabulary in Old Chinese" (PDF). Sino-Platonic Papers.
  71. Kane (2006), p. 161.
  72. "Requirements for Chinese Text Layout" 中文排版需求.
  73. Huang Hua (黃華). 白話為何在五四時期「活」起來了? (PDF) (in Chinese). Chinese University of Hong Kong. Archived (PDF) from the original on 10 October 2022.
  74. 粵普之爭 為你中文解毒 (in Chinese).
  75. 粤语:中国最强方言是如何炼成的_私家历史_澎湃新闻. The Paper澎湃新闻.
  76. 陳宇碩. 白話字滄桑. The New Messenger新使者雜誌 (in Chinese).
  77. 全球華文網-華文世界,數位之最 (in Chinese).
  78. Zimmermann, Basile (2010). "Redesigning Culture: Chinese Characters in Alphabet-Encoded Networks". Design and Culture. 2 (1): 27–43. doi:10.2752/175470710X12593419555126. S2CID   53981784.
  79. "How hard is it to learn Chinese?". BBC News. 17 January 2006. Retrieved 28 April 2010.
  80. Wakefield, John C., Cantonese as a Second Language: Issues, Experiences and Suggestions for Teaching and Learning (Routledge Studies in Applied Linguistics), Routledge, New York City, 2019., p.45
  81. (in Chinese) "汉语水平考试中心:2005年外国考生总人数近12万", Xinhua News Agency, 16 January 2006.
  82. Liu lili (27 June 2011). "Chinese language proficiency test becoming popular in Mexico". Archived from the original on 29 June 2011. Retrieved 12 September 2013.


Further reading