List of languages by number of native speakers

Last updated

Current distribution of human language families Human Language Families.png
Current distribution of human language families

This article ranks human languages by their number of native speakers.


However, all such rankings should be used with caution, because it is not possible to devise a coherent set of linguistic criteria for distinguishing languages in a dialect continuum. [1] For example, a language is often defined as a set of varieties that are mutually intelligible, but independent national standard languages may be considered to be separate languages even though they are largely mutually intelligible, as in the case of Danish and Norwegian. [2] Conversely, many commonly accepted languages, including German, Italian and even English, encompass varieties that are not mutually intelligible. [1] While Arabic is sometimes considered a single language centred on Modern Standard Arabic, other authors describe its mutually unintelligible varieties as separate languages. [3] Similarly, Chinese is sometimes viewed as a single language because of a shared culture and common literary language. [4] It is also common to describe various Chinese dialect groups, such as Mandarin, Wu and Yue, as languages, even though each of these groups contains many mutually unintelligible varieties. [5]

There are also difficulties in obtaining reliable counts of speakers, which vary over time because of population change and language shift. In some areas, there is no reliable census data, the data is not current, or the census may not record languages spoken, or record them ambiguously. Sometimes speaker populations are exaggerated for political reasons, or speakers of minority languages may be under-reported in favour of a national language. [6]

Top languages by population

Ethnologue (2019, 22nd edition)

The following languages are listed as having at least 10 million first language speakers in the 2019 edition of Ethnologue, a language reference published by SIL International, which is based in the United States. [7]

Languages with at least 10 million first-language speakers [7]
% of the World population

(March 2019) [8]

Language family
1 Mandarin Chinese 91811.922 Sino-Tibetan
2 Spanish 4805.994 Indo-European
3 English 3794.922 Indo-European
4 Hindi (Sanskritised Hindustani) [9] 3414.429 Indo-European
5 Bengali 2282.961 Indo-European
6 Portuguese 2212.870 Indo-European
7 Russian 1542.000 Indo-European
8 Japanese 1281.662 Japonic
9 Western Punjabi [10] 92.71.204 Indo-European
10 Marathi 83.11.079 Indo-European
11 Telugu 82.01.065 Dravidian
12 Wu Chinese 81.41.057 Sino-Tibetan
13 Turkish 79.41.031 Turkic
14 Korean 77.31.004 Koreanic
language isolate
15 French 77.21.003 Indo-European
16 German 76.10.988 Indo-European
17 Vietnamese 76.00.987 Austroasiatic
18 Tamil 75.00.974 Dravidian
19 Yue Chinese 73.10.949 Sino-Tibetan
20 Urdu (Persianised Hindustani) [9] 68.60.891 Indo-European
21 Javanese 68.30.887 Austronesian
22 Italian 64.80.842 Indo-European
23 Egyptian Arabic 64.60.839 Afroasiatic
24 Gujarati 56.40.732 Indo-European
25 Iranian Persian 52.80.686 Indo-European
26 Bhojpuri 52.20.678 Indo-European
27 Min Nan Chinese 50.10.651 Sino-Tibetan
28 Hakka Chinese 48.20.626 Sino-Tibetan
29 Jin Chinese 46.90.609 Sino-Tibetan
30 Hausa 43.90.570 Afroasiatic
31 Kannada 43.60.566 Dravidian
32 Indonesian (Indonesian Malay)43.40.564 Austronesian
33 Polish 39.70.516 Indo-European
34 Yoruba 37.80.491 Niger–Congo
35 Xiang Chinese 37.30.484 Sino-Tibetan
36 Malayalam 37.10.482 Dravidian
37 Odia 34.50.448 Indo-European
38 Maithili 33.90.440 Indo-European
39 Burmese 32.90.427 Sino-Tibetan
40 Eastern Punjabi [10] 32.60.423 Indo-European
41 Sunda 32.40.421 Austronesian
42 Sudanese Arabic 31.90.414 Afroasiatic
43 Algerian Arabic 29.40.382 Afroasiatic
44 Moroccan Arabic 27.50.357 Afroasiatic
45 Ukrainian 27.30.355 Indo-European
46 Igbo 27.00.351 Niger–Congo
47 Northern Uzbek 25.10.326 Turkic
48 Sindhi 24.60.319 Indo-European
49 North Levantine Arabic 24.60.319 Afroasiatic
50 Romanian 24.30.316 Indo-European
51 Tagalog 23.60.306 Austronesian
52 Dutch 23.10.300 Indo-European
53 Saʽidi Arabic 22.40.291 Afroasiatic
54 Gan Chinese 22.10.287 Sino-Tibetan
55 Amharic 21.90.284 Afroasiatic
56 Northern Pashto 20.90.271 Indo-European
57 Magahi 20.70.269 Indo-European
58 Thai 20.70.269 Kra–Dai
59 Saraiki 20.00.260 Indo-European
60 Khmer 16.60.216 Austroasiatic
61 Chhattisgarhi 16.30.212 Indo-European
62 Somali 16.20.210 Afroasiatic
63 Malay (Malaysian Malay)16.10.209 Austronesian
64 Cebuano 15.90.206 Austronesian
65 Nepali 15.80.205 Indo-European
66 Mesopotamian Arabic 15.70.204 Afroasiatic
67 Assamese 15.30.199 Indo-European
68 Sinhalese 15.30.199 Indo-European
69 Northern Kurdish 14.60.190 Indo-European
70 Hejazi Arabic 14.50.188 Afroasiatic
71 Nigerian Fulfulde 14.50.188 Niger–Congo
72 Bavarian 14.10.183 Indo-European
73 South Azerbaijani 13.80.179 Turkic
74 Greek 13.10.170 Indo-European
75 Chittagonian 13.00.169 Indo-European
76 Kazakh 12.90.168 Turkic
77 Deccan 12.80.166 Indo-European
78 Hungarian 12.60.164 Uralic
79 Kinyarwanda 12.10.157 Niger–Congo
80 Zulu 12.10.157 Niger–Congo
81 South Levantine Arabic 11.60.151 Afroasiatic
82 Tunisian Arabic 11.60.151 Afroasiatic
83 Sanaani Spoken Arabic 11.40.148 Afroasiatic
84 Min Bei Chinese 11.00.143 Sino-Tibetan
85 Southern Pashto 10.90.142 Indo-European
86 Rundi 10.80.140 Niger–Congo
87 Czech 10.70.139 Indo-European
88 Taʽizzi-Adeni Arabic 10.50.136 Afroasiatic
89 Uyghur 10.40.135 Turkic
90 Min Dong Chinese 10.30.134 Sino-Tibetan
91 Sylheti 10.30.134 Indo-European

Nationalencyklopedin (2010)

The following table contains the top 100 languages by estimated number of native speakers in the 2007 edition of the Swedish encyclopedia Nationalencyklopedin . As census methods in different countries vary to a considerable extent, and given that some countries do not record language in their censuses, any list of languages by native speakers, or total speakers, is effectively based on estimates. Updated estimates from 2010 are also provided. [11]

The top eleven languages have additional figures from the 2010 edition of the Nationalencyklopedin. Numbers above 95 million are rounded off to the nearest 5 million.

Top languages by population per Nationalencyklopedin
in millions
2007 (2010)
of world
1 Mandarin (entire branch)935 (955)14.1%
2 Spanish 390 (405)5.85%
3 English 365 (360)5.52%
4 Hindi [lower-alpha 1] 295 (310)4.46%
5 Arabic 280 (295)4.23%
6 Portuguese 205 (215)3.08%
7 Bengali 200 (205)3.05%
8 Russian 160 (155)2.42%
9 Japanese 125 (125)1.92%
10 Punjabi 95 (100)1.44%
11 German 92 (95)1.39%
12 Javanese 821.25%
13 Wu (inc. Shanghainese)801.20%
14 Malay (inc. Indonesian and Malaysian)771.16%
15 Telugu 761.15%
16 Vietnamese 761.14%
17 Korean 761.14%
18 French 751.12%
19 Marathi 731.10%
20 Tamil 701.06%
21 Urdu 660.99%
22 Turkish 630.95%
23 Italian 590.90%
24 Yue (inc. Cantonese)590.89%
25 Thai 560.85%
26 Gujarati 490.74%
27 Jin 480.72%
28 Southern Min (inc. Hokkien and Teochew)470.71%
29 Persian 450.68%
30 Polish 400.61%
31 Pashto 390.58%
32 Kannada 380.58%
33 Xiang 380.58%
34 Malayalam 380.57%
35 Sundanese 380.57%
36 Hausa 340.52%
37 Odia (Oriya)330.50%
38 Burmese 330.50%
39 Hakka 310.46%
40 Ukrainian 300.46%
41 Bhojpuri 29 [lower-alpha 2] 0.43%
42 Tagalog (Filipino)280.42%
43 Yoruba 280.42%
44 Maithili 27 [lower-alpha 2] 0.41%
45 Uzbek 260.39%
46 Sindhi 260.39%
47 Amharic 250.37%
48 Fula 240.37%
49 Romanian 240.37%
50 Oromo 240.36%
51 Igbo 240.36%
52 Azerbaijani 230.34%
53 Awadhi 22 [lower-alpha 2] 0.33%
54 Gan 220.33%
55 Cebuano (Visayan)210.32%
56 Dutch 210.32%
57 Kurdish 210.31%
58 Serbo-Croatian 190.28%
59 Malagasy 180.28%
60 Saraiki 17 [lower-alpha 3] 0.26%
61 Nepali 170.25%
62 Sinhala 160.25%
63 Chittagonian 160.24%
64 Zhuang 160.24%
65 Khmer 160.24%
66 Turkmen 160.24%
67 Assamese 150.23%
68 Madurese 150.23%
69 Somali 150.22%
70 Marwari 14 [lower-alpha 2] 0.21%
71 Magahi 14 [lower-alpha 2] 0.21%
72 Haryanvi 14 [lower-alpha 2] 0.21%
73 Hungarian 130.19%
74 Chhattisgarhi 12 [lower-alpha 2] 0.19%
75 Greek 120.18%
76 Chewa 120.17%
77 Deccan 110.17%
78 Akan 110.17%
79 Kazakh 110.17%
80 Northern Min [ disputed ]10.90.16%
81 Sylheti 10.70.16%
82 Zulu 10.40.16%
83 Czech 10.00.15%
84 Kinyarwanda 9.80.15%
85 Dhundhari 9.6 [lower-alpha 2] 0.15%
86 Haitian Creole 9.60.15%
87 Eastern Min (inc. Fuzhou dialect)9.50.14%
88 Ilocano 9.10.14%
89 Quechua 8.90.13%
90 Kirundi 8.80.13%
91 Swedish 8.70.13%
92 Hmong 8.40.13%
93 Shona 8.30.13%
94 Uyghur 8.20.12%
95 Hiligaynon/Ilonggo (Visayan)8.20.12%
96 Mossi 7.60.11%
97 Xhosa 7.60.11%
98 Belarusian 7.6 [lower-alpha 4] 0.11%
99 Balochi 7.60.11%
100 Konkani 7.40.11%

Charts and graphs

See also


  1. Refers to only Modern Standard Hindi here. The Census of India defines Hindi on a loose and broad basis. It does not include the entire Hindustani language, only the Hindi register of it. In addition to Standard Hindi, it incorporates a set of other Indo-Aryan languages written in Devanagari script including Awadhi, Bhojpuri, Haryanvi, Dhundhari etc. under Hindi group which have more than 422 million native speakers as of 2001. [12] However, the census also acknowledges Standard Hindi, the above mentioned languages and others as separate mother tongues of the Hindi language and provides individual figures for all these languages. [12]
  2. 1 2 3 4 5 6 7 8 This is only a fraction of total speakers; others are counted under "Hindi" as they regard their language a Hindi dialect.
  3. Numbers may also be counted in Punjabi above
  4. Only half this many use Belarusian as their home language.

Related Research Articles

Chinese language family of languages

Chinese is a group of languages that forms the Sinitic branch of the Sino-Tibetan languages. Chinese languages are spoken by the ethnic Chinese majority and many minority ethnic groups in China. About 1.2 billion people speak some form of Chinese as their first language.

The term dialect is used in two distinct ways to refer to two different types of linguistic phenomena:

Hindi Indo-Aryan language spoken in India

Hindi or more precisely Modern Standard Hindi, is a standardised and Sanskritised register of the Hindustani language, which itself is based primarily on the Khariboli dialect of Delhi and neighbouring areas of Northern India. Hindi, written in the Devanagari script, is one of the two official languages of the Government of India, along with the English language. It is one of the 22 scheduled languages of the Republic of India.

Urdu National language and lingua franca of Pakistan; one of the official languages of India; standardised register of Hindustani

Urdu – also known as Lashkari – or Modern Standard Urdu is a Persianised standard register of the Hindustani language. Urdu is the official national language, and lingua franca, of Pakistan. In India, it is one of 22 constitutionally recognised official languages, having official status in the five states of Telangana, Uttar Pradesh, Bihar, Jharkhand and West Bengal, as well as the national capital territory of Delhi.

Hindustani language Indo-Aryan language

Hindustani, known in its literary forms as Hindi/Urdu and historically as Urdu, Hindavi, Rekhta, Hindi and Dehlavi, is the lingua franca of Northern India and Pakistan. It is an Indo-Aryan language, deriving its base primarily from the Western Hindi dialect of Delhi, also known as Khariboli. Hindustani is a pluricentric language, with two standardised registers, Modern Standard Hindi and Modern Standard Urdu.

Indo-Aryan languages Language family in the Indian subcontinent

The Indo-Aryan languages, or Indic languages, are a major language family native to northern Indian subcontinent, and presently found all across South Asia. They constitute a branch of the Indo-Iranian languages, itself a branch of the Indo-European language family. In the early 21st century, Indo-Aryan languages were spoken by more than 800 million people, primarily in India, Bangladesh, Nepal, Pakistan and Sri Lanka. Moreover, there are large immigrant and expatriate Indo-Aryan speaking communities in Northwestern Europe, Western Asia, North America and Australia. There are about 219 known Indo-Aryan languages in the world.

Languages of India Languages of a geographic region

Languages spoken in India belong to several language families, the major ones being the Indo-Aryan languages spoken by 78.05% of Indians and the Dravidian languages spoken by 19.64% of Indians. Languages spoken by the remaining 2.31% of the population belong to the Austroasiatic, Sino-Tibetan, Tai-Kadai and a few other minor language families and isolates. India (780) has the world's second highest number of languages, after Papua New Guinea (839).

A dialect continuum or dialect chain is a spread of language varieties spoken across some geographical area such that neighboring varieties differ only slightly, but the differences accumulate over distance so that widely separated varieties may not be mutually intelligible. This is a typical occurrence with widely spread languages and language families around the world; some prominent examples include across large parts of India ; Iran, neighboring Afghanistan, and Tajikistan ; the Arab world ; Portugal, Spain, France, southern Belgium (Wallonia) and southern Italy ; from Flanders to Germany to Austria ; the Chinese languages or dialects; and several divisions of the Slavic languages. Leonard Bloomfield used the name dialect area. Charles F. Hockett used the term L-complex. It is analogous to a ring species in evolutionary biology.

Marwari is a Rajasthani language spoken in the Indian state of Rajasthan. Marwari is also found in the neighbouring state of Gujarat and Haryana, Eastern Pakistan and some migrant communities in himalayan country Nepal. With some 7.9 million or so speakers, it is one of the largest varieties of Rajasthani. Most speakers live in Rajasthan, with a quarter million in Sindh and a tenth that number in Nepal. There are two dozen dialects of Marwari.

In linguistics, mutual intelligibility is a relationship between languages or dialects in which speakers of different but related varieties can readily understand each other without prior familiarity or special effort. It is sometimes used as an important criterion for distinguishing languages from dialects, although sociolinguistic factors are often also used.

A pluricentric language or polycentric language is a language with several interacting codified standard forms, often corresponding to different countries. Examples include Chinese, English, French, German, Korean, Portuguese, Spanish, Swahili and Tamil. The opposite case is a monocentric language, which has only one formally standardized version. Examples include Japanese and Russian. In some cases, the different standards of a pluricentric language may be elaborated until they become autonomous languages, as happened with Malaysian and Indonesian, and with Hindi and Urdu. The same process is under way in Serbo-Croatian.

Hindi Belt Linguistic region within India where Hindi dialects are spoken

The Hindi Belt or Hindi Heartland or Hindi Patti, is a linguistic region encompassing parts of northern, central, eastern and western India where Hindi is widely spoken. Hindi belt is sometimes also used to refer to nine Indian states whose official language is Hindi and have a Hindi-speaking majority, namely Bihar, Chhattisgarh, Haryana, Himachal Pradesh, Jharkhand, Madhya Pradesh, Rajasthan, Uttarakhand, Uttar Pradesh and National Capital Territory of Delhi. It is also referred as Hindi-Urdu belt by some Muslim writers.

Linguistic demography is the statistical study of languages among all populations. Estimating the number of speakers of a given language is not straightforward, and various estimates may diverge considerably. This is first of all due to the question of defining "language" vs. "dialect". Identification of varieties as a single language or as distinct languages is often based on ethnic, cultural, or political considerations rather than mutual intelligibility. The second difficulty is multilingualism, complicating the definition of "native language". Finally, in many countries, insufficient census data add to the difficulties.

The Persian language historically influenced many of the modern languages and dialects of the Middle East, Central Asia, Eastern Europe, and South Asia including the standard register Urdu, the national language of Pakistan.

Languages of Nigeria languages of a geographic region

There are over 500 native languages spoken in Nigeria. The official language of Nigeria is English, the former language of colonial British Nigeria. As reported in 2003, Nigerian English and Nigerian Pidgin were spoken as a second language by 60 million people in Nigeria. Communication in the English language is much more popular in the country's urban communities than it is in the rural areas, due to globalization.

Koiné language language variety that has arisen as a result of contact between two or more mutually intelligible dialects of the same language

In linguistics, a koiné language, koiné dialect, or simply koiné is a standard or common language or dialect that has arisen to prestige or dominance as a result of the contact, mixing, and often simplifying of two or more mutually intelligible varieties of the same language.

A world language is one that is spoken internationally and learned and spoken by numerous people as a second language. A world language is characterized not only by the total number of speakers but also by geographical distribution and its use in international organizations and diplomatic relations.

Languages of Uttar Pradesh Languages of uttar pradesh

The languages of Uttar Pradesh generally belong to two zones in the Indo-Aryan languages, Central and East. Hindi is the state's official language, and according to census data, it is spoken by 91.32% of the population. However, Hindi is a wide label that covers many dialects, which may or not be considered separate languages and may or may not be fully mutually intelligible. These include Awadhi, Braj Bhasha, Bundeli, Bagheli, Kannauji, Hindustani and Bhojpuri. Bhojpuri belongs to the Bihari languages of the Eastern zone, and its status as a Hindi language is subject to debate.


  1. 1 2 Paolillo, John C.; Das, Anupam (31 March 2006). "Evaluating language statistics: the Ethnologue and beyond" (PDF). UNESCO Institute of Statistics. pp. 3–5. Retrieved 17 November 2018.
  2. Chambers, J.K.; Trudgill, Peter (1998). Dialectology (2nd ed.). Cambridge University Press. ISBN   978-0-521-59646-6.
  3. Kaye, Alan S.; Rosenhouse, Judith (1997). "Arabic Dialects and Maltese". In Hetzron, Robert (ed.). The Semitic Languages. Routledge. pp. 263–311. ISBN   978-0-415-05767-7.
  4. Norman, Jerry (1988). Chinese. Cambridge University Press. p. 2. ISBN   978-0-521-29653-3.
  5. Norman, Jerry (2003). "The Chinese dialects: phonology". In Thurgood, Graham; LaPolla, Randy J. (eds.). The Sino-Tibetan languages. Routledge. pp. 72–83. ISBN   978-0-7007-1129-1.
  6. Crystal, David (1988). The Cambridge Encyclopedia of Language . Cambridge University Press. pp.  286–287. ISBN   978-0-521-26438-9.
  7. 1 2 "Summary by language size". Ethnologue. Retrieved 12 March 2019. For items below #26, see individual Ethnologue entry for each language.
  8. "World Population Clock: 7.7 Billion People (2019) - Worldometers". Retrieved 31 March 2019.
  9. 1 2 Hindi and Urdu are often classified as standardized registers of a single Hindustani language.
  10. 1 2 Defined at the national border rather than by language
  11. 1 2 Mikael Parkvall, "Världens 100 största språk 2007" (The World's 100 Largest Languages in 2007), in Nationalencyklopedin . Asterisks mark the 2010 estimates for the top dozen languages.
  12. 1 2 Abstract of speakers' strength of languages and mother tongues – 2000, Census of India, 2001
  13. Summary by language size