List of languages by total number of speakers

Last updated

Principal language families of the world (and in some cases geographic groups of families). For greater detail, see Distribution of languages in the world. Primary Human Languages Improved Version.png
Principal language families of the world (and in some cases geographic groups of families). For greater detail, see Distribution of languages in the world .

This is a list of languages by total number of speakers.

Contents

It is difficult to define what constitutes a language as opposed to a dialect. For example, Chinese and Arabic are sometimes considered single languages, but each includes several mutually unintelligible varieties, and so they are sometimes considered language families instead. Conversely, colloquial registers of Hindi and Urdu are almost completely mutually intelligible, and are sometimes classified as one language, Hindustani. Such rankings should be used with caution, because it is not possible to devise a coherent set of linguistic criteria for distinguishing languages in a dialect continuum. [1]

There is no single criterion for how much knowledge is sufficient to be counted as a second-language speaker. For example, English has about 450 million native speakers but, depending on the criterion chosen, can be said to have as many as two billion speakers. [2]

There are also difficulties in obtaining reliable counts of speakers, which vary over time because of population change and language shift. In some areas, there is no reliable census data, the data is not current, or the census may not record languages spoken, or record them ambiguously. Sometimes speaker populations are exaggerated for political reasons, or speakers of minority languages may be underreported in favor of a national language. [3]

Ethnologue (2024)

Ethnologue lists the following languages as having 50 million or more total speakers. [4] This section does not include entries that Ethnologue identifies as macrolanguages encompassing several varieties, such as Arabic, Lahnda, Persian, Malay, Pashto, and Chinese.

Most spoken languages, Ethnologue, 2024 [4]
Language Family Branch First-language
(L1) speakers
Second-language
(L2) speakers
Total speakers
(L1+L2)
English
(excl. creole languages)
Indo-European Germanic 380 million1.135 billion1.515 billion
Mandarin Chinese
(incl. Standard Chinese, but excl. other varieties)
Sino-Tibetan Sinitic 941 million199 million1.140 billion
Hindi
(excl. Urdu)
Indo-European Indo-Aryan 345 million264 million609 million
Spanish
(excl. creole languages)
Indo-European Romance 486 million74 million560 million
Modern Standard Arabic
(excl. dialects)
Afro-Asiatic Semitic 0 [lower-alpha 1] 332 million332 million
French
(excl. creole languages)
Indo-European Romance 74 million238 million312 million
Bengali Indo-European Indo-Aryan 237 million41 million278 million
Portuguese
(excl. creole languages)
Indo-European Romance 236 million27 million264 million
Russian Indo-European Balto-Slavic 148 million108 million255 million
Urdu
(excl. Hindi)
Indo-European Indo-Aryan 70 million168 million238 million
Indonesian
(excl. other Malay)
Austronesian Malayo-Polynesian 44 million155 million199 million
Standard German Indo-European Germanic 76 million58 million134 million
Japanese Japonic 123 million<1 million123 million
Nigerian Pidgin English Creole Krio 5 million116 million121 million
Egyptian Arabic
(excl. other Arabic dialects)
Afro-Asiatic Semitic 78 million25 million103 million
Marathi Indo-European Indo-Aryan 83 million16 million99 million
Telugu Dravidian South-Central83 million13 million96 million
Turkish Turkic Oghuz 84 million6 million90 million
Hausa Afro-Asiatic Chadic 54 million34 million88 million
Tamil Dravidian Southern79 million8 million87 million
Yue Chinese
(incl. Cantonese)
Sino-Tibetan Sinitic 86 million1 million87 million
Swahili Niger–Congo Bantu 3 million83 million87 million
Vietnamese Austroasiatic Vietic 85 million1 million86 million
Wu Chinese
(incl. Shanghainese)
Sino-Tibetan Sinitic 83 million<1 million83 million
Tagalog [lower-alpha 2] Austronesian Malayo-Polynesian 29 million54 million83 million
Western Punjabi
(excl. Eastern Punjabi)
Indo-European Indo-Aryan 82 million
Korean Koreanic 82 million<1 million81 million
Iranian Persian
(excl. other Persian dialects)
Indo-European Iranian 62 million17 million78 million
Javanese Austronesian Malayo-Polynesian 68 million
Italian Indo-European Romance 64 million3 million67 million
Gujarati Indo-European Indo-Aryan 58 million5 million63 million
Thai Kra–Dai Zhuang–Tai 21 million40 million61 million
Amharic Afro-Asiatic Semitic 35 million25 million60 million
Kannada Dravidian Southern44 million15 million59 million
Levantine Arabic
(excl. other Arabic dialects)
Afro-Asiatic Semitic 51 million2 million54 million
Bhojpuri Indo-European Indo-Aryan 53 million<1 million53 million
Min Nan Chinese
(incl. Hokkien)
Sino-Tibetan Sinitic 51 million<1 million51 million

The World Factbook (2022)

The World Factbook , produced by the Central Intelligence Agency (CIA), estimates the ten most spoken languages (L1 + L2) in 2022 as follows: [6]

Most spoken languages, CIA, 2022 [6]
LanguagePercentage of world population (2022)
English 18.8%
Mandarin Chinese 13.8%
Hindi 7.5%
Spanish 6.9%
French 3.4%
Arabic 3.4%
Bengali 3.4%
Russian 3.2%
Portuguese 3.2%
Urdu 2.9%

See also

Explanatory notes

  1. Modern Standard Arabic (MSA) is not an L1. Arabic speakers first learn their respective local dialect. MSA is acquired through formal education. [5]
  2. Tagalog and Filipino are defined as two different languages in the ISO 639 standard. Ethnologue considers that Filipino is a standardized variety of the Tagalog language with no speakers.

Related Research Articles

<span class="mw-page-title-main">Arabic</span> Semitic language and lingua franca of the Arab world

Arabic is a Central Semitic language of the Afroasiatic language family spoken primarily in the Arab world. The ISO assigns language codes to 32 varieties of Arabic, including its standard form of Literary Arabic, known as Modern Standard Arabic, which is derived from Classical Arabic. This distinction exists primarily among Western linguists; Arabic speakers themselves generally do not distinguish between Modern Standard Arabic and Classical Arabic, but rather refer to both as al-ʿarabiyyatu l-fuṣḥā or simply al-fuṣḥā (اَلْفُصْحَىٰ).

Dialect refers to two distinctly different types of linguistic relationships.

<span class="mw-page-title-main">Languages of Europe</span>

There are over 250 languages indigenous to Europe, and most belong to the Indo-European language family. Out of a total European population of 744 million as of 2018, some 94% are native speakers of an Indo-European language. The three largest phyla of the Indo-European language family in Europe are Romance, Germanic, and Slavic; they have more than 200 million speakers each, and together account for close to 90% of Europeans.

<span class="mw-page-title-main">Demographics of the United Arab Emirates</span>

Demographic features of the United Arab Emirates (UAE) include population density, vital statistics, immigration and emigration data, ethnicity, education levels, religions practiced, and languages spoken within the UAE.

<span class="mw-page-title-main">Kazakh language</span> Turkic language mostly spoken in Kazakhstan

Kazakh or Qazaq is a Turkic language of the Kipchak branch spoken in Central Asia by Kazakhs. It is closely related to Nogai, Kyrgyz and Karakalpak. It is the official language of Kazakhstan, and has official status in the Altai Republic of Russia. It is also a significant minority language in the Ili Kazakh Autonomous Prefecture in Xinjiang, China, and in the Bayan-Ölgii Province of western Mongolia. The language is also spoken by many ethnic Kazakhs throughout the former Soviet Union, Germany, and Turkey.

<span class="mw-page-title-main">Malay language</span> Austronesian language of Southeast Asia

Malay is an Austronesian language that is an official language of Brunei, Indonesia, Malaysia, and Singapore, and that is also spoken in East Timor and parts of Thailand. Altogether, it is spoken by 290 million people across Maritime Southeast Asia.

<span class="mw-page-title-main">Cebuano language</span> Austronesian language spoken in the Philippines

Cebuano is an Austronesian language spoken in the southern Philippines. It is natively, though informally, called by its generic term Bisayâ or Binisayâ and sometimes referred to in English sources as Cebuan. It is spoken by the Visayan ethnolinguistic groups native to the islands of Cebu, Bohol, Siquijor, the eastern half of Negros, the western half of Leyte, and the northern coastal areas of Northern Mindanao and the eastern part of Zamboanga del Norte due to Spanish settlements during the 18th century. In modern times, it has also spread to the Davao Region, Cotabato, Camiguin, parts of the Dinagat Islands, and the lowland regions of Caraga, often displacing native languages in those areas.

<span class="mw-page-title-main">Languages of the Philippines</span>

There are some 130 to 195 languages spoken in the Philippines, depending on the method of classification. Almost all are Malayo-Polynesian languages native to the archipelago. A number of Spanish-influenced creole varieties generally called Chavacano along with some local varieties of Chinese are also spoken in certain communities. The 1987 constitution designates Filipino, a standardized version of Tagalog, as the national language and an official language along with English. Filipino is regulated by Commission on the Filipino Language and serves as a lingua franca used by Filipinos of various ethnolinguistic backgrounds.

Literary language is the form (register) of a language used when writing in a formal, academic, or particularly polite tone; when speaking or writing in such a tone, it can also be known as formal language. It may be the standardized variety of a language. It can sometimes differ noticeably from the various spoken lects, but the difference between literary and non-literary forms is greater in some languages than in others. If there is a strong divergence between a written form and the spoken vernacular, the language is said to exhibit diglossia.

<span class="mw-page-title-main">Mutual intelligibility</span> Closeness of linguistic varieties

In linguistics, mutual intelligibility is a relationship between different but related language varieties in which speakers of the different varieties can readily understand each other without prior familiarity or special effort. Mutual intelligibility is sometimes used to distinguish languages from dialects, although sociolinguistic factors are often also used.

A pluricentric language or polycentric language is a language with several codified standard forms, often corresponding to different countries. Many examples of such languages can be found worldwide among the most-spoken languages, including but not limited to Chinese in the People's Republic of China, Taiwan and Singapore; English in the United States, United Kingdom, Canada, Australia, New Zealand, Ireland, South Africa, India, and elsewhere; and French in France, Canada, and elsewhere. The converse case is a monocentric language, which has only one formally standardized version. Examples include Japanese and Russian. In some cases, the different standards of a pluricentric language may be elaborated to appear as separate languages, e.g. Malaysian and Indonesian, Hindi and Urdu, while Serbo-Croatian is in an earlier stage of that process.

<span class="mw-page-title-main">Modern Standard Arabic</span> Formal literary variety of Arabic

Modern Standard Arabic (MSA) or Modern Written Arabic (MWA) is the variety of standardized, literary Arabic that developed in the Arab world in the late 19th and early 20th centuries, and in some usages also the variety of spoken Arabic that approximates this written standard. MSA is the language used in literature, academia, print and mass media, law and legislation, though it is generally not spoken as a first language, similar to Contemporary Latin. It is a pluricentric standard language taught throughout the Arab world in formal education, differing significantly from many vernacular varieties of Arabic that are commonly spoken as mother tongues in the area; these are only partially mutually intelligible with both MSA and with each other depending on their proximity in the Arabic dialect continuum.

Linguistic demography is the statistical study of languages among all populations. Estimating the number of speakers of a given language is not straightforward, and various estimates may diverge considerably. This is first of all due to the question of defining "language" vs. "dialect". Identification of varieties as a single language or as distinct languages is often based on ethnic, cultural, or political considerations rather than mutual intelligibility. The second difficulty is multilingualism, complicating the definition of "native language". Finally, in many countries, insufficient census data add to the difficulties.

<span class="mw-page-title-main">Languages of Eritrea</span>

The main languages spoken in Eritrea are Tigrinya, Tigre, Kunama, Bilen, Nara, Saho, Afar, and Beja. The country's working languages are Tigrinya, Arabic, English, and formely Italian.

<span class="mw-page-title-main">Languages of Afghanistan</span>

Afghanistan is a linguistically diverse nation, with upwards of 40 distinct languages. However, Dari and Pashto are two of the most prominent languages in the country, and have shared official status under various governments of Afghanistan. Dari, as a shared language between multiple ethnic groups in the country, has served as a historical lingua franca between different linguistic groups in the region and is the most widely understood language in the country. Pashto is also widely spoken in the region; but the language does not have a diverse multi-ethnic population like Persian, and the language is not as commonly spoken by non-Pashtuns. Persian and Pashto are also "relatives", as both are Iranian languages.

<span class="mw-page-title-main">Languages of Greece</span>

The official language of Greece is Greek, spoken by 99% of the population. In addition, a number of non-official, minority languages and some Greek dialects are spoken as well. The most common foreign languages learned by Greeks are English, German, French and Italian.

<span class="mw-page-title-main">Languages of Morocco</span>

Arabic, particularly the Moroccan Arabic dialect, is the most widely spoken language in Morocco, but a number of regional and foreign languages are also spoken. The official languages of Morocco are Modern Standard Arabic and Standard Moroccan Berber. Moroccan Arabic is by far the primary spoken vernacular and lingua franca, whereas Berber languages serve as vernaculars for significant portions of the country. The languages of prestige in Morocco are Arabic in its Classical and Modern Standard Forms and sometimes French, the latter of which serves as a second language for approximately 33% of Moroccans. According to a 2000–2002 survey done by Moha Ennaji, author of Multilingualism, Cultural Identity, and Education in Morocco, "there is a general agreement that Standard Arabic, Moroccan Arabic, and Berber are the national languages." Ennaji also concluded "This survey confirms the idea that multilingualism in Morocco is a vivid sociolinguistic phenomenon, which is favored by many people."

<span class="mw-page-title-main">Varieties of Arabic</span> Family of dialects/variants of Arabic language

Varieties of Arabic are the linguistic systems that Arabic speakers speak natively. Arabic is a Semitic language within the Afroasiatic family that originated in the Arabian Peninsula. There are considerable variations from region to region, with degrees of mutual intelligibility that are often related to geographical distance and some that are mutually unintelligible. Many aspects of the variability attested to in these modern variants can be found in the ancient Arabic dialects in the peninsula. Likewise, many of the features that characterize the various modern variants can be attributed to the original settler dialects as well as local native languages and dialects. Some organizations, such as SIL International, consider these approximately 30 different varieties to be separate languages, while others, such as the Library of Congress, consider them all to be dialects of Arabic.

<span class="mw-page-title-main">Jordanian Arabic</span> Variety of Levantine Arabic spoken in the Kingdom of Jordan

Jordanian Arabic is a dialect continuum of mutually intelligible varieties of Arabic spoken by the population of the Hashemite Kingdom of Jordan.

References

  1. Paolillo, John C.; Das, Anupam (31 March 2006). "Evaluating language statistics: the Ethnologue and beyond" (PDF). UNESCO Institute of Statistics. pp. 3–5. Retrieved 17 November 2018.
  2. Crystal, David (March 2008). "Two thousand million?". English Today. 24: 3–6. doi: 10.1017/S0266078408000023 . S2CID   145597019.
  3. Crystal, David (1988). The Cambridge Encyclopedia of Language . Cambridge University Press. pp.  286–287. ISBN   978-0-521-26438-9.
  4. 1 2 "What are the top 200 most spoken languages?". Ethnologue. 2024. Retrieved 2024-08-15.
  5. Arabic, Standard at Ethnologue (27th ed., 2024) Closed Access logo transparent.svg
  6. 1 2 "Most spoken languages in the World". The World Factbook . CIA . Retrieved 2022-01-01.