List of languages by number of native speakers

Last updated

Current distribution of human language families Human Language Families Updated.jpg
Current distribution of human language families

Human languages ranked by their number of native speakers are as follows. All such rankings should be used with caution, because it is not possible to devise a coherent set of linguistic criteria for distinguishing languages in a dialect continuum. [1] For example, a language is often defined as a set of mutually intelligible varieties, but independent national standard languages may be considered separate languages even though they are largely mutually intelligible, as in the case of Danish and Norwegian. [2] Conversely, many commonly accepted languages, including German, Italian and even English encompass varieties that are not mutually intelligible. [1] While Arabic is sometimes considered a single language centred on Modern Standard Arabic, other authors consider its mutually unintelligible varieties separate languages. [3] Similarly, Chinese is sometimes viewed as a single language because of a shared culture and common literary language. [4] It is also common to describe various Chinese dialect groups, such as Mandarin, Wu and Yue, as languages, even though each of these groups contains many mutually unintelligible varieties. [5]

Contents

There are also difficulties in obtaining reliable counts of speakers, which vary over time because of population change and language shift. In some areas, there is no reliable census data, the data is not current, or the census may not record languages spoken, or record them ambiguously. Sometimes speaker populations are exaggerated for political reasons, or speakers of minority languages may be underreported in favour of a national language. [6]

Top languages by population

Ethnologue (2024)

The following languages are listed as having at least 50 million first-language speakers in the 27th edition of Ethnologue published in 2024. [7] This section does not include entries that Ethnologue identifies as macrolanguages encompassing all their respective varieties, such as Arabic, Lahnda, Persian, Malay, Pashto, and Chinese.

Languages with at least 50 million first-language speakers [7]
LanguageNative speakers
(in millions)
Language familyBranch
Mandarin Chinese 941 Sino-Tibetan Sinitic
Spanish 486 Indo-European Romance
English 380 Indo-European Germanic
Hindi 345 Indo-European Indo-Aryan
Bengali 237 Indo-European Indo-Aryan
Portuguese 236 Indo-European Romance
Russian 148 Indo-European Balto-Slavic
Japanese 123 Japonic Japanese
Yue Chinese 86 Sino-Tibetan Sinitic
Vietnamese 85 Austroasiatic Vietic
Turkish 84 Turkic Oghuz
Wu Chinese 83 Sino-Tibetan Sinitic
Marathi 83 Indo-European Indo-Aryan
Telugu 83 Dravidian South-Central
Western Punjabi 82 Indo-European Indo-Aryan
Korean 81 Koreanic
Tamil 79 Dravidian South
Egyptian Arabic 78 Afroasiatic Semitic
Standard German 76 Indo-European Germanic
French 74 Indo-European Romance
Urdu 70 Indo-European Indo-Aryan
Javanese 68 Austronesian Malayo-Polynesian
Italian 64 Indo-European Romance
Iranian Persian 62 Indo-European Iranian
Gujarati 58 Indo-European Indo-Aryan
Hausa 54 Afroasiatic Chadic
Bhojpuri 53 Indo-European Indo-Aryan
Levantine Arabic 51 Afroasiatic Semitic
Southern Min 51 Sino-Tibetan Sinitic

CIA World Factbook (2018 estimates)

According to the CIA World Factbook , the most-spoken first languages in 2018 were: [8]

Top first languages by population per CIA [8]
RankLanguagePercentage
of world
population
(2018)
1 Mandarin Chinese 12.3%
2 Spanish 6.0%
3 English 5.1%
3 Arabic 5.1%
5 Hindi 3.5%
6 Bengali 3.3%
7 Portuguese 3.0%
8 Russian 2.1%
9 Japanese 1.7%
10 Western Punjabi 1.3%
11 Javanese 1.1%

See also

Related Research Articles

<span class="mw-page-title-main">Arabic</span> Semitic language and lingua franca of the Arab world

Arabic is a Central Semitic language of the Afroasiatic language family spoken primarily in the Arab world. The ISO assigns language codes to 32 varieties of Arabic, including its standard form of Literary Arabic, known as Modern Standard Arabic, which is derived from Classical Arabic. This distinction exists primarily among Western linguists; Arabic speakers themselves generally do not distinguish between Modern Standard Arabic and Classical Arabic, but rather refer to both as al-ʿarabiyyatu l-fuṣḥā or simply al-fuṣḥā (اَلْفُصْحَىٰ).

<span class="mw-page-title-main">Chinese language</span> National language of China

Chinese is a group of languages spoken natively by the ethnic Han Chinese majority and many minority ethnic groups in China. Approximately 1.35 billion people, or 17% of the global population, speak a variety of Chinese as their first language.

Dialect refers to two distinctly different types of linguistic relationships.

Ethnologue: Languages of the World is an annual reference publication in print and online that provides statistics and other information on the living languages of the world. It is the world's most comprehensive catalogue of languages. It was first issued in 1951, and is now published by SIL International, an American evangelical Christian non-profit organization.

<span class="mw-page-title-main">Hakka Chinese</span> Sinitic language originating in southern China

Hakka forms a language group of varieties of Chinese, spoken natively by the Hakka people in parts of Southern China and some diaspora areas of Taiwan, Southeast Asia and in overseas Chinese communities around the world.

Arvanitika, also known as Arvanitic, is the variety of Albanian traditionally spoken by the Arvanites, a population group in Greece. Arvanitika was brought to southern Greece during the late Middle Ages by Albanian settlers who moved south from their homeland in present-day Albania in several waves. The dialect preserves elements of medieval Albanian, while also being significantly influenced by the Greek language. Arvanitika is today endangered, as its speakers have been shifting to the use of Greek and most younger members of the community no longer speak it.

<span class="mw-page-title-main">Southern Min</span> Branch of the Min Chinese languages

Southern Min, Minnan or Banlam, is a group of linguistically similar and historically related Chinese languages that form a branch of Min Chinese spoken in Fujian, most of Taiwan, Eastern Guangdong, Hainan, and Southern Zhejiang. Southern Min dialects are also spoken by descendants of emigrants from these areas in diaspora, most notably in Southeast Asia, such as Singapore, Malaysia, the Philippines, Indonesia, Brunei, Southern Thailand, Myanmar, Cambodia, Southern and Central Vietnam, San Francisco, Los Angeles and New York City. Minnan is the most widely-spoken branch of Min, with approximately 48 million speakers as of 2017–2018.

<span class="mw-page-title-main">Varieties of Chinese</span> Family of local language varieties

There are hundreds of local Chinese language varieties forming a branch of the Sino-Tibetan language family, many of which are not mutually intelligible. Variation is particularly strong in the more mountainous southeast part of mainland China. The varieties are typically classified into several groups: Mandarin, Wu, Min, Xiang, Gan, Jin, Hakka and Yue, though some varieties remain unclassified. These groups are neither clades nor individual languages defined by mutual intelligibility, but reflect common phonological developments from Middle Chinese.

A dialect continuum or dialect chain is a series of language varieties spoken across some geographical area such that neighboring varieties are mutually intelligible, but the differences accumulate over distance so that widely separated varieties may not be. This is a typical occurrence with widely spread languages and language families around the world, when these languages did not spread recently. Some prominent examples include the Indo-Aryan languages across large parts of India, varieties of Arabic across north Africa and southwest Asia, the Turkic languages, the Chinese languages or dialects, and parts of the Romance, Germanic and Slavic families in Europe. Terms used in older literature include dialect area and L-complex.

<span class="mw-page-title-main">Marwari language</span> Indo-Aryan language

Marwari is a language within the Rajasthani language family of the Indo-Aryan languages. Marwari and its closely related varieties like Dhundhari, Shekhawati and Mewari form a part of the broader Marwari language family. It is spoken in the Indian state of Rajasthan, as well as the neighbouring states of Gujarat and Haryana, some adjacent areas in eastern parts of Pakistan, and some migrant communities in Nepal. There are two dozen varieties of Marwari. Marwari is also referred to as simply Rajasthani.

Literary language is the form (register) of a language used when writing in a formal, academic, or particularly polite tone; when speaking or writing in such a tone, it can also be known as formal language. It may be the standardized variety of a language. It can sometimes differ noticeably from the various spoken lects, but the difference between literary and non-literary forms is greater in some languages than in others. If there is a strong divergence between a written form and the spoken vernacular, the language is said to exhibit diglossia.

<span class="mw-page-title-main">Mutual intelligibility</span> Closeness of linguistic varieties

In linguistics, mutual intelligibility is a relationship between languages or dialects in which speakers of different but related varieties can readily understand each other without prior familiarity or special effort. It is sometimes used as an important criterion for distinguishing languages from dialects, although sociolinguistic factors are often also used.

A pluricentric language or polycentric language is a language with several codified standard forms, often corresponding to different countries. Many examples of such languages can be found worldwide among the most-spoken languages, including but not limited to Chinese in mainland China, Taiwan and Singapore; English in the United States, United Kingdom, Canada, Australia, New Zealand, Ireland, South Africa, India, and elsewhere; and French in France, Canada, and elsewhere. The converse case is a monocentric language, which has only one formally standardized version. Examples include Japanese and Russian. In some cases, the different standards of a pluricentric language may be elaborated to appear as separate languages, e.g. Malaysian and Indonesian, Hindi and Urdu, while Serbo-Croatian is in an earlier stage of that process.

Autonomy and heteronomy are complementary attributes of a language variety describing its functional relationship with related varieties. The concepts were introduced by William A. Stewart in 1968, and provide a way of distinguishing a language from a dialect.

Linguistic demography is the statistical study of languages among all populations. Estimating the number of speakers of a given language is not straightforward, and various estimates may diverge considerably. This is first of all due to the question of defining "language" vs. "dialect". Identification of varieties as a single language or as distinct languages is often based on ethnic, cultural, or political considerations rather than mutual intelligibility. The second difficulty is multilingualism, complicating the definition of "native language". Finally, in many countries, insufficient census data add to the difficulties.

The Fangyan is a Chinese dictionary compiled in the early 1st century CE by the poet and philosopher Yang Xiong. It was the first Chinese dictionary to include significant regional vocabulary, and is considered the "most significant lexicographic work" of its era. His dictionary's preface explains how he spent 27 years amassing and collating the dictionary. Yang collected regionalisms from many sources, particularly the 'light carriage' surveys made during the Zhou and Qin dynasties, where imperial emissaries were sent into the countryside annually to record folk songs and idioms from across China, reaching as far north as Korea.

<span class="mw-page-title-main">Central Asian Arabic</span> Endangered Semitic language of Central Asia

Central Asian Arabic or Jugari Arabic refers to a set of four closely-related varieties of Arabic currently facing extinction and spoken predominantly by Arab communities living in portions of Central Asia. These varieties are Bactrian Arabic, Bukhara Arabic, Qashqa Darya Arabic, and Khorasani Arabic.

Rawang, also known as Krangku, Kiutze (Qiuze), and Ch’opa, is a Sino-Tibetan language of India and Burma. Rawang has a high degree of internal diversity, and some varieties are not mutually intelligible. Most, however, understand Mutwang (Matwang), the standard dialect, and basis of written Rawang.

References

  1. 1 2 Paolillo, John C.; Das, Anupam (31 March 2006). "Evaluating language statistics: the Ethnologue and beyond" (PDF). UNESCO Institute of Statistics. pp. 3–5. Archived (PDF) from the original on 10 January 2017. Retrieved 17 November 2018.
  2. Chambers, J.K.; Trudgill, Peter (1998). Dialectology (2nd ed.). Cambridge University Press. ISBN   978-0-521-59646-6.
  3. Kaye, Alan S.; Rosenhouse, Judith (1997). "Arabic Dialects and Maltese". In Hetzron, Robert (ed.). The Semitic Languages. Routledge. pp. 263–311. ISBN   978-0-415-05767-7.
  4. Norman, Jerry (1988). Chinese. Cambridge University Press. p. 2. ISBN   978-0-521-29653-3.
  5. Norman, Jerry (2003). "The Chinese dialects: phonology". In Thurgood, Graham; LaPolla, Randy J. (eds.). The Sino-Tibetan languages . Routledge. pp.  72–83. ISBN   978-0-7007-1129-1.
  6. Crystal, David (1988). The Cambridge Encyclopedia of Language . Cambridge University Press. pp.  286–287. ISBN   978-0-521-26438-9.
  7. 1 2 Statistics, in Eberhard, David M.; Simons, Gary F.; Fennig, Charles D., eds. (2024). Ethnologue: Languages of the World (27th ed.). Dallas, Texas: SIL International.
  8. 1 2 "The World Factbook. People and Society. Languages". The World Factbook . Central Intelligence Agency. 29 November 2023. Retrieved 30 November 2023.