Human languages ranked by their number of native speakers are as follows. All such rankings should be used with caution, because it is not possible to devise a coherent set of linguistic criteria for distinguishing languages in a dialect continuum. [1] For example, a language is often defined as a set of mutually intelligible varieties, but independent national standard languages may be considered separate languages even though they are largely mutually intelligible, as in the case of Danish and Norwegian. [2] Conversely, many commonly accepted languages, including German, Italian and English, encompass varieties that are not mutually intelligible. [1] While Arabic is sometimes considered a single language centred on Modern Standard Arabic, other authors consider its mutually unintelligible varieties separate languages. [3] Similarly, Chinese is sometimes viewed as a single language because of a shared culture and common literary language. [4] It is also common to describe various Chinese dialect groups, such as Mandarin, Wu and Yue, as languages, even though each of these groups contains many mutually unintelligible varieties. [5]
There are also difficulties in obtaining reliable counts of speakers, which vary over time because of population change and language shift. In some areas, there is no reliable census data, the data is not current, or the census may not record languages spoken, or record them ambiguously. Sometimes speaker populations are exaggerated for political reasons, or speakers of minority languages may be underreported in favour of a national language. [6]
The following languages are listed as having at least 50 million first-language speakers in the 27th edition of Ethnologue published in 2024. [7] This section does not include entries that Ethnologue identifies as macrolanguages encompassing all their respective varieties, such as Arabic, Lahnda, Persian, Malay, Pashto, and Chinese.
According to the CIA World Factbook , the most-spoken first languages in 2018 were: [8]
Rank | Language | Percentage of world population (2018) |
---|---|---|
1 | Mandarin Chinese | 12.3% |
2 | Spanish | 6.0% |
3 | English | 5.1% |
3 | Arabic | 5.1% |
5 | Hindi | 3.5% |
6 | Bengali | 3.3% |
7 | Portuguese | 3.0% |
8 | Russian | 2.1% |
9 | Japanese | 1.7% |
10 | Western Punjabi | 1.3% |
11 | Javanese | 1.1% |
Arabic is a Central Semitic language of the Afroasiatic language family spoken primarily in the Arab world. The ISO assigns language codes to 32 varieties of Arabic, including its standard form of Literary Arabic, known as Modern Standard Arabic, which is derived from Classical Arabic. This distinction exists primarily among Western linguists; Arabic speakers themselves generally do not distinguish between Modern Standard Arabic and Classical Arabic, but rather refer to both as al-ʿarabiyyatu l-fuṣḥā or simply al-fuṣḥā (اَلْفُصْحَىٰ).
Chinese is a group of languages spoken natively by the ethnic Han Chinese majority and many minority ethnic groups in China, as well as by various communities of the Chinese diaspora. Approximately 1.35 billion people, or 17% of the global population, speak a variety of Chinese as their first language.
A dialect is a variety of language spoken by a particular group of people. It can also refer to a language subordinate in status to a dominant language, and is sometimes used to mean a vernacular language.
Ethnologue: Languages of the World is an annual reference publication in print and online that provides statistics and other information on the living languages of the world. It is the world's most comprehensive catalogue of languages. It was first issued in 1951, and is now published by SIL International, an American evangelical Christian non-profit organization.
Hakka forms a language group of varieties of Chinese, spoken natively by the Hakka people in parts of Southern China, Taiwan, some diaspora areas of Southeast Asia and in overseas Chinese communities around the world.
Yue is a branch of the Sinitic languages primarily spoken in Southern China, particularly in the provinces of Guangdong and Guangxi.
Southern Min, Minnan or Banlam, is a group of linguistically similar and historically related Chinese languages that form a branch of Min Chinese spoken in Fujian, most of Taiwan, Eastern Guangdong, Hainan, and Southern Zhejiang. Southern Min dialects are also spoken by descendants of emigrants from these areas in diaspora, most notably in Southeast Asia, such as Singapore, Malaysia, the Philippines, Indonesia, Brunei, Southern Thailand, Myanmar, Cambodia, Southern and Central Vietnam, as well as major cities in the United States, including San Francisco, Los Angeles and New York City. Minnan is the most widely-spoken branch of Min, with approximately 48 million speakers as of 2017–2018.
There are hundreds of local Chinese language varieties forming a branch of the Sino-Tibetan language family, many of which are not mutually intelligible. Variation is particularly strong in the more mountainous southeast part of mainland China. The varieties are typically classified into several groups: Mandarin, Wu, Min, Xiang, Gan, Jin, Hakka and Yue, though some varieties remain unclassified. These groups are neither clades nor individual languages defined by mutual intelligibility, but reflect common phonological developments from Middle Chinese.
A dialect continuum or dialect chain is a series of language varieties spoken across some geographical area such that neighboring varieties are mutually intelligible, but the differences accumulate over distance so that widely separated varieties may not be. This is a typical occurrence with widely spread languages and language families around the world, when these languages did not spread recently. Some prominent examples include the Indo-Aryan languages across large parts of India, varieties of Arabic across north Africa and southwest Asia, the Turkic languages, the varieties of Chinese, and parts of the Romance, Germanic and Slavic families in Europe. Terms used in older literature include dialect area and L-complex.
Marwari is a language within the Rajasthani language family of the Indo-Aryan languages. Marwari and its closely related varieties like Dhundhari, Shekhawati and Mewari form a part of the broader Marwari language family. It is spoken in the Indian state of Rajasthan, as well as the neighbouring states of Gujarat and Haryana, some adjacent areas in eastern parts of Pakistan, and some migrant communities in Nepal. There are two dozen varieties of Marwari.
In linguistics, mutual intelligibility is a relationship between different but related language varieties in which speakers of the different varieties can readily understand each other without prior familiarity or special effort. Mutual intelligibility is sometimes used to distinguish languages from dialects, although sociolinguistic factors are often also used.
A pluricentric language or polycentric language is a language with several codified standard forms, often corresponding to different countries. Many examples of such languages can be found worldwide among the most-spoken languages, including but not limited to Chinese in the People's Republic of China, Taiwan and Singapore; English in the United States, United Kingdom, Canada, Australia, New Zealand, Ireland, South Africa, India, and elsewhere; and French in France, Canada, and elsewhere. The converse case is a monocentric language, which has only one formally standardized version. Examples include Japanese and Russian. In some cases, the different standards of a pluricentric language may be elaborated to appear as separate languages, e.g. Malaysian and Indonesian, Hindi and Urdu, while Serbo-Croatian is in an earlier stage of that process.
Autonomy and heteronomy are complementary attributes of a language variety describing its functional relationship with related varieties. The concepts were introduced by William A. Stewart in 1968, and provide a way of distinguishing a language from a dialect.
Linguistic demography is the statistical study of languages among all populations. Estimating the number of speakers of a given language is not straightforward, and various estimates may diverge considerably. This is first of all due to the question of defining "language" vs. "dialect". Identification of varieties as a single language or as distinct languages is often based on ethnic, cultural, or political considerations rather than mutual intelligibility. The second difficulty is multilingualism, complicating the definition of "native language". Finally, in many countries, insufficient census data add to the difficulties.
Northeastern Neo-Aramaic (NENA) is a grouping of related dialects of Neo-Aramaic spoken before World War I as a vernacular language by Jews and Assyrian Christians between the Tigris and Lake Urmia, stretching north to Lake Van and southwards to Mosul and Kirkuk. As a result of the Assyrian genocide, Christian speakers were forced out of the area that is now Turkey and in the early 1950s most Jewish speakers moved to Israel. The Kurdish-Turkish conflict resulted in further dislocations of speaker populations. As of the 1990s, the NENA group had an estimated number of fluent speakers among the Assyrians just below 500,000, spread throughout the Middle East and the Assyrian diaspora. In 2007, linguist Geoffrey Khan wrote that many dialects were nearing extinction with fluent speakers difficult to find.
The Fangyan is a Chinese dictionary compiled in the early 1st century CE by the poet and philosopher Yang Xiong. It was the first Chinese dictionary to include significant regional vocabulary, and is considered the "most significant lexicographic work" of its era. His dictionary's preface explains how he spent 27 years amassing and collating the dictionary. Yang collected regionalisms from many sources, particularly the 'light carriage' surveys made during the Zhou and Qin dynasties, where imperial emissaries were sent into the countryside annually to record folk songs and idioms from across China, reaching as far north as Korea.
Central Asian Arabic or Jugari Arabic refers to a set of four closely-related varieties of Arabic currently facing extinction and spoken predominantly by Arab communities living in portions of Central Asia. These varieties are Bactrian Arabic, Bukhara Arabic, Qashqa Darya Arabic, and Khorasani Arabic.
Rawang, also known as Krangku, Kiutze (Qiuze), and Ch’opa, is a Sino-Tibetan language of India and Burma. Rawang has a high degree of internal diversity, and some varieties are not mutually intelligible. Most, however, understand Mutwang (Matwang), the standard dialect, and basis of written Rawang.