This is a list of languages by total number of speakers.
It is difficult to define what constitutes a language as opposed to a dialect. For example, Chinese and Arabic are sometimes considered single languages, but each includes several mutually unintelligible varieties, and so they are sometimes considered language families instead. Conversely, colloquial registers of Hindi and Urdu are almost completely mutually intelligible, and are sometimes classified as one language, Hindustani. Such rankings should be used with caution, because it is not possible to devise a coherent set of linguistic criteria for distinguishing languages in a dialect continuum. [1]
There is no single criterion for how much knowledge is sufficient to be counted as a second-language speaker. For example, English has about 450 million native speakers but, depending on the criterion chosen, can be said to have as many as two billion speakers. [2]
There are also difficulties in obtaining reliable counts of speakers, which vary over time because of population change and language shift. In some areas, there is no reliable census data, the data is not current, or the census may not record languages spoken, or record them ambiguously. Sometimes speaker populations are exaggerated for political reasons, or speakers of minority languages may be underreported in favor of a national language. [3]
Ethnologue lists the following languages as having 50 million or more total speakers. [4] This section does not include entries that Ethnologue identifies as macrolanguages encompassing several varieties, such as Arabic, Lahnda, Persian, Malay, Pashto, and Chinese.
Language | Family | Branch | First-language (L1) speakers | Second-language (L2) speakers | Total speakers (L1+L2) |
---|---|---|---|---|---|
English (excl. creole languages) | Indo-European | Germanic | 380 million | 1.135 billion | 1.515 billion |
Mandarin Chinese (incl. Standard Chinese, but excl. other varieties) | Sino-Tibetan | Sinitic | 941 million | 199 million | 1.140 billion |
Hindi (excl. Urdu) | Indo-European | Indo-Aryan | 345 million | 264 million | 609 million |
Spanish (excl. creole languages) | Indo-European | Romance | 486 million | 74 million | 560 million |
Modern Standard Arabic (excl. dialects) | Afro-Asiatic | Semitic | 0 [a] | 332 million | 332 million |
French (excl. creole languages) | Indo-European | Romance | 74 million | 238 million | 312 million |
Bengali | Indo-European | Indo-Aryan | 237 million | 41 million | 278 million |
Portuguese (excl. creole languages) | Indo-European | Romance | 236 million | 27 million | 264 million |
Russian | Indo-European | Balto-Slavic | 148 million | 108 million | 255 million |
Urdu (excl. Hindi) | Indo-European | Indo-Aryan | 70 million | 168 million | 238 million |
Indonesian (excl. other Malay) | Austronesian | Malayo-Polynesian | 44 million | 155 million | 199 million |
Standard German | Indo-European | Germanic | 76 million | 58 million | 134 million |
Japanese | Japonic | — | 123 million | <1 million | 123 million |
Nigerian Pidgin | English Creole | Krio | 5 million | 116 million | 121 million |
Egyptian Arabic (excl. other Arabic dialects) | Afro-Asiatic | Semitic | 78 million | 25 million | 103 million |
Marathi | Indo-European | Indo-Aryan | 83 million | 16 million | 99 million |
Telugu | Dravidian | South-Central | 83 million | 13 million | 96 million |
Turkish | Turkic | Oghuz | 84 million | 6 million | 90 million |
Hausa | Afro-Asiatic | Chadic | 54 million | 34 million | 88 million |
Tamil | Dravidian | Southern | 79 million | 8 million | 87 million |
Yue Chinese (incl. Cantonese) | Sino-Tibetan | Sinitic | 86 million | 1 million | 87 million |
Swahili | Niger–Congo | Bantu | 3 million | 83 million | 87 million |
Vietnamese | Austroasiatic | Vietic | 85 million | 1 million | 86 million |
Wu Chinese (incl. Shanghainese) | Sino-Tibetan | Sinitic | 83 million | <1 million | 83 million |
Tagalog [b] | Austronesian | Malayo-Polynesian | 29 million | 54 million | 83 million |
Western Punjabi (excl. Eastern Punjabi) | Indo-European | Indo-Aryan | — | — | 82 million |
Korean | Koreanic | — | 81 million | <1 million | 81 million |
Iranian Persian (excl. other Persian dialects) | Indo-European | Iranian | 62 million | 17 million | 78 million |
Javanese | Austronesian | Malayo-Polynesian | — | — | 68 million |
Italian | Indo-European | Romance | 64 million | 3 million | 67 million |
Gujarati | Indo-European | Indo-Aryan | 58 million | 5 million | 63 million |
Thai | Kra–Dai | Zhuang–Tai | 21 million | 40 million | 61 million |
Amharic | Afro-Asiatic | Semitic | 35 million | 25 million | 60 million |
Kannada | Dravidian | Southern | 44 million | 15 million | 59 million |
Levantine Arabic (excl. other Arabic dialects) | Afro-Asiatic | Semitic | 51 million | 2 million | 54 million |
Bhojpuri | Indo-European | Indo-Aryan | 53 million | <1 million | 53 million |
Min Nan Chinese (incl. Hokkien) | Sino-Tibetan | Sinitic | 51 million | <1 million | 51 million |
The World Factbook , produced by the Central Intelligence Agency (CIA), estimates the ten most spoken languages (L1 + L2) in 2022 as follows: [6]
Language | Percentage of world population (2022) |
---|---|
English | 18.8% |
Mandarin Chinese | 13.8% |
Hindi | 7.5% |
Spanish | 6.9% |
French | 3.4% |
Arabic | 3.4% |
Bengali | 3.4% |
Russian | 3.2% |
Portuguese | 3.2% |
Urdu | 2.9% |
Arabic is a Central Semitic language of the Afroasiatic language family spoken primarily in the Arab world. The ISO assigns language codes to 32 varieties of Arabic, including its standard form of Literary Arabic, known as Modern Standard Arabic, which is derived from Classical Arabic. This distinction exists primarily among Western linguists; Arabic speakers themselves generally do not distinguish between Modern Standard Arabic and Classical Arabic, but rather refer to both as al-ʿarabiyyatu l-fuṣḥā or simply al-fuṣḥā (اَلْفُصْحَىٰ).
A dialect is a variety of language spoken by a particular group of people. It can also refer to a language subordinate in status to a dominant language, and is sometimes used to mean a vernacular language.
There are over 250 languages indigenous to Europe, and most belong to the Indo-European language family. Out of a total European population of 744 million as of 2018, some 94% are native speakers of an Indo-European language. The three largest phyla of the Indo-European language family in Europe are Romance, Germanic, and Slavic; they have more than 200 million speakers each, and together account for close to 90% of Europeans.
Kazakh is a Turkic language of the Kipchak branch spoken in Central Asia by Kazakhs. It is closely related to Nogai, Kyrgyz and Karakalpak. It is the official language of Kazakhstan, and has official status in the Altai Republic of Russia. It is also a significant minority language in the Ili Kazakh Autonomous Prefecture in Xinjiang, China, and in the Bayan-Ölgii Province of western Mongolia. The language is also spoken by many ethnic Kazakhs throughout the former Soviet Union, Germany, and Turkey.
Cebuano is an Austronesian language spoken in the southern Philippines by Cebuano people and other Ethnic groups as secondary language. It is natively, though informally, called by its generic term Bisayâ or Binisayâ and sometimes referred to in English sources as Cebuan. It is spoken by the Visayan ethnolinguistic groups native to the islands of Cebu, Bohol, Siquijor, the eastern half of Negros, the western half of Leyte, and the northern coastal areas of Northern Mindanao and the eastern part of Zamboanga del Norte due to Spanish settlements during the 18th century. In modern times, it has also spread to the Davao Region, Cotabato, Camiguin, parts of the Dinagat Islands, and the lowland regions of Caraga, often displacing native languages in those areas.
The English-speaking world comprises the 88 countries and territories in which English is an official, administrative, or cultural language. In the early 2000s, between one and two billion people spoke English, making it the largest language by number of speakers, the third largest language by number of native speakers and the most widespread language geographically. The countries in which English is the native language of most people are sometimes termed the Anglosphere. Speakers of English are called Anglophones.
There are some 130 to 195 languages spoken in the Philippines, depending on the method of classification. Almost all are Malayo-Polynesian languages native to the archipelago. A number of Spanish-influenced creole varieties generally called Chavacano along with some local varieties of Chinese are also spoken in certain communities. The 1987 constitution designates Filipino, a de facto standardized version of Tagalog, as the national language and an official language along with English. Filipino is regulated by Commission on the Filipino Language and serves as a lingua franca used by Filipinos of various ethnolinguistic backgrounds.
Literary language is the form (register) of a language used when writing in a formal, academic, or particularly polite tone; when speaking or writing in such a tone, it can also be known as formal language. It may be the standardized variety of a language. It can sometimes differ noticeably from the various spoken lects, but the difference between literary and non-literary forms is greater in some languages than in others. If there is a strong divergence between a written form and the spoken vernacular, the language is said to exhibit diglossia.
In linguistics, mutual intelligibility is a relationship between different but related language varieties in which speakers of the different varieties can readily understand each other without prior familiarity or special effort. Mutual intelligibility is sometimes used to distinguish languages from dialects, although sociolinguistic factors are often also used.
A pluricentric language or polycentric language is a language with several codified standard forms, often corresponding to different countries. Many examples of such languages can be found worldwide among the most-spoken languages, including but not limited to Chinese in the People's Republic of China, Taiwan and Singapore; English in the United States, United Kingdom, Canada, Australia, New Zealand, Ireland, South Africa, India, and elsewhere; and French in France, Canada, and elsewhere. The converse case is a monocentric language, which has only one formally standardized version. Examples include Japanese and Russian. In some cases, the different standards of a pluricentric language may be elaborated to appear as separate languages, e.g. Malaysian and Indonesian, Hindi and Urdu, while Serbo-Croatian is in an earlier stage of that process.
Modern Standard Arabic (MSA) or Modern Written Arabic (MWA) is the variety of standardized, literary Arabic that developed in the Arab world in the late 19th and early 20th centuries, and in some usages also the variety of spoken Arabic that approximates this written standard. MSA is the language used in literature, academia, print and mass media, law and legislation, though it is generally not spoken as a first language, similar to Contemporary Latin. It is a pluricentric standard language taught throughout the Arab world in formal education, differing significantly from many vernacular varieties of Arabic that are commonly spoken as mother tongues in the area; these are only partially mutually intelligible with both MSA and with each other depending on their proximity in the Arabic dialect continuum.
Linguistic demography is the statistical study of languages among all populations. Estimating the number of speakers of a given language is not straightforward, and various estimates may diverge considerably. This is first of all due to the question of defining "language" vs. "dialect". Identification of varieties as a single language or as distinct languages is often based on ethnic, cultural, or political considerations rather than mutual intelligibility. The second difficulty is multilingualism, complicating the definition of "native language". Finally, in many countries, insufficient census data add to the difficulties.
The main languages spoken in Eritrea are Tigrinya, Tigre, Kunama, Bilen, Nara, Saho, Afar, and Beja. The country's working languages are Tigrinya, Arabic, English, and formerly Italian.
Afghanistan is a linguistically diverse nation, with upwards of 40 distinct languages. However, Dari and Pashto are two of the most prominent languages in the country, and have shared official status under various governments of Afghanistan. Dari, as a shared language between multiple ethnic groups in the country, has served as a historical lingua franca between different linguistic groups in the region and is the most widely understood language in the country.
The official language of Greece is Greek, spoken by 99% of the population. In addition, a number of non-official, minority languages and some Greek dialects are spoken as well. The most common foreign languages learned by Greeks are English, German, French and Italian.
Kuwaiti is a Gulf Arabic dialect spoken in Kuwait. Kuwaiti Arabic shares many phonetic features unique to Gulf dialects spoken in the Arabian Peninsula. Due to Kuwait's soap opera industry, knowledge of Kuwaiti Arabic has spread throughout the Arabic-speaking world and become recognizable even to people in countries such as Tunisia and Jordan.
Arabic, particularly the Moroccan Arabic dialect, is the most widely spoken language in Morocco, but a number of regional and foreign languages are also spoken. The official languages of Morocco are Modern Standard Arabic and Standard Moroccan Berber. Moroccan Arabic is by far the primary spoken vernacular and lingua franca, whereas Berber languages serve as vernaculars for significant portions of the country. According to the 2024 Moroccan census, 92.7% of the population spoke Arabic, whereas 24.8% spoke Berber languages.
Varieties of Arabic are the linguistic systems that Arabic speakers speak natively. Arabic is a Semitic language within the Afroasiatic family that originated in the Arabian Peninsula. There are considerable variations from region to region, with degrees of mutual intelligibility that are often related to geographical distance and some that are mutually unintelligible. Many aspects of the variability attested to in these modern variants can be found in the ancient Arabic dialects in the peninsula. Likewise, many of the features that characterize the various modern variants can be attributed to the original settler dialects as well as local native languages and dialects. Some organizations, such as SIL International, consider these approximately 30 different varieties to be separate languages, while others, such as the Library of Congress, consider them all to be dialects of Arabic.
Jordanian Arabic is a dialect continuum of mutually intelligible varieties of Arabic spoken in Jordan.