Languages used on the Internet

Last updated

Slightly over half of the homepages of the most visited websites on the World Wide Web are in English, with varying amounts of information available in many other languages. [1] [2] Other top languages are Chinese, Spanish, Russian, Persian, French, German and Japanese. [1] [3]

Contents

Of the more than 7,000 existing languages, only a few hundred are recognized as being in use for Web pages on the World Wide Web. [4]

Languages used

There is debate over the most-used languages on the Internet. A 2009 UNESCO report monitoring the languages of websites for 12 years, from 1996 to 2008, found a steady year-on-year decline in the percentage of webpages in English, from 75 percent in 1998 to 45 percent in 2005. [2] The authors found that English remained at 45 percent of content for 2005 to the end of the study but believe this was due to the bias of search engines indexing more English-language content rather than a true stabilization of the percentage of content in English on the World Wide Web. [2]

The number of non-English web pages is rapidly expanding. The use of English online increased by around 281 percent from 2001 to 2011, a lower rate of growth than that of Spanish (743 percent), Chinese (1,277 percent), Russian (1,826 percent) or Arabic (2,501 percent) over the same period. [5]

According to a 2000 study, the international auxiliary language Esperanto ranked 40 out of all languages in search engine queries, also ranking 27 out of all languages that rely on the Latin script. [6]

Usage statistics of content languages for websites

W3Techs estimated percentages of the top 10 million websites on the World Wide Web using various content languages as of 22 November 2024: [1]

RankLanguage15 May 202314 November 2024
1 English 55.5%49.4%
2 Spanish 5.0%6.0%
3 Russian 4.9%3.9%
4 German 4.3%5.6%
5 French 4.4%4.4%
6 Japanese 3.7%5.0%
7 Portuguese 2.4%3.8%
8 Turkish 2.3%1.8%
9 Italian 1.9%2.7%
10 Persian 1.8%1.3%
11 Dutch 1.5%2.1%
12 Polish 1.4%1.8%
13 Chinese 1.4%1.2%
14 Vietnamese 1.3%1.1%
15 Indonesian 0.7%1.2%
16 Czech 0.7%1%
17 Korean 0.7%0.8%
18 Ukrainian 0.6%0.6%
19 Arabic 0.7%0.6%
20 Greek 0.5%0.5%
21 Hebrew 0.5%0.4%
22 Swedish 0.5%0.5%
23 Romanian 0.4%0.5%
24 Hungarian 0.4%0.6%
25 Thai 0.4%0.4%
26 Danish 0.3%0.4%
27 Slovak 0.3%0.4%
28 Finnish 0.3%0.4%
29 Bulgarian 0.2%0.3%
30 Serbian 0.3%0.2%
31 Norwegian 0.1%0.1%
32 Lithuanian 0.1%0.2%
33 Slovenian 0.1%0.1%
34 Catalan 0.1%0.1%
35 Estonian 0.1%0.1%
36 Latvian 0.1%0.1%
37 Bokmål 0.1%0.2%
38 Croatian 0.2%0.2%

All other languages are used in less than 0.1% of websites. Even including all languages, percentages may not sum to 100% because some websites contain multiple content languages.

The figures from the W3Techs study are based on the one million most visited websites (i.e., approximately 0.27 percent of all websites according to December 2011 figures) as ranked by Alexa.com, and language is identified using only the home page of the sites in most cases (e.g., all of Wikipedia is based on the language detection of http://www.wikipedia.org). [7] As a consequence, the figures show a significantly higher percentage for many languages (especially for English) as compared to the figures for all websites. [8] For all websites, estimates are between 20 and 50% for English. [9] [2] [10] [11]

Most used scripts on the Internet

#Script%
1 Latin 84.7%
2 Cyrillic 5.1%
3 Kana/Kanji 4.9%
4 Arabic 1.9%
5 Hanzi 1.3%
6 Hangul 0.8%
7 Greek 0.5%
8 Hebrew 0.4%
9 Thai 0.4%

Content languages on YouTube

Of the top 250 YouTube channels, 66% of the content is in English, 15% in Spanish, 7% in Portuguese, 5% in Hindi, and 2% in Korean, while other languages make up 5%, [12] although other sources point to different percentages. [13] [ better source needed ] YouTube is available in over 80 languages with more than a hundred different local versions. [14] Of those popular YouTube channels that posted a video in the first week of 2019, just over half contained some content in a language other than English. [15]

Internet users by language

InternetWorldStats estimates of the number of Internet users by language as of March 31, 2020: [16] [17] [18] [19] [20] [21] [22]

RankLanguageInternet
users
Percentage
1 English 1,186,451,05225.9%
2 Chinese 888,453,06819.4%
3 Spanish 363,684,593  7.9%
4 Arabic 237,418,349  5.2%
5 Indonesian 198,029,815  4.3%
6 Portuguese 171,750,818  3.7%
7 French 144,695,288  3.3%
8 Japanese 118,626,672  2.6%
9 Russian 116,353,942  2.5%
10 German 92,525,427  2.0%
1–10Top 10 languages3,525,027,347  76.9%
Others1,060,551,371 23.1%
Total4,585,578,718100%

Wikipedia page views by language

Most popular edition of Wikipedia by country as of Dec 2022. In greyed-out countries, the "national-language" edition is usually the most popular, but there are exceptions. Most popular edition of Wikipedia by country.svg
Most popular edition of Wikipedia by country as of Dec 2022. In greyed-out countries, the "national-language" edition is usually the most popular, but there are exceptions.
Most viewed editions of Wikipedia over time. The ranking reflects the most recent month in the data (Sep 2024). Most visited Wikipedias by language in proportion of monthly page visits.png
Most viewed editions of Wikipedia over time. The ranking reflects the most recent month in the data (Sep 2024).
Most edited editions of Wikipedia over time. The ranking reflects the most recent month in the data (Sep 2024). Most edited Wikipedias by language in proportion of monthly edits.png
Most edited editions of Wikipedia over time. The ranking reflects the most recent month in the data (Sep 2024).

The Wikimedia Analytics API provides the most recent data on page views and page edits, among other statistics, for all language editions of Wikipedia.

RankLanguage of Wikipedia editionAverage daily page views by humans
(from 10/8/2023 to 10/8/2024)
1 English 253,610,218
2 Japanese 29,741,657
3 Russian 29,008,708
4 Spanish 27,436,473
5 German 26,790,751
6 French 22,913,851
7 Italian 15,306,223
8 Chinese 14,975,873
9 Persian 8,148,931
10 Portuguese 7,813,004
11 Polish 7,151,202
12 Arabic 7,135,389
13 Turkish 4,825,138
14 Indonesian 3,976,393
15 Dutch 3,934,187

See also

Related Research Articles

<span class="mw-page-title-main">Internet</span> Global system of connected computer networks

The Internet is the global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a network of networks that consists of private, public, academic, business, and government networks of local to global scope, linked by a broad array of electronic, wireless, and optical networking technologies. The Internet carries a vast range of information resources and services, such as the interlinked hypertext documents and applications of the World Wide Web (WWW), electronic mail, internet telephony, and file sharing.

<span class="mw-page-title-main">Internet slang</span> Slang languages used by different people on the Internet

Internet slang is a non-standard or unofficial form of language used by people on the Internet to communicate to one another. An example of Internet slang is "lol" meaning "laugh out loud." Since Internet slang is constantly changing, it is difficult to provide a standardized definition. However, it can be understood to be any type of slang that Internet users have popularized, and in many cases, have coined. Such terms often originate with the purpose of saving keystrokes or to compensate for small character limits. Many people use the same abbreviations in texting, instant messaging, and social networking websites. Acronyms, keyboard symbols, and abbreviations are common types of Internet slang. New dialects of slang, such as leet or Lolspeak, develop as ingroup Internet memes rather than time savers. Many people also use Internet slang in face-to-face, real life communication.

<span class="mw-page-title-main">Web browser</span> Software used to access websites

A web browser is an application for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used on a range of devices, including desktops, laptops, tablets, and smartphones. By 2020, an estimated 4.9 billion people had used a browser. The most-used browser is Google Chrome, with a 67% global market share on all devices, followed by Safari with 18%.

<span class="mw-page-title-main">Website</span> Set of related web pages served from a single domain

A website is one or more web pages and related content that is identified by a common domain name and published on at least one web server. Websites are typically dedicated to a particular topic or purpose, such as news, education, commerce, entertainment, or social media. Hyperlinking between web pages guides the navigation of the site, which often starts with a home page. The most-visited sites are Google, YouTube, and Facebook.

This is a Glossary of Internet Terminology; words pertaining to Internet Technology, a subset of Computer Science.

<span class="mw-page-title-main">Internet Explorer 6</span> Web browser for Windows released in 2001

Microsoft Internet Explorer 6 (IE6) is a web browser developed by Microsoft for Windows operating systems. Released on August 24, 2001, it is the sixth, and by now discontinued, version of Internet Explorer and the successor to Internet Explorer 5. It does not support earlier versions.

The English language is sometimes described as the lingua franca of computing. In comparison to other sciences, where Latin and Greek are often the principal sources of vocabulary, computer science borrows more extensively from English. In the past, due to the technical limitations of early computers, and the lack of international standardization on the Internet, computer users were limited to using English and the Latin alphabet. However, this historical limitation is less present today, due to innovations in internet infrastructure and increases in computer speed. Most software products are localized in numerous languages and the invention of the Unicode character encoding has resolved problems with non-Latin alphabets. Some limitations have changed since June 2003 such as with domain names, which previously allowed only ASCII characters.

<span class="mw-page-title-main">User-generated content</span> Online content created by users

User-generated content (UGC), alternatively known as user-created content (UCC), emerged from the rise of intelligent web services which allow everyday users to create content, such as images, videos, audio, text, testimonials, and software and interact with other users. Online content aggregation platforms such as social media, discussion forums and wikis by their interactive and social nature, no longer produce multimedia content but provide tools to produce, collaborate, and share a variety of content, which can affect the attitudes and behaviors of the audience in various aspects. This transforms the role of consumers from passive spectators to active participants.

<span class="mw-page-title-main">Mobile web</span> Mobile browser-based World Wide Web services

The mobile web comprises mobile browser-based World Wide Web services accessed from handheld mobile devices, such as smartphones or feature phones, through a mobile or other wireless network.

jQuery is a JavaScript library designed to simplify HTML DOM tree traversal and manipulation, as well as event handling, CSS animations, and Ajax. It is free, open-source software using the permissive MIT License. As of August 2022, jQuery is used by 77% of the 10 million most popular websites. Web analysis indicates that it is the most widely deployed JavaScript library by a large margin, having at least three to four times more usage than any other JavaScript library.

<span class="mw-page-title-main">Internet censorship</span> Legal control of the internet

Internet censorship is the legal control or suppression of what can be accessed, published, or viewed on the Internet. Censorship is most often applied to specific internet domains but exceptionally may extend to all Internet resources located outside the jurisdiction of the censoring state. Internet censorship may also put restrictions on what information can be made internet accessible. Organizations providing internet access – such as schools and libraries – may choose to preclude access to material that they consider undesirable, offensive, age-inappropriate or even illegal, and regard this as ethical behavior rather than censorship. Individuals and organizations may engage in self-censorship of material they publish, for moral, religious, or business reasons, to conform to societal norms, political views, due to intimidation, or out of fear of legal or other consequences.

The usage share of an operating system is the percentage of computers running that operating system (OS). These statistics are estimates as wide scale OS usage data is difficult to obtain and measure. Reliable primary sources are limited and data collection methodology is not formally agreed. Currently devices connected to the internet allow for web data collection to approximately measure OS usage.

<span class="mw-page-title-main">Language preservation</span> Efforts to save endangered languages

Language preservation is the preservation of endangered or dead languages. With language death, studies in linguistics, anthropology, prehistory and psychology lose diversity. As history is remembered with the help of historic preservation, language preservation maintains dying or dead languages for future studies in such fields. Organizations such as 7000 Languages and the Living Tongues Institute for Endangered Languages document and teach endangered languages as a way of preserving languages. Sometimes parts of languages are preserved in museums, such as tablets containing Cuneiform writing from Mesopotamia. Additionally, dictionaries have been published to help keep record of languages, such as the Kalapuya dictionary published by the Siletz tribe in Oregon.

<span class="mw-page-title-main">Internet linguistics</span> Domain of linguistics

Internet linguistics is a domain of linguistics advocated by the English linguist David Crystal. It studies new language styles and forms that have arisen under the influence of the Internet and of other new media, such as Short Message Service (SMS) text messaging. Since the beginning of human–computer interaction (HCI) leading to computer-mediated communication (CMC) and Internet-mediated communication (IMC), experts, such as Gretchen McCulloch have acknowledged that linguistics has a contributing role in it, in terms of web interface and usability. Studying the emerging language on the Internet can help improve conceptual organization, translation and web usability. Such study aims to benefit both linguists and web users combined.

Niconico, Inc. is a Japanese video sharing service based in Tokyo, Japan. "Niconico" or "nikoniko" is the Japanese ideophone for smiling. As of 2021, Niconico is the 34th most-visited website in Japan, according to Alexa Internet.

<span class="mw-page-title-main">Web page</span> Content provided by a website

A web page is a document on the Web that is accessed in a web browser. A website typically consists of many web pages linked together under a common domain name. The term "web page" is therefore a metaphor of paper pages bound together into a book.

<span class="mw-page-title-main">Runet</span> Russian-language community on the internet

The Russian Internet or Runet, is the part of the Internet that uses the Russian language, including the Russian-language community on the Internet and websites. Geographically, it reaches all continents, including Antarctica, but mostly it is based in Russia.

<span class="mw-page-title-main">Media pluralism</span> Plurality of voices, opinions, and analyses in media systems

Media pluralism defines the state of having a plurality of voices, opinions, and analyses in media systems or the coexistence of different and diverse types of medias and media support.

A number of text encoding standards have historically been used on the World Wide Web, though by now UTF-8 is dominant in all countries, with all languages at 95% use or usually rather higher. The same encodings are used in local files, in fact many more, at least historically. Exact measurements for the prevalence of each are not possible, because of privacy reasons, but rather accurate estimates are available for public web sites, and statistics may reflect use in local files. Attempts at measuring encoding popularity may utilize counts of numbers of (web) documents, or counts weighed by actual use or visibility of those documents.

<span class="mw-page-title-main">YouTube Shorts</span> Sharing platform within YouTube since 2020-21

YouTube Shorts is the short-form section of the American online video-sharing platform YouTube. Shorts focuses on vertical videos that are less than 180 seconds of duration and various features for user interaction. As of May 2024, Shorts have collectively earned over 5 trillion views since the platform was made available to the general public on July 13, 2021, including views that pre-date the YouTube Shorts feature. Creators earn money based on the amount of views they receive, or through ad revenue. The increased popularity of YouTube Shorts has led to concerns about addiction for teenagers.

References

  1. 1 2 3 4 Pimienta, Daniel; Prado, Daniel; Blanco, Álvaro (2009). "Twelve years of measuring linguistic diversity in the Internet: balance and perspectives". United Nations Educational, Scientific and Cultural Organization. Archived from the original on 3 April 2015. Retrieved 24 March 2015.
  2. Language Online
  3. "What continents have the most indigenous languages?". Ethnologue. 3 May 2019. Archived from the original on 27 December 2019. Retrieved 27 December 2019.
  4. Rotaru, Alexandru. "The foreign language Internet is good for business". Archived from the original on 7 April 2013. Retrieved 21 June 2011.
  5. Grefenstette, Gregory; Nioche, Julien. "Estimation of English and non-English Language Use on the WWW Archived 10 April 2018 at the Wayback Machine ". Proceedings of RIAO'2000, "Content-Based Multimedia Information Access", Paris, April 12–14, 2000, pp. 237-246.
  6. "Technologies Overview". W3Techs. Retrieved 24 March 2015.
  7. An alternative approach to produce indicators of languages in the Internet Archived 31 August 2017 at the Wayback Machine Pimienta, Daniel, June 2017
  8. Vannini, Laurent; Le Crosnier, Hervé (March 2012). "NET.LANG: Towards a multilingual cyberspace". Net.lang: réussir le cyberespace multilingue. Caen: C&F éd. ISBN   978-2-915825-08-4. Archived from the original on 4 March 2016 via Maaya Network.
  9. Pimienta, Daniel (2022). "Resource: Indicators on the Presence of Languages in Internet". Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages. Marseille, France: European Language Resources Association: 83–91.
  10. Pimienta, Daniel; Blanco, Álvaro; de Oliveira, Gilvan Müller (2023). "The method behind the unprecedented production of indicators of the presence of languages in the Internet". Frontiers in Research Metrics and Analytics. 8. doi: 10.3389/frma.2023.1149347 . ISSN   2504-0537. PMC   10233101 . PMID   37273659.
  11. Yang, Brian (2019). "6 Common Features Of Top 250 YouTube Channels". Twinword, Inc. Retrieved 19 September 2021.
  12. Expert, Alan Spicer-YouTube Certified (24 November 2020). "Top Languages on YouTube [All The Stats!][Dominate YouTube with Multiple Languages]". Alan Spicer - YouTube Certified Expert. Retrieved 9 April 2023.
  13. GMI Blogger (18 April 2022). "YouTube User Statistics 2022". Global Media Insight - Dubai Digital Interactive Agency. Archived from the original on 27 April 2022. Retrieved 2 May 2022.
  14. van Kessel, Patrick; Toor, Skye; Smith, Aaron (25 July 2019). "Popular YouTube channels produced a vast amount of content, much of it in languages other than English". Pew Research Center . Retrieved 2 May 2022.
  15. "Top Ten Internet Languages in The World - Internet Statistics". 7 September 2019. Archived from the original on 7 September 2019. Retrieved 4 July 2024.
  16. "Internet: most common languages online by users 2017". Statista. Retrieved 4 July 2024.
  17. Schäferhoff, Nick (24 July 2023). "Most Used Languages on the Internet (Which to Add to Your Site?)". TranslatePress. Retrieved 4 July 2024.
  18. Berta, Natalie (23 May 2022). "What Are the Most Used Languages on the Internet & Why?". MosaLingua. Retrieved 4 July 2024.
  19. Admin (26 May 2022). "Top 10 Languages Used On the Internet And Why?". Tridindia. Retrieved 4 July 2024.
  20. "10 Most Common Languages Used On The Internet For 2024". www.marstranslation.com. Retrieved 4 July 2024.
  21. "Langues dans Internet". www.axl.cefan.ulaval.ca. Retrieved 4 July 2024.