English in computing

Last updated

The English language is sometimes described as the lingua franca of computing. In comparison to other sciences, where Latin and Greek are often the principal sources of vocabulary, computer science borrows more extensively from English. In the past, due to the technical limitations of early computers, and the lack of international standardization on the Internet, computer users were limited to using English and the Latin alphabet. However, this historical limitation is less present today, due to innovations in internet infrastructure and increases in computer speed. Most software products are localized in numerous languages and the invention of the Unicode character encoding has resolved problems with non-Latin alphabets. Some limitations have only been changed recently,[ when? ] such as with domain names, which previously allowed only ASCII characters.

Contents

English is seen as having this role due to the prominence of the United States and the United Kingdom, both English-speaking countries, in the development and popularization of computer systems, computer networks, software and information technology.

History

Computer Science has an ultimately mathematical foundation which was laid by non-English speaking cultures. The first mathematically literate societies in the Ancient Near East recorded methods for solving mathematical problems in steps. [1] The word 'algorithm' comes from the name of a famous medieval Arabic mathematician who contributed to the spread of Hindu-Arabic numerals, al-Khwārizmī. [2] The first systematic treatment of binary numbers was completed by Leibniz, [3] a German mathematician. Leibniz wrote his treatise on the topic in French, the lingua franca of science at the time, [4] and innovations in what is now called Computer hardware occurred outside of an English tradition, with Pascal inventing the first mechanical calculator, and Leibniz improving it. [5]

Interest in building computing machines first emerged in the 19th century, with the coming of the Second Industrial Revolution. The origins of Computing in an English tradition began in this era with Charles Babbage's conceptualization of the Difference and Analytical Engine, George Boole's work on logic, and Herman Hollerith's invention of the tabulating machine for specific use in the 1890 United States census. [6] At the time, Britain enjoyed near complete hegemonic power in the West at the height of the Pax Britannica , and America was experiencing an economic and demographic boom. By the time of the interwar period in the early 20th century, the most important mathematics related to the development of computing were being done in English, which was also beginning to become the new lingua franca of science. [7]

Influence on other languages

The computing terminology of many languages borrows from English. Some language communities actively resist this trend, and in other cases English is used extensively and more directly. This section gives some examples of the use of English loans in other languages and mentions any notable differences.

Bulgarian

Both English and Russian have had influence over Bulgarian computing vocabulary. In many cases, however, the borrowed word is translated into Bulgarian rather than transcribed phonetically from English. Combined with the use of Cyrillic this can make it difficult to recognize loanwords. For example, the Bulgarian term for motherboard is дънна платка (IPA: [ˈdɤnnaˈplatka] ), literally "bottom board".

Faroese

The Faroese language has a sparse scientific vocabulary based on the language itself. Many Faroese scientific words are borrowed and/or modified versions of especially Nordic and English equivalents. The vocabulary is constantly evolving and thus new words often die out, and only a few survive and become widely used. Examples of successful words include e.g. "telda" (computer), "kurla" (at sign) and "ambætari" (server). [8]

French

In French, there are some generally accepted English loanwords, but there is also a distinct effort to avoid them. In France, the Académie française is responsible for the standardisation of the language and often coins new technological terms. Some of them are accepted in practice, but oftentimes the English loans remain predominant. In Quebec, the Office québécois de la langue française has a similar function.

German

In German, English words are very often used as well:

Japanese

Japanese uses the katakana alphabet for foreign loanwords, a wide variety of which are in use today. English computing terms remain prevalent in modern Japanese vocabulary.

Utilizing a keyboard layout suitable for romanization of Japanese, a user may type in the Latin script in order to display Japanese, inclusive of hiragana, katakana, and Japanese kanji.

Usually when writing in Japanese on a computer keyboard, the text is input in roman transcription, optionally according to Hepburn, Kunrei, or Nippon romanization; the common Japanese word processing programs allow for all three. Long vowels are input according to how they are written in kana; for example, a long o is input as ou, instead of an o with a circumflex or macron (ô or ō). As letters are keyed in, they are automatically converted, as specified, into either hiragana or katakana. And these kana phrases are in turn converted, as desired, into kanji. [10]

Icelandic

The Icelandic language has its own vocabulary of scientific terms. Still, English loans exist, and are mostly used in casual conversation, whereas the Icelandic words might be longer or not as widespread.

Norwegian

It's quite common to use English words with regards to computing in all Scandinavian languages.

nouns: mail (referring to e-mail), software, blogg (from "blog"), spam

verbs: å boote, å spamme, å blogge

Polish

Polish terminology derived from English:

Russian

Spanish

The English influence on the software industry and the internet in Latin America has borrowed significantly from the Castilian lexicon.

Frequently untranslated, and their Spanish equivalent
Not translated
Undecided

Many computing terms in Spanish share a common root with their English counterpart. In these cases, both terms are understood, but the Spanish is preferred for formal use:

Character encoding

Early computer software and hardware had very little support for character sets other than the Latin alphabet. As a result, it was difficult or impossible to represent languages based on other scripts. The ASCII character encoding, created in the 1960s, usually only supported 128 different characters in a 7 bit format. With the use of additional software it was possible to provide support for some languages, for instance those based on the Cyrillic alphabet. However, complex-script and logographic languages like Chinese or Japanese need more characters than the 256 limit imposed by 8-bit character encodings. Some computers created in the former USSR had native support for the Cyrillic alphabet.

The widespread adoption of Unicode, and UTF-8 on the web, resolved most of these historical limitations. ASCII remains the de facto standard for command interpreters, programming languages and text-based communication protocols, but it is slowly dying out.

Programming language

The syntax of most programming languages uses English keywords, and therefore it could be argued some knowledge of English is required in order to use them. Some studies have shown that programmers nonnative to English self-report that English is their biggest obstacle to programming proficiency. [12] However, it is important to recognize all programming languages are in the class of formal languages. They are very different from any natural language, including English.

Some examples of non-English programming languages:

Communication protocols

Many application protocols use text strings for requests and parameters, rather than the binary values commonly used in lower layer protocols. The request strings are generally based on English words, although in some cases the strings are contractions or acronyms of English expressions, which can render them somewhat cryptic to anyone not familiar with the protocol, whatever their proficiency in English. Nevertheless, the use of word-like strings is a convenient mnemonic device that allows a person skilled in the art (and with sufficient knowledge of English) to execute the protocol manually from a keyboard, usually for the purpose of finding a problem with the service.

Examples:

It is notable that response codes, that is, the strings sent back by the recipient of a request, are typically numeric: for instance, in HTTP (and some borrowed by other protocols)

This is because response codes also need to convey unambiguous information, but can have various nuances that the requester may optionally use to vary its subsequent actions. To convey all such "sub-codes" with alphabetic words would be unwieldy, and negate the advantage of using pseudo-English words. Since responses are usually generated by software they do not need to be mnemonic. Numeric codes are also more easily analyzed and categorized when they are processed by software, instead of a human testing the protocol by manual input.

Localization

BIOS

Many personal computers have a BIOS chip, displaying text in English during boot time.

Keyboard shortcut

Keyboard shortcuts are usually defined in terms of English keywords such as CTRL+F for find.

English on the World Wide Web

English is the largest language on the World Wide Web, with 27% of internet users.

English speakers

Web user percentages usually focus on raw comparisons of the first language of those who access the web. Just as important is a consideration of second- and foreign-language users; i.e., the first language of a user does not necessarily reflect which language he or she regularly employs when using the web.

Native speakers

English-language users appear to be a plurality of web users, consistently cited as around one-third of the overall (near one billion). This reflects the relative affluence of English-speaking countries and high Internet penetration rates in them. This lead may be eroding due mainly to a rapid increase of Chinese users. [13]

First-language users among other relatively affluent countries appear generally stable, the two largest being German and Japanese, which each have between 5% and 10% of the overall share.

World Wide Web content

One widely quoted figure for the amount of web content in English is 80%. [14] Other sources show figures five to fifteen points lower, though still well over 50%. [15] [16] [17] There are two notable facts about these percentages:

The English web content is greater than the number of first-language English users by as much as 2 to 1.[ citation needed ]

Given the enormous lead it already enjoys and its increasing use as a lingua franca in other spheres, English web content may continue to dominate even as English first-language Internet users decline. This is a classic positive feedback loop: new Internet users find it helpful to learn English and employ it online, thus reinforcing the language's prestige and forcing subsequent new users to learn English as well.

Certain other factors (some predating the medium's appearance) have propelled English into a majority web-content position. Most notable in this regard is the tendency for researchers and professionals to publish in English to ensure maximum exposure. The largest database of medical bibliographical information, for example, shows English was the majority language choice for the past forty years and its share has continually increased over the same period. [18]

The fact that non-Anglophones regularly publish in English only reinforces the language's dominance. English has a rich technical vocabulary (largely because native and non-native speakers alike use it to communicate technical ideas) and many IT and technical professionals use English regardless of country of origin (Linus Torvalds, for instance, comments his code in English, despite being from Finland and having Swedish as his first language).

Notes

  1. Chabert, Jean-Luc (1994). A History of Algorithms. Paris: Springer. p. 7.
  2. O'Regan, Gerard (2021). A Brief History of Computing. Cham, Switzerland: Springer. p. 29.
  3. O'Regan, Gerard (2021). A Brief History of Computing. Cham, Switzerland: Springer. p. 38.
  4. Weber, George (2003). "Top Languages". www.andaman.org. Archived from the original on March 12, 2008.
  5. O'Regan, Gerard (2021). A Brief History of Computing. Cham, Switzerland: Springer. p. 36.
  6. O'Regan, Gerard (2021). A Brief History of Computing. Cham, Switzerland: Springer. pp. 35–88.
  7. Kaplan, Robert (2001). The Dominance of English as a Language of Science. Berlin, New York: Mouton De Gruyter. p. 9.
  8. "List of Faroese-English-Danish IT words". Archived from the original on May 31, 2013. Retrieved June 29, 2010.
  9. "Questions de langue" on the Académie Française's website
  10. "Romanization systems". www.hadamitzky.de. Retrieved May 15, 2019.
  11. "dżojstik". Słownik języka polskiego . Polish Scientific Publishers PWN . Retrieved September 26, 2012.
  12. Ben Idris, Mrwan; Ammar, Hany (March 2018). "The Correlation between Arabic Student's English Proficiency and Their Computer Programming Ability at the University Level". International Journal of Managing Public Sector Information and Communication Technologies. 9: 01–10. doi: 10.5121/ijmpict.2018.9101 .
  13. Johnson, Bobbie (August 16, 2005). "English grip on internet being eroded". Onlineblog. Guardian Unlimited. Archived from the original on September 10, 2005.
  14. "What percentage of the internet is in English?". English English. Archived from the original on January 28, 2020.
  15. "Usage of content languages for websites". W3Techs. Archived from the original on May 30, 2017. Retrieved December 30, 2011.
  16. "VeriSign Announces Plan to Further Enhance .com and .net Global Internet Constellation Sites with Regional Resolution Servers". VeriSign. April 6, 2005. Archived from the original on December 24, 2005. Retrieved January 31, 2014.
  17. Bowen, Ted Smalley (November 21, 2001). "English could snowball on Net". Technology Research News. Archived from the original on September 24, 2023.
  18. Loria, Alvar; Arroyo, Pedro (July 2005). "Language and country preponderance trends in MEDLINE and its causes". J Med Libr Assoc. 93 (3): 381–385. PMC   1175804 . PMID   16059428.

Related Research Articles

<span class="mw-page-title-main">Client–server model</span> Distributed application structure in computing

The client–server model is a distributed application structure that partitions tasks or workloads between the providers of a resource or service, called servers, and service requesters, called clients. Often clients and servers communicate over a computer network on separate hardware, but both client and server may reside in the same system. A server host runs one or more server programs, which share their resources with clients. A client usually does not share any of its resources, but it requests content or service from a server. Clients, therefore, initiate communication sessions with servers, which await incoming requests. Examples of computer applications that use the client–server model are email, network printing, and the World Wide Web.

<span class="mw-page-title-main">Internet</span> Global system of connected computer networks

The Internet is the global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a network of networks that consists of private, public, academic, business, and government networks of local to global scope, linked by a broad array of electronic, wireless, and optical networking technologies. The Internet carries a vast range of information resources and services, such as the interlinked hypertext documents and applications of the World Wide Web (WWW), electronic mail, telephony, and file sharing.

<span class="mw-page-title-main">String (computer science)</span> Sequence of characters, data type

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed. A string is generally considered as a data type and is often implemented as an array data structure of bytes that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence data types and structures.

<span class="mw-page-title-main">World Wide Web</span> Linked hypertext system on the Internet

The World Wide Web is an information system that enables content sharing over the Internet through user-friendly ways meant to appeal to users beyond IT specialists and hobbyists. It allows documents and other web resources to be accessed over the Internet according to specific rules of the Hypertext Transfer Protocol (HTTP).

An international auxiliary language is a language meant for communication between people from all different nations, who do not share a common first language. An auxiliary language is primarily a foreign language and often a constructed language. The concept is related to but separate from the idea of a lingua franca that people must use to communicate. The study of international auxiliary languages is interlinguistics.

RT-11 is a discontinued small, low-end, single-user real-time operating system for the full line of Digital Equipment Corporation PDP-11 16-bit computers. RT-11 was first implemented in 1970. It was widely used for real-time computing systems, process control, and data acquisition across all PDP-11s. It was also used for low-cost general-use computing.

<span class="mw-page-title-main">Internationalization and localization</span> Process of making software accessible to people in different areas of the world

In computing, internationalization and localization (American) or internationalisation and localisation (British), often abbreviated i18n and l10n respectively, are means of adapting computer software to different languages, regional peculiarities and technical requirements of a target locale.

Greeklish, a portmanteau of the words Greek and English, also known as Grenglish, Latinoellinika/Λατινοελληνικά or ASCII Greek, is the Greek language written using the Latin script. Unlike standardized systems of Romanization of Greek, as used internationally for purposes such as rendering Greek proper names or place names, or for bibliographic purposes, the term Greeklish mainly refers to informal, ad-hoc practices of writing Greek text in environments where the use of the Greek alphabet is technically impossible or cumbersome, especially in electronic media. Greeklish was commonly used on the Internet when Greek people communicate by forum, e-mail, IRC, instant messaging and occasionally on SMS, mainly because older operating systems did not support non-Latin writing systems, or in a unicode form like UTF-8. Nowadays most Greek language content appears in the Greek alphabet.

<span class="mw-page-title-main">Online chat</span> Real-time texting over the internet

Online chat is any kind of communication over the Internet that offers a real-time transmission of text messages from sender to receiver. Chat messages are generally short in order to enable other participants to respond quickly. Thereby, a feeling similar to a spoken conversation is created, which distinguishes chatting from other text-based online communication forms such as Internet forums and email. Online chat may address point-to-point communications as well as multicast communications from one sender to many receivers and voice and video chat, or may be a feature of a web conferencing service.

<span class="mw-page-title-main">Mojibake</span> Garbled text as a result of incorrect character encodings

Mojibake is the garbled or gibberish text that is the result of text being decoded using an unintended character encoding. The result is a systematic replacement of symbols with completely unrelated ones, often from a different writing system.

<span class="mw-page-title-main">Lingua Franca Nova</span> Auxiliary constructed language

Lingua Franca Nova, abbreviated as LFN and known colloquially as Elefen, is an auxiliary constructed language originally created by C. George Boeree of Shippensburg University, Pennsylvania, and further developed by many of its users. Its vocabulary is based primarily on the Romance languages, namely French, Italian, Portuguese, Spanish, and Catalan. Lingua Franca Nova has phonemic spelling based on 22 letters from the Latin script.

<span class="mw-page-title-main">Ligature (writing)</span> Glyph combining two or more letterforms

In writing and typography, a ligature occurs where two or more graphemes or letters are joined to form a single glyph. Examples are the characters ⟨æ⟩ and ⟨œ⟩ used in English and French, in which the letters ⟨a⟩ and ⟨e⟩ are joined for the first ligature and the letters ⟨o⟩ and ⟨e⟩ are joined for the second ligature. For stylistic and legibility reasons, ⟨f⟩ and ⟨i⟩ are often merged to create ⟨fi⟩ ; the same is true of ⟨s⟩ and ⟨t⟩ to create ⟨st⟩. The common ampersand, ⟨&⟩, developed from a ligature in which the handwritten Latin letters ⟨e⟩ and ⟨t⟩ were combined.

A wordfilter is a script typically used on Internet forums or chat rooms that automatically scans users' posts or comments as they are submitted and automatically changes or censors particular words or phrases.

<span class="mw-page-title-main">Electronic dictionary</span> Dictionary whose data exists in digital form and can be accessed through a number of different media

An electronic dictionary is a dictionary whose data exists in digital form and can be accessed through a number of different media. Electronic dictionaries can be found in several forms, including software installed on tablet or desktop computers, mobile apps, web applications, and as a built-in function of E-readers. They may be free or require payment.

L, or l, is the twelfth letter in the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is el, plural els.

J, or j, is the tenth letter in the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its usual name in English is jay, with a now-uncommon variant jy. When used in the International Phonetic Alphabet for the voiced palatal approximant it may be called yod or jod.

A Unicode font is a computer font that maps glyphs to code points defined in the Unicode Standard. The vast majority of modern computer fonts use Unicode mappings, even those fonts which only include glyphs for a single writing system, or even only support the basic Latin alphabet. Fonts which support a wide range of Unicode scripts and Unicode symbols are sometimes referred to as "pan-Unicode fonts", although as the maximum number of glyphs that can be defined in a TrueType font is restricted to 65,535, it is not possible for a single font to provide individual glyphs for all defined Unicode characters. This article lists some widely used Unicode fonts that support a comparatively large number and broad range of Unicode characters.

A home server is a computing server located in a private computing residence providing services to other devices inside or outside the household through a home network or the Internet. Such services may include file and printer serving, media center serving, home automation control, web serving, web caching, file sharing and synchronization, video surveillance and digital video recorder, calendar and contact sharing and synchronization, account authentication, and backup services.

Informal or ad hoc romanizations of Cyrillic have been in use since the early days of electronic communications, starting from early e-mail and bulletin board systems. Their use faded with the advances in the Russian internet that made support of Cyrillic script standard, but resurfaced with the proliferation of instant messaging, SMS and mobile phone messaging in Russia.

Mobile translation is any electronic device or software application that provides audio translation. The concept includes any handheld electronic device that is specifically designed for audio translation. It also includes any machine translation service or software application for hand-held devices, including mobile telephones, Pocket PCs, and PDAs. Mobile translation provides hand-held device users with the advantage of instantaneous and non-mediated translation from one human language to another, usually against a service fee that is, nevertheless, significantly smaller than a human translator charges.