Language code

Last updated

A language code is a code that assigns letters or numbers as identifiers or classifiers for languages. These codes may be used to organize library collections or presentations of data, to choose the correct localizations and translations in computing, and as a shorthand designation for longer forms of language names.

Contents

Difficulties of classification

Language code schemes attempt to classify the complex world of human languages, dialects, and variants. Most schemes make some compromises between being general and being complete enough to support specific dialects.

For example, Spanish is spoken in over 20 countries in North America, Central America, the Caribbean, and Europe. Spanish spoken in Mexico will be slightly different from Spanish spoken in Peru. Different regions of Mexico will have slightly different dialects and accents of Spanish. A language code scheme might group these all as "Spanish" for choosing a keyboard layout, most as "Spanish" for general usage, or separate each dialect to allow region-specific variation.

Common schemes

List of some common language code schemes
SchemeNotesExamples for EnglishExamples for Spanish
Glottolog codesCreated for minority languages as a scientific alternative to the industrial ISO 639‑3 standard.
Intentionally do not resemble abbreviations.
  • stan1293 – standard English
  • macr1271 – macro-English (Modern English, incl. creoles)
  • midd1317 – Middle English
  • merc1242 – Mercian (Middle to Modern English)
  • olde1238 – Old English
  • angl1265 – Anglian (Old to Modern English, incl. Scots)
  • stan1288 – standard Spanish
  • olds1249 – Old Spanish
  • cast1243 – Castilic (Old to Modern Spanish, incl. Extremaduran and creoles)
IETF language tag An IETF best practice, specified by BCP 47, [1] for language tags easy to parse by computer. The tag system is extensible to region, dialect, and private designations. It references ISO 639, ISO 3166 and ISO 15924.
  • en – English, as shortest ISO 639 code.
  • en-US – English as used in the United States (US is the ISO 3166‑1 country code for the United States) [2]
  • es – Spanish, as shortest ISO 639 code.
  • es-419 – Spanish appropriate for the Latin America and Caribbean region, using the UN M.49 region code
ISO 639‑1 Two-letter code system made official in 2002, containing 136 codes at the time. Many systems use two-letter ISO 639‑1 codes supplemented by three-letter ISO 639‑2 codes when no two-letter code is applicable.

There are 183 two-letter codes registered as of June 2021. See: List of ISO 639 language codes

  • en
  • es – Spanish
ISO 639‑2 Three-letter system of 464 codes.

See: List of ISO 639-2 codes

  • eng – three-letter code
  • enm – Middle English, c. 1100–1500
  • ang – Old English, c. 450–1100
  • cpe – other English-based creoles and pidgins
  • spa – Spanish
ISO 639‑3 An extension of ISO 639‑2 to cover all known, living or dead, spoken or written languages in 7,589 entries.

See: List of ISO 639-3 codes

  • eng – three-letter code
  • enm – Middle English, c. 1100–1500
  • aig – Antigua and Barbuda Creole English
  • ang – Old English, c. 450–1100
  • svc – Vincentian Creole English
  • spa – Spanish
  • spq – Spanish, Loreto-Ucayali
  • ssp – Spanish sign language
Linguasphere Register code-systemTwo-digit + one to six letter Linguasphere Register code-system published in 2000, [3] containing over 32,000 codes within 10 sectors of reference, covering the world's languages and speech communities.

Navigate also the hierarchy of the Linguasphere Register code-system published online by hortensj-garden.org [4]

Within hierarchy of Linguasphere Register code-system:

  • 5= Indo-European phylosector
  • 52= Germanic phylozone
  • 52-A Germanic set
  • 52-AB English + Anglo-Creole chain
  • 52-ABA English net
  • 52-ABA-c Global English (outer unit)
    52-ABA-ca to 52-ABA-cwe (186 varieties)

Compare:52-ABA-a Scots + Northumbrian
outer unit & 52-ABA-b "Anglo-English" outer unit
(= South Great Britain traditional varieties + Old Anglo-Irish)

Within hierarchy of Linguasphere Register code-system:

  • 5= Indo-European phylosector
  • 51= Romanic phylozone
  • 51-A Romance set
  • 51-AA Romance chain
  • 51-AAA West Romance net
  • 51-AAA-b Español/Castellano (outer unit)
    51-AAA-ba to 51-AAA-bkk (58 varieties)

Compare:51-AAA-a Português + Galego outer unit & 51-AAA-c Astur + Leonés outer unit, etc.

SIL codes (10th–14th editions)Codes created for use in the Ethnologue , a publication of SIL International that lists language statistics. The publication now uses ISO 639‑3 codes.ENGSPN
Verbix language codesConstructed codes starting with old SIL codes and adding more information. [5] ENGSPN

See also

Related Research Articles

ISO 639 is a standard by the International Organization for Standardization (ISO) concerned with representation of languages and language groups. It currently consists of four sets of code, named after each part which formerly described respective set ; a part 6 was published but withdrawn. It was first approved in 1967 as a single-part ISO Recommendation, ISO/R 639, superseded in 2002 by part 1 of the new series, ISO 639-1, followed by additional parts. All existing parts of the series were consolidated into a single standard in 2023, largely based on the text of ISO 639-4.

This is a partial index of Wikipedia articles treating natural languages, arranged alphabetically and with (sub-) families mentioned. The list also includes extinct languages.

<span class="mw-page-title-main">Southern Min</span> Branch of the Min Chinese languages

Southern Min, Minnan or Banlam, is a group of linguistically similar and historically related Chinese languages that form a branch of Min Chinese spoken in Fujian, most of Taiwan, Eastern Guangdong, Hainan, and Southern Zhejiang. Southern Min dialects are also spoken by descendants of emigrants from these areas in diaspora, most notably in Southeast Asia, such as Singapore, Malaysia, the Philippines, Indonesia, Brunei, Southern Thailand, Myanmar, Cambodia, Southern and Central Vietnam, as well as major cities in the United States, including San Francisco, Los Angeles and New York City. Minnan is the most widely-spoken branch of Min, with approximately 48 million speakers as of 2017–2018.

ISO 639-1:2002, Codes for the representation of names of languages—Part 1: Alpha-2 code, is the first part of the ISO 639 series of international standards for language codes. Part 1 covers the registration of "set 1" two-letter codes. There are 183 two-letter codes registered as of June 2021. The registered codes cover the world's major languages.

In computing, a locale is a set of parameters that defines the user's language, region and any special variant preferences that the user wants to see in their user interface. Usually a locale identifier consists of at least a language code and a country/region code. Locale is an important aspect of i18n.

<span class="mw-page-title-main">South Slavic languages</span> Language family

The South Slavic languages are one of three branches of the Slavic languages. There are approximately 30 million speakers, mainly in the Balkans. These are separated geographically from speakers of the other two Slavic branches by a belt of German, Hungarian and Romanian speakers.

The Linguasphere Observatory is a non-profit transnational research network, devoted to the gathering, study, classification, editing and free distribution online of the updatable text of a fully indexed and comprehensive Linguasphere Register of the World's Languages and Speech Communities.

ISO 639-3:2007, Codes for the representation of names of languages – Part 3: Alpha-3 code for comprehensive coverage of languages, is an international standard for language codes in the ISO 639 series. It defines three-letter codes for identifying languages. The standard was published by International Organization for Standardization (ISO) on 1 February 2007.

<span class="mw-page-title-main">Nahuan languages</span> Uto-Aztecan language family in North America

The Nahuan or Aztecan languages are those languages of the Uto-Aztecan language family that have undergone a sound change, known as Whorf's law, that changed an original *t to before *a. Subsequently, some Nahuan languages have changed this to or back to, but it can still be seen that the language went through a stage. The best known Nahuan language is Nahuatl. Nahuatl is spoken by about 1.7 million Nahua peoples.

ISO 639-6, Codes for the representation of names of languages — Part 6: Alpha-4 code for comprehensive coverage of language variants, was a proposed international standard in the ISO 639 series, developed by ISO/TC 37/SC 2. It contained four-letter codes that denote variants of languages and language families. This allowed one to differentiate between, for example, historical (glvx) versus revived (rvmx) Manx, while ISO 639-3 only includes glv for Manx.

Language localisation is the process of adapting a product's translation to a specific country or region. It is the second phase of a larger process of product translation and cultural adaptation to account for differences in distinct markets, a process known as internationalisation and localisation.

An IETF BCP 47 language tag is a standardized code that is used to identify human languages on the Internet. The tag structure has been standardized by the Internet Engineering Task Force (IETF) in Best Current Practice (BCP) 47; the subtags are maintained by the IANA Language Subtag Registry.

This page is a list of lists of languages.

The Ojibwe language is spoken in a series of dialects occupying adjacent territories, forming a language complex in which mutual intelligibility between adjacent dialects may be comparatively high but declines between some non-adjacent dialects. Mutual intelligibility between some non-adjacent dialects, notably Ottawa, Severn Ojibwe, and Algonquin, is low enough that they could be considered distinct languages. There is no single dialect that is considered the most prestigious or most prominent, and no standard writing system that covers all dialects. The relative autonomy of the regional dialects of Ojibwe is associated with an absence of linguistic or political unity among Ojibwe-speaking groups.

This is a list of ISO 639 codes and IETF language tags for individual constructed languages, complete as of January 2023.

Emilian is a Gallo-Italic unstandardised language spoken in the historical region of Emilia, which is now in the western part of Emilia-Romagna, Northern Italy.

<span class="mw-page-title-main">Varieties of Arabic</span> Family of dialects/variants of Arabic language

Varieties of Arabic are the linguistic systems that Arabic speakers speak natively. Arabic is a Semitic language within the Afroasiatic family that originated in the Arabian Peninsula. There are considerable variations from region to region, with degrees of mutual intelligibility that are often related to geographical distance and some that are mutually unintelligible. Many aspects of the variability attested to in these modern variants can be found in the ancient Arabic dialects in the peninsula. Likewise, many of the features that characterize the various modern variants can be attributed to the original settler dialects as well as local native languages and dialects. Some organizations, such as SIL International, consider these approximately 30 different varieties to be separate languages, while others, such as the Library of Congress, consider them all to be dialects of Arabic.

<span class="mw-page-title-main">Ecuadorian Spanish</span> Variety of Spanish spoken in Ecuador

Spanish is the most-widely spoken language in Ecuador, though great variations are present depending on several factors, the most important one being the geographical region where it is spoken. The three main regional variants are:

Spurious languages are languages that have been reported as existing in reputable works, while other research has reported that the language in question did not exist. Some spurious languages have been proven to not exist. Others have very little evidence supporting their existence, and have been dismissed in later scholarship. Others still are of uncertain existence due to limited research.

<span class="mw-page-title-main">Humla Tibetan language</span> Sino-Tibetan language of western Nepal.

Humla Tibetan, also known as Humla Bhotiya, and Humli Tamang, is the Sino-Tibetan language of the Tibetan people of Humla district in Nepal.

References

  1. "Information on BCP 47 » RFC Editor".
  2. Best Current Practice 47 – Tags for Identifying Languages, IETF
  3. "The Linguasphere Register in PDF". l’Observatoire linguistique (Linguasphere Observatory). Archived from the original on 27 April 2015. Retrieved 20 April 2015.
  4. "Linguasphere Register hierarchy" . Retrieved 8 June 2016.
  5. Verbix language codes Archived 2009-04-01 at the Wayback Machine , Verbix