Language code

Last updated December 29, 2023

A language code is a code that assigns letters or numbers as identifiers or classifiers for languages. These codes may be used to organize library collections or presentations of data, to choose the correct localizations and translations in computing, and as a shorthand designation for longer forms of language names.

Difficulties of classification

Language code schemes attempt to classify the complex world of human languages, dialects, and variants. Most schemes make some compromises between being general and being complete enough to support specific dialects.

For example, Spanish is spoken in over 20 countries in North America, Central America, the Caribbean, and Europe. Spanish spoken in Mexico will be slightly different from Spanish spoken in Peru. Different regions of Mexico will have slightly different dialects and accents of Spanish. A language code scheme might group these all as "Spanish" for choosing a keyboard layout, most as "Spanish" for general usage, or separate each dialect to allow region-specific variation.

Common schemes

List of some common language code schemes
Scheme	Notes	Examples for English	Examples for Spanish
Glottolog codes	Created for minority languages as a scientific alternative to the industrial ISO 639‑3 standard. Intentionally do not resemble abbreviations.	stan1293 – standard English macr1271 – macro-English (Modern English, incl. creoles) midd1317 – Middle English merc1242 – Mercian (Middle to Modern English) olde1238 – Old English angl1265 – Anglian (Old to Modern English, incl. Scots)	stan1288 – standard Spanish olds1249 – Old Spanish cast1243 – Castilic (Old to Modern Spanish, incl. Extremaduran and creoles)
IETF language tag	An IETF best practice, specified by BCP 47,^[1] for language tags easy to parse by computer. The tag system is extensible to region, dialect, and private designations. It references ISO 639, ISO 3166 and ISO 15924.	en – English, as shortest ISO 639 code. en-US – English as used in the United States (US is the ISO 3166‑1 country code for the United States) Source: IETF memo^[2]	es – Spanish, as shortest ISO 639 code. es-419 – Spanish appropriate for the Latin America and Caribbean region, using the UN M.49 region code
ISO 639‑1	Two-letter code system made official in 2002, containing 136 codes at the time. Many systems use two-letter ISO 639‑1 codes supplemented by three-letter ISO 639‑2 codes when no two-letter code is applicable. There are 183 two-letter codes registered as of June 2021. See: List of ISO 639 language codes	en	es – Spanish
ISO 639‑2	Three-letter system of 464 codes. See: List of ISO 639-2 codes	eng – three-letter code enm – Middle English, c. 1100–1500 ang – Old English, c. 450–1100 cpe – other English-based creoles and pidgins	spa – Spanish
ISO 639‑3	An extension of ISO 639‑2 to cover all known, living or dead, spoken or written languages in 7,589 entries. See: List of ISO 639-3 codes	eng – three-letter code enm – Middle English, c. 1100–1500 aig – Antigua and Barbuda Creole English ang – Old English, c. 450–1100 svc – Vincentian Creole English	spa – Spanish spq – Spanish, Loreto-Ucayali ssp – Spanish sign language
Linguasphere Register code-system	Two-digit + one to six letter Linguasphere Register code-system published in 2000,^[3] containing over 32,000 codes within 10 sectors of reference, covering the world's languages and speech communities. Navigate also the hierarchy of the Linguasphere Register code-system published online by hortensj-garden.org ^[4]	Within hierarchy of Linguasphere Register code-system: 5= Indo-European phylosector 52= Germanic phylozone 52-A Germanic set 52-AB English + Anglo-Creole chain 52-ABA English net 52-ABA-c Global English (outer unit) 52-ABA-ca to 52-ABA-cwe (186 varieties) Compare:52-ABA-a Scots + Northumbrian outer unit & 52-ABA-b "Anglo-English" outer unit (= South Great Britain traditional varieties + Old Anglo-Irish)	Within hierarchy of Linguasphere Register code-system: 5= Indo-European phylosector 51= Romanic phylozone 51-A Romance set 51-AA Romance chain 51-AAA West Romance net 51-AAA-b Español/Castellano (outer unit) 51-AAA-ba to 51-AAA-bkk (58 varieties) Compare:51-AAA-a Português + Galego outer unit & 51-AAA-c Astur + Leonés outer unit, etc.
SIL codes (10th–14th editions)	Codes created for use in the Ethnologue , a publication of SIL International that lists language statistics. The publication now uses ISO 639‑3 codes.	ENG	SPN
Verbix language codes	Constructed codes starting with old SIL codes and adding more information.^[5]	ENG	SPN

Related Research Articles

ISO 639 is a standard by the International Organization for Standardization (ISO) concerned with representation of languages and language groups. It currently consists of four sets of code, named after each part which formerly described respective set ; a part 6 was published but withdrawn. It was first approved in 1967 as a single-part ISO Recommendation, ISO/R 639, superseded in 2002 by part 1 of the new series, ISO 639-1, followed by additional parts. All existing parts of the series were consolidated into a single standard in 2023, largely based on the text of ISO 639-4.

This is a partial index of Wikipedia articles treating natural languages, arranged alphabetically and with (sub-) families mentioned. The list also includes extinct languages.

Southern Min, Minnan or Banlam, is a group of linguistically similar and historically related Chinese languages that form a branch of Min Chinese spoken in Fujian, most of Taiwan, Eastern Guangdong, Hainan, and Southern Zhejiang. Southern Min dialects are also spoken by descendants of emigrants from these areas in diaspora, most notably in Southeast Asia, such as Singapore, Malaysia, the Philippines, Indonesia, Brunei, Southern Thailand, Myanmar, Cambodia, Southern and Central Vietnam, San Francisco, Los Angeles and New York City. Minnan is the most widely-spoken branch of Min, with approximately 48 million speakers as of 2017–2018.

ISO 639-1:2002, Codes for the representation of names of languages—Part 1: Alpha-2 code, is the first part of the ISO 639 series of international standards for language codes. Part 1 covers the registration of two-letter codes. There are 183 two-letter codes registered as of June 2021. The registered codes cover the world's major languages.

In computing, a locale is a set of parameters that defines the user's language, region and any special variant preferences that the user wants to see in their user interface. Usually a locale identifier consists of at least a language code and a country/region code. Locale is an important aspect of i18n.

The South Slavic languages are one of three branches of the Slavic languages. There are approximately 30 million speakers, mainly in the Balkans. These are separated geographically from speakers of the other two Slavic branches by a belt of German, Hungarian and Romanian speakers.

The Linguasphere Observatory is a non-profit transnational research network, devoted to the gathering, study, classification, editing and free distribution online of the updatable text of a fully indexed and comprehensive Linguasphere Register of the World's Languages and Speech Communities.

ISO 639-3:2007, Codes for the representation of names of languages – Part 3: Alpha-3 code for comprehensive coverage of languages, is an international standard for language codes in the ISO 639 series. It defines three-letter codes for identifying languages. The standard was published by International Organization for Standardization (ISO) on 1 February 2007.

The Nahuan or Aztecan languages are those languages of the Uto-Aztecan language family that have undergone a sound change, known as Whorf's law, that changed an original *t to before *a. Subsequently, some Nahuan languages have changed this to or back to, but it can still be seen that the language went through a stage. The best known Nahuan language is Nahuatl. Nahuatl is spoken by about 1.7 million Nahua peoples.

ISO 639-6, Codes for the representation of names of languages — Part 6: Alpha-4 code for comprehensive coverage of language variants, was a proposed international standard in the ISO 639 series, developed by ISO/TC 37/SC 2. It contained four-letter codes that denote variants of languages and language families. This allowed one to differentiate between, for example, historical (glvx) versus revived (rvmx) Manx, while ISO 639-3 only includes glv for Manx.

Language localisation is the process of adapting a product's translation to a specific country or region. It is the second phase of a larger process of product translation and cultural adaptation to account for differences in distinct markets, a process known as internationalisation and localisation.

An IETF BCP 47 language tag is a standardized code or tag that is used to identify human languages in the Internet. The tag structure has been standardized by the Internet Engineering Task Force (IETF) in Best Current Practice (BCP) 47; the subtags are maintained by the IANA Language Subtag Registry.

This page is a list of lists of languages.

The Ojibwe language is spoken in a series of dialects occupying adjacent territories, forming a language complex in which mutual intelligibility between adjacent dialects may be comparatively high but declines between some non-adjacent dialects. Mutual intelligibility between some non-adjacent dialects, notably Ottawa, Severn Ojibwe, and Algonquin, is low enough that they could be considered distinct languages. There is no single dialect that is considered the most prestigious or most prominent, and no standard writing system that covers all dialects. The relative autonomy of the regional dialects of Ojibwe is associated with an absence of linguistic or political unity among Ojibwe-speaking groups.

This is a list of ISO 639 codes and IETF language tags for individual constructed languages, complete as of January 2023.

Emilian is a Gallo-Italic unstandardised language spoken in the historical region of Emilia, which is now in the northwestern part of Emilia-Romagna, Northern Italy.

Varieties of Arabic are the linguistic systems that Arabic speakers speak natively. Arabic is a Semitic language within the Afroasiatic family that originated in the Arabian Peninsula. There are considerable variations from region to region, with degrees of mutual intelligibility that are often related to geographical distance and some that are mutually unintelligible. Many aspects of the variability attested to in these modern variants can be found in the ancient Arabic dialects in the peninsula. Likewise, many of the features that characterize the various modern variants can be attributed to the original settler dialects as well as local native languages and dialects. Some organizations, such as SIL International, consider these approximately 30 different varieties to be separate languages, while others, such as the Library of Congress, consider them all to be dialects of Arabic.

Spanish is the most-widely spoken language in Ecuador, though great variations are present depending on several factors, the most important one being the geographical region where it is spoken. The three main regional variants are:

Spurious languages are languages that have been reported as existing in reputable works, while other research has reported that the language in question did not exist. Some spurious languages have been proven to not exist. Others have very little evidence supporting their existence, and have been dismissed in later scholarship. Others still are of uncertain existence due to limited research.

Humla Tibetan, also known as Humla Bhotiya, and Humli Tamang, is the Sino-Tibetan language of the Tibetan people of Humla district in Nepal.

References

↑ "Information on BCP 47 » RFC Editor".
↑ Best Current Practice 47 – Tags for Identifying Languages, IETF
↑ "The Linguasphere Register in PDF". l’Observatoire linguistique (Linguasphere Observatory). Archived from the original on 27 April 2015. Retrieved 20 April 2015.
↑ "Linguasphere Register hierarchy" . Retrieved 8 June 2016.
↑ Verbix language codes Archived 2009-04-01 at the Wayback Machine , Verbix

External links

List of usual language codes and its variants
Language Tags in HTML and XML
Language Identifiers in the Markup Context

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "Information on BCP 47 » RFC Editor".

[2] Best Current Practice 47 – Tags for Identifying Languages, IETF

[3] "The Linguasphere Register in PDF". l’Observatoire linguistique (Linguasphere Observatory). Archived from the original on 27 April 2015. Retrieved 20 April 2015.

[4] "Linguasphere Register hierarchy" . Retrieved 8 June 2016.

[5] Verbix language codes Archived 2009-04-01 at the Wayback Machine , Verbix

[1]

[2]

[3]

[4]

[5]