Linguistic distance

Last updated

Linguistic distance is the measure of how different one language (or dialect) is from another. [1] [2] Although they lack a uniform approach to quantifying linguistic distance between languages, linguists apply the concept to a variety of linguistic contexts, such as second-language acquisition, historical linguistics, language-based conflicts, and the effects of language differences on trade. [3] [4] [5] [6] [7] [8] [9]

Contents

Measures

Lexicostatistics

The proposed measures used for linguistic distance reflect varying understandings of the term itself. One approach is based on mutual intelligibility, i.e. the ability of speakers of one language to understand the other language. With this, the higher the linguistic distance, the lower is the level of mutual intelligibility. [10]

Because cognate words play an important role in mutual intelligibility between languages, these figure prominently in such analyses. The higher the percentage of cognate (as opposed to non-cognate) words in the two languages with respect to one another, the lower is their linguistic distance. Also, the greater the degree of grammatical relatedness (i.e. the cognates mean roughly similar things) and lexical relatedness (i.e. the cognates are easily discernible as related words), the lower is the linguistic distance. [10] As an example of this, the Hindustani word pānch is grammatically identical and lexically similar (but non-identical) to its cognate Punjabi and Persian word panj as well as to the lexically dissimilar but still grammatically identical Greek pent- [11] and English five. As another example, the English dish and German tisch 'table' are lexically (phonologically) similar but grammatically (semantically) dissimilar. Cognates in related languages can even be identical in form, but semantically distinct, such as caldo and largo, which mean respectively 'hot' and 'wide' in Italian but 'broth, soup' and 'long' in Spanish. Using a statistical approach (called lexicostatistics) by comparing each language's mass of words, distances can be calculated between them. In technical terms, what is calculated is the Levenshtein distance. [10] Based on this, one study compared both Afrikaans and West Frisian with Dutch to see which was closer to Dutch. It determined that the Dutch and Afrikaans (mutual distance of 20.9%) were considerably closer than Dutch and West Frisian (mutual distance of 34.2%). [10]

However, lexicostatistical methods, which are based on retentions from a common proto-language – and not innovations – are problematic due to a number of reasons, so some linguists argue they cannot be relied upon during the tracing of a phylogenetic tree (for example, highest retention rates can sometimes be found in the opposite, peripheral ends of a language family). [12] Unusual innovativeness or conservativeness of a language can distort linguistic distance and the assumed separation date, examples being Romani language and East Baltic languages respectively. [12] On the one hand, continued adjacency of closely related languages after their separation can make some loanwords 'invisible' (indistinguishable from cognates, see etymological nativization), therefore, from lexicostatistical point of view these languages appear less distant then they actually are (examples being Finnic and Saami languages). [12] On the other hand, strong foreign influence of languages spreading far from their homeland can make them share fewer inherited words than they ought to (examples being Hungarian and Samoyedic languages in the East Uralic branch). [12]

Other internal aspects

Besides cognates, other aspects that are often measured are similarities of syntax and written forms. [13]

To overcome the aforementioned problems of the lexicostatistical methods, Donald Ringe, Tandy Warnow and Luay Nakhleh developed a complex phylogenetical method relying on phonological and morphological innovations in 2000s. [12]

Language learning

A 2005 paper by economists Barry Chiswick and Paul Miller attempted to put forth a metric for linguistic distances that was based on empirical observations of how rapidly speakers of a given language gained proficiency in another one when immersed in a society that overwhelmingly communicated in the latter language. In this study, the speed of English language acquisition was studied for immigrants of various linguistic backgrounds in the United States and Canada.[ vague ] [13]

See also

Related Research Articles

<span class="mw-page-title-main">Comparative method</span> Technique for studying the historical development of languages, based on language comparison

In linguistics, the comparative method is a technique for studying the development of languages by performing a feature-by-feature comparison of two or more languages with common descent from a shared ancestor and then extrapolating backwards to infer the properties of that ancestor. The comparative method may be contrasted with the method of internal reconstruction in which the internal development of a single language is inferred by the analysis of features within that language. Ordinarily, both methods are used together to reconstruct prehistoric phases of languages; to fill in gaps in the historical record of a language; to discover the development of phonological, morphological and other linguistic systems and to confirm or to refute hypothesised relationships between languages.

The term dialect can refer to either of two distinctly different types of linguistic phenomena:

Lexicology is the branch of linguistics that analyzes the lexicon of a specific language. A word is the smallest meaningful unit of a language that can stand on its own, and is made up of small components called morphemes and even smaller elements known as phonemes, or distinguishing sounds. Lexicology examines every feature of a word – including formation, spelling, origin, usage, and definition.

<span class="mw-page-title-main">Ural-Altaic languages</span> Former language family proposal

Ural-Altaic, Uralo-Altaic or Uraltaic is a linguistic convergence zone and former language-family proposal uniting the Uralic and the Altaic languages. It is generally now agreed that even the Altaic languages do not share a common descent: the similarities among Turkic, Mongolic and Tungusic are better explained by diffusion and borrowing. Just as Altaic, internal structure of the Uralic family also has been debated since the family was first proposed. Doubts about the validity of most or all of the proposed higher-order Uralic branchings are becoming more common. The term continues to be used for the central Eurasian typological, grammatical and lexical convergence zone.

<span class="mw-page-title-main">Creole language</span> Stable natural languages that have developed from a pidgin

A creole language, or simply creole, is a stable natural language that develops from the process of different languages simplifying and mixing into a new form, and then that form expanding and elaborating into a full-fledged language with native speakers, all within a fairly brief period of time. While the concept is similar to that of a mixed or hybrid language, creoles are often characterized by a tendency to systematize their inherited grammar. Like any language, creoles are characterized by a consistent system of grammar, possess large stable vocabularies, and are acquired by children as their native language. These three features distinguish a creole language from a pidgin. Creolistics, or creology, is the study of creole languages and, as such, is a subfield of linguistics. Someone who engages in this study is called a creolist.

Historical linguistics, also termed diachronic linguistics, is the scientific study of language change over time. Principal concerns of historical linguistics include:

  1. to describe and account for observed changes in particular languages
  2. to reconstruct the pre-history of languages and to determine their relatedness, grouping them into language families
  3. to develop general theories about how and why language changes
  4. to describe the history of speech communities
  5. to study the history of words, i.e. etymology

Glottochronology is the part of lexicostatistics which involves comparative linguistics and deals with the chronological relationship between languages.

Linguistics is the scientific study of human language. Someone who engages in this study is called a linguist. See also the Outline of linguistics, the List of phonetics topics, the List of linguists, and the List of cognitive science topics. Articles related to linguistics include:

Lexicostatistics is a method of comparative linguistics that involves comparing the percentage of lexical cognates between languages to determine their relationship. Lexicostatistics is related to the comparative method but does not reconstruct a proto-language. It is to be distinguished from glottochronology, which attempts to use lexicostatistical methods to estimate the length of time since two or more languages diverged from a common earlier proto-language. This is merely one application of lexicostatistics, however; other applications of it may not share the assumption of a constant rate of change for basic lexical items.

<span class="mw-page-title-main">Mutual intelligibility</span> Closeness of linguistic varieties

In linguistics, mutual intelligibility is a relationship between languages or dialects in which speakers of different but related varieties can readily understand each other without prior familiarity or special effort. It is sometimes used as an important criterion for distinguishing languages from dialects, although sociolinguistic factors are often also used.

Comparative linguistics, or comparative-historical linguistics is a branch of historical linguistics that is concerned with comparing languages to establish their historical relatedness.

Mbabaram (Barbaram) is an extinct Australian Aboriginal language of north Queensland. It was the traditional language of the Mbabaram people. Known speakers were Albert Bennett, Alick Chalk, Jimmy Taylor and Mick Burns. Recordings of Bennett and Chalk are held in the Audiovisual Archive of the Australian Institute of Aboriginal and Torres Strait Islander Studies. R. M. W. Dixon described his hunt for a native speaker of Mbabaram in his book Searching for Aboriginal Languages: Memoirs of a Field Worker. Most of what is known of the language is from Dixon's field research with Bennett.

Dialectology is the scientific study of linguistic dialect, a sub-field of sociolinguistics. It studies variations in language based primarily on geographic distribution and their associated features. Dialectology deals with such topics as divergence of two local dialects from a common ancestor and synchronic variation.

The Left May or Arai languages are a small language family of half a dozen closely related but not mutually intelligible languages in the centre of New Guinea, in the watershed of the Left May River. There are only about 2,000 speakers in all. Foley (2018) classifies them separately as an independent language family, while Usher (2020) links them with the Amto–Musan languages.

In linguistics, lexical similarity is a measure of the degree to which the word sets of two given languages are similar. A lexical similarity of 1 would mean a total overlap between vocabularies, whereas 0 means there are no common words.

The classification of the Japonic languages and their external relations is unclear. Linguists traditionally consider the Japonic languages to belong to an independent family; indeed, until the classification of Ryukyuan as separate languages within a Japonic family rather than as dialects of Japanese, Japanese was considered a language isolate.

Pseudoscientific language comparison is a form of pseudo-scholarship that aims at establishing historical associations between languages by naïve postulations of similarities between them.

Quantitative comparative linguistics is the use of quantitative analysis as applied to comparative linguistics. Examples include the statistical fields of lexicostatistics and glottochronology, and the borrowing of phylogenetics from biology.

Linguistics is the scientific study of language. It entails the comprehensive, systematic, objective, and precise analysis of all aspects of language — cognitive, social, environmental, biological as well as structural.

The Automated Similarity Judgment Program (ASJP) is a collaborative project applying computational approaches to comparative linguistics using a database of word lists. The database is open access and consists of 40-item basic-vocabulary lists for well over half of the world's languages. It is continuously being expanded. In addition to isolates and languages of demonstrated genealogical groups, the database includes pidgins, creoles, mixed languages, and constructed languages. Words of the database are transcribed into a simplified standard orthography (ASJPcode). The database has been used to estimate dates at which language families have diverged into daughter languages by a method related to but still different from glottochronology, to determine the homeland (Urheimat) of a proto-language, to investigate sound symbolism, to evaluate different phylogenetic methods, and several other purposes.

References

  1. Colin Renfrew; April M. S. McMahon; Robert Lawrence Trask (2000), Time depth in historical linguistics, McDonald Institute for Archaeological Research, 2000, ISBN   978-1-902937-06-9, ... The term 'linguistic distance' is often used to refer to the degree of similarity/ difference between any two language varieties ...
  2. Li Wei (2000), The bilingualism reader, Psychology Press, 2000, ISBN   978-0-415-21336-3, ... linguistic distance is a notion which still remains problematic (for a discussion, see Hinskens, 1988), it does seem possible to place languages along a continuum based on formal characteristics such as the number of cognates in languages or sets of shared syntactic characteristics ...
  3. Michael H. Long (15 July 2009), The Handbook of Language Teaching, John Wiley and Sons, 2009, ISBN   978-1-4051-5489-5, ... findings from work on linguistic transfer, typology and 'linguistic distance' ... two related issues arise in these studies: typological distance/phylogenetic relatedness and transfer ... Spanish-Basque bilinguals learning English demonstrated a stronger influence from Spanish, typologically a closer language ...
  4. Terry Crowley; Claire Bowern (4 March 2010), An Introduction to Historical Linguistics, Oxford University Press US, 2009, ISBN   978-0-19-536554-2, ... Methods that hypothesize relationships in this way are called distance-based methods because they infer the historical relationships from the linguistic distance between languages. Lexicostatistics is a commonly used distance-based ...
  5. North-western European language evolution: NOWELE, Issues 27-29, Odense University Press, 1996, 1996, ISBN   9788778381842, ... The main reason for the rapid language shift is said to be the lack of linguistic 'distance' between the two codes (both of them being Germanic and therefore genetically closely related) ...
  6. Marshall B. Reinsdorf; Matthew Jon Slaughter (1 August 2009), International trade in services and intangibles in the era of globalization, University of Chicago Press, 2009, ISBN   978-0-226-70959-8, ... We measure cultural trade costs between the United States and its trading partners using indicators of the linguistic distance between English and other countries' primary languages ...
  7. Jeffrey A. Frankel; Ernesto Stein; Shang-Jin Wei (1997), Regional trading blocs in the world economic system, Peterson Institute, 1997, ISBN   978-0-88132-202-6, ... The implication is that two countries sharing linguistic/colonial links tend to trade roughly 55 percent more than they would ... a new measure of linguistic distance that is a continuous scalar rather than a discrete dummy variable ...
  8. William Hernandez Requejo; John L. Graham (4 March 2008), Global negotiation: the new rules, Macmillan, 2008, ISBN   978-1-4039-8493-7, ... Linguisitic distance has been shown to be an important factor in determining the amount of trade between countries ... 'wider' language differences increases transaction costs and makes trade and negotiations less efficient ...
  9. Jyotirindra Dasgupta, University of California, Berkeley. Center for South and Southeast Asia Studies (1 January 1970), Language conflict and national development: group politics and national language policy in India, University of California Press, 1970, ISBN   978-0-520-01590-6, ... The linguistic distance between East and West Pakistan has therefore tended to increase ...{{citation}}: CS1 maint: multiple names: authors list (link)
  10. 1 2 3 4 Jan D. ten Thije; Ludger Zeevaert (1 January 2007), Receptive multilingualism: linguistic analyses, language policies, and didactic concepts, John Benjamins Publishing Company, 2007, ISBN   978-90-272-1926-8, ... Assuming that intelligibility is inversely related to linguistic distance ... the content words the percentage of cognates (related directly or via a synonym) ... lexical relatedness ... grammatical relatedness ...
  11. List of Greek and Latin roots in English#P
  12. 1 2 3 4 5 Häkkinen, Jaakko (23 September 2012). "Problems in the method and interpretations of the computational phylogenetics based on linguistic data An example of wishful thinking" (PDF). elisanet.fi. Archived (PDF) from the original on 1 September 2013. Retrieved 18 November 2022.
  13. 1 2 Chiswick, B. R.; Miller, P. W. (2005). "Linguistic Distance: A Quantitative Measure of the Distance Between English and Other Languages". Journal of Multilingual and Multicultural Development. 26: 1–11. doi:10.1080/14790710508668395. hdl: 10419/20510 . S2CID   145544574.... vocabulary, grammar, written form, syntax and myriad other statistics ... this scalar measure of "linguisitic distance" is demonstrated through an analysis of the determinants of English language proficiency among immigrants ...