Lexical similarity

Last updated

In linguistics, lexical similarity is a measure of the degree to which the word sets of two given languages are similar. A lexical similarity of 1 (or 100%) would mean a total overlap between vocabularies, whereas 0 means there are no common words.

Contents

There are different ways to define the lexical similarity and the results vary accordingly. For example, Ethnologue 's method of calculation consists in comparing a regionally standardized wordlist (comparable to the Swadesh list) and counting those forms that show similarity in both form and meaning. Using such a method, English was evaluated to have a lexical similarity of 60% with German and 27% with French.

Lexical similarity can be used to evaluate the degree of genetic relationship between two languages. Percentages higher than 85% usually indicate that the two languages being compared are likely to be related dialects. [1]

The lexical similarity is only one indication of the mutual intelligibility of the two languages, since the latter also depends on the degree of phonetical, morphological, and syntactical similarity. The variations due to differing wordlists weigh on this. For example, lexical similarity between French and English is considerable in lexical fields relating to culture, whereas their similarity is smaller as far as basic (function) words are concerned. Unlike mutual intelligibility, lexical similarity can only be symmetrical.


East Asian languages

There are words borrowed from Chinese which are called Sino-Korean vocabulary, and there are new Korean words created from Chinese characters, and there are also words borrowed from Sino-Japanese vocabulary. According to the Standard Korean Language Dictionary published by the National Institute of Korean Language (NIKL), Sino-Korean represents approximately 57% of the Korean vocabulary. [2]

As for Japanese, it has been estimated that about 60% of the words contained in modern Japanese dictionaries are Sino-Japanese, [3] and that about 18–20% of words used in common speech are Sino-Japanese, as measured by the National Institute for Japanese Language in its study of language use in NHK broadcasts from April to June 1989. [4] The usage of such Sino-Japanese words also increase in formal or literary contexts, and in expressions of abstract or complex ideas. [4]


Despite the borrowing of many Chinese words into the Japanese and Korean languages, speakers of the three languages do not have enough mutual intelligibility to be able to communicate with each other. Japanese and Korean aren't tonal languages, but Chinese languages are tonal, which means that the proper pronunciation of a syllable for a word is important for communication, as well as the proper tone when pronouncing a word. When Chinese symbols (Hanzi) are used for writing in Korean (which are called "Hanja") and in Japanese (which are called "Kanji"), sometimes a few words can be understood in a sentence, but an entire sentence is highly unlikely to be understood even in writing. Japanese and Korean have their own writing systems which are different from Hanzi, so entirely sentences aren't likely to be fully written in borrowed Chinese symbols.

Indo-European languages


A study conducted by Mario Pei in 1949 which analyzed the degree of differentiation of languages from their parental language (in the case of Romance languages to Latin comparing phonology, inflection, discourse, syntax, vocabulary, and intonation) produced the following percentages (the higher the percentage, the greater the distance from Latin): [5]



The table below shows some lexical similarity values for pairs of selected Romance, Germanic, and Slavic languages, as collected and published by Ethnologue . [6]

Lang.
code
Language 1
Lexical similarity coefficients
ItalianSpanishPortugueseFrenchRomanianCatalanRomanshSardinianEnglishGermanRussian
ita Italian 10.820.800.890.770.870.780.85---
spa Spanish 0.8210.890.750.710.850.740.76---
por Portuguese 0.800.8910.750.720.850.740.760.20--
fra French 0.890.750.7510.75-0.780.800.270.29-
ron Romanian 0.770.710.720.7510.730.720.74---
cat Catalan 0.870.850.85-0.7310.760.75---
roh Romansh 0.780.740.740.780.720.7610.74---
srd Sardinian 0.850.760.760.800.740.750.741---
eng English --0.200.27----10.600.24
deu German ---0.29----0.601-
rus Russian --------0.24-1
ItalianSpanishPortugueseFrenchRomanianCatalanRomanshSardinianEnglishGermanRussian
Language 2 →itaspaporfraroncatrohsrdengdeurus

Notes:

See also

Related Research Articles

<span class="mw-page-title-main">Chinese language</span> National language of China

Chinese is a group of languages spoken natively by the ethnic Han Chinese majority and many minority ethnic groups in China. Approximately 1.35 billion people, or 17% of the global population, speak a variety of Chinese as their first language.

<span class="mw-page-title-main">Korean language</span> Language spoken in Korea

Korean is the native language for about 81 million people, mostly of Korean descent. It is the national language of both North Korea and South Korea.

<span class="mw-page-title-main">Vietnamese language</span> Austroasiatic spoken language

Vietnamese is an Austroasiatic language spoken primarily in Vietnam where it is the national and official language. Vietnamese is spoken natively by around 85 million people, several times as many as the rest of the Austroasiatic family combined. It is the native language of the Vietnamese (Kinh) people, as well as a second or first language for other ethnic groups in Vietnam, and still used by Vietnamese diaspora in the world.

Dungan is a Sinitic language spoken primarily in Kazakhstan, Russia and Kyrgyzstan by the Dungan people, an ethnic group related to the Hui people of China. Although it is derived from the Central Plains Mandarin of Gansu and Shaanxi, it is written in Cyrillic and contains loanwords and archaisms not found in other modern varieties of Mandarin.

<span class="mw-page-title-main">Southern Min</span> Branch of the Min Chinese languages

Southern Min, Minnan or Banlam, is a group of linguistically similar and historically related Chinese languages that form a branch of Min Chinese spoken in Fujian, most of Taiwan, Eastern Guangdong, Hainan, and Southern Zhejiang. Southern Min dialects are also spoken by descendants of emigrants from these areas in diaspora, most notably in Southeast Asia, such as Singapore, Malaysia, the Philippines, Indonesia, Brunei, Southern Thailand, Myanmar, Cambodia, Southern and Central Vietnam, San Francisco, Los Angeles and New York City. Minnan is the most widely-spoken branch of Min, with approximately 48 million speakers as of 2017–2018.

<span class="mw-page-title-main">Campidanese Sardinian</span> Written standard of the Sardinian language

Campidanese Sardinian also known as Southern Sardinian is one of the two written standards of the Sardinian language, which is often considered one of the most, if not the most conservative of all the Romance languages. The orthography is based on the spoken dialects of central southern Sardinia, identified by certain attributes which are not found, or found to a lesser degree, among the Sardinian dialects centered on the other written form, Logudorese. Its ISO 639-3 code is sro.

Sino-Japanese vocabulary, also known as kango, is a subset of Japanese vocabulary that originated in Chinese or was created from elements borrowed from Chinese. Some grammatical structures and sentence patterns can also be identified as Sino-Japanese.

<span class="mw-page-title-main">Mutual intelligibility</span> Closeness of linguistic varieties

In linguistics, mutual intelligibility is a relationship between different but related language varieties in which speakers of the different varieties can readily understand each other without prior familiarity or special effort. Mutual intelligibility is sometimes used to distinguish languages from dialects, although sociolinguistic factors are often also used.

A pluricentric language or polycentric language is a language with several codified standard forms, often corresponding to different countries. Many examples of such languages can be found worldwide among the most-spoken languages, including but not limited to Chinese in the People's Republic of China, Taiwan and Singapore; English in the United States, United Kingdom, Canada, Australia, New Zealand, Ireland, South Africa, India, and elsewhere; and French in France, Canada, and elsewhere. The converse case is a monocentric language, which has only one formally standardized version. Examples include Japanese and Russian. In some cases, the different standards of a pluricentric language may be elaborated to appear as separate languages, e.g. Malaysian and Indonesian, Hindi and Urdu, while Serbo-Croatian is in an earlier stage of that process.

Ratagnon is a regional language spoken by the Ratagnon people, an indigenous group from Occidental Mindoro. It is a part of the Bisayan language family and is closely related to other Philippine languages. Its speakers are shifting to Tagalog. In 2000, there were only two to five speakers of the language. However, in 2010 Ethnologue had reported there were 310 new speakers.

Language contact occurs when speakers of two or more languages or varieties interact with and influence each other. The study of language contact is called contact linguistics. Language contact can occur at language borders, between adstratum languages, or as the result of migration, with an intrusive language acting as either a superstratum or a substratum.

<span class="mw-page-title-main">Linguistic purism</span> Preferring a language variety as purer

Linguistic purism or linguistic protectionism is a concept having a dual notion with respect to foreign languages and with respect to the internal variants of a language (dialects) The first meaning is the historical trend of every language to conserve intact its lexical structure of word families, in opposition to foreign influence which are considered 'impure'. The second meaning is the practice, possibly prescriptive, of determining and recognizing one linguistic variety (dialect) as being purer or of intrinsically higher quality than other varieties.

Abun, also known as Yimbun, Anden, Manif, or Karon Pantai, is a Papuan language spoken by the Abun people along the northern coast of the Bird's Head Peninsula in Sausapor District, Tambrauw Regency. It is not closely related to any other language, and though Ross (2005) assigned it to the West Papuan family, based on similarities in pronouns, Palmer (2018), Ethnologue, and Glottolog list it as a language isolate.

<span class="mw-page-title-main">Kadu language</span> Sino-Tibetan language of Burma

Kadu or Kado ; is a Sino-Tibetan language of the Sal branch spoken in Sagaing Region, Myanmar by the Kadu people. Dialects are Settaw, Mawkhwin, and Mawteik [extinct], with 30,000 speakers total. Kadu is considered an endangered language, and is closely related to the Ganan and Sak languages.

Khumi, or Khumi Chin, is a Kuki-Chin-Mizo language of Burma, with some speakers across the border in Bangladesh. Khumi shares 75%–87% lexical similarity with Eastern Khumi, and 78-81% similarity with Mro-Khimi.

Pyen is a Loloish language of Myanmar. It is spoken by about 700 people in two villages near Mong Yang, Shan State, Burma, just to the north of Kengtung.

<span class="mw-page-title-main">Dupaningan Agta</span> Austronesian language of the Philippines

Dupaningan Agta, or Eastern Cagayan Agta, is a language spoken by a semi-nomadic hunter-gatherer Negrito people of Cagayan and Isabela provinces in northern Luzon, Philippines. Its Yaga dialect is only partially intelligible.

Anu-Hkongso is a Sino-Tibetan language spoken between the Kaladan and Michaung rivers in Paletwa Township, Chin State, Burma. It is closely related to Mru, forming the Mruic language branch, whose position within Sino-Tibetan is unclear. It consists of two dialects, Anu (Añú) and Hkongso.

<span class="mw-page-title-main">Southern Alta language</span> Austronesian language spoken in the Philippines

Southern Alta, is a distinctive Aeta language of the mountains of northern Philippines. Southern Alta is one of many endangered languages that risks being lost if it is not passed on by current speakers. Most speakers of Southern Alta also speak Tagalog.

<span class="mw-page-title-main">North Korean standard language</span> Language-Alphabet system of North Korea

North Korean standard language or Munhwaŏ is the North Korean standard version of the Korean language. Munhwaŏ was adopted as the standard in 1966. The adopting proclamation stated that the Pyongan dialect spoken in the North Korean capital Pyongyang and its surroundings should be the basis for Munhwaŏ. Though this view is supported by some linguists, others posit that Munhwaŏ remains "firmly rooted" in the Seoul dialect, which had been the national standard for centuries. Thus, while the first group indicate that, besides the large divergence at the level of vocabulary, differences between the North and South Korean standards also include phonetic and phonological features, as well as stress and intonation, the others consider these differences attributable to replacement of Sino-Korean vocabulary and other loanwords with pure Korean words, or the Northern ideological preference for "the speech of the working class" which includes some words considered non-standard in the South.

References

Notes

  1. "Methodology". Ethnologue. 2024-02-21. Retrieved 2024-05-31.
  2. Choo, Miho; O'Grady, William (1996). Handbook of Korean Vocabulary: An Approach to Word Recognition and Comprehension. University of Hawaii Press. pp. ix. ISBN   0824818156.
  3. Shibatani, Masayoshi. The Languages of Japan (Section 7.2 "Loan words", p.142), Cambridge University Press, 1990. ISBN   0-521-36918-5
  4. 1 2 国立国語研究所『テレビ放送の語彙調査I』(平成7年,秀英出版)Kokuritsu Kokugo Kenkyuujo, "Terebi Hoosoo no Goi Choosa 1" (1995, Shuuei Publishing)
  5. Pei, Mario (1949). Story of Language. Lippincott. ISBN   03-9700-400-1.
  6. See, for instance, lexical similarity data for French, German, English
  7. 1 2 "Bolognesi, Roberto; Heeringa, Wilbert. Sardegna fra tante lingue, pp.123, 2005, Condaghes" (PDF). Archived from the original (PDF) on 2014-02-11. Retrieved 2017-04-14.
  8. Finkenstaedt, Thomas; Dieter Wolff (1973). Ordered profusion; studies in dictionaries and the English lexicon. C. Winter. ISBN   3-533-02253-6.
  9. "Joseph M. Willams, Origins of the English Language at". Amazon.com. Retrieved 2010-04-21.
  10. Nation, I.S.P. (2001). Learning Vocabulary in Another Language. Cambridge University Press. p. 477. ISBN   0-521-80498-1.