Leipzig–Jakarta list

Last updated May 31, 2024

The Leipzig–Jakarta list of 100 words is used by linguists to test the degree of chronological separation of languages by comparing words that are resistant to borrowing. The Leipzig–Jakarta list became available in 2009.^[1] The word list is named after the cities of Leipzig, Germany, and Jakarta, Indonesia, the places where the list was conceived and created.

In the 1950s, the linguist Morris Swadesh published a list of 200 words called the Swadesh list, allegedly the 200 lexical concepts found in all languages that were least likely to be borrowed from other languages. Swadesh later whittled his list down to 100 items. The Swadesh list, however, was based mainly on intuition, according to Martin Haspelmath and Uri Tadmor.^[2] In origin, the words in the Swadesh lists were chosen for their universal, culturally independent availability in as many languages as possible, regardless of their "stability". Nevertheless, the stability of the resulting list of "universal" vocabulary under language change and the potential use of this fact for purposes of glottochronology have been analyzed by numerous authors, including Marisa Lohr 1999, 2000.^[3]

The Swadesh list was put together by Morris Swadesh on the basis of his intuition. Similar more recent lists, such as the Dolgopolsky list (1964) or the Leipzig–Jakarta list, are based on systematic data from many different languages, but they are not yet as widely known nor as widely used as the Swadesh list. Although he was one of the pioneers of glottochronology and lexicostatistics, his theories were often controversial, and some have been deprecated by later linguists.^[4]

The Loanword Typology Project, with the World Loanword Database (WOLD), published by the Max Planck Digital Library, was established to rectify this problem. Experts on 41 languages from across the world were given a uniform vocabulary list and asked to provide the words for each item in the language on which they were an expert, as well as information on how strong the evidence was that each word was borrowed.^[5] The 100 concepts that were found in most languages and were most resistant to borrowing formed the Leipzig–Jakarta list. Only 62 items on the Leipzig–Jakarta list and on the 100-word Swadesh list overlap, hence a 38% difference between the two lists.

A quarter of the words in the Leipzig–Jakarta list are human body parts: mouth , eye , leg / foot , navel , liver , knee , etc.^[6] Six animal words appear on the list: fish , bird , dog , louse , ant and fly – animal species that are present wherever humans are as well.^[2]

The items house , name , rope and to tie are products of human culture, but are probably found in all present-day human societies. Haspelmath and Tadmor drew the conclusion that "rope is the most basic of human tools and tying is the most basic technology".^[2]

List

Lexical items in the Leipzig–Jakarta list are ranked by semantic stability, i.e. words least likely to be replaced by other words as a language evolves.^[7]^[8] The right two columns indicate inclusion on the 100-word and 207-word Swadesh lists.^[9]

Rank	Word meaning	100-word Swadesh list	207-word Swadesh list
1	fire	✔	✔
2	nose	✔	✔
3	to go
4	water	✔	✔
5	mouth	✔	✔
6	tongue	✔	✔
7	blood	✔	✔
8	bone	✔	✔
9	2nd-person singular pronoun (you)	✔	✔
10	root	✔	✔
11	to come	✔	✔
12	breast	✔	✔
13	rain	✔	✔
14	1st-person singular pronoun (I/me)	✔	✔
15	name	✔	✔
16	louse	✔	✔
17	wing		✔
18	flesh/meat	✔	✔
19	arm/hand	✔	✔
20	fly		✔
21	night	✔	✔
22	ear	✔	✔
23	neck	✔	✔
24	far		✔
25	to do/make
26	house
27	stone/rock	✔	✔
28	bitter
29	to say	✔	✔
30	tooth	✔	✔
31	hair	✔	✔
32	big	✔	✔
33	one	✔	✔
34	who?	✔	✔
35	3rd-person singular pronoun (he/she/it/him/her)
36	to hit/beat		✔
37	leg/foot	✔	✔
38	horn	✔	✔
39	this	✔	✔
40	fish	✔	✔
41	yesterday
42	to drink	✔	✔
43	black	✔	✔
44	navel
45	to stand	✔	✔
46	to bite	✔	✔
47	back		✔
48	wind		✔
49	smoke	✔	✔
50	what?	✔	✔
51	child (kin term)		✔
52	egg	✔	✔
53	to give	✔	✔
54	new	✔	✔
55	to burn (intr.)	✔	✔
56	not	✔	✔
57	good	✔	✔
58	to know	✔	✔
59	knee	✔	✔
60	sand	✔	✔
61	to laugh		✔
62	to hear	✔	✔
63	soil	✔	✔
64	leaf	✔	✔
65	red	✔	✔
66	liver	✔	✔
67	to hide
68	skin/hide	✔	✔
69	to suck		✔
70	to carry
71	ant
72	heavy		✔
73	to take
74	old		✔
75	to eat	✔	✔
76	thigh
77	thick		✔
78	long	✔	✔
79	to blow		✔
80	wood
81	to run
82	to fall		✔
83	eye	✔	✔
84	ash	✔	✔
85	tail	✔	✔
86	dog	✔	✔
87	to cry/weep
88	to tie		✔
89	to see	✔	✔
90	sweet
91	rope		✔
92	shade/shadow
93	bird	✔	✔
94	salt		✔
95	small	✔	✔
96	wide		✔
97	star	✔	✔
98	in		✔
99	hard
100	to crush/grind

Other differences with the Swadesh list

Items on the 100-word Swadesh list but not on the Leipzig–Jakarta list:^[9]

all
bark
belly
cloud
cold
die
dry
feather
fingernail
fly (verb)
full
grease
green
head
heart
hot
kill
lie
man
many
moon
mountain
path
person
round
seed
sit
sleep
sun
swim
that
tree
two
walk
we
white
woman
yellow

Related Research Articles

The Chadic languages form a branch of the Afroasiatic language family. They are spoken in parts of the Sahel. They include 150 languages spoken across northern Nigeria, southern Niger, southern Chad, the Central African Republic, and northern Cameroon. By far the most widely spoken Chadic language is Hausa, a lingua franca of much of inland Eastern West Africa, particularly Niger and the northern half of Nigeria.

A loanword is a word at least partly assimilated from one language into another language, through the process of borrowing. Borrowing is a metaphorical term that is well established in the linguistic field despite its acknowledged descriptive flaws: nothing is taken away from the donor language and there is no expectation of returning anything.

Glottochronology is the part of lexicostatistics which involves comparative linguistics and deals with the chronological relationship between languages.

Malay is an Austronesian language that is an official language of Brunei, Indonesia, Malaysia, and Singapore, and that is also spoken in East Timor and parts of Thailand. Altogether, it is spoken by 290 million people across Maritime Southeast Asia.

<span class="mw-page-title-main">Morris Swadesh</span> American linguist (1909–1967)

Morris Swadesh was an American linguist who specialized in comparative and historical linguistics.

Lexicostatistics is a method of comparative linguistics that involves comparing the percentage of lexical cognates between languages to determine their relationship. Lexicostatistics is related to the comparative method but does not reconstruct a proto-language. It is to be distinguished from glottochronology, which attempts to use lexicostatistical methods to estimate the length of time since two or more languages diverged from a common earlier proto-language. This is merely one application of lexicostatistics, however; other applications of it may not share the assumption of a constant rate of change for basic lexical items.

Comparative linguistics is a branch of historical linguistics that is concerned with comparing languages to establish their historical relatedness.

The Swadesh list is a compilation of tentatively universal concepts for the purposes of lexicostatistics. Translations of the Swadesh list into a set of languages allow researchers to quantify the interrelatedness of those languages. The Swadesh list is named after linguist Morris Swadesh. It is used in lexicostatistics and glottochronology. Because there are several different lists, some authors also refer to "Swadesh lists".

The Saharan languages are a small family of languages across parts of the eastern Sahara, extending from northwestern Sudan to southern Libya, north and central Chad, eastern Niger and northeastern Nigeria. Noted Saharan languages include Kanuri, Daza, Teda, and Zaghawa. They have been classified as part of the hypothetical but controversial Nilo-Saharan family.

Oroqen, also known as Orochon, Oronchon, Olunchun, Elunchun or Ulunchun, is a Northern Tungusic language spoken in the People's Republic of China. Dialects are Gankui and Heilongjiang. Gankui is the standard dialect. It is spoken by the Oroqen people of Inner Mongolia and Heilongjiang in Northeast China.

Betawi, also known as Betawi Malay, Jakartan Malay, or Batavian Malay, is the spoken language of the Betawi people in Jakarta, Indonesia. It is the native language of perhaps 5 million people; a precise number is difficult to determine due to the vague use of the name.

Martin Haspelmath is a German linguist working in the field of linguistic typology. He is a researcher at the Max Planck Institute for Evolutionary Anthropology in Leipzig, where he worked from 1998 to 2015 and again since 2020. Between 2015 and 2020, he worked at the Max Planck Institute for the Science of Human History. He is also an honorary professor of linguistics at the University of Leipzig.

The Slavic influence on Romanian is noticeable on all linguistic levels: lexis, phonetics, morphology and syntax.

The Dolgopolsky list is a word list compiled by Aharon Dolgopolsky in 1964 based on a study of 140 languages from across Eurasia. It lists the 15 lexical items that he found have the most semantic stability, i.e. the 15 words least likely to be replaced.

<span class="mw-page-title-main">Cheq Wong language</span> Austroasiatic language spoken in Malaysia

Cheq Wong is an Austroasiatic language spoken in the Malay Peninsula by the Cheq Wong people. It belongs to the Northern subbranch of the Aslian languages. Northern Aslian was labelled Jehaic in the past.

Manang, also called Manangba, Manange, Manang Ke, Nyishang, Nyishangte and Nyishangba, is a Sino-Tibetan language spoken in Nepal. Native speakers refer to the language as ŋyeshaŋ, meaning 'our language'. Manang and its most closely related languages are often written as TGTM in literature, referring to Tamang, Gurung, Thakali, and Manangba, due to the high degree of similarity in the linguistic characteristics of the languages. The language is unwritten and almost solely spoken within the Manang District, leading it to be classified as threatened, with the number of speakers continuing to decline. Suspected reasons for the decline include parents not passing down the language to their children, in order to allow for what they see as more advanced communication with other groups of people, and thus gain more opportunities. Due to the proximity of the district to Tibet, as well as various globally widespread languages being introduced into the area, use of the native language is declining in favor of new languages, which are perceived to aid in the advancement of the people and region.

The Automated Similarity Judgment Program (ASJP) is a collaborative project applying computational approaches to comparative linguistics using a database of word lists. The database is open access and consists of 40-item basic-vocabulary lists for well over half of the world's languages. It is continuously being expanded. In addition to isolates and languages of demonstrated genealogical groups, the database includes pidgins, creoles, mixed languages, and constructed languages. Words of the database are transcribed into a simplified standard orthography (ASJPcode). The database has been used to estimate dates at which language families have diverged into daughter languages by a method related to but still different from glottochronology, to determine the homeland (Urheimat) of a proto-language, to investigate sound symbolism, to evaluate different phylogenetic methods, and several other purposes.

The Evolution of Human Languages (EHL) project is a historical-comparative linguistics research project hosted by the Santa Fe Institute. It aims to provide a detailed genealogical classification of the world's languages.

Concepticon is an open-source online lexical database of linguistic concept lists. It links concept labels in concept lists to concept sets.

References

↑ Jeanette Sakel; Daniel L. Everett (2012). Linguistic Fieldwork: A Student Guide. Cambridge University Press. p. 116. ISBN 978-1-107-37702-8.
1 2 3 Haspelmath & Tadmor, p. 72.
↑ Marisa Lohr (2000), "New Approaches to Lexicostatistics and Glottochronology" in C. Renfrew, A. McMahon and L. Trask, ed. Time Depth in Historical Linguistics, Vol. 1, pp. 209–223
↑ Ruhlen, Meritt (1994). The Origin of Language: Tracing the Evolution of the Mother Tongue. Stanford: Stanford University Press.
↑ "The World Loanword Database (WOLD)". wold.clld.org. Retrieved February 24, 2019.
↑ Haspelmath & Tadmor, p. 71.
↑ The Leipzig-Jakarta List of Basic Vocabulary. Source: Haspelmath, Martin and Uri Tadmor (eds.), 2009. Loanwords in the World’s Languages: A Comparative Handbook. Berlin and New York: Mouton de Gruyter.
↑ Tadmor, Uri, Martin Haspelmath, and Bradley Taylor. 2010. Borrowability and the notion of basic vocabulary. Diachronica 27:2 (2010), 226–246. doi : 10.1075/dia.27.2.04tad
1 2 Haspelmath & Tadmor, p. 74.

Loanwords in the World's Languages: A Comparative Handbook, Martin Haspelmath and Uri Tadmor (editors), 2009, de Gruyter Publishing

External links

The Leipzig-Jakarta list on Concepticon

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Jeanette Sakel; Daniel L. Everett (2012). Linguistic Fieldwork: A Student Guide. Cambridge University Press. p. 116. ISBN 978-1-107-37702-8.

[H&Tp72-2] 1 2 3 Haspelmath & Tadmor, p. 72.

[3] Marisa Lohr (2000), "New Approaches to Lexicostatistics and Glottochronology" in C. Renfrew, A. McMahon and L. Trask, ed. Time Depth in Historical Linguistics, Vol. 1, pp. 209–223

[4] Ruhlen, Meritt (1994). The Origin of Language: Tracing the Evolution of the Mother Tongue. Stanford: Stanford University Press.

[5] "The World Loanword Database (WOLD)". wold.clld.org. Retrieved February 24, 2019.

[H&Tp71-6] Haspelmath & Tadmor, p. 71.

[7] The Leipzig-Jakarta List of Basic Vocabulary. Source: Haspelmath, Martin and Uri Tadmor (eds.), 2009. Loanwords in the World’s Languages: A Comparative Handbook. Berlin and New York: Mouton de Gruyter.

[8] Tadmor, Uri, Martin Haspelmath, and Bradley Taylor. 2010. Borrowability and the notion of basic vocabulary. Diachronica 27:2 (2010), 226–246. doi : 10.1075/dia.27.2.04tad

[H&Tp74-9] 1 2 Haspelmath & Tadmor, p. 74.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

v t e Long-range comparative linguistics
Concepts	Comparative method Etymological dictionary Glottochronology Lexicostatistics Linguistic reconstruction Internal reconstruction Linguistic universal Macrofamily Mass comparison Origin of language Paleolinguistics Proto-language Swadesh list Dolgopolsky list Leipzig–Jakarta list
Language families	Proto-Human language Borean Amerind Nostratic Elamo-Dravidian Eurasiatic Altaic Ural-Altaic Indo-Uralic Sino-Uralic Dené–Caucasian North Caucasian Austric Indo-Pacific
Linguists	John Bengtson Václav Blažek Allan R. Bomhard Svetlana Burlak Aharon Dolgopolsky Vladimir Dybo Harold C. Fleming Joseph Greenberg Eugene Helimski Murray Gell-Mann Vladislav Illich-Svitych Frederik Kortlandt Alexis Manaster Ramer Sergei Nikolaev Sorin Paliga Holger Pedersen Ilia Peiros Martine Robbeets Merritt Ruhlen Vitaly Shevoroshkin Georgiy Starostin Sergei Starostin Alfredo Trombetti
Journals	Journal of Language Relationship Mother Tongue
Books	Etymological Dictionary of the Altaic Languages The Languages of Africa
Institutions and schools	Evolution of Human Languages Institute of Linguistics of the Russian Academy of Sciences Moscow School of Comparative Linguistics Russian State University for the Humanities Santa Fe Institute
Linguistics portal Category