Lexibank

Last updated
Lexibank
Producer Max Planck Institute for Evolutionary Anthropology (Germany)
LanguagesEnglish
Access
CostFree
Coverage
Disciplines Linguistics, lexicography

Lexibank is a linguistics database managed by the Max Planck Institute for Evolutionary Anthropology in Leipzig, Germany. [1] The database consists of over 100 standardized wordlists (datasets) that are independently curated. [2] [3]

Contents

Description

Lexibank datasets are presented in the Cross-Linguistic Data Format (CLDF). [4]

Phonological and lexical features are automatically computed in Lexibank. [2]

The datasets are publicly accessible and are archived at Zenodo [5] and are also publicly available on GitHub. [6] Lexibank is also part of the Cross-Linguistic Linked Data project. All of the datasets are released under the CC BY 4.0 license.

Applications of the database include historical linguistics and comparative phonology.

List of datasets

The following is a list of Lexibank (version 0.2) datasets as of 17 June 2022. [7]

IDLanguagesZenodoCitation
aaleykusunda Kusunda 5115947 Uday Raj Aaley and Timotheus A. Bodt (2020): New Kusunda data: A list of 250 concepts. Computer-Assisted Language Comparison in Practice 3.4 (08/04/2020), URL: https://calc.hypotheses.org/2414.
abrahammonpa Monpa 5115885 Abraham, Binny, Kara Sako, Elina Kinny, and Isapdaile Zeliang (2018): Sociolinguistic Research among Selected Groups in Western Arunachal Pradesh: Highlighting Monpa. Dallas: SIL International.
allenbai Bai 5115649 Allen, Bryan (2007): Bai Dialect Survey. Dallas: SIL International.
backstromnorthernpakistan Northern Pakistan 5116054 Backstrom, Peter C. and Radloff, Carla F. (1992): Sociolinguistic Survey of Northern Pakistan, Volume 2. Languages of Northern Areas. Islamabad: National Institute of Pakistan Studies.
bantubvd Bantu 5115982 Simon Greenhill and Russell Gray, 2015. Bantu Basic Vocabulary Database .
bdpa 5116087 List, Johann-Mattis and Jelena Prokić. (2014). A benchmark database of phonetic alignments in historical linguistics and dialectology. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC), 26 — 31 May 2014, Reykjavik. 288-294.
beidasinitic Sinitic 5119295 Běijīng Dàxué 北京大学 (1964): Hànyǔ fāngyán cíhuì 汉语方言词汇 [Chinese dialect vocabularies]. Beijing: Wenzi Gaige.
birchallchapacuran Chapacuran 5119306 Birchall J, Dunn M, & Greenhill SJ. 2016. A Combined Comparative and Phylogenetic Analysis of the Chapacuran Language Family. International Journal of American Linguistics 82(3). 255–284.
blustaustronesian Austronesian 5137392 Greenhill, SJ; Blust, R and Gray, RD (2008): The Austronesian Basic Vocabulary Database: From bioinformatics to lexomics. Evolutionary Bioinformatics. 4. 271-283.
bodtkhobwa Kho-Bwa 5119330 Bodt, Timotheus Adrianus and List, Johann-Mattis (2019): Testing the predictive strength of the comparative method: An ongoing experiment on unattested words in Western Kho-Bwa languages. Papers in Historical Phonology 4.1: 22-44.
bowernpny Pama-Nyungan 5119341 Bowern, Claire, & Atkinson, Quentin. (2012). Computational Phylogenetics and the Internal Structure of Pama-Nyungan: Dataset [Data set]. Language. doi : 10.1353/lan.2012.0081
cals Turkic and Indo-European 5121189 Mennecier, P., Nerbonne, J., Heyer, E., & Manni, F. (2016). A Central Asian Language Survey, Language Dynamics and Change, 6(1), 57-98. doi : 10.1163/22105832-00601015
carvalhopurus Purus 5121195 de Carvalho, F. O. (2021): A comparative reconstruction of Proto-Purus (Arawakan) segmental phonology. IJAL. 87.1. 49-108.
castrosui Sui 5121213 Castro, Andy and Pan, Xingwen (2015): Sui dialect research. SIL: Guiyang.
castroyi Yi 5121214 Castro, Andy; Crook, Brian; Flaming, Royce (2010): A sociolinguistic survey of Kua-nsi and related Yi varieties in Heqing county, Yunnan province, China. SIL Electronic Survey Reports 2010-001. Dallas: SIL International.
castrozhuang Zhuang 5121215 Castro, Andy; Hansen, Bruce (2010): Hongshui He Zhuang dialect intelligibility survey. Dallas: SIL International.
chaconarawakan Arawakan 5118556 Chacon, Thiago C. (2017): Arawakan and Tukanoan contacts in Northwest Amazonia prehistory. PAPIA 27(2). 237-265.
chaconbaniwa Baniwa 5118605 Chacon, T. C.; Gonçalves, A. G.; and da Silva, L. F (2019): A diversidade linguística Aruák no Alto Rio Negro em gravações da década de 1950 [The diversity of Arawakan languages from the upper Rio Negro in recordings from the 1950s]. Forma y Función, 32.2, 41-67. doi : 10.15446/fyf.v32n2.80814
chaconcolumbian Colombian 5118763 Chacon, Thiago C. (2017): Arawakan and Tukanoan contacts in Northwest Amazonia prehistory. PAPIA 27(2). 237-265.
chacontukanoan Tukanoan 5118723 T. Chacon. (2014). A revised proposal of Proto-Tukanoan consonants and Tukanoan family classification. Journal of American Linguistics 80.3, pp. 275–322. doi : 10.1086/676393
chenhmongmien Hmong-Mien 5118744 Chén, Qíguāng 陳其光 (2012): Miáoyáo yǔwén 苗瑤语文 [Miao and Yao language]. Zhōngyāng Mínzú Dàxué 中央民族大学 [China Minzu University Press].
chindialectsurvey Chin 5121280 Language and Social Development Organization (2019): Chin dialect data collection. Yangon: LSDO.
chingelong Gelong 5121324 Chin, Andy C. (2015): The Gelong Language in the Multilingual Hub of Hainan. Bulletin of Chinese Linguistics. 8. 140-156.
clarkkimmun Kim Mun 5121482 Clark, E. R. (2008). A phonological analysis and comparison of two Kim Mun varieties in Laos and Vietnam. Payap University: Chiang Mai.
clics1 5121530 List, Johann-Mattis, Thomas Mayer, Anselm Terhalle, and Matthias Urban (2014). CLICS: Database of Cross-Linguistic Colexifications. Marburg: Forschungszentrum Deutscher Sprachatlas (Version 1.0).
constenlachibchan Chibchan 5121347 Umaña, Adolfo Constenla. 2005. ¿Existe relación genealógica entre las lenguas misumalpas y las chibchenses?. Estudios de Lingüística Chibcha.
davletshinaztecan Aztecan 5121382 Davletshin, Albert (2012): Proto-Uto-Aztecans on their way to the Proto-Aztecan homeland: linguistic evidence. Journal of Language Relationship. 8. 1. 75-92.
deepadungpalaung Palaung 5121402 Deepadung, Sujaritlak; Buakaw, Supakit; and Rattanapitak, Ampica (2015): A lexical comparison of the Palaung dialects spoken in China, Myanmar, and Thailand. Mon-Khmer Studies 44. 19-38.
diacl 5121561 Carling, Gerd (ed.) 2017. Diachronic Atlas of Comparative Linguistics Online. Lund: Lund University. (URL: https://diacl.ht.lu.se/). Accessed on: 2019-02-07.
dravlex Dravidian 5121580 Kolipakam, Vishnupriya, Michael Dunn, Fiona M. Jordan & Annemarie Verkerk. (2018). DravLex: A Dravidian lexical database. Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands.
dunnaslian Aslian 5121613 Dunn, Michael, Nicole Kruspe, and Niclas Burenhult. 2013. "Time and Place in the Prehistory of the Aslian Languages." Human Biology 85: 383–400.
dunnielex Indo-European 5121651 Dunn, Michael (2012): Indo-European Lexical Cognacy Database. Max Planck Institute for Psycholinguistics: Nijmegen.
duonglachi Lachi 5121663 Duong, Thu Hang and Nguyen, Thu Quynh and Nguyen, Van Loi (2021): The Language of the La Chí People in Bản Díu Commune, Xín Mần District, Hà Giang Province, Vietnam. In: Studies in the Anthropology of Language in Mainland Southeast Asia. Ed. by N. J. Enfield, Jack Sidnell, and Charles H. P. Zuckermann. University of Hawaii Press: Honolulu. 124-138
felekesemitic Semitic 5126691 Feleke, Tekabe Legesse (2021): Ethiosemitic languages: classifications and classification determinants. Ampersand. 2021. doi : 10.1016/j.amper.2021.100074
galuciotupi Tupian 5121724 Galucio, Ana Vilacy, Meira, Sérgio, Birchall, Joshua, Moore, Denny, Gabas Júnior, Nilson, Drude, Sebastian, Storto, Luciana, Picanço, Gessiane, & Rodrigues, Carmen Reis. (2015). Genealogical relations and lexical distances within the Tupian linguistic family. Boletim do Museu Paraense Emílio Goeldi. Ciências Humanas, 10(2), 229-274. doi : 10.1590/1981-81222015000200004
gaotb Tibeto-Burman 5121776 Gao, Tianjun (2020): Reconstruction and analysis of phylogenetic network on Tibeto-Burman languages in China. Journal of Chinese Linguistics, 48:1, 257-293.
gerarditupi Tupi–Guarani 5127906 Ferraz Gerardi, Fabrício and Reichert, Stanislav (2020) The Tupí-Guaraní Language Family: A Phylogenetic Classification. To appear in Diachronica.
halenepal Nepal 5121540 Hale, Austin (1973): Clause, sentences, and discourse patterns in selected languages of Nepal. Kathmandu: Institute of Nepal and Asiatic Studies.
hantganbangime Bangime 5126441 Hantgan, Abbie and List, Johann-Mattis (2018): Bangime. Secret language, language isolate, or language island? Journal of Language Contact.
hattorijaponic Japonic 5126845 Hattori, S. (1973): Japanese dialects. In: Diachronic, areal and typological linguistics. Edited by H. M. Hoenigswald and R. H. Langacre. 368-400.
houchinese Sinitic 5126858 Hóu, J. (2004): Xiàndài Hànyǔ fāngyán yīnkù 现代汉语方言音库 [Phonological database of Chinese dialects]. Shànghǎi: Shànghǎi Jiàoyù.
hsiuhmongmien Hmong-Mien 5126451 Hsiu, Andrew (2015): The classification of Na Meo, a Hmong-Mien language of Vietnam. Handout prepared for SEALS 25 (Chiang Mai, 2015/05/27-29).
hubercolumbian Colombian 5121219 Huber, R. Q. and Reed, R. B. 1992. Vocabulario comparativo: palabras selectas de lenguas indígenas de Colombia [Comparative vocabulary. Selected words from the indigenous languages of Columbia]. Santafé de Bogota: Associatión Instituto Lingüístico de Verano.
huntergatherer 5126741 Bowern, Claire, Patience Epps, Jane Hill, and Patrick McConvell. Hunter-Gatherer Language Database. https://huntergatherer.la.utexas.edu/ Accessed 2021-04-27.
ids 5126899 Key, Mary Ritchie & Comrie, Bernard (eds.) 2015. The Intercontinental Dictionary Series . Leipzig: Max Planck Institute for Evolutionary Anthropology.
ivanisuansu Suansu 5126966 Ivani, J. K. (2019): A first overview on Suansu, a Tibeto-Burman language from Northeastern India. Talk, held at the 29th conference of the Southeast Asian Linguistic Society (27-29 May, Tokyo). https://zenodo.org/record/3383006
johanssonsoundsymbolic 5127131 Erben Johansson, N., Anikin, A., Carling, G., & Holmer, A. (2020). The typology of sound symbolism: Defining macro-concepts via their semantic and phonetic features, Linguistic Typology, 24(2), 253-310. doi : 10.1515/lingty-2020-2034
joophonosemantic 5137230 Joo, I. (2020). Phonosemantic biases found in Leipzig-Jakarta lists of 66 languages. Linguistic Typology, 24(1), 1–12. doi : 10.1515/lingty-2019-0030
kesslersignificance 5127775 Kessler, B. (2001): The Significance of Wordlists. CSLI: Stanford.
kleinewillinghoeferbikwinjen Bikwin-Jen 5127404 Kleinewillinghöfer, Ulrich (2015). Bikwin-Jen Group. https://www.blogs.uni-mainz.de/fb07-adamawa/adamawa-languages/bikwin-jen-group/. Accessed on: 2020-04-15.
kraftchadic Chadic 5121222 Kraft, Charles H. 1981. Chadic wordlists. Berlin: Dietrich Reimer.
leejaponic Japonic 5126801 Lee, Sean, Hasegawa, Toshikazu (2011). Bayesian phylogenetic analysis supports an agricultural origin of Japonic languages. Proceedings of the Royal Society B: Biological Sciences, 278(1725), 3662–3669. doi : 10.1098/rspb.2011.0518
leeainu Ainu 5126890 Lee Sean, Hasegawa Toshikazu (2013). Evolution of the Ainu Language in Space and Time. PLoS ONE 8(4): e62243. doi : 10.1371/journal.pone.0062243
bremerberta Berta 5126757 Bremer, Nate D. (2016): A Sociolinguistic Survey of Six Berta Speech Varieties in Ethiopia. SIL Electronic Survey Reports 2016-007. Dallas: SIL International.
leekoreanic Koreanic 5126904 Lee, Sean (2015). A Sketch of Language History in the Korean Peninsula. PLoS ONE 10(5): e0128448. doi : 10.1371/journal.pone.0128448
lieberherrkhobwa Kho-Bwa 5127687 Lieberherr, Ismail and Bodt, Timotheus Adrianus (2017): Sub-grouping Kho-Bwa based on shared core vocabulary. Himalayan Linguistics 16(2). 26-63. URL: https://escholarship.org/uc/item/4t27h5fg
lindseyende Ende 5127829 Kate Lynn Lindsey and Bernard Comrie. 2020. Ende (Papua New Guinea) dictionary. In: Key, Mary Ritchie & Comrie, Bernard (eds.) The Intercontinental Dictionary Series . Leipzig: Max Planck Institute for Evolutionary Anthropology. (Available online at http://ids.clld.org/)
listsamplesize 5128050 List, Johann-Mattis (2014): Investigating the impact of sample size on cognate detection. Journal of Language Relationship. 11. 91-102. doi : 10.31826/jlr-2014-110111
liusinitic Sinitic 5131413 Líu, L.; Wáng, H.; Bǎi, Y. (2007): Xiàndài Hànyǔ fāngyán héxīncí, tèzhēng cíjí 现代汉语方言核心词·特征词集 [Collection of basic vocabulary words and characteristic dialect words in modern Chinese dialects]. Nánjīng: Fènghuáng.
lundgrenomagoa Proto-Omagua-Kokama-Tupinambá 5128097 Lundgren, Olof (2020): A phonological reconstruction of Proto-Omagua-Kokama-Tupinambá. Master's thesis. Lund: Lund University.
mannburmish Burmish 5131419 Mann, Noel W. 1998. A phonological reconstruction of Proto Northern Burmic. (PhD Thesis).
marrisonnaga Naga 5121317 Marrison, Geoffrey Edward (1967) : The classification of the Naga languages of North-East India. London: School of African and Oriental Sciences.
mcelhanonhuon Huon 5127348 McElhanon, K.A. 1967. Preliminary Observations on Huon Peninsula Languages. Oceanic Linguistics. 6, 1-45.
mitterhoferbena Bena 5121327 Mitterhofer, Bernadette. 2013. Lessons from a dialect survey of Bena: Analyzing wordlists. SIL International.
naganorgyalrongic rGyalrongic 5126458 Nagano, Yasuhiko and Prins, Marielle (2013): rGyalrongic Languages Database. Osaka: National Museum of Ethnology.
nagarajakhasian Khasian 5131421 Nagaraja KS, Sidwell P & Greenhill SJ. 2013. A Lexicostatistical Study of the Khasian Languages: Khasi, Pnar, Lyngngam, and War. Mon-Khmer Studies Journal, 42, 1-11.
northeuralex 5121268 Dellert, J., Daneyko, T., Münch, A. et al (2020). NorthEuraLex (Version 0.9). Lang Resources and Evaluation. doi : 10.1007/s10579-019-09480-6
peirosaustroasiatic Austroasiatic 5127536 Peiros, I. I. (2004): Генетическая классификация австроазиатских языков / Genetičeskaja klassifikacija avstroaziatskix jazykov [Genetic classification of Austro-Asiatic languages]. Russian State University for the Humanities, Russian State University for the Humanities, Moscow.
pharaocoracholaztecan Proto-Corachol, Proto-Náhuatl 5136882 Pharao Hansen, Magnus (2020): ¿Familia o vecinos? Investigando la relación entre el proto-náhuatl y el proto-corachol [Family or neighbors? Investigating the relation between Proto-Náhuatl and Proto-Corachol]. In: Lenguas yutoaztecas: historia, estructuras y contacto lingüístico. Homenaje a Karen Dakin. Rosa Yañez (ed.) Guadalajara: Universidad de Guadalajara.
polyglottaafricana 5136890 Koelle, Sigismund W. (1854). Polyglotta Africana or Comparative Vocabulary of Nearly Three Hundred Words and Phrases in more than One Hundred Distinct African Languages. London: Church Missionary House.
ratcliffearabic Arabic 5136898 Ratcliffe, Robert R. (2021): The glottometrics of Arabic. Language Dynamics and Change. 2021. doi : 10.1163/22105832-01001100
robinsonap Alor-Pantar 5121340 Robinson, Laura C. and Holton, Gary (2012): Internal Classification of the Alor-Pantar Language Family Using Computational Methods Applied to the Lexicon. Language Dynamics and Change 2.2. 123-149.
saenkoromance Romance 5136900 Saenko, M. (2015): Annotated Swadesh wordlists for the Romance group (Indo-European family). In: Starostin GS, editor. The Global Lexicostatistical Database. RGU; 2015. http://starling.rinet.ru/new100/tuj.xls
sagartst Sino-Tibetan 5121409 Laurent Sagart, Jacques, Guillaume, Yunfan Lai, and Johann-Mattis List (2019): Sino-Tibetan Database of Lexical Cognates. Jena: Max Planck Institute for the Science of Human History.
satterthwaitetb Tibeto-Burman 5136997 Satterthwaite-Phillips, Damian (2011) Phylogenetic inference of the Tibeto-Burman languages or on the usefuseful of lexicostatistics (and "megalo"-comparison) for the subgrouping of Tibeto-Burman. Stanford: Stanford University.
savelyevturkic Turkic 5137274 Savelyev, Alexander and Robbeets, Martine (2020): Bayesian phylolinguistics infers the internal structure and the time-depth of the Turkic language family. Journal of Language Evolution 5.1. 39-53.
servamalagasy Malagasy 5137040 Serva M., Pasquini M. (2020): Dialects of Madagascar, PLoS ONE 15(10).
sidwellbahnaric Bahnaric 5137055 Sidwell, Paul. 2015. Austroasiatic dataset for phylogenetic analysis: 2015 version. Mon-Khmer Studies (Notes, Reviews, Data-Papers) 44. lxviii-ccclvii.
simsrma Rma 5166593 Sims, Nathanial A. (2020): Reconsidering the diachrony of tone in Rma. Journal of the Southeast Asian Linguistics Society 13.1. 53-85.
sohartmannchin Chin 5121813 So-Hartmann, Helga (1988): Notes on the Southern Chin Languages. Linguistics of the Tibeto-Burman Area 11.2: 98-119.
starostinpie Proto-Indo-European 5137281 Starostin, S. A. (2005): Indo-European files in DBF/VAR. Moscow.
suntb Tibeto-Burman 5121515 Sūn, Hóngkāi 孙宏开 (1991): Zangmianyu yuyin he cihui 藏缅语音和词汇 [Tibeto-Burman phonology and lexicon]. Beijing: Chinese Social Sciences Press.
syrjaenenuralic Uralic 5137236 Syrjänen, K.; Honkola, T.; Korhonen, K.; Lehtinen, J.; Vesakoski, O. & Wahlber, N. Shedding more light on language classification using basic vocabularies and phylogenetic methods. Diachronica, 2013, 30, 323-352
tls Bantu 5121819 Nurse, Derek and Gérard Philippson (1975). The Tanzanian Language Survey. Department of Foreign Languages and Linguistics of the University of Dar es Salaam: Dar es Salaam.
tppsr
transnewguineaorg Trans-New Guinea 5141620 Greenhill, Simon J. (2015): TransNewGuinea.org: An Online Database of New Guinea Languages. PLoS ONE 10.10: e0141563.
tuled Tupian Ferraz Gerardi, Fabrício & Reichert, Stanislav & Aragon, Carolina. (2021) TuLeD (Tupían Lexical Database): Introducing a database of a South American language family. Language Resources and Evaluation. doi : 10.1007/s10579-020-09521-5
visserkalamang Kalamang 5139559 Eline Visser. 2021. Kalamang dictionary. In: Key, Mary Ritchie & Comrie, Bernard (eds.) The Intercontinental Dictionary Series . Leipzig: Max Planck Institute for Evolutionary Anthropology. (Available online at https://ids.clld.org/)
walworthpolynesian Polynesian 5126932 Walworth, Mary. (2018). Polynesian Segmented Data (Version 1) [Data set]. Zenodo. doi : 10.5281/zenodo.1689909
wangbai Bai 5137407 Wang, Feng (2004): Language contact and language comparison. The case of Bai. PhD thesis. Hong Kong: City University of Hong Kong.
wangbcd Sinitic 5136930 Wang, F. 2004. BCD: basic words of Chinese dialects. Unpublished dataset. [Digital version in: List, J.-M. (2015): Network perspectives on Chinese dialect history. Bulletin of Chinese Linguistics 8. 42-67.]
wichmannmixezoquean Mixe-Zoquean 5126948 Cysouw, M., Wichmann, S., & Kamholz, D. (2006). A critique of the separation base method for genealogical subgrouping, with data from Mixe-Zoquean. Journal of Quantitative Linguistics, 13(2-3), 225–264. doi : 10.1080/09296170600850759
wold 5139859 Haspelmath, Martin & Tadmor, Uri (eds.) 2009. World Loanword Database. Leipzig: Max Planck Institute for Evolutionary Anthropology. (Available online at https://wold.clld.org/)
yanglalo Lalo 5121829 Yang, Cathryn (2011): Lalo regional varieties: Phylogeny, dialectometry and sociolinguistics. Bundoora: La Trobe University.
yangyi Yi 5167277 Yang, Cathryn (2021): The phonetic tone change *high > rising: Evidence from the Ngwi dialect laboratory.
yuchinese Sinitic 5139881 Hsiao-jung Yu and Yifan Wang. 2021. Mandarin Chinese dictionary. In: Key, Mary Ritchie & Comrie, Bernard (eds.) The Intercontinental Dictionary Series . Leipzig: Max Planck Institute for Evolutionary Anthropology. (Available online at https://ids.clld.org/)
zgraggenmadang Madang 5121535 Z'graggen, J A. (1980) A comparative word list of the Northern Adelbert Range Languages, Madang Province, Papua New Guinea. Canberra: Pacific Linguistics.
zhaobai Bai 5136947 Zhao, Yanzhen (2006): Zhàozhuāng Báiyǔ miáoxiě yánjiū 趙莊白語描寫研究 [Investigations of Zhaozhuang Bai]. Běijīng: Zhōngyāng Mínzú Dàxué.
zhivlovobugrian Ob-Ugrian 5137439 Zhivlov, M. (2011): Annotated Swadesh wordlists for the Ob-Ugrian group (Uralic family). The Global Lexicostatistical Database. Moscow: RGGU.
zhoubizic Bizic 5140129 Zhou, Yulou (2020): Proto-Bizic. A study of Tujia historical phonology. Bachelor Thesis. Stanford University.
logos 5141379 List, Johann-Mattis, Thomas Mayer, Anselm Terhalle, and Matthias Urban (2014). CLICS: Database of Cross-Linguistic Colexifications. Marburg: Forschungszentrum Deutscher Sprachatlas (Version 1.0).
utoaztecan Uto-Aztecan 5173799 Greenhill, Simon J., Hannah J. Haynie, Robert M. Ross, Angela M. Chira, List, Johann-Mattis, Lyle Campbell, Carlos A. Botero, and Russell D. Gray (2021): A recent northern origin for the Uto-Aztecan language family. Leipzig: Max Planck Institute for Evolutionary Anthropology.
abvdoceanic Oceanic 5206553 Greenhill, S.J., Blust. R, & Gray, R.D. (2008). The Austronesian Basic Vocabulary Database: From Bioinformatics to Lexomics. Evolutionary Bioinformatics, 4:271-283.

Related Research Articles

<span class="mw-page-title-main">Chadic languages</span> Branch of the Afroasiatic languages

The Chadic languages form a branch of the Afroasiatic language family. They are spoken in parts of the Sahel. They include 150 languages spoken across northern Nigeria, southern Niger, southern Chad, the Central African Republic, and northern Cameroon. By far the most widely spoken Chadic language is Hausa, a lingua franca of much of inland Eastern West Africa, particularly Niger and the northern half of Nigeria.

The Sko or Skou languages are a small language family spoken by about 7000 people, mainly along the Vanimo coast of Sandaun Province in Papua New Guinea, with a few being inland from this area and at least one just across the border in the Indonesian province of Papua.

Comparative linguistics is a branch of historical linguistics that is concerned with comparing languages to establish their historical relatedness.

<span class="mw-page-title-main">Torricelli languages</span> Language family

The Torricelli languages are a family of about fifty languages of the northern Papua New Guinea coast, spoken by about 80,000 people. They are named after the Torricelli Mountains. The most populous and best known Torricelli language is Arapesh, with about 30,000 speakers.

In linguistics, lexical similarity is a measure of the degree to which the word sets of two given languages are similar. A lexical similarity of 1 would mean a total overlap between vocabularies, whereas 0 means there are no common words.

<span class="mw-page-title-main">South Halmahera–West New Guinea languages</span> Subgroup of the Austronesian language family

The South Halmahera–West New Guinea (SHWNG) languages are a branch of the Malayo-Polynesian languages, found in the islands and along the shores of the Halmahera Sea in the Indonesian province of North Maluku and of Cenderawasih Bay in the provinces of Papua and West Papua. There are 38 languages.

<span class="mw-page-title-main">Central Solomon languages</span> Papuan language family of Solomon Islands

The Central Solomon languages are the four Papuan languages spoken in the state of Solomon Islands.

Elseng is a poorly documented Papuan language spoken by about 300 people in the Indonesian province of Papua. It is also known as Morwap, which means "what is it?" ‘Morwap’ is vigorously rejected as a language name by speakers and government officials.

The Reef Islands – Santa Cruz languages are a branch of the Oceanic languages comprising the languages of the Santa Cruz Islands and Reef Islands:

<span class="mw-page-title-main">West Bomberai languages</span> Family of Papuan languages

The West Bomberai languages are a family of Papuan languages spoken on the Bomberai Peninsula of western New Guinea and in East Timor and neighboring islands of Indonesia.

Quantitative comparative linguistics is the use of quantitative analysis as applied to comparative linguistics. Examples include the statistical fields of lexicostatistics and glottochronology, and the borrowing of phylogenetics from biology.

The Kho-Bwa languages, also known as Kamengic, are a small family of languages spoken in Arunachal Pradesh, northeast India. The name Kho-Bwa was originally proposed by George van Driem (2001). It is based on the reconstructed words *kho ("water") and *bwa ("fire"). Blench (2011) suggests the name Kamengic, from the Kameng area of Arunachal Pradesh. Alternatively, Anderson (2014) refers to Kho-Bwa as Northeast Kamengic.

Figshare is an online open access repository where researchers can preserve and share their research outputs, including figures, datasets, images, and videos. It is free to upload content and free to access, in adherence to the principle of open data. Figshare is one of a number of portfolio businesses supported by Digital Science, a subsidiary of Springer Nature.

The Cross-Linguistic Linked Data (CLLD) project coordinates over a dozen linguistics databases covering the languages of the world. It is hosted by the Department of Linguistic and Cultural Evolution at the Max Planck Institute for Evolutionary Anthropology in Leipzig, Germany.

<span class="mw-page-title-main">Zenodo</span> Research data repository

Zenodo is a general-purpose open repository developed under the European OpenAIRE program and operated by CERN. It allows researchers to deposit research papers, data sets, research software, reports, and any other research related digital artefacts. For each submission, a persistent digital object identifier (DOI) is minted, which makes the stored items easily citeable.

In natural language processing, linguistics, and neighboring fields, Linguistic Linked Open Data (LLOD) describes a method and an interdisciplinary community concerned with creating, sharing, and (re-)using language resources in accordance with Linked Data principles. The Linguistic Linked Open Data Cloud was conceived and is being maintained by the Open Linguistics Working Group (OWLG) of the Open Knowledge Foundation, but has been a point of focal activity for several W3C community groups, research projects, and infrastructure efforts since then.

Colexification, together with its associated verb colexify, are terms used in semantics and lexical typology. They refer to the ability, for a language, to express different meanings with the same word.

Concepticon is an open-source online lexical database of linguistic concept lists. It links concept labels in concept lists to concept sets.

Johann-Mattis List is a German scientist. He is known for his work on quantitative comparative linguistics. List is currently professor at the University of Passau, Germany, where he leads the Chair of Multilingual Computational Linguistics.

<span class="mw-page-title-main">PHOIBLE</span>

PHOIBLE is a linguistic database accessible through its website and compiling phonological inventories from primary documents and tertiary databases into a single, easily searchable sample. The 2019 version 2.0 includes 3,020 inventories containing 3,183 segment types found in 2,186 distinct languages. It is edited by Steven Moran, Assistant Professor from the Institute of Biology at the University of Neuchâtel and Daniel McCloy, Researcher at the Institute for Learning and Brain Sciences at the University of Washington.

References

  1. "Shedding light on linguistic diversity and its evolution: Linguists and computer scientists collaborate to publish a large global Open Access lexical database". ScienceDaily. 2022-07-22. Retrieved 2022-07-22.
  2. 1 2 List, Johann-Mattis; Forkel, Robert; Greenhill, Simon J.; Rzymski, Christoph; Englisch, Johannes; Gray, Russell D. (2022-06-16). "Lexibank, a public repository of standardized wordlists with computed phonological and lexical features". Scientific Data. 9 (1): 1–16. doi: 10.1038/s41597-022-01432-0 . ISSN   2052-4463. PMC   9203750 . S2CID   239629792.
  3. List, Johann-Mattis; Forkel, Robert; Greenhill, Simon J.; Rzymski, Christoph; Englisch, Johannes; Gray, Russell D. (2021-09-02), Lexibank: A public repository of standardized wordlists with computed phonological and lexical features, Research Square, doi:10.21203/rs.3.rs-870835/v1, S2CID   239629792
  4. Forkel, R. et al. Cross-Linguistic Data Formats, advancing data sharing and reuse in comparative linguistics. Sci. Data. 5:180205 doi : 10.1038/sdata.2018.205 (2018).
  5. Forkel, Robert; Greenhill, Simon J.; Rzymski, Christoph; Englisch, Johannes; Gray, Russell D. (2021). Lexibank: A publicly available repository of standardized lexical datasets with automatically computed phonological and lexical features for more than 2000 language varieties. doi:10.5281/ZENODO.5227817 . Retrieved 2022-06-17.
  6. "lexibank". GitHub. Retrieved 2022-06-17.
  7. "lexibank-analysed/lexibank.csv (v0.2)". GitHub. Retrieved 2022-06-17.