International Computer Archive of Modern and Medieval English

Last updated

The International Computer Archive of Modern and Medieval English (ICAME) is an international group of linguists and data scientists working in corpus linguistics to digitise English texts. [1] The organisation was founded in Oslo, Norway in 1977 as the International Computer Archive of Modern English, before being renamed to its current title. [2]

The portal to their materials is hosted at the University of Bergen, where they have set out the aim of the organization to "collect and distribute information on English language material available for computer processing and on linguistic research to compile an archive of English text corpora in machine-readable form, and to make material available to research institutions." [3] Creating computer corpora, i.e. collections of texts in machine-readable form, is the most accessible way to study both transcribed spoken language and various genres of written texts for modern scholars, including both "descriptive and more theoretically-minded linguists". [4]

The ICAME group hosts academic conferences that focus on corpus linguistic studies of historical changes and contemporary grammatical descriptions of English, and makes corpora of different varieties of English available to scholars, starting with editions of the 1960s Brown Corpus. Their first academic conference was held in Bergen, Norway in 1979, and scholars who were interested in corpus linguistics continued to meet each spring in different European and English-speaking countries. At these meetings, the compilation and distribution of corpora they enabled played a key role in the creation of the field of corpus linguistics in the 20th century, a precursor to current big data analytics. In summarizing the field, Kennedy's Introduction to Corpus Linguistics notes that "for corpus linguists with an interest in the description of English, the International Computer Archive of Modern and Medieval English has been the major resource". [5] The influence of ICAME on the field has also be laid out in Facchinetti's history, Corpus Linguistics Twenty-five Years On. [6]

One influential resource that ICAME made available was a CD of 20 different corpora, including those covering different regional Englishes (such as the Australian Corpus of English, the Wellington Corpus of Spoken New Zealand English, the Kolhapur Corpus of Indian English, the Bergen Corpus of London Teenage Language (COLT), the Helsinki Corpus of Older Scots, and the International Corpus of English—East-African component), as well as versions of the Brown Corpus and the Lancaster-Bergen-Oslo (LOB) corpus tagged for part of speech. [7]

ICAME also published an annual journal, the ICAME Journal, formerly ICAME News, [8] that contains articles, conference reports, reviews and notices related to corpus linguistics. [9] The current editors of the ICAME Journal are Merja Kytö and Anna-Brita Stenström. [10]

Related Research Articles

Corpus linguistics is the study of a language as that language is expressed in its text corpus, its body of "real world" text. Corpus linguistics proposes that a reliable analysis of a language is more feasible with corpora collected in the field—the natural context ("realia") of that language—with minimal experimental interference. The large collections of text allow linguistics to run quantitative analyses on linguistic concepts, otherwise harder to quantify.

<span class="mw-page-title-main">Brown Corpus</span> Data set of American English in 1961

The Brown University Standard Corpus of Present-Day American English is an electronic collection of text samples of American English, the first major structured corpus of varied genres. This corpus first set the bar for the scientific study of the frequency and distribution of word categories in everyday language use. Compiled by Henry Kučera and W. Nelson Francis at Brown University, in Rhode Island, it is a general language corpus containing 500 samples of English, totaling roughly one million words, compiled from works published in the United States in 1961.

<span class="mw-page-title-main">Treebank</span>

In linguistics, a treebank is a parsed text corpus that annotates syntactic or semantic sentence structure. The construction of parsed corpora in the early 1990s revolutionized computational linguistics, which benefitted from large-scale empirical data.

Geoffrey Neil Leech FBA was a specialist in English language and linguistics. He was the author, co-author, or editor of more than 30 books and more than 120 published papers. His main academic interests were English grammar, corpus linguistics, stylistics, pragmatics, and semantics.

The British National Corpus (BNC) is a 100-million-word text corpus of samples of written and spoken English from a wide range of sources. The corpus covers British English of the late 20th century from a wide variety of genres, with the intention that it be a representative sample of spoken and written British English of that time. It is used in corpus linguistics for analysis of corpora.

Language and Computers: Studies in Practical Linguistics is a book series on corpus linguistics and related areas. As studies in linguistics, volumes in the series have, by definition, their foundations in linguistic theory; however, they are not concerned with theory for theory's sake, but always with a definite direct or indirect interest in the possibilities of practical application in the dynamic area where language and computers meet.

Linguistic categories include

Stefan Th. Gries is (full) professor of linguistics in the Department of Linguistics at the University of California, Santa Barbara (UCSB), Honorary Liebig-Professor of the Justus-Liebig-Universität Giessen, and since 1 April 2018 also Chair of English Linguistics at the Justus-Liebig-Universität Giessen.

<span class="mw-page-title-main">Internet linguistics</span> Domain of linguistics

Internet linguistics is a domain of linguistics advocated by the English linguist David Crystal. It studies new language styles and forms that have arisen under the influence of the Internet and of other new media, such as Short Message Service (SMS) text messaging. Since the beginning of human–computer interaction (HCI) leading to computer-mediated communication (CMC) and Internet-mediated communication (IMC), experts, such as Gretchen McCulloch have acknowledged that linguistics has a contributing role in it, in terms of web interface and usability. Studying the emerging language on the Internet can help improve conceptual organization, translation and web usability. Such study aims to benefit both linguists and web users combined.

The International Corpus of English(ICE) is a set of corpora representing varieties of English from around the world. Over twenty countries or groups of countries where English is the first language or an official second language are included.

The Survey of English Usage was the first research centre in Europe to carry out research with corpora. The Survey is based in the Department of English Language and Literature at University College London.

<span class="mw-page-title-main">Mark Davies (linguist)</span> American linguist (born 1963)

Mark E. Davies is an American linguist. He specializes in corpus linguistics and language variation and change. He is the creator of most of the text corpora from English-Corpora.org as well as the Corpus del español and the Corpus do português. He has also created large datasets of word frequency, collocates, and n-grams data, which have been used by many large companies in the fields of technology and also language learning.

The Spoken English Corpus (SEC) is a speech corpus collection of recordings of spoken British English compiled during 1984–1987. The corpus manual can be found on ICAME.

<span class="mw-page-title-main">W. Nelson Francis</span> American linguist

W. Nelson Francis was an American author, linguist, and university professor. He served as a member of the faculties of Franklin & Marshall College and Brown University, where he specialized in English and corpus linguistics. He is known for his work compiling a text collection entitled the Brown University Standard Corpus of Present-Day American English, which he completed with Henry Kučera.

<i>Longman Grammar of Spoken and Written English</i>

Longman Grammar of Spoken and Written English (LGSWE) is a descriptive grammar of English written by Douglas Biber, Stig Johansson, Geoffrey Leech, Susan Conrad, and Edward Finegan, first published by Longman in 1999. It is an authoritative description of modern English, a successor to A Comprehensive Grammar of the English Language (ComGEL) published in 1985 and a predecessor of the Cambridge Grammar of the English Language (CamGEL) published in 2002. The authors and some reviewers consider it a complement rather than a replacement of the former since it follows – with few exceptions – the grammatical framework and concepts from ComGEL, which is also corroborated by the fact that one of LGSWE's authors, Geoffrey Leech, is also a co-author of ComGEL.

Stig Johansson was a Swedish-Norwegian linguist.

Paul Baker is a British professor and linguist at the Department of Linguistics and English Language of Lancaster University, United Kingdom. His research focuses on corpus linguistics, critical discourse analysis, corpus-assisted discourse studies and language and identity. He is known for his research on the language of Polari. He is a Fellow of the Academy of Social Sciences and a Fellow of the Royal Society for Arts.

Michael Henry 'Mick' Short is a British linguist. He is currently an honorary professor at the Department of Linguistics and English Language of Lancaster University, United Kingdom. His research focuses on applied linguistics with a special focus on stylistics.

Manfred Markus, is a German-Austrian linguist and university professor.

References

  1. Corpus Linguistics and Beyond: Proceedings of the Seventh International Conference on English Language Research on Computerized Corpora. Vol. 59. Rodopi. 1987. p. vi. ISBN   978-9-062-03569-4.
  2. Kennedy, Graeme (19 September 2014). An Introduction to Corpus Linguistics. Routledge. p. 85. ISBN   978-1-317-89258-8.
  3. "ICAME" . Retrieved March 28, 2015.
  4. Johansson, Stig (1994). "ICAME-Quo Vadis? Reflections on the use of computer corpora in linguistics". Computers and the Humanities. 28 (4–5): 243–252. doi:10.1007/BF01830271. S2CID   20568137.
  5. Kennedy, Graeme (2014). Introduction to Corpus Linguistics. Routledge. pp. ch. 2.
  6. Facchinetti, Roberta (2007). Corpus Linguistics Twenty-Five Years On. Brill / Rodopi.
  7. Hofland, K.; et al. (1999). ICAME collection of English language corpora [CD].
  8. "degruyter ICAME supplement" (PDF). Retrieved March 28, 2015.
  9. "The LinguistList--ICAME Journal". Archived from the original on October 26, 2007. Retrieved March 28, 2015.
  10. "ICAME Journal" . Retrieved March 28, 2015.

Further reading