Survey of English Usage

Last updated

The Survey of English Usage was the first research centre in Europe to carry out research with corpora. The Survey is based in the Department of English Language and Literature at University College London.

Contents

History

The Survey of English Usage was founded as the Survey of Spoken English at Durham University in 1959 by Randolph Quirk, moving with him to University College London in 1960. [1] Many well-known linguists have spent time doing research at the Survey, including Bas Aarts, Valerie Adams, John Algeo, Dwight Bolinger, Noël Burton-Roberts, David Crystal, Derek Davy, Jan Firbas, Sidney Greenbaum, Liliane Haegeman, Robert Ilson, Ruth Kempson, Geoffrey Leech, Jan Rusiecki, Jan Svartvik, and Joe Taglicht. The current director is Bas Aarts. [2]

The original Survey Corpus predated modern computing. It was recorded on reel-to-reel tapes, transcribed on paper, filed in filing cabinets, and indexed on paper cards. Transcriptions were annotated with a detailed prosodic and paralinguistic annotation developed by Crystal and Quirk (1964). [3] Sets of paper cards were manually annotated for grammatical structures and filed, so, for example, all noun phrases could be found in the noun phrase filing cabinet in the Survey. Naturally, corpus searches required a visit to the Survey.

This corpus is now known more widely as the London-Lund Corpus (LLC), as it was the responsibility of co-workers in Lund, Sweden, to computerise the corpus. Thirty-four of the spoken texts were published in book form as Svartvik and Quirk (1980), [4] and the corpus was used as the basis for the famous book A Comprehensive Grammar of the English Language (Quirk et al. 1985). [5]

Current research

Constructing corpora

In 1988 Sidney Greenbaum proposed a new project, ICE, the International Corpus of English. ICE was to be an international project, carried out at research centres around the world, to compile corpora of English varieties where English was the first or second official language. ICE texts would contain spoken and written English in a balanced sample of one million words per component so that these samples could be compared in a wide variety of ways. The ICE project continues around the world to the present day.

ICE-GB, the British Component of ICE, was compiled at the Survey. ICE-GB was annotated to a very detailed level, including constructing a full grammatical analysis (parse) for every sentence in the corpus. The first release of ICE-GB took place in 1998. ICE-GB was distributed with software for searching and exploring the parsed corpus called ICECUP. Release 2 of ICE-GB has now been released and is available on CD.

As well as contrasting varieties of English, many researchers are interested in language development and change over time. A recent project at the Survey undertook the parsing of a large (400,000 word) selection of the spoken part of the LLC in a manner directly comparable with ICE-GB, forming a new, 800,000 word diachronic corpus, called the Diachronic Corpus of Present-Day Spoken English (DCPSE). DCPSE has now been released and is available on CD from the Survey.

These two corpora comprise the largest collection of parsed and corrected, orthographically transcribed spoken English language data in the world, with over one million words of spoken English in this form.

Exploring corpora

Parsed corpora are large databases containing detailed grammatical tree structures. One of the consequences of forming large collections of valuable linguistic data is a pressing need for methods and tools to help researchers and other users make the most of them. So in parallel with the parsing of natural language data, the Survey team have carried out research and development of software tools to help linguists use these corpora. The ICECUP research platform uses an intuitive grammatical query representation called Fuzzy Tree Fragments (FTFs) to search parsed corpora.

Linguistic research with corpora

As well as distributing corpora and tools to the corpus linguistics research community, the SEU carries out research into English language. Recent projects include research on the English Noun Phrase, Subordination in Spoken and Written English, and the English Verb Phrase. The Survey also provides support for PhD students who carry out research into English language corpora.

Related Research Articles

Corpus linguistics is an empirical method for the study of language by way of a text corpus. Corpora are balanced, often stratified collections of authentic, "real world", text of speech or writing that aim to represent a given linguistic variety. Today, corpora are generally machine-readable data collections.

In linguistics and natural language processing, a corpus or text corpus is a dataset, consisting of natively digital and older, digitalized, language resources, either annotated or unannotated.

In linguistics, an object pronoun is a personal pronoun that is used typically as a grammatical object: the direct or indirect object of a verb, or the object of a preposition. Object pronouns contrast with subject pronouns. Object pronouns in English take the objective case, sometimes called the oblique case or object case. For example, the English object pronoun me is found in "They see me", "He's giving me my book", and "Sit with me" ; this contrasts with the subject pronoun in "I see them," "I am getting my book," and "I am sitting here."

<span class="mw-page-title-main">Randolph Quirk</span> British linguist (1920–2017)

Charles Randolph Quirk, Baron Quirk, CBE, FBA was a British linguist and life peer. He was the Quain Professor of English language and literature at University College London from 1968 to 1981. He sat as a crossbencher in the House of Lords.

<span class="mw-page-title-main">English personal pronouns</span> Closed lexical category of the English language

The English personal pronouns are a subset of English pronouns taking various forms according to number, person, case and grammatical gender. Modern English has very little inflection of nouns or adjectives, to the point where some authors describe it as an analytic language, but the Modern English system of personal pronouns has preserved some of the inflectional complexity of Old English and Middle English.

In linguistics, synesis is a traditional grammatical/rhetorical term referring to agreement due to meaning.

<span class="mw-page-title-main">Treebank</span> Text corpus with tree annotations

In linguistics, a treebank is a parsed text corpus that annotates syntactic or semantic sentence structure. The construction of parsed corpora in the early 1990s revolutionized computational linguistics, which benefitted from large-scale empirical data.

Geoffrey Neil Leech FBA was a specialist in English language and linguistics. He was the author, co-author, or editor of more than 30 books and more than 120 published papers. His main academic interests were English grammar, corpus linguistics, stylistics, pragmatics, and semantics.

The British National Corpus (BNC) is a 100-million-word text corpus of samples of written and spoken English from a wide range of sources. The corpus covers British English of the late 20th century from a wide variety of genres, with the intention that it be a representative sample of spoken and written British English of that time. It is used in corpus linguistics for analysis of corpora.

<i>The Cambridge Grammar of the English Language</i> 2002 compendium on the English language

The Cambridge Grammar of the English Language (CamGEL) is a descriptive grammar of the English language. Its primary authors are Rodney Huddleston and Geoffrey K. Pullum. Huddleston was the only author to work on every chapter. It was published by Cambridge University Press in 2002 and has been cited more than 8,000 times.

Linguistic categories include

The International Corpus of English (ICE) is a set of text corpora representing varieties of English from around the world. Over twenty countries or groups of countries where English is the first language or an official second language are included.

Sidney Greenbaum was a British scholar of the English language and of linguistics. He was Quain Professor of English language and literature at the University College London from 1983 to 1990 and Director of the Survey of English Usage, 1983–96. With Randolph Quirk and others, he wrote A Comprehensive Grammar of the English Language. He also wrote Oxford English Grammar.

The history of English grammars begins late in the sixteenth century with the Pamphlet for Grammar by William Bullokar. In the early works, the structure and rules of English grammar were based on those of Latin. A more modern approach, incorporating phonology, was introduced in the nineteenth century.

<span class="mw-page-title-main">Quranic Arabic Corpus</span>

The Quranic Arabic Corpus is an annotated linguistic resource consisting of 77,430 words of Quranic Arabic. The project aims to provide morphological and syntactic annotations for researchers wanting to study the language of the Quran.

<span class="mw-page-title-main">English nouns</span> Part of speech

English nouns form the largest category of words in English, both in the number of different words and how often they are used in typical texts. The three main categories of English nouns are common nouns, proper nouns, and pronouns. A defining feature of English nouns is their ability to inflect for number, as through the plural –s morpheme. English nouns primarily function as the heads of noun phrases, which prototypically function at the clause level as subjects, objects, and predicative complements. These phrases are the only English phrases whose structure includes determinatives and predeterminatives, which add abstract-specifying meaning such as definiteness and proximity. Like nouns in general, English nouns typically denote physical objects, but they also denote actions, characteristics, relations in space, and just about anything at all. Taken all together, these features separate English nouns from other lexical categories such as adjectives and verbs.

<span class="mw-page-title-main">English possessive</span> Possessive words and phrases in the English language

In English, possessive words or phrases exist for nouns and most pronouns, as well as some noun phrases. These can play the roles of determiners or of nouns.

<i>A Comprehensive Grammar of the English Language</i> 1985 compendium on the English language

A Comprehensive Grammar of the English Language is a descriptive grammar of English written by Randolph Quirk, Sidney Greenbaum, Geoffrey Leech, and Jan Svartvik. It was first published by Longman in 1985.

In grammar, an object complement is a predicative expression that follows a direct object of an attributive ditransitive verb or resultative verb and that complements the direct object of the sentence by describing it. Object complements are constituents of the predicate. Noun phrases and adjective phrases most frequently function as object complements.

Jan Lars Svartvik was a Swedish linguist and professor in English. Svartvik's work started an entirely new discipline, forensic linguistics. He was the author of several grammar books on English that were widely used in teaching English in Sweden during his lifetime. One of his research areas was also corpus linguistics.

References

  1. Negley Harte; John Northe; Georgina Brewis (2018). The World of UCL (PDF). UCL Press. pp. 239–240. doi:10.14324/111.9781787352933. ISBN   9781787352933.
  2. "Survey Staff". University College London . Retrieved 14 November 2016.
  3. Crystal, David, and Quirk, Randolph (1964). Systems of Prosodic and Paralinguistic Features in English. The Hague: Mouton.
  4. Svartvik, Jan and Quirk, Randolph (1980) (eds.). A Corpus of English Conversation Lund: CWK Gleerup.
  5. Quirk, Randolph, Greenbaum, Sidney, Leech, Geoffrey and Svartvik, Jan (1985). A Comprehensive Grammar of the English Language London: Longman.