Bank of English

Last updated

The Bank of English (BoE) is a representative subset of the 4.5 billion words COBUILD corpus, a collection of English texts. These are mainly British in origin, but content from North America, Australia, New Zealand, South Africa and other Commonwealth countries is also being included.

Contents

The majority of the texts are from written English, collected from websites, newspapers, magazines and books. There is also a large component of spoken data using material from radio, TV and informal conversations. The Bank of English totals 650 million running words. [1] Copies of the corpus are held both at HarperCollins Publishers and the University of Birmingham. The version at Birmingham can be accessed for academic research.

The Bank of English forms part of the Collins Word Web together with the French, German and Spanish corpora.

See also

Related Research Articles

<span class="mw-page-title-main">Dictionary</span> Collection of words and their meanings

A dictionary is a listing of lexemes from the lexicon of one or more specific languages, often arranged alphabetically, which may include information on definitions, usage, etymologies, pronunciations, translation, etc. It is a lexicographical reference that shows inter-relationships among the data.

Corpus linguistics is the study of a language as that language is expressed in its text corpus, its body of "real world" text. Corpus linguistics proposes that a reliable analysis of a language is more feasible with corpora collected in the field—the natural context ("realia") of that language—with minimal experimental interference.

A monolingual learner's dictionary (MLD) is designed to meet the reference needs of people learning a foreign language. MLDs are based on the premise that language-learners should progress from a bilingual dictionary to a monolingual one as they become more proficient in their target language, but that general-purpose dictionaries (aimed at native speakers) are inappropriate for their needs. Dictionaries for learners include information on grammar, usage, common errors, collocation, and pragmatics, which is largely missing from standard dictionaries, because native speakers tend to know these aspects of language intuitively. And while the definitions in standard dictionaries are often written in difficult language, those in an MLD use a simple and accessible defining vocabulary.

COBUILD, an acronym for Collins Birmingham University International Language Database, is a British research facility set up at the University of Birmingham in 1980 and funded by Collins publishers.

BOE, BoE or Boe may refer to:

<span class="mw-page-title-main">Brown Corpus</span> Data set of American English in 1961

The Brown University Standard Corpus of Present-Day American English is an electronic collection of text samples of American English, the first major structured corpus of varied genres. This corpus first set the bar for the scientific study of the frequency and distribution of word categories in everyday language use. Compiled by Henry Kučera and W. Nelson Francis at Brown University, in Rhode Island, it is a general language corpus containing 500 samples of English, totaling roughly one million words, compiled from works published in the United States in 1961.

John McHardy Sinclair was a Professor of Modern English Language at Birmingham University from 1965 to 2000. He pioneered work in corpus linguistics, discourse analysis, lexicography, and language teaching.

The Oxford English Corpus (OEC) is a text corpus of 21st-century English, used by the makers of the Oxford English Dictionary and by Oxford University Press' language research programme. It is the largest corpus of its kind, containing nearly 2.1 billion words. It includes language from the UK, the United States, Ireland, Australia, New Zealand, the Caribbean, Canada, India, Singapore, and South Africa. The text is mainly collected from web pages; some printed texts, such as academic journals, have been collected to supplement particular subject areas. The sources are writings of all sorts, from "literary novels and specialist journals to everyday newspapers and magazines and from Hansard to the language of blogs, emails, and social media". This may be contrasted with similar databases that sample only a specific kind of writing. The corpus is generally available only to researchers at Oxford University Press, but other researchers who can demonstrate a strong need may apply for access.

<span class="mw-page-title-main">Advanced learner's dictionary</span> Type of monolingual learners dictionary

The advanced learner's dictionary is the most common type of monolingual learner's dictionary, that is, a dictionary written in one language only, for someone who is learning a foreign language. It differs from a bilingual or translation dictionary, a standard dictionary written for native speakers, or a children's dictionary. Its definitions are usually built on a restricted defining vocabulary. "Advanced" usually refers learners with a proficiency level of B2 or above according to the Common European Framework. Basic learner's dictionaries also exist.

The British National Corpus (BNC) is a 100-million-word text corpus of samples of written and spoken English from a wide range of sources. The corpus covers British English of the late 20th century from a wide variety of genres, with the intention that it be a representative sample of spoken and written British English of that time. It is used in corpus linguistics for analysis of corpora.

Patrick Hanks is an English lexicographer, corpus linguist, and onomastician. He has edited dictionaries of general language, as well as dictionaries of personal names.

Data-driven learning (DDL) is an approach to foreign language learning. Whereas most language learning is guided by teachers and textbooks, data-driven learning treats language as data and students as researchers undertaking guided discovery tasks. Underpinning this pedagogical approach is the data - information - knowledge paradigm. It is informed by a pattern-based approach to grammar and vocabulary, and a lexicogrammatical approach to language in general. Thus the basic task in DDL is to identify patterns at all levels of language. From their findings, foreign language students can see how an aspect of language is typically used, which in turn informs how they can use it in their own speaking and writing. Learning how to frame language questions and use the resources to obtain data and interpret it is fundamental to learner autonomy. When students arrive at their own conclusions through such procedures, they use their higher order thinking skills and are creating knowledge.

The Corpus of Contemporary American English (COCA) is a one-billion-word corpus of contemporary American English. It was created by Mark Davies, retired professor of corpus linguistics at Brigham Young University (BYU).

<i>Collins English Dictionary</i> Printed and online dictionary of English, published by HarperCollins in Glasgow

The Collins English Dictionary is a printed and online dictionary of English. It is published by HarperCollins in Glasgow.

The lexical approach is a method of teaching foreign languages described by Michael Lewis in the early 1990s. The basic concept on which this approach rests is the idea that an important part of learning a language consists of being able to understand and produce lexical phrases as chunks. Students are taught to be able to perceive patterns of language (grammar) as well as have meaningful set uses of words at their disposal when they are taught in this way. In 2000, Norbert Schmitt, an American linguist and a Professor of Applied Linguistics at the University of Nottingham in the United Kingdom, contributed to a learning theory supporting the lexical approach he stated that "the mind stores and processes these [lexical] chunks as individual wholes." The short-term capacity of the brain is much more limited than long-term and so it is much more efficient for our brain to pull up a lexical chunk as if it were one piece of information as opposed to pulling up each word as separate pieces of information.

<span class="mw-page-title-main">Michael Hoey (linguist)</span> British linguist (1948–2021)

Michael Hoey was a British linguist and Baines Professor of English Language. He lectured in applied linguistics in over 40 countries.

Pattern Grammar is a model for describing the syntactic environments of individual lexical items, derived from studying their occurrences in authentic linguistic corpora. It was developed by Hunston, Francis, and Manning as part of the COBUILD project. It is a highly informal account that suggests a linear view of grammar.

<i>Collins COBUILD Advanced Dictionary</i>

The Collins COBUILD Advanced Dictionary (CCAD) from HarperCollins, first published in 1987, is a dictionary that distinguished itself by providing definitions in full sentences, rather than excerpted phrases. Example sentences are given for almost every meaning of every word, drawn from a large corpus of actual usage.

Susan Elizabeth Hunston is a British linguist. She received her PhD in English under the supervision of Michael Hoey at the University of Birmingham in 1989. She does research in the areas of corpus linguistics and applied linguistics. She is one of the primary developers of the Pattern Grammar model of linguistic analysis, which is a way of describing the syntactic environments of individual words, based on studying their occurrences in large sets of authentic examples, i.e. language corpora. The Pattern Grammar model was developed as part of the COBUILD project, where Hunston worked for several years as a senior grammarian for the Collins Cobuild English Dictionary.

CLOC was a first generation general purpose text analyzer program. It was produced at the University of Birmingham and could produce concordances as well as word lists and collocational analysis of text. First-generation concordancers were typically held on a mainframe computer and used at a single site; individual research teams would build their own concordancer and use it on the data they had access to locally, any further analysis was done by separate programs.

References