Cambridge English Corpus

Last updated

The Cambridge English Corpus (CEC) (formerly the Cambridge International Corpus, CIC) is a multi-billion word corpus of English language (containing both text corpus and spoken corpus data). The Cambridge English Corpus contains data from a number of sources including written and spoken, British and American English. The CEC also contains the Cambridge Learner Corpus, a 40m word corpus made up from English exam responses written by English language learners.

Contents

The Cambridge English Corpus is used to inform Cambridge University Press English Language Teaching publications as well as for research in corpus linguistics. Access is currently restricted to authors and researchers working on projects and publications for Cambridge University Press, and researchers at Cambridge English Language Assessment. [1]

It contains instances of modern written English, taken from newspapers, magazines, novels, letters, emails, textbooks, websites, and many other sources. Its spoken data is taken from many sources, including everyday conversations, telephone calls, radio broadcasts, presentations, speeches, meetings, TV programmes and lectures.

Cambridge Learner Corpus

The Cambridge Learner Corpus (CLC) is a collection of exam scripts written by students learning English, built in collaboration with Cambridge English Language Assessment. The CLC contains scripts from over 180,000 students, from around 200 countries, speaking 138 different first languages and is growing all the time. [2] The exams currently included are:

A unique feature of the Cambridge Learner Corpus is its error coding system. Language specialists identify and annotate errors in the exam scripts. This means that the Corpus can be used to find out about the frequency of different types of errors, the contexts that the errors are made in and the student groups that find particular language areas difficult. [3]

Authors of Cambridge English Language Teaching resources can use this information to target common errors – for example, the Cambridge Advanced Learner’s Dictionary contains ‘Common mistake’ features which highlight frequent learner errors.

Conversely, the error coding system also reveals what students can achieve at each level. This is central to the work of English Profile, a collaborative programme to enhance the learning, teaching and assessment of English worldwide. [4] The founding partners are Cambridge University Press, Cambridge English Language Assessment, the University of Cambridge, the University of Bedfordshire, the British Council and English UK. [5] The project’s aim is to describe what learners know and can do in English at each level of the Common European Framework of Reference (CEFR). [6]

Specialized corpora

The Cambridge English Corpus contains a number of specialized corpora:

Cambridge Business English Corpus

The Cambridge Business English Corpus is a large collection of British and American business language, including reports and documents, books relating to different aspects of business, and the business sections from many national newspapers.

The Cambridge Business English Corpus also includes the Cambridge and Nottingham Spoken Business English Corpus (CANBEC), the result of a joint project between Cambridge University Press and the University of Nottingham. This is a collection of recordings of English from companies of all sizes, ranging from big multinational companies to small partnerships. It contains formal and informal meetings, presentations, telephone conversations, lunchtime conversations, and spoken language from other business situations.

The Cambridge Legal English Corpus contains books, journals and newspaper articles relating to the law and legal processes.

Cambridge Financial English Corpus

The Cambridge Financial English Corpus contains texts relating to economics and finance, including leading financial magazines and newspapers.

Cambridge Academic English Corpus

The Cambridge Academic English Corpus contains written and spoken academic language at undergraduate and post-graduate level from a range of US and UK institutions, including lectures, seminars, student presentations, journals, essays and text books.

CANCODE

The Cambridge and Nottingham Corpus of Discourse in English (CANCODE) is a collection of spoken English recorded at hundreds of locations across the British Isles in a wide variety of situations (e.g. casual conversation, socialising, finding out information, and discussions). The CANCODE corpus is the result of a joint project between Cambridge University Press and the University of Nottingham.

There are about five million words in the CANCODE corpus, and it's a very rich resource for researchers of spoken English. However, the data does have some limitations. Most people knew they were being recorded, and are chatting in informal situations such as while relaxing at home, with others of fairly equal social status. This means the interactions are generally consensual and collaborative, so the corpus has minimal evidence of conflict or adversarial exchanges [7]

Cambridge-Cornell Corpus of Spoken North American English

The Cambridge University Press/Cornell Corpus is a large collection of informal, highly interactive, multiparty conversations between family/friends in North America. The Cambridge-Cornell corpus is the result of a joint project between Cambridge University Press and Cornell University.

CAMSNAE

The Cambridge Corpus of Spoken North American English (CAMSNAE) is a large collection of spoken American English. It includes recordings of people going about their everyday life – at work, at home with their families, going shopping, having meals, etc.

See also

Related Research Articles

University of Cambridge Local Examinations Syndicate (UCLES) is a non-teaching department of the University of Cambridge, which operates under the brand name Cambridge Assessment, and is part of Cambridge University Press & Assessment. It provides educational assessments, which include the Oxford, Cambridge and RSA Examinations (OCR) examination board, Cambridge Assessment International Education, Cambridge Assessment Admissions Testing, and Cambridge Assessment English for learners of the English language.

The Common European Framework of Reference for Languages: Learning, Teaching, Assessment, abbreviated in English as CEFR or CEF or CEFRL, is a guideline used to describe achievements of learners of foreign languages across Europe and, increasingly, in other countries. The CEFR is also intended to make it easier for educational institutions and employers to evaluate the language qualifications of candidates for education admission or employment. Its main aim is to provide a method of learning, teaching, and assessing that applies to all languages in Europe.

Cambridge Assessment English or Cambridge English develops and produces Cambridge English Qualifications and the International English Language Testing System (IELTS). The organisation contributed to the development of the Common European Framework of Reference for Languages (CEFR), the standard used around the world to benchmark language skills, and its qualifications and tests are aligned with CEFR levels.

A language school is a school where one studies a foreign language. Classes at a language school are usually geared towards, for example, communicative competence in a foreign language. Language learning in such schools typically supplements formal education or existing knowledge of a foreign language.

B1 Preliminary, previously known as Cambridge English: Preliminary and the Preliminary English Test (PET), is an English language examination provided by Cambridge Assessment English.

C2 Proficiency, previously known as Cambridge English: Proficiency and the Certificate of Proficiency in English (CPE), is an English language examination provided by Cambridge Assessment English (previously known as Cambridge English Language Assessment and University of Cambridge ESOL examination).

C1 Advanced, previously known as Cambridge English: Advanced and the Certificate in Advanced English (CAE), is an English language examination provided by Cambridge Assessment English (previously known as Cambridge English Language Assessment and University of Cambridge ESOL examination).

B2 First, previously known as Cambridge English: First and the First Certificate in English (FCE), is an English language examination provided by Cambridge Assessment English (previously known as Cambridge English Language Assessment and University of Cambridge ESOL examinations).

Trinity College London, established in 1872, is a leading international exam board, publisher and independent education charity. Since 1938 Trinity has been offering English language assessments taken by over 850,000 candidates in over 60 countries each year.

Pearson Language Tests is a unit of the Pearson PLC group, dedicated to assessing and validating the English language usage of non-native English speakers. The tests include the Pearson Test of English Academic, the PTE General, and PTE Young Learners. These are scenario-based exams, accredited by the QCA and Ofqual, and are administered in association with Edexcel, the world's largest academic examining body.

English Profile is an interdisciplinary research programme designed to enhance the learning, teaching and assessment of English worldwide. The aim of the programme is to provide a clear benchmark for progress in English by clearly describing the language that learners need at each level of the Common European Framework of Reference for Languages (CEFR). By making the CEFR more accessible, English Profile will provide support for the development of curricula and teaching materials, and in assessing students' language proficiency.

The European Language Certificates are international standardised tests of ten languages.

A2 Key, previously known as Cambridge English: Key and the Key English Test (KET), is an English language examination provided by Cambridge Assessment English.

The EF Standard English Test is a standardized test of the English language designed for non-native English speakers. It is the product of EF Education First, a global language training company, and a team of language assessment experts including Lyle Bachman, Mari Pearlman, and Ric Luecht. EF compares the EFSET's accuracy to the most widely used high stakes standardized English tests: TOEFL, IELTS, and Cambridge International Examinations.

The Cambridge English Scale is a single range of scores used to report results for Cambridge English Language Assessment exams. It was introduced in January 2015, with Cambridge English Scale scores replacing the standardised score and candidate profile used for exams taken pre-2015. The scale aims to provide exam users with more detailed information about their exam performance than was previously available.

The Michigan English Test (MET) is a multilevel, modular English language examination, which measures English language proficiency in personal, public, occupational and educational contexts. It is developed by CaMLA, a not-for-profit collaboration between the University of Michigan and the University of Cambridge and has been in use since 2008.

The Examination for the Certificate in Competency in English (ECCE) is a high-intermediate level English language qualification that focuses on Level B2 of the Common European Framework of Reference for Languages (CEFR).

The Examination for the Certificate in Proficiency in English (ECPE) is an advanced level English language qualification that focuses on Level C2 of the Common European Framework of Reference for Languages (CEFR).

Cambridge English Qualifications are a graduated series of exams designed to assess competency in English for learners of English as a second or foreign language. The Cambridge English Qualifications are based on the candidate's scoring on the Cambridge English Scale which is a single range of scores used to report results for Cambridge English Language Assessment exams. It was introduced in January 2015, with Cambridge English Scale scores replacing the standardised score and candidate profile used for exams taken pre-2015. The scale aims to provide exam users with more detailed information about their exam performance than was previously available.

References

  1. Cambridge International Corpus, http://www.cambridge.org/us/esl/catalog/subject/custom/item3637700/Cambridge-International-Corpus-Cambridge-International-Corpus/?site_locale=en_US
  2. Cambridge Learner Corpus, http://www.cambridge.org/us/esl/catalog/subject/custom/item3646603/Cambridge-International-Corpus-Cambridge-Learner-Corpus/?site_locale=en_US
  3. Diane Nicholls, http://ucrel.lancs.ac.uk/publications/CL2003/papers/nicholls.pdf
  4. English Profile project, http://www.englishprofile.org/index.php?option=com_content&view=article&id=11&Itemid=2 Archived 2011-09-14 at the Wayback Machine
  5. English Profile, http://www.englishprofile.org/index.php?option=com_content&view=article&id=24&Itemid=22 Archived 2011-05-07 at the Wayback Machine
  6. Council of Europe, CEFR levels, "Archived copy". Archived from the original on 30 October 2009. Retrieved 5 November 2009.{{cite web}}: CS1 maint: archived copy as title (link)
  7. Carter (2004) Language and Creativity: The Art of Common Talk. London: Routledge.