General Service List

Last updated

The General Service List (GSL) is a list of roughly 2,000 words published by Michael West in 1953. [1] The words were selected to represent the most frequent words of English and were taken from a corpus of written English. The target audience was English language learners and ESL teachers. To maximize the utility of the list, some frequent words that overlapped broadly in meaning with words already on the list were omitted. In the original publication the relative frequencies of various senses of the words were also included.

Contents

Details

The list is important because a person who knows all the words on the list and their related families would understand approximately 90–95 per cent of colloquial speech and 80–85 per cent of common written texts. The list consists only of headwords, which means that the word "be" is high on the list, but assumes that the person is fluent in all forms of the word, e.g. am, is, are, was, were, being, and been.

Researchers have expressed doubts about the adequacy of the GSL because of its age and the relatively low coverage provided by the words not in the first 1,000 words of the list. [2] Engels was, in particular, critical of the limited vocabulary chosen by West (1953), and while he concurred that the first 1,000 words of the GSL were good selections based on their high frequency and wide range, he was of the opinion that the words beyond the first 1,000 of the GSL could not be considered general service words because the range and frequency of these words were too low to be included in the list. Recent research by Billuroğlu and Neufeld (2005) confirmed that the General Service List was in need of minor revision, but the headwords in the list still provide approximately 80% text coverage in written English. The research showed that the GSL contains a small number of archaic terms, such as shilling, while excluding words that have gained currency since the first half of the twentieth century, such as plastic, television, battery, okay, victim, and drug.

The GSL evolved over several decades before West's publication in 1953. The GSL is not a list based solely on frequency, but includes groups of words on a semantic basis. [3] Various versions float around the Internet, and attempts have been made to improve it. [4]

There are two major updates of the GSL:

  1. the New General Service List (new-GSL) by Brezina & Gablasova, originally published in Applied Linguistics in 2013. This wordlist is based on the analysis of four language corpora of a total size of over 12 billion words. [5]
  2. the New General Service List (NGSL), published in March 2013 by Browne, Culligan and Phillips. The NGSL was based on a 273 million-word subsection of the 2 billion-word Cambridge English Corpus. Preliminary results show that the new list provides a substantially higher degree of coverage with fewer words. [6]

Some ESL dictionaries use the General Service List as their controlled defining vocabulary. In the Longman Dictionary of Contemporary English , each definition is written using the 2000-word Longman Defining Vocabulary based on the GSL. [7]

See also

Notes

  1. West, M. 1953. A General Service List of English Words. London: Longman, Green and Co.
  2. Engels, 1968
  3. Nation & Waring, 2004; Dickins
  4. Bauman & Culligan, 1995
  5. Brezina, V. & Gablasova, D. (2015) "Is There a Core General Vocabulary? Introducing the New General Service List", Applied Linguistics, 36(1), 1–22.
  6. Browne, Charles (July 2013). "The New General Service List: Celebrating 60 years of Vocabulary Learning". The Language Teacher. 34. 7: 13–15.
  7. Bullock, D. 'NSM + LDOCE: A Non-Circular Dictionary of English', International Journal of Lexicography, 24/2, 2011: 226–240.

Related Research Articles

Basic English is a controlled language based on standard English, but with a greatly simplified vocabulary and grammar. It was created by the linguist and philosopher Charles Kay Ogden as an international auxiliary language, and as an aid for teaching English as a second language. It was presented in Ogden's 1930 book Basic English: A General Introduction with Rules and Grammar.

A defining vocabulary is a list of words used by lexicographers to write dictionary definitions. The underlying principle goes back to Samuel Johnson's notion that words should be defined using 'terms less abstruse than that which is to be explained', and a defining vocabulary provides the lexicographer with a restricted list of high-frequency words which can be used for producing simple definitions of any word in the dictionary.

A vocabulary is a set of words, typically the set in a language or the set known to an individual. The word vocabulary originated from the Latin vocabulum, meaning "a word, name". It forms an essential component of language and communication, helping convey thoughts, ideas, emotions, and information. Vocabulary can be oral, written, or signed and can be categorized into two main types: active vocabulary and passive vocabulary. An individual's vocabulary continually evolves through various methods, including direct instruction, independent reading, and natural language exposure, but it can also shrink due to forgetting, trauma, or disease. Furthermore, vocabulary is a significant focus of study across various disciplines, like linguistics, education, psychology, and artificial intelligence. Vocabulary is not limited to single words; it also encompasses multi-word units known as collocations, idioms, and other types of phraseology. Acquiring an adequate vocabulary is one of the largest challenges in learning a second language.

<span class="mw-page-title-main">Brown Corpus</span> Data set of American English in 1961

The Brown University Standard Corpus of Present-Day American English, better known as simply the Brown Corpus, is an electronic collection of text samples of American English, the first major structured corpus of varied genres. This corpus first set the bar for the scientific study of the frequency and distribution of word categories in everyday language use. Compiled by Henry Kučera and W. Nelson Francis at Brown University, in Rhode Island, it is a general language corpus containing 500 samples of English, totaling roughly one million words, compiled from works published in the United States in 1961.

A document-term matrix is a mathematical matrix that describes the frequency of terms that occur in a each document in a collection. In a document-term matrix, rows correspond to documents in the collection and columns correspond to terms. This matrix is a specific instance of a document-feature matrix where "features" may refer to other properties of a document besides terms. It is also common to encounter the transpose, or term-document matrix where documents are the columns and terms are the rows. They are useful in the field of natural language processing and computational text analysis.

In corpus linguistics a key word is a word which occurs in a text more often than we would expect to occur by chance alone. Key words are calculated by carrying out a statistical test which compares the word frequencies in a text against their expected frequencies derived in a much larger corpus, which acts as a reference for general language use. Keyness is then the quality a word or phrase has of being "key" in its context.

In morphology and lexicography, a lemma is the canonical form, dictionary form, or citation form of a set of word forms. In English, for example, break, breaks, broke, broken and breaking are forms of the same lexeme, with break as the lemma by which they are indexed. Lexeme, in this context, refers to the set of all the inflected or alternating forms in the paradigm of a single word, and lemma refers to the particular form that is chosen by convention to represent the lexeme. Lemmas have special significance in highly inflected languages such as Arabic, Turkish, and Russian. The process of determining the lemma for a given lexeme is called lemmatisation. The lemma can be viewed as the chief of the principal parts, although lemmatisation is at least partly arbitrary.

Japanese dictionaries have a history that began over 1300 years ago when Japanese Buddhist priests, who wanted to understand Chinese sutras, adapted Chinese character dictionaries. Present-day Japanese lexicographers are exploring computerized editing and electronic dictionaries. According to Nakao Keisuke (中尾啓介):

It has often been said that dictionary publishing in Japan is active and prosperous, that Japanese people are well provided for with reference tools, and that lexicography here, in practice as well as in research, has produced a number of valuable reference books together with voluminous academic studies. (1998:35)

Studies that estimate and rank the most common words in English examine texts written in English. Perhaps the most comprehensive such analysis is one that was conducted against the Oxford English Corpus (OEC), a massive text corpus that is written in the English language.

The Academic Word List (AWL) is a word list of 570 English word families which appear with great frequency in a broad range of academic texts. The target readership is English as a second or foreign language students intending to enter English-medium higher education, and teachers of such students. The AWL was developed by Averil Coxhead at the School of Linguistics and Applied Language Studies at Victoria University of Wellington, New Zealand. His list replaced the previously widely used University Word List, developed by Xue and Nation in 1986. The words included in the AWL were selected based on their range, frequency, and dispersion, and were divided into ten sublists, each containing 1000 words in decreasing order of frequency. The AWL excludes words from the General Service List. Many words in the AWL are general vocabulary not restricted to an academic domain, such as the words area, approach, create, similar, and occur, found in Sublist One, and the AWL only accounts for a small percentage of the actual word occurrences in academic texts.

<i>Longman Dictionary of Contemporary English</i> Advanced learners dictionary

The Longman Dictionary of Contemporary English (LDOCE), first published by Longman in 1978, is an advanced learner's dictionary, providing definitions using a restricted vocabulary, helping non-native English speakers understand meanings easily. It is available in four configurations:

A conceptual dictionary is a dictionary that groups words by concept or semantic relation instead of arranging them in alphabetical order. Examples of conceptual dictionaries are picture dictionaries, thesauri, and visual dictionaries. Onelook.com and Diccionario Ideológico de la Lengua Española are specific online and print examples.

A word list is a list of a language's lexicon within some given text corpus, serving the purpose of vocabulary acquisition. A lexicon sorted by frequency "provides a rational basis for making sure that learners get the best return for their vocabulary learning effort", but is mainly intended for course writers, not directly for learners. Frequency lists are also made for lexicographical purposes, serving as a sort of checklist to ensure that common words are not left out. Some major pitfalls are the corpus content, the corpus register, and the definition of "word". While word counting is a thousand years old, with still gigantic analysis done by hand in the mid-20th century, natural language electronic processing of large corpora such as movie subtitles has accelerated the research field.

Ian Stephen Paul Nation is an internationally recognized scholar in the field of linguistics and teaching methodology.

Macmillan English Dictionary for Advanced Learners, also known as MEDAL, is an advanced learner's dictionary first published in 2002 by Macmillan Education. It shares most of the features of this type of dictionary: it provides definitions in simple language, using a controlled defining vocabulary; most words have example sentences to illustrate how they are typically used; and information is given about how words combine grammatically or in collocations. MEDAL also introduced a number of innovations. These include:

A word family is the base form of a word plus its inflected forms and derived forms made with suffixes and prefixes plus its cognates, i.e. all words that have a common etymological origin, some of which even native speakers don't recognize as being related. In the English language, inflectional affixes include third person -s, verbal -ed and -ing, plural -s, possessive -s, comparative -er and superlative -est. Derivational affixes include -able, -er, -ish, -less, -ly, -ness, -th, -y, non-, un-, -al, -ation, -ess, -ful, -ism, -ist, -ity, -ize/-ise, -ment, in-. The idea is that a base word and its inflected forms support the same core meaning, and can be considered learned words if a learner knows both the base word and the affix. Bauer and Nation proposed seven levels of affixes based on their frequency in English. It has been shown that word families can assist with deriving related words via affixes, along with decreasing the time needed to derive and recognize such words.

Norbert Schmitt is an American applied linguist and Emeritus Professor of Applied Linguistics at the University of Nottingham in the United Kingdom. He is known for his work on second-language vocabulary acquisition and second-language vocabulary teaching. He has published numerous books and papers on vocabulary acquisition.

<span class="mw-page-title-main">Reta Vortaro</span>

Reta Vortaro is a general-purpose multilingual Esperanto dictionary for the Internet. Each of the dictionary's headwords is defined in Esperanto, along with additional information, such as example sentences, to help distinguish the subtle shades of meaning that each particular word form may have.

The New General Service List (NGSL) is a list of 2,809 words (lemmas) claimed to be a list of words that second language learners of the English language are most likely to meet in their daily lives. It was published by Dr. Charles Browne, Dr. Brent Culligan and Joseph Phillips in March 2013 and updated in 2016 and 2023.

Averil Jean Coxhead is a New Zealand academic, and is a full professor at Victoria University of Wellington, specialising in applied linguistics. She is known for creating the Academic Word List, which is a list of 570 English word families that appear with great frequency in a broad range of academic texts. She has also created wordlists for other uses, such as rugby terms for referees and players, and building terms for Tongan tradespeople.

References