Geoffrey Sampson | |
---|---|
Born | 1944 Broxbourne, Hertfordshire |
Nationality | British |
Known for | The 'Language Instinct' Debate |
Scientific career | |
Fields | Linguistics, Computing, Economics |
Institutions |
Geoffrey Sampson (born 1944) is Professor of Natural Language Computing in the Department of Informatics, University of Sussex. [1] He produces annotation standards for compiling corpora (databases) of ordinary usage of the English language. [1] His work has been applied in automatic language-understanding software, and in writing-skills training. [1] He has also analysed Ronald Coase's "theory of the firm" and the economic and political implications of e-business. [1]
Sampson is a Fellow of the Royal Society of Arts, the British Computer Society and the Higher Education Academy. [2] He is also a Chartered Information Technology Professional. [2] He holds three MA degrees, one each from Cambridge, Yale and Oxford. [2] After graduating from St. John's College, Cambridge, he went on to Yale, conducting research in the Linguistics and Engineering & Applied Science departments. [2] He was awarded a doctorate by Cambridge under the special regulations; [2] his published work was deemed to comprise "a significant contribution to scholarship". [3]
His academic career has included work in Asian languages, linguistics and computing, with side interests in philosophy, and political and economic thought. He lectured at the London School of Economics, the University of Lancaster and the University of Leeds before moving to Sussex in 1991. [2]
Sampson is widely known for academic papers criticising the linguistic nativist movement, including the arguments of proponents such as Noam Chomsky, Jerry Fodor and Steven Pinker. Sampson critically engaged with Pinker's 1994 book The Language Instinct , in his own book The 'Language Instinct' Debate , the first edition of which, published in 1997, was entitled Educating Eve.
Sampson is politically active and was elected to Wealden District Council in 2001, serving until 2002 with the local Conservative Party branch. He resigned this position after he was criticised by Labour Party and Liberal Democrat ministers and councillors for publishing on his website an article, There's Nothing Wrong With Racism (Except the Name), containing a number of racist claims. The outcome was subsequently endorsed by Conservative Central Office as "in the best interests of all concerned ...the Conservative party is opposed to all forms of racial discrimination". [4] Some time later he left the Conservative Party and in 2006 joined the United Kingdom Independence Party. [5]
Corpus linguistics is an empirical method for the study of language by way of a text corpus. Corpora are balanced, often stratified collections of authentic, "real world", text of speech or writing that aim to represent a given linguistic variety. Today, corpora are generally machine-readable data collections.
In linguistics and natural language processing, a corpus or text corpus is a dataset, consisting of natively digital and older, digitalized, language resources, either annotated or unannotated.
Word-sense disambiguation is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious.
The Brown University Standard Corpus of Present-Day American English, better known as simply the Brown Corpus, is an electronic collection of text samples of American English, the first major structured corpus of varied genres. This corpus first set the bar for the scientific study of the frequency and distribution of word categories in everyday language use. Compiled by Henry Kučera and W. Nelson Francis at Brown University, in Rhode Island, it is a general language corpus containing 500 samples of English, totaling roughly one million words, compiled from works published in the United States in 1961.
The American National Corpus (ANC) is a text corpus of American English containing 22 million words of written and spoken data produced since 1990. Currently, the ANC includes a range of genres, including emerging genres such as email, tweets, and web data that are not included in earlier corpora such as the British National Corpus. It is annotated for part of speech and lemma, shallow parse, and named entities.
In linguistics, a treebank is a parsed text corpus that annotates syntactic or semantic sentence structure. The construction of parsed corpora in the early 1990s revolutionized computational linguistics, which benefitted from large-scale empirical data.
Geoffrey Neil Leech FBA was a specialist in English language and linguistics. He was the author, co-author, or editor of more than 30 books and more than 120 published papers. His main academic interests were English grammar, corpus linguistics, stylistics, pragmatics, and semantics.
The British National Corpus (BNC) is a 100-million-word text corpus of samples of written and spoken English from a wide range of sources. The corpus covers British English of the late 20th century from a wide variety of genres, with the intention that it be a representative sample of spoken and written British English of that time. It is used in corpus linguistics for analysis of corpora.
Linguistic categories include
Stefan Th. Gries is Professor of Linguistics in the Department of Linguistics at the University of California, Santa Barbara (UCSB), Honorary Liebig-Professor of the Justus-Liebig-Universität Giessen, and since 1 April 2018 also Chair of English Linguistics in the Department of English at the Justus-Liebig-Universität Giessen.
The International Corpus of English (ICE) is a set of text corpora representing varieties of English from around the world. Over twenty countries or groups of countries where English is the first language or an official second language are included.
Educating Eve: The 'Language Instinct' Debate is a book by Geoffrey Sampson, providing arguments against Noam Chomsky's theory of a human instinct for (first) language acquisition. Sampson explains the original title of the book as a deliberate allusion to Educating Rita (1980), and uses the plot of that play to illustrate his argument. Sampson's book is a response to Steven Pinker's The Language Instinct specifically and Chomskyan linguistic nativism broadly.
The Constituent Likelihood Automatic Word-tagging System (CLAWS) is a program that performs part-of-speech tagging. It was developed in the 1980s at Lancaster University by the University Centre for Computer Corpus Research on Language. It has an overall accuracy rate of 96–97% with the latest version (CLAWS4) tagging around 100 million words of the British National Corpus.
The Quranic Arabic Corpus is an annotated linguistic resource consisting of 77,430 words of Quranic Arabic. The project aims to provide morphological and syntactic annotations for researchers wanting to study the language of the Quran.
The knowledge acquisition bottleneck is perhaps the major impediment to solving the word-sense disambiguation (WSD) problem. Unsupervised learning methods rely on knowledge about word senses, which is barely formulated in dictionaries and lexical databases. Supervised learning methods depend heavily on the existence of manually annotated examples for every word sense, a requisite that can so far be met only for a handful of words for testing purposes, as it is done in the Senseval exercises.
Mark E. Davies is an American linguist. He specializes in corpus linguistics and language variation and change. He is the creator of most of the text corpora from English-Corpora.org as well as the Corpus del español and the Corpus do português. He has also created large datasets of word frequency, collocates, and n-grams data, which have been used by many large companies in the fields of technology and also language learning.
The Spoken English Corpus (SEC) is a speech corpus collection of recordings of spoken British English compiled during 1984–1987. The corpus manual can be found on ICAME.
Longman Grammar of Spoken and Written English (LGSWE) is a descriptive grammar of English written by Douglas Biber, Stig Johansson, Geoffrey Leech, Susan Conrad, and Edward Finegan, first published by Longman in 1999. It is an authoritative description of modern English, a successor to A Comprehensive Grammar of the English Language (ComGEL) published in 1985 and a predecessor of the Cambridge Grammar of the English Language (CamGEL) published in 2002. The authors and some reviewers consider it a complement rather than a replacement of the former since it follows – with few exceptions – the grammatical framework and concepts from ComGEL, which is also corroborated by the fact that one of LGSWE's authors, Geoffrey Leech, is also a co-author of ComGEL.
Adam Kilgarriff was a corpus linguist, lexicographer, and co-author of Sketch Engine.
Paul Baker is a British professor and linguist at the Department of Linguistics and English Language of Lancaster University, United Kingdom. His research focuses on corpus linguistics, critical discourse analysis, corpus-assisted discourse studies and language and identity. He is known for his research on the language of Polari. He is a Fellow of the Academy of Social Sciences and a Fellow of the Royal Society for Arts.