Geoffrey Sampson

Last updated

Geoffrey Sampson
Born1944
Broxbourne, Hertfordshire
NationalityBritish
Known for The 'Language Instinct' Debate
Scientific career
Fields Linguistics, Computing, Economics
Institutions

Geoffrey Sampson (born 1944) is Professor of Natural Language Computing in the Department of Informatics, University of Sussex. [1] He produces annotation standards for compiling corpora (databases) of ordinary usage of the English language. [1] His work has been applied in automatic language-understanding software, and in writing-skills training. [1] He has also analysed Ronald Coase's "theory of the firm" and the economic and political implications of e-business. [1]

Contents

Career

Sampson is a Fellow of the Royal Society of Arts, the British Computer Society and the Higher Education Academy. [2] He is also a Chartered Information Technology Professional. [2] He holds three MA degrees, one each from Cambridge, Yale and Oxford. [2] After graduating from St. John's he went on to Yale, conducting research in the Linguistics and Engineering & Applied Science departments. [2] He was awarded a doctorate by Cambridge under the special regulations; [2] his published work was deemed to comprise "a significant contribution to scholarship". [3]

His academic career has included work in Asian languages, linguistics and computing, with side interests in philosophy, and political and economic thought. He lectured at the London School of Economics, the University of Lancaster and the University of Leeds before moving to Sussex in 1991. [2]

Sampson is widely known for academic papers criticising the linguistic nativist movement, including the arguments of proponents such as Noam Chomsky, Jerry Fodor and Steven Pinker. Sampson critically engaged with Pinker's 1994 book The Language Instinct , in his own book The 'Language Instinct' Debate , the first edition of which, published in 1997, was entitled Educating Eve.

Political activities

Sampson is politically active and was elected to Wealden District Council in 2001, serving until 2002 with the local Conservative Party branch. He resigned this position after he was criticised by Labour Party and Liberal Democrat ministers and councillors for publishing on his website an article, There's Nothing Wrong With Racism (Except the Name), containing a number of racist claims. The outcome was subsequently endorsed by Conservative Central Office as "in the best interests of all concerned ...the Conservative party is opposed to all forms of racial discrimination". [4] Some time later he left the Conservative Party and in 2006 joined the United Kingdom Independence Party. [5]

Selection of publications

Monographs
Essays
Articles
Reviews

Related Research Articles

Corpus linguistics is the study of a language as that language is expressed in its text corpus, its body of "real world" text. Corpus linguistics proposes that a reliable analysis of a language is more feasible with corpora collected in the field—the natural context ("realia") of that language—with minimal experimental interference. The large collections of text allow linguists to run quantitative analyses on linguistic concepts, otherwise harder to quantify.

In linguistics and natural language processing, a corpus or text corpus is a dataset, consisting of natively digital and older, digitalized, language resources, either annotated or unannotated.

Word-sense disambiguation (WSD) is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious/automatic but can often come to conscious attention when ambiguity impairs clarity of communication, given the pervasive polysemy in natural language. In computational linguistics, it is an open problem that affects other computer-related writing, such as discourse, improving relevance of search engines, anaphora resolution, coherence, and inference.

<span class="mw-page-title-main">Brown Corpus</span> Data set of American English in 1961

The Brown University Standard Corpus of Present-Day American English, better known as simply the Brown Corpus, is an electronic collection of text samples of American English, the first major structured corpus of varied genres. This corpus first set the bar for the scientific study of the frequency and distribution of word categories in everyday language use. Compiled by Henry Kučera and W. Nelson Francis at Brown University, in Rhode Island, it is a general language corpus containing 500 samples of English, totaling roughly one million words, compiled from works published in the United States in 1961.

The Language Acquisition Device (LAD) is a claim from language acquisition research proposed by Noam Chomsky in the 1960s. The LAD concept is a purported instinctive mental capacity which enables an infant to acquire and produce language. It is a component of the nativist theory of language. This theory asserts that humans are born with the instinct or "innate facility" for acquiring language. The main argument given in favor of the LAD was the argument from the poverty of the stimulus, which argues that unless children have significant innate knowledge of grammar, they would not be able to learn language as quickly as they do, given that they never have access to negative evidence and rarely receive direct instruction in their first language.

Dr. Hermann Moisl is a retired senior lecturer and visiting fellow in Linguistics at Newcastle University. He was educated at various institutes, including Trinity College Dublin and the University of Oxford.

The American National Corpus (ANC) is a text corpus of American English containing 22 million words of written and spoken data produced since 1990. Currently, the ANC includes a range of genres, including emerging genres such as email, tweets, and web data that are not included in earlier corpora such as the British National Corpus. It is annotated for part of speech and lemma, shallow parse, and named entities.

<span class="mw-page-title-main">Treebank</span>

In linguistics, a treebank is a parsed text corpus that annotates syntactic or semantic sentence structure. The construction of parsed corpora in the early 1990s revolutionized computational linguistics, which benefitted from large-scale empirical data.

Geoffrey Neil Leech FBA was a specialist in English language and linguistics. He was the author, co-author, or editor of more than 30 books and more than 120 published papers. His main academic interests were English grammar, corpus linguistics, stylistics, pragmatics, and semantics.

The British National Corpus (BNC) is a 100-million-word text corpus of samples of written and spoken English from a wide range of sources. The corpus covers British English of the late 20th century from a wide variety of genres, with the intention that it be a representative sample of spoken and written British English of that time. It is used in corpus linguistics for analysis of corpora.

Linguistic categories include

The International Corpus of English (ICE) is a set of text corpora representing varieties of English from around the world. Over twenty countries or groups of countries where English is the first language or an official second language are included.

<i>Educating Eve</i> 1997 book by Geoffrey Sampson

Educating Eve: The 'Language Instinct' Debate is a book by Geoffrey Sampson, providing arguments against Noam Chomsky's theory of a human instinct for (first) language acquisition. Sampson explains the original title of the book as a deliberate allusion to Educating Rita (1980), and uses the plot of that play to illustrate his argument. Sampson's book is a response to Steven Pinker's The Language Instinct specifically and Chomskyan linguistic nativism broadly.

The Constituent Likelihood Automatic Word-tagging System (CLAWS) is a program that performs part-of-speech tagging. It was developed in the 1980s at Lancaster University by the University Centre for Computer Corpus Research on Language. It has an overall accuracy rate of 96-97% with the latest version (CLAWS4) tagging around 100 million words of the British National Corpus.

<span class="mw-page-title-main">Quranic Arabic Corpus</span>

The Quranic Arabic Corpus is an annotated linguistic resource consisting of 77,430 words of Quranic Arabic. The project aims to provide morphological and syntactic annotations for researchers wanting to study the language of the Quran.

The knowledge acquisition bottleneck is perhaps the major impediment to solving the word-sense disambiguation (WSD) problem. Unsupervised learning methods rely on knowledge about word senses, which is barely formulated in dictionaries and lexical databases. Supervised learning methods depend heavily on the existence of manually annotated examples for every word sense, a requisite that can so far be met only for a handful of words for testing purposes, as it is done in the Senseval exercises.

The Spoken English Corpus (SEC) is a speech corpus collection of recordings of spoken British English compiled during 1984–1987. The corpus manual can be found on ICAME.

<i>Longman Grammar of Spoken and Written English</i>

Longman Grammar of Spoken and Written English (LGSWE) is a descriptive grammar of English written by Douglas Biber, Stig Johansson, Geoffrey Leech, Susan Conrad, and Edward Finegan, first published by Longman in 1999. It is an authoritative description of modern English, a successor to A Comprehensive Grammar of the English Language (ComGEL) published in 1985 and a predecessor of the Cambridge Grammar of the English Language (CamGEL) published in 2002. The authors and some reviewers consider it a complement rather than a replacement of the former since it follows – with few exceptions – the grammatical framework and concepts from ComGEL, which is also corroborated by the fact that one of LGSWE's authors, Geoffrey Leech, is also a co-author of ComGEL.

<span class="mw-page-title-main">Adam Kilgarriff</span>

Adam Kilgarriff was a corpus linguist, lexicographer, and co-author of Sketch Engine.

Paul Baker is a British professor and linguist at the Department of Linguistics and English Language of Lancaster University, United Kingdom. His research focuses on corpus linguistics, critical discourse analysis, corpus-assisted discourse studies and language and identity. He is known for his research on the language of Polari. He is a Fellow of the Academy of Social Sciences and a Fellow of the Royal Society for Arts.

References

  1. 1 2 3 4 Geoffrey Sampson, University of Sussex staff bio page.
  2. 1 2 3 4 5 6 Geoffrey Sampson, personal website.
  3. PhD by Special Regulations, Board of Graduate Studies, Cambridge University.
  4. Tory councillor forced to step down after racism row, Staff and agencies, The Guardian, 14 May 2002
  5. Life, official website