Geoffrey Sampson

Geoffrey Sampson
Geoffrey Sampson
Born	1944; Broxbourne, Hertfordshire
Nationality	British
Known for	The 'Language Instinct' Debate
	Scientific career
Fields	Linguistics, Computing, Economics
Institutions	London School of Economics ; University of Lancaster ; University of Leeds ; University of Sussex ;

Last updated March 31, 2024

Geoffrey Sampson (born 1944) is Professor of Natural Language Computing in the Department of Informatics, University of Sussex.^[1] He produces annotation standards for compiling corpora (databases) of ordinary usage of the English language.^[1] His work has been applied in automatic language-understanding software, and in writing-skills training.^[1] He has also analysed Ronald Coase's "theory of the firm" and the economic and political implications of e-business.^[1]

Career

Sampson is a Fellow of the Royal Society of Arts, the British Computer Society and the Higher Education Academy.^[2] He is also a Chartered Information Technology Professional.^[2] He holds three MA degrees, one each from Cambridge, Yale and Oxford.^[2] After graduating from St. John's he went on to Yale, conducting research in the Linguistics and Engineering & Applied Science departments.^[2] He was awarded a doctorate by Cambridge under the special regulations;^[2] his published work was deemed to comprise "a significant contribution to scholarship".^[3]

His academic career has included work in Asian languages, linguistics and computing, with side interests in philosophy, and political and economic thought. He lectured at the London School of Economics, the University of Lancaster and the University of Leeds before moving to Sussex in 1991.^[2]

Sampson is widely known for academic papers criticising the linguistic nativist movement, including the arguments of proponents such as Noam Chomsky, Jerry Fodor and Steven Pinker. Sampson critically engaged with Pinker's 1994 book The Language Instinct , in his own book The 'Language Instinct' Debate , the first edition of which, published in 1997, was entitled Educating Eve.

Political activities

Sampson is politically active and was elected to Wealden District Council in 2001, serving until 2002 with the local Conservative Party branch. He resigned this position after he was criticised by Labour Party and Liberal Democrat ministers and councillors for publishing on his website an article, There's Nothing Wrong With Racism (Except the Name), containing a number of racist claims. The outcome was subsequently endorsed by Conservative Central Office as "in the best interests of all concerned ...the Conservative party is opposed to all forms of racial discrimination".^[4] Some time later he left the Conservative Party and in 2006 joined the United Kingdom Independence Party.^[5]

Selection of publications

Monographs

The Form of Language (Weidenfeld & Nicolson, 1975)
Liberty and Language (Oxford, 1979)
Making Sense (Oxford, 1980)
Schools of Linguistics: Competition and Evolution (Hutchinson, 1980)
Writing Systems (Anchor Brenton Ltd., 1985)
Educating Eve: The 'Language Instinct' Debate (Continuum, 1997)
Empirical Linguistics (Continuum, 2001)

Essays

"From central embedding to corpus linguistics" in Using Corpora for Language Research (Longman, 1996)

Articles

"What was transformational grammar?" Lingua 48 (1979): 355–78.
"Popperian language-acquisition undefeated". British Journal for the Philosophy of Science 31 (1980): 63–67.
Geoffrey Sampson (1 January 1989). "How Fully Does a Machine-Usable Dictionary Cover English Text?". Digital Scholarship in the Humanities . 4 (1): 29–35. doi:10.1093/LLC/4.1.29. ISSN 0268-1145. Wikidata Q115688673.
"Depth in English grammar". Journal of Linguistics 33 (1997): 131–51.
"Grammatical depth: a rejoinder". Computational Linguistics 25 (1999): xx–xx.
"Briefly noted – English for the computer: the SUSANNE corpus and analytic scheme". Computational Linguistics 28 (2002): xx–xx.
"Word frequency distributions". Computational Linguistics 28 (2002): xx–xx.
"The myth of diminishing firms". Communications of the ACM 46 (2003): xx–xx.
"A test of the leaf-ancestor metric for parse accuracy" Natural Language Engineering 9 (2003): xx–xx. [with Anna Babarczy]
"Definitional, personal, and mechanical constraints on part of speech annotation performance". Natural Language Engineering 12 (2006): xx–xx. [with Anna Babarczy and John Carroll (not John M Carroll)]

Reviews

Steven Pinker, Words and Rules for Times Higher Education Supplement 12 May (2000): 22–23.

Related Research Articles

Corpus linguistics is the study of a language as that language is expressed in its text corpus, its body of "real world" text. Corpus linguistics proposes that a reliable analysis of a language is more feasible with corpora collected in the field—the natural context ("realia") of that language—with minimal experimental interference. The large collections of text allow linguists to run quantitative analyses on linguistic concepts, otherwise harder to quantify.

In linguistics and natural language processing, a corpus or text corpus is a dataset, consisting of natively digital and older, digitalized, language resources, either annotated or unannotated.

Word-sense disambiguation (WSD) is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious/automatic but can often come to conscious attention when ambiguity impairs clarity of communication, given the pervasive polysemy in natural language. In computational linguistics, it is an open problem that affects other computer-related writing, such as discourse, improving relevance of search engines, anaphora resolution, coherence, and inference.

The Brown University Standard Corpus of Present-Day American English, better known as simply the Brown Corpus, is an electronic collection of text samples of American English, the first major structured corpus of varied genres. This corpus first set the bar for the scientific study of the frequency and distribution of word categories in everyday language use. Compiled by Henry Kučera and W. Nelson Francis at Brown University, in Rhode Island, it is a general language corpus containing 500 samples of English, totaling roughly one million words, compiled from works published in the United States in 1961.

The Language Acquisition Device (LAD) is a claim from language acquisition research proposed by Noam Chomsky in the 1960s. The LAD concept is a purported instinctive mental capacity which enables an infant to acquire and produce language. It is a component of the nativist theory of language. This theory asserts that humans are born with the instinct or "innate facility" for acquiring language. The main argument given in favor of the LAD was the argument from the poverty of the stimulus, which argues that unless children have significant innate knowledge of grammar, they would not be able to learn language as quickly as they do, given that they never have access to negative evidence and rarely receive direct instruction in their first language.

Dr. Hermann Moisl is a retired senior lecturer and visiting fellow in Linguistics at Newcastle University. He was educated at various institutes, including Trinity College Dublin and the University of Oxford.

The American National Corpus (ANC) is a text corpus of American English containing 22 million words of written and spoken data produced since 1990. Currently, the ANC includes a range of genres, including emerging genres such as email, tweets, and web data that are not included in earlier corpora such as the British National Corpus. It is annotated for part of speech and lemma, shallow parse, and named entities.

In linguistics, a treebank is a parsed text corpus that annotates syntactic or semantic sentence structure. The construction of parsed corpora in the early 1990s revolutionized computational linguistics, which benefitted from large-scale empirical data.

Geoffrey Neil Leech FBA was a specialist in English language and linguistics. He was the author, co-author, or editor of more than 30 books and more than 120 published papers. His main academic interests were English grammar, corpus linguistics, stylistics, pragmatics, and semantics.

The British National Corpus (BNC) is a 100-million-word text corpus of samples of written and spoken English from a wide range of sources. The corpus covers British English of the late 20th century from a wide variety of genres, with the intention that it be a representative sample of spoken and written British English of that time. It is used in corpus linguistics for analysis of corpora.

Linguistic categories include

The International Corpus of English (ICE) is a set of text corpora representing varieties of English from around the world. Over twenty countries or groups of countries where English is the first language or an official second language are included.

<i>Educating Eve</i> 1997 book by Geoffrey Sampson

Educating Eve: The 'Language Instinct' Debate is a book by Geoffrey Sampson, providing arguments against Noam Chomsky's theory of a human instinct for (first) language acquisition. Sampson explains the original title of the book as a deliberate allusion to Educating Rita (1980), and uses the plot of that play to illustrate his argument. Sampson's book is a response to Steven Pinker's The Language Instinct specifically and Chomskyan linguistic nativism broadly.

The Constituent Likelihood Automatic Word-tagging System (CLAWS) is a program that performs part-of-speech tagging. It was developed in the 1980s at Lancaster University by the University Centre for Computer Corpus Research on Language. It has an overall accuracy rate of 96-97% with the latest version (CLAWS4) tagging around 100 million words of the British National Corpus.

<span class="mw-page-title-main">Quranic Arabic Corpus</span>

The Quranic Arabic Corpus is an annotated linguistic resource consisting of 77,430 words of Quranic Arabic. The project aims to provide morphological and syntactic annotations for researchers wanting to study the language of the Quran.

The knowledge acquisition bottleneck is perhaps the major impediment to solving the word-sense disambiguation (WSD) problem. Unsupervised learning methods rely on knowledge about word senses, which is barely formulated in dictionaries and lexical databases. Supervised learning methods depend heavily on the existence of manually annotated examples for every word sense, a requisite that can so far be met only for a handful of words for testing purposes, as it is done in the Senseval exercises.

The Spoken English Corpus (SEC) is a speech corpus collection of recordings of spoken British English compiled during 1984–1987. The corpus manual can be found on ICAME.

<i>Longman Grammar of Spoken and Written English</i>

Longman Grammar of Spoken and Written English (LGSWE) is a descriptive grammar of English written by Douglas Biber, Stig Johansson, Geoffrey Leech, Susan Conrad, and Edward Finegan, first published by Longman in 1999. It is an authoritative description of modern English, a successor to A Comprehensive Grammar of the English Language (ComGEL) published in 1985 and a predecessor of the Cambridge Grammar of the English Language (CamGEL) published in 2002. The authors and some reviewers consider it a complement rather than a replacement of the former since it follows – with few exceptions – the grammatical framework and concepts from ComGEL, which is also corroborated by the fact that one of LGSWE's authors, Geoffrey Leech, is also a co-author of ComGEL.

<span class="mw-page-title-main">Adam Kilgarriff</span>

Adam Kilgarriff was a corpus linguist, lexicographer, and co-author of Sketch Engine.

Paul Baker is a British professor and linguist at the Department of Linguistics and English Language of Lancaster University, United Kingdom. His research focuses on corpus linguistics, critical discourse analysis, corpus-assisted discourse studies and language and identity. He is known for his research on the language of Polari. He is a Fellow of the Academy of Social Sciences and a Fellow of the Royal Society for Arts.

References

1 2 3 4 Geoffrey Sampson, University of Sussex staff bio page.
1 2 3 4 5 6 Geoffrey Sampson, personal website.
↑ PhD by Special Regulations, Board of Graduate Studies, Cambridge University.
↑ Tory councillor forced to step down after racism row, Staff and agencies, The Guardian, 14 May 2002
↑ Life, official website

External links

Geoffrey Sampson — staff bio page @ official University of Sussex website
Geoffrey Sampson — personal website

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Sussex-1] 1 2 3 4 Geoffrey Sampson, University of Sussex staff bio page.

[Sampson-2] 1 2 3 4 5 6 Geoffrey Sampson, personal website.

[Cambridge-3] PhD by Special Regulations, Board of Graduate Studies, Cambridge University.

[4] Tory councillor forced to step down after racism row, Staff and agencies, The Guardian, 14 May 2002

[5] Life, official website

[1]

[2]

[3]

[4]

[5]

v t e University of Sussex
People	Sanjeev Bhaskar (Chancellor) Sasha Roseneil (Vice-Chancellor) Alumni
Departments and buildings	Astronomy Centre Brighton and Sussex Medical School Institute of Development Studies The Keep, Brighton Science Policy Research Unit Swanborough Manor
Student life	The Badger
Category

v t e Department of Linguistics and English Language, Lancaster University
Academics	Charles Alderson Paul Baker David Barton Martin Bygate Jonathan Culpeper Norman Fairclough Sylviane Granger Claire Hardaker Luke Harding Rosalind Ivanić Keith Johnson Francis Katamba Paul Kerswill Veronika Koller Judit Kormos Geoffrey Leech Alison Mackey Tony McEnery Marije Michel Greg Myers Jenefer Philp Geoffrey Sampson Elena Semino Anna Siewierska Mick Short Jane Sunderland Theo van Leeuwen Ruth Wodak
Alumni	Anthony Baxter Lilie Chouliaraki Ali Hajjaj Rami Hamdallah Ray Honeyford Karattuparambil Jayaseelan Majid KhosraviNik Alec McHoul Sarah Mercer Ahdaf Soueif Mary Talbot
Partner universities	Guangdong University of Foreign Studies
Other	DIALANG Lancaster-Oslo-Bergen Corpus

v t e Lancaster University
Colleges	Bowland Cartmel County Furness Fylde Graduate Grizedale Lonsdale Pendle
People	Chancellor: Alan Milburn Vice-Chancellor: Andy Schofield Academics Alumni Vice-Chancellor of the County Palatine of Lancaster
Buildings	Bailrigg InfoLab21 The Ruskin: Library, Museum and Research Centre Chaplaincy Centre
Departments	Lancaster Environment Centre Lancaster Institute for the Contemporary Arts Management School Medical School Lancaster Arts Richardson Institute
Student life	Lancaster University Students' Union Bailrigg FM LA1TV SCAN International Volunteers (InterVol) Boat Club Roses Tournament
Symbols	Coat of arms Red Rose of Lancaster
Category Commons

Authority control databases
International	FAST ISNI VIAF
National	Norway Spain France BnF data Catalonia Germany Israel Belgium United States Latvia Czech Republic Australia Korea Croatia Netherlands Poland
Academics	CiNii
Other	IdRef

Geoffrey Sampson
Born	1944 Broxbourne, Hertfordshire
Nationality	British
Known for	The 'Language Instinct' Debate
Scientific career
Fields	Linguistics, Computing, Economics
Institutions	London School of Economics University of Lancaster University of Leeds University of Sussex