Karen Spärck Jones | |
|---|---|
| Spärck Jones in 2002 | |
| Born | Karen Ida Boalth Spärck Jones 26 August 1935 Huddersfield, Yorkshire, England |
| Died | 4 April 2007 (aged 71) [1] Willingham, Cambridgeshire, England |
| Alma mater | University of Cambridge |
| Known for | Term frequency–inverse document frequency |
| Spouse | |
| Awards |
|
| Scientific career | |
| Fields | |
| Institutions | University of Cambridge |
| Thesis | Synonymy and Semantic Classification (1964) |
| Doctoral advisor | Richard Braithwaite [1] |
| Website | cl |
Karen Ida Boalth Spärck Jones FBA (26 August 1935 – 4 April 2007) was a self-taught programmer and a pioneering British computer and information scientist responsible for the concept of inverse document frequency (IDF), a technology that underlies most modern search engines. [2] [3] [4] [5] [6] [7] She was an advocate for women in computer science, her slogan being, "Computing is too important to be left to men. [8] " In 2019, The New York Times published her belated obituary in its series Overlooked, [9] [10] calling her "a pioneer of computer science for work combining statistics and linguistics, and an advocate for women in the field." [10] From 2008, to recognise her achievements in the fields of information retrieval [11] [12] (IR) and natural language processing (NLP), the Karen Spärck Jones Award is awarded annually to a recipient for outstanding research in one or both of her fields. [13] [14] [15] [16]
Karen Ida Boalth Spärck Jones was born in Huddersfield, Yorkshire, England. Her parents were Alfred Owen Jones, a chemistry lecturer, and Ida Spärck, a Norwegian who worked for the Norwegian government while in exile in London during World War II. [17]
Spärck Jones was educated at a grammar school in Huddersfield and then from 1953 to 1956 at Girton College, Cambridge, studying history, with an additional final year in Moral Sciences (philosophy). While at Cambridge, Spärck Jones joined the organisation known as the Cambridge Language Research Unit (CLRU) and met the head of CLRU Margaret Masterman, who would inspire her to go into computer science. [10] While working at the CLRU, Spärck Jones began pursuing her PhD At the time of submission, her PhD thesis was cast aside as uninspired and lacking original thought, but was later published in its entirety as a book. [18] She briefly became a school teacher [19] before moving into computer science. [20] Spärck Jones married fellow Cambridge computer scientist Roger Needham in 1958. [21] [10]
Spärck Jones's mother, Ida Spärck, had fled Norway on one of the last boats out after the German invasion in April 1940, going on to serve the Norwegian government in exile in London throughout the war. [21] This background of displacement and resilience shaped the household in which Spärck Jones grew up. She later kept her mother's Norwegian surname professionally after marrying, stating that "it maintains a permanent existence of your own." [10]
Spärck Jones described her entry into computing as almost accidental. She had been working as a schoolteacher when she began visiting the CLRU out of curiosity about her husband's work. It was Margaret Masterman — whom she later described as "a very strange and interesting woman" — who offered her a research position and drew her fully into the field. [10]
Spärck Jones worked at the Cambridge Language Research Unit from the late 1950s, [21] then at Cambridge University Computer Laboratory from 1974 until her retirement in 2002.
From 1999, she held the post of Professor of Computers and Information. [1] She had been given a permanent position only in 1993, and earlier in her career had been employed on a series of short-term contracts. [10] She continued to work in the Computer Laboratory until shortly before her death. Her publications include nine books and numerous papers. A full list of her publications is available from the Cambridge Computer Laboratory. [22]
Spärck Jones' main research interests, since the late 1950s, were natural language processing and information retrieval. In 1964, Spärck Jones published "Synonymy and Semantic Classification", which is now seen as a foundational paper in the field of natural language processing. One of her most important contributions was the concept of inverse document frequency (IDF) weighting in information retrieval, which she introduced in a 1972 paper. [11] [23] IDF is used in most search engines today, usually as part of the term frequency–inverse document frequency (TF–IDF) weighting scheme. [24] In the 1980s, Spärck Jones began her work on early speech recognition systems. In 1982 she became involved in the Alvey Programme [10] which was an initiative to motivate more computer science research across the country.
At the time Spärck Jones was working, most computer scientists were focused on making people adapt to machines — learning precise codes and commands to retrieve information. Spärck Jones was working in the opposite direction: teaching computers to understand human language as it is actually used. [10]
Her 1972 paper introduced the concept of inverse document frequency (IDF) by observing that not all words carry equal informational value. A word like "the" appears in virtually every document and tells a retrieval system almost nothing about what any specific document is about. A rare word like "photosynthesis," by contrast, is highly specific and informative. IDF assigns each word a statistical weight based on how rarely it occurs across a document collection — the rarer the word, the higher its weight. [11] When combined with term frequency (TF), which measures how often a word appears within a single document, the resulting TF–IDF score gives every word a relevance rating that can be used to rank documents in response to a search query. [25]
By 2007, Spärck Jones noted that "pretty much every web engine uses those principles." [10] Her colleague John Tait remarked that "a lot of the stuff she was working on until five or ten years ago seemed like mad nonsense, and now we take it for granted." [10] The 1972 paper remains among the most cited works in information retrieval research, with over 4,500 citations recorded in Google Scholar at the time of her death. [26]
The conceptual foundation of TF–IDF — that word meaning is statistical and contextual — has also informed later developments in machine learning and natural language processing, including transformer-based language models such as BERT. [27]
Spärck Jones spent the majority of her career at Cambridge on short-term contracts without permanent employment, a situation she attributed directly to gender. In her 2001 IEEE oral history interview she stated that Cambridge was "in many ways not user-friendly, in the sense of women-friendly." [19] She was frequently the only woman present in professional meetings throughout her career. [28]
She channelled this experience into active advocacy. She was a founding member of the women@cl network at Cambridge's Computer Laboratory, worked on outreach programmes aimed at encouraging girls into computing, and became widely known for her slogan: "Computing is too important to be left to men." [29] She was the first woman ever to receive the BCS Lovelace Medal. [30]
These include:
Spärck Jones died on 4 April 2007, due to cancer at the age of 71. [10]
In 2008, the BCS Information Retrieval Specialist Group (BCS IRSG) in conjunction with the British Computer Society established an annual Karen Spärck Jones Award in her honour, to encourage and promote research that advances understanding of Natural Language Processing or Information Retrieval. [6] The Karen Spärck Jones lecture sponsored by BCS recognises the contribution that women have made to computing. [36]
In August 2017, the University of Huddersfield renamed one of its campus buildings in her honour. Formerly known as Canalside West, the Spärck Jones building houses the University's School of Computing and Engineering. [37]
When Spärck Jones died in 2007, The Times did not publish an obituary for her, despite having published one for her husband Roger Needham in 2003. [10] In 2019, The New York Times included her in its Overlooked series under the title "Overlooked No More: Karen Sparck Jones, Who Established the Basis for Search Engines." [10]
In 2024, the University of Cambridge and the UK Government's Department for Science, Innovation and Technology jointly launched the Spärck AI Scholarships to support the next generation of AI researchers, named in her honour. [38]