Karen Spärck Jones

Last updated

Karen Spärck Jones
FBA
Karen Sparck.jpg
Spärck Jones in 2002
Born
Karen Ida Boalth Spärck Jones

(1935-08-26)26 August 1935
Huddersfield, Yorkshire, England
Died4 April 2007(2007-04-04) (aged 71) [1]
Alma mater University of Cambridge
Known for Term frequency–inverse document frequency
Spouse
(m. 1958;died 2003)
Awards
Scientific career
Fields
Institutions University of Cambridge
Thesis Synonymy and Semantic Classification  (1964)
Doctoral advisor Richard Braithwaite [1]
Website cl.cam.ac.uk/archive/ksj21/

Karen Ida Boalth Spärck Jones FBA (26 August 1935 – 4 April 2007) was a self-taught programmer and a pioneering British computer and information scientist responsible for the concept of inverse document frequency (IDF), a technology that underlies most modern search engines. [2] [3] [4] [5] [6] [7] She was an advocate for women in computer science, her slogan being, "Computing is too important to be left to men. [8] " In 2019, The New York Times published her belated obituary in its series Overlooked, [9] [10] calling her "a pioneer of computer science for work combining statistics and linguistics, and an advocate for women in the field." [10] From 2008, to recognise her achievements in the fields of information retrieval [11] [12] (IR) and natural language processing (NLP), the Karen Spärck Jones Award is awarded annually to a recipient for outstanding research in one or both of her fields. [13] [14] [15] [16]

Contents

Early life and education

Karen Ida Boalth Spärck Jones was born in Huddersfield, Yorkshire, England. Her parents were Alfred Owen Jones, a chemistry lecturer, and Ida Spärck, a Norwegian who worked for the Norwegian government while in exile in London during World War II. [17]

Spärck Jones was educated at a grammar school in Huddersfield and then from 1953 to 1956 at Girton College, Cambridge, studying history, with an additional final year in Moral Sciences (philosophy). While at Cambridge, Spärck Jones joined the organisation known as the Cambridge Language Research Unit (CLRU) and met the head of CLRU Margaret Masterman, who would inspire her to go into computer science. [10] While working at the CLRU, Spärck Jones began pursuing her PhD At the time of submission, her PhD thesis was cast aside as uninspired and lacking original thought, but was later published in its entirety as a book. [18] She briefly became a school teacher [19] before moving into computer science. [20] Spärck Jones married fellow Cambridge computer scientist Roger Needham in 1958. [21] [10]

Spärck Jones's mother, Ida Spärck, had fled Norway on one of the last boats out after the German invasion in April 1940, going on to serve the Norwegian government in exile in London throughout the war. [21] This background of displacement and resilience shaped the household in which Spärck Jones grew up. She later kept her mother's Norwegian surname professionally after marrying, stating that "it maintains a permanent existence of your own." [10]

Spärck Jones described her entry into computing as almost accidental. She had been working as a schoolteacher when she began visiting the CLRU out of curiosity about her husband's work. It was Margaret Masterman — whom she later described as "a very strange and interesting woman" — who offered her a research position and drew her fully into the field. [10]

Career

Spärck Jones worked at the Cambridge Language Research Unit from the late 1950s, [21] then at Cambridge University Computer Laboratory from 1974 until her retirement in 2002.

From 1999, she held the post of Professor of Computers and Information. [1] She had been given a permanent position only in 1993, and earlier in her career had been employed on a series of short-term contracts. [10] She continued to work in the Computer Laboratory until shortly before her death. Her publications include nine books and numerous papers. A full list of her publications is available from the Cambridge Computer Laboratory. [22]

Spärck Jones' main research interests, since the late 1950s, were natural language processing and information retrieval. In 1964, Spärck Jones published "Synonymy and Semantic Classification", which is now seen as a foundational paper in the field of natural language processing. One of her most important contributions was the concept of inverse document frequency (IDF) weighting in information retrieval, which she introduced in a 1972 paper. [11] [23] IDF is used in most search engines today, usually as part of the term frequency–inverse document frequency (TF–IDF) weighting scheme. [24] In the 1980s, Spärck Jones began her work on early speech recognition systems. In 1982 she became involved in the Alvey Programme [10] which was an initiative to motivate more computer science research across the country.

Significance of inverse document frequency

At the time Spärck Jones was working, most computer scientists were focused on making people adapt to machines — learning precise codes and commands to retrieve information. Spärck Jones was working in the opposite direction: teaching computers to understand human language as it is actually used. [10]

Her 1972 paper introduced the concept of inverse document frequency (IDF) by observing that not all words carry equal informational value. A word like "the" appears in virtually every document and tells a retrieval system almost nothing about what any specific document is about. A rare word like "photosynthesis," by contrast, is highly specific and informative. IDF assigns each word a statistical weight based on how rarely it occurs across a document collection — the rarer the word, the higher its weight. [11] When combined with term frequency (TF), which measures how often a word appears within a single document, the resulting TF–IDF score gives every word a relevance rating that can be used to rank documents in response to a search query. [25]

By 2007, Spärck Jones noted that "pretty much every web engine uses those principles." [10] Her colleague John Tait remarked that "a lot of the stuff she was working on until five or ten years ago seemed like mad nonsense, and now we take it for granted." [10] The 1972 paper remains among the most cited works in information retrieval research, with over 4,500 citations recorded in Google Scholar at the time of her death. [26]

The conceptual foundation of TF–IDF — that word meaning is statistical and contextual — has also informed later developments in machine learning and natural language processing, including transformer-based language models such as BERT. [27]

Gender and advocacy

Spärck Jones spent the majority of her career at Cambridge on short-term contracts without permanent employment, a situation she attributed directly to gender. In her 2001 IEEE oral history interview she stated that Cambridge was "in many ways not user-friendly, in the sense of women-friendly." [19] She was frequently the only woman present in professional meetings throughout her career. [28]

She channelled this experience into active advocacy. She was a founding member of the women@cl network at Cambridge's Computer Laboratory, worked on outreach programmes aimed at encouraging girls into computing, and became widely known for her slogan: "Computing is too important to be left to men." [29] She was the first woman ever to receive the BCS Lovelace Medal. [30]

Honours and awards

These include:

Death and legacy

Spärck Jones died on 4 April 2007, due to cancer at the age of 71. [10]

In 2008, the BCS Information Retrieval Specialist Group (BCS IRSG) in conjunction with the British Computer Society established an annual Karen Spärck Jones Award in her honour, to encourage and promote research that advances understanding of Natural Language Processing or Information Retrieval. [6] The Karen Spärck Jones lecture sponsored by BCS recognises the contribution that women have made to computing. [36]

In August 2017, the University of Huddersfield renamed one of its campus buildings in her honour. Formerly known as Canalside West, the Spärck Jones building houses the University's School of Computing and Engineering. [37]

When Spärck Jones died in 2007, The Times did not publish an obituary for her, despite having published one for her husband Roger Needham in 2003. [10] In 2019, The New York Times included her in its Overlooked series under the title "Overlooked No More: Karen Sparck Jones, Who Established the Basis for Search Engines." [10]

In 2024, the University of Cambridge and the UK Government's Department for Science, Innovation and Technology jointly launched the Spärck AI Scholarships to support the next generation of AI researchers, named in her honour. [38]

References

  1. 1 2 3 "Jones, Karen Ida Boalth Spärck (1935–2007), Computer Scientist". Oxford Dictionary of National Biography (online ed.). Oxford University Press. doi:10.1093/ref:odnb/98729.(Subscription, Wikipedia Library access or UK public library membership required.)
  2. Video: Natural Language and the Information Layer, Karen Spärck Jones, March 2007
  3. University of Cambridge obituary
  4. Obituary, The Independent , 12 April 2007
  5. Robertson, S.; Tait, J. (2008). "Karen Spärck Jones". Journal of the American Society for Information Science and Technology. 59 (5): 852. doi:10.1002/asi.20784.
  6. 1 2 3 "Karen Spärck Jones Award | BCS". www.bcs.org. Retrieved 21 January 2023.
  7. Sparck Jones, Karen (31 January 2005). "2002 ASIST Award of Merit: Karen Sparck Jones". Bulletin of the American Society for Information Science and Technology. 29 (3): 12–14. doi:10.1002/bult.274 . Retrieved 12 March 2025.
  8. "Computing's too important to be left to men | BCS".
  9. Padnani, Amisha; Bennett, Jessica (8 March 2018). "Remarkable People We Overlooked in Our Obituaries". The New York Times. ISSN   0362-4331 . Retrieved 7 December 2019.
  10. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 "Overlooked No More: Karen Sparck Jones, Who Established the Basis for Search Engines". The New York Times. 2 January 2019. ISSN   0362-4331 . Retrieved 7 December 2019.
  11. 1 2 3 Spärck Jones, K. (1972). "A Statistical Interpretation of Term Specificity and Its Application in Retrieval". Journal of Documentation . 28: 11–21. CiteSeerX   10.1.1.115.8343 . doi:10.1108/eb026526. S2CID   2996187.
  12. Tait, John I., ed. (2005). Charting a New Course: Natural Language Processing and Information Retrieval, Essays in Honour of Karen Spärck Jones. The Kluwer International Series on Information Retrieval. Vol. 16. doi:10.1007/1-4020-3467-9. ISBN   978-1-4020-3343-8.
  13. Obituary [ dead link ], The Times , 22 June 2007 (subscription required)
  14. Computer Science, A Woman's Work, IEEE Spectrum, May 2007
  15. Thompson, Bill. "Karen Spärck Jones". A Stick a Dog and a Box With Something in It. Retrieved 1 August 2019. (originally published in The Times )
  16. 1 2 3 Tait, J. I. (2007). "Karen Spärck Jones". Computational Linguistics. 33 (3): 289–291. doi: 10.1162/coli.2007.33.3.289 . S2CID   19790552.
  17. Pulman, S. G. (2011). "Karen Ida Boalth Spärck Jones 1935–2007" (PDF). Proceedings of the British Academy . 166 (IX).
  18. Robertson, S., & Tait, J. (2008). Karen Spärck Jones. Journal of the American Society for Information Science & Technology, 59(5), 852–854.
  19. 1 2 Spärck Jones, Karen (10 April 2001). "Karen Spärck Jones, an oral history conducted in 2001". IEEE History Center (Interview). Interviewed by Janet Abbate. Piscataway, NJ.
  20. Karen Spärck Jones (1986). Synonymy and Semantic Classification (thesis published as a book). Edinburgh Information Technology series. Vol. 1. Edinburgh University Press. ISBN   9780852245170.
  21. 1 2 3 Anon (2007). "Karen Spärck Jones, FBA Professor Emerita of Computers and Information Honorary Fellow of Wolfson College 26 August 1935 – 4 April 2007". University of Cambridge.
  22. "Karen Sparck Jones Publications".
  23. Spärck Jones, K. (1973). "Index term weighting". Information Storage and Retrieval. 9 (11): 619–633. doi:10.1016/0020-0271(73)90043-0.
  24. Maybury, M. T. (2005). "Karen Spärck Jones and Summarization". Charting a New Course: Natural Language Processing and Information Retrieval. The Kluwer International Series on Information Retrieval. Vol. 16. pp. 99–10. doi:10.1007/1-4020-3467-9_7. ISBN   978-1-4020-3343-8.
  25. "Women Who Changed Tech – Karen Spärck Jones". Extreme Networks. 25 May 2023. Retrieved 7 April 2026.
  26. Willett, P.; Robertson, S. (2007). "In memoriam: Karen Spärck Jones". Journal of Documentation. 63 (5). doi:10.1108/jd.2007.27863eaa.001.
  27. "Karen Spärck Jones: The Search Engineer Enabler". History of Data Science. 3 December 2021. Retrieved 7 April 2026.
  28. "Karen Spärck Jones: A pioneering Girtonian and interdisciplinary role model". Girton College, Cambridge. 9 June 2025. Retrieved 7 April 2026.
  29. "Computing's too important to be left to men". British Computer Society. Retrieved 7 April 2026.
  30. "Karen Spärck Jones: A pioneering Girtonian and interdisciplinary role model". Girton College, Cambridge. 9 June 2025. Retrieved 7 April 2026.
  31. "Gerard Salton Awards". Special Interest Group on Information Retrieval . Retrieved 2 April 2018.
  32. Anon (2022). "Elected AAAI Fellows". aaai.org.
  33. 1 2 3 "Karen Spärck Jones". The Computer Laboratory, Cambridge University. March 2007. Retrieved 2 April 2018.
  34. 1 2 3 "Karen Spärck Jones". The Daily Telegraph . 12 April 2007.
  35. "ACL Lifetime Achievement Award Recipients". ACL wiki. ACL . Retrieved 16 August 2014.
  36. "Karen Spärck Jones lecture". BCS Academy of Computing. British Computer Society. Retrieved 3 October 2013.
  37. "How to find us – University of Huddersfield". hud.ac.uk. Retrieved 20 September 2017.
  38. "Karen Spärck Jones: A pioneering Girtonian and interdisciplinary role model". Girton College, Cambridge. 9 June 2025. Retrieved 7 April 2026.