Martin Porter

Last updated

Martin F. Porter is the inventor of the Porter Stemmer, [1] one of the most common algorithms for stemming English, [2] [3] and the Snowball programming framework. His 1980 paper "An algorithm for suffix stripping", proposing the stemming algorithm, has been cited over 8000 times (Google Scholar). [4]

The Muscat search engine comes from research performed by Porter at the University of Cambridge and was commercialized in 1984 by Cambridge CD Publishing; it was subsequently sold to MAID which became the Dialog Corporation. [5] Part of Dialog was then spun off to become BrightStation in 2000, [6] [7] which transitioned Open Muscat to a closed-source development model in 2001. [8] Subsequently, a group of developers led by Porter [9] initiated a project based on Open Muscat called Xapian and released the first official version on September 30, 2002. [10]

In 2000 he was awarded the Tony Kent Strix award. [11]

Porter read mathematics at St John's College, Cambridge (1963–66) and went to get a Diploma in Computer Science (1967) and a PhD. at Cambridge Computer Laboratory. He worked at the University of Leeds for a year before returning to Cambridge's Literary and Linguistic Computing Centre (1971-1974) and at the Sedgwick Museum as a programmer (1974-1976). In 1977, he became the Director of the Museum Documentation Advisory Unit (MDA). [12]

Martin Porter is co-founder with John Snyder of the contextual targeting and content recommendation company, Grapeshot. [13] John Snyder is listed as CEO and Martin Porter is listed as Chief Scientist. Grapeshot took £250,000 in UK government subsidies and subsequently raised £16m from UK investors. [14] On May 15, 2018, Oracle Corporation completed the acquisition of Grapeshot.

Related Research Articles

<span class="mw-page-title-main">Data structure</span> Particular way of storing and organizing data in a computer

In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, and the functions or operations that can be applied to the data, i.e., it is an algebraic structure about data.

Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for the metadata that describes data, and for databases of texts, images or sounds.

<span class="mw-page-title-main">Information science</span> Academic field concerned with collection and analysis of information

Information science is an academic field which is primarily concerned with analysis, collection, classification, manipulation, storage, retrieval, movement, dissemination, and protection of information. Practitioners within and outside the field study the application and the usage of knowledge in organizations in addition to the interaction between people, organizations, and any existing information systems with the aim of creating, replacing, improving, or understanding the information systems.

<span class="mw-page-title-main">Westlaw</span> Online legal research service

Westlaw is an online legal research service and proprietary database for lawyers and legal professionals available in over 60 countries. Information resources on Westlaw include more than 40,000 databases of case law, state and federal statutes, administrative codes, newspaper and magazine articles, public records, law journals, law reviews, treatises, legal forms and other information resources.

The UKeiG Strix award is an annual award for outstanding contributions to the field of information retrieval and is presented in memory of Dr Tony Kent, a past Fellow of the Institute of Information Scientists (IIS), who died in 1997. Tony Kent made a major contribution to the development of information science and information services both in the UK and internationally, particularly in the field of chemistry. The name 'Strix' was chosen to reflect Tony's interest in ornithology, and as the name of the last and most successful information retrieval packages that he created.

Lemmatization in linguistics is the process of grouping together the inflected forms of a word so they can be analysed as a single item, identified by the word's lemma, or dictionary form.

Stop words are the words in a stop list which are filtered out before or after processing of natural language data (text) because they are insignificant. There is no single universal list of stop words used by all natural language processing tools, nor any agreed upon rules for identifying stop words, and indeed not all tools even use such a list. Therefore, any group of words can be chosen as the stop words for a given purpose. The "general trend in [information retrieval] systems over time has been from standard use of quite large stop lists to very small stop lists to no stop list whatsoever".

In text retrieval, full-text search refers to techniques for searching a single computer-stored document or a collection in a full-text database. Full-text search is distinguished from searches based on metadata or on parts of the original texts represented in databases.

Microsoft Research (MSR) is the research subsidiary of Microsoft. It was created in 1991 by Richard Rashid, Bill Gates and Nathan Myhrvold with the intent to advance state-of-the-art computing and solve difficult world problems through technological innovation in collaboration with academic, government, and industry researchers. The Microsoft Research team has more than 1,000 computer scientists, physicists, engineers, and mathematicians, including Turing Award winners, Fields Medal winners, MacArthur Fellows, and Dijkstra Prize winners.

<span class="mw-page-title-main">Susan Dumais</span> American computer scientist

Susan Dumais is an American computer scientist who is a leader in the field of information retrieval, and has been a significant contributor to Microsoft's search technologies. According to Mary Jane Irwin, who heads the Athena Lecture awards committee, “Her sustained contributions have shaped the thinking and direction of human-computer interaction and information retrieval."

Xapian is a free and open-source probabilistic information retrieval library, released under the GNU General Public License (GPL). It is a full-text search engine library for programmers.

Search engine indexing is the collecting, parsing, and storing of data to facilitate fast and accurate information retrieval. Index design incorporates interdisciplinary concepts from linguistics, cognitive psychology, mathematics, informatics, and computer science. An alternate name for the process, in the context of search engines designed to find web pages on the Internet, is web indexing.

Stephen Robertson is a British computer scientist. He is known for his work on probabilistic information retrieval together with Karen Spärck Jones and the Okapi BM25 weighting model.

W. Bruce Croft is a distinguished professor of computer science at the University of Massachusetts Amherst whose work focuses on information retrieval. He is the founder of the Center for Intelligent Information Retrieval and served as the editor-in-chief of ACM Transactions on Information Systems from 1995 to 2002. He was also a member of the National Research Council Computer Science and Telecommunications Board from 2000 to 2003. Since 2015, he is the Dean of the College of Information and Computer Sciences at the University of Massachusetts Amherst. He was Chair of the UMass Amherst Computer Science Department from 2001 to 2007.

<span class="mw-page-title-main">Learning to rank</span> Use of machine learning to rank items

Learning to rank or machine-learned ranking (MLR) is the application of machine learning, typically supervised, semi-supervised or reinforcement learning, in the construction of ranking models for information retrieval systems. Training data consists of lists of items with some partial order specified between items in each list. This order is typically induced by giving a numerical or ordinal score or a binary judgment for each item. The goal of constructing the ranking model is to rank new, unseen lists in a similar way to rankings in the training data.

<span class="mw-page-title-main">Image editing</span> Processes of altering images, digital or traditional photos, adding, pasting, cutting words

Image editing encompasses the processes of altering images, whether they are digital photographs, traditional photo-chemical photographs, or illustrations. Traditional analog image editing is known as photo retouching, using tools such as an airbrush to modify photographs or editing illustrations with any traditional art medium. Graphic software programs, which can be broadly grouped into vector graphics editors, raster graphics editors, and 3D modelers, are the primary tools with which a user may manipulate, enhance, and transform images. Many image editing programs are also used to render or create computer art from scratch. The term "image editing" usually refers only to the editing of 2D images, not 3D ones.

<span class="mw-page-title-main">Stemming</span> Process of reducing words to word stems

In linguistic morphology and information retrieval, stemming is the process of reducing inflected words to their word stem, base or root form—generally a written word form. The stem need not be identical to the morphological root of the word; it is usually sufficient that related words map to the same stem, even if this stem is not in itself a valid root. Algorithms for stemming have been studied in computer science since the 1960s. Many search engines treat words with the same stem as synonyms as a kind of query expansion, a process called conflation.

Julie Beth Lovins was a computational linguist who published The Lovins Stemming Algorithm - a type of stemming algorithm for word matching - in 1968.

The Lovins Stemmer is a single pass, context sensitive stemmer, which removes endings based on the longest-match principle. The stemmer was the first to be published and was extremely well developed considering the date of its release, having been the main influence on a large amount of the future work in the area. -Adam G., et al

Roger K. Summit is the founder of Dialog Information Services, and has been called the father of modern online search. He worked for Lockheed in the 1960s, was put in charge of its information retrieval lab, and from his work created a system that became known as Dialog and spun off by Lockheed in the 1970s. Dialog is one of the leading professional online services, used by companies, law firms, governments etc. as a key online research tool. Many feel that Dialog led the way to the Web's search engines and search today.

<span class="mw-page-title-main">Maristella Agosti</span> Italian University professor

Maristella Agosti, is an Italian researcher and professor. Her research covers retrieval, user engagement, databases, digital cultural heritage, and data engineering. She has published more than 200 papers covering these areas. She also is the Professor in Computer Science at the University of Padua. She was granted the title of Professor Emeritus by Decree of the Italian Ministry of Education, University and Research. She is also a recipient of the Tony Kent Strix Award.

References

  1. Porter Stemming Algorithm
  2. Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze (2008). Introduction to Information Retrieval. Cambridge University Press.
  3. Daniel Jurafsky and James H. Martin (2009). Speech and Language Processing. Pearson, p. 102.
  4. Articles at Google Scholar, accessed 2012-02-09.
  5. Avi Rappoport, Search Tools Consulting. "Smartlogik Discover (APR) - SearchTools Report". Searchtools.com. Retrieved 2012-02-09.
  6. Rob Buckley (March 2001). "The Bayesian haze". infoconomy. Retrieved 2022-04-10.
  7. Paul Farrelly (2000-09-23). "Bright at the end of the tunnel". The Guardian. Retrieved 2022-04-10.
  8. "The Xapian Project: History" . Retrieved 2022-04-10.
  9. Porter, Martin (March 30, 2006). "Lovins Revisited". In Tait, John (ed.). Charting a New Course: Natural Language Processing and Information Retrieval.: Essays in Honour of Karen Spärck Jones. Amsterdam: Kluwer: Springer Science & Business Media. p. 61. ISBN   9781402034671.
  10. "Xapian Core NEWS" . Retrieved 2022-04-10.
  11. UKeiIG Tony Kent Strix Award Archived 2014-09-25 at the Wayback Machine (Accessed Feb 2012)
  12. Museum, Vol XXX, n° 3/4, 1978, Museums and Computers p.224
  13. Grapeshot (Accessed Oct 2012)
  14. Parliamentary Review 2018 - Grapeshot