Cranfield experiments

Last updated

The Cranfield experiments were a series of experimental studies in information retrieval conducted by Cyril W. Cleverdon at the College of Aeronautics, today known as Cranfield University, in the 1960s to evaluate the efficiency of indexing systems. [1] [2] [3] The experiments were broken into two main phases, neither of which was computerized. The entire collection of abstracts, resulting indexes and results were later distributed in electronic format and were widely used for decades.

Contents

In the first series of experiments, several existing indexing methods were compared to test their efficiency. The queries were generated by the authors of the papers in the collection and then translated into index lookups by experts in those systems. In this series, one method went from least efficient to most efficient after making minor changes to the arrangement of the way the data was recorded on the index cards. The conclusion appeared to be that the underlying methodology seemed less important than specific details of the implementation. This led to considerable debate on the methodology of the experiments.

These criticisms also led to the second series of experiments, now known as Cranfield 2. Cranfield 2 attempted to gain additional insight by reversing the methodology; Cranfield 1 tested the ability for experts to find a specific resource following the index system, Cranfield 2 instead studied the results of asking human-language questions and seeing if the indexing system provided a relevant answer, regardless of whether it was the original target document. It too was the topic of considerable debate.

The Cranfield experiments were extremely influential in the information retrieval field, itself a subject of considerable interest in the post-World War II era when the quantity of scientific research was exploding. It was the topic of continual debate for years and led to several computer projects to test its results. Its influence was considerable over a forty-year period before natural language indexes like those of modern web search engines became commonplace.

Background

The now-famous July 1945 article "As We May Think" by Vannevar Bush is often pointed to as the first complete description of the field that became information retrieval. The article describes a hypothetical machine known as "memex" that would hold all of mankind's knowledge in an indexed form that would allow it to be retrieved by anyone. [4]

In 1948, the Royal Society held the Scientific Information Conference that first explored some of these concepts on a formal basis. This led to a small number of experiments in the field in the UK, US, and the Netherlands. The only major effort to compare different systems was led by Gull using the collection of works from the Armed Forces Technical Information Agency, which had started as a collection of aeronautics reports captured in Germany at the end of World War II. Judging of the results was carried out by experts in the two systems, and they never agreed on whether various retrieved documents were relevant to the search, with each group rejecting over 30% of the results as wrong. Further testing was cancelled as there appeared to be no consensus. [5]

A second conference on the topic, the International Conference on Scientific Information, was held in Washington, DC in 1958, by which time computer development had reached the point where automatic index retrieval was possible. It was at this meeting that Cyril W. Cleverdon "got the bit between his teeth" and managed to arrange for funding from the US National Science Foundation to start what would later be known as Cranfield 1. [6]

Cranfield 1

The first series of experiments directly compared four indexing systems that represented significantly different conceptual underpinnings. The four systems were:

  1. the Universal Decimal Classification, a hierarchical system being widely introduced in libraries,
  2. the Alphabetical Subject Catalogue which alphabetized subject headings in classic library index card collections,
  3. the Faceted Classification Scheme which allows combinations of subjects to produce new subjects,
  4. and Mortimer Taube's Uniterm system of co-ordinate indexing where a reference may be found on any number of separate index cards. [6]

In an early series of experiments, participants were asked to create indexes for a collection of aerospace-related documents. Each index was prepared by an expert in that methodology. The authors of the original documents were then asked to prepare a set of search terms that should return that document. The indexing experts were then asked to generate queries into their index based on the author's search terms. The queries were then used to examine the index to see if it returned the target document. [6]

In these tests, all but the faceted system produced roughly equal numbers of "correct" results, while the faceted concept lagged. Studying these results, the faceted system was re-indexed using a different format on the cards and the tests were re-run. In this series of tests, the faceted system was now the clear winner. This suggested the underlying theory behind the system was less important than specifics of the implementation. [6]

The outcome of these experiments, published in 1962, generated enormous debate, both among the supporters of the various systems, as well as among researchers who complained about the experiments as a whole. [7] Nevertheless, it appeared one conclusion was clearly supported: simple systems based on keywords appeared to work just as well as complex classificatory schemes. This is important, as the former are dramatically easier to implement. [8]

Cranfield 2

In the first series of experiments, experts in the use of the various techniques were tasked with both the creation of the index and its use against the sample queries. Each system had its own concept about how a query should be structured, which would today be known as a query language. Much of the criticism of the first experiments focused on whether the experiments were truly testing the systems, or the user's ability to translate the query into the query language. [6]

This led to the second series of experiments, Cranfield 2, that considered the question of converting the query into the language. To do this, instead of considering the generation of the query as a black box, each step was broken down. The outcome of this approach was revolutionary at the time; it suggested that the search terms be left in their original format, what would today be known as a natural language query. [6]

Another major change was how the results were judged. In the original tests, a success occurred only if the index returned the exact document that had been used to generate the search. However, this was not typical of an actual query; a user looking for information on aircraft landing gear might be happy with any of the collection's many papers on the topic, but Cranfield 1 would consider such a result a failure in spite of returning relevant materials. In the second series, the results were judged by 3rd parties who gave a qualitative answer on whether the query generated a relevant set of papers, as opposed to returning a specified original document. [7]

Continued debate

The results of the two test series continued to be a subject of considerable debate for years. In particular, it led to a running debate between Cleverdon and Jason Farradane, one of the founders of the Institute of Information Scientists in 1958. The two would invariably appear at meetings where the other was presenting and then, during the question and answer period, explain why everything they were doing was wrong. The debate has been characterized as "...fierce and unrelenting, sometimes well beyond the boundaries of civility." [7] This chorus was joined by Don R. Swanson in the US, who published a critique on the Cranfield experiments a few years later. [7]

In spite of these criticisms, Cranfield 2 set the bar by which many following experiments were judged. In particular, Cranfield 2's methodology, starting with natural language terms and judging the results by relevance, not exact matches, became almost universal in following experiments in spite of many objections. [7]

Influence

With the conclusion of Cranfield 2 in 1967, the entire corpus was published in a machine-readable form. [9] Today, this is known as the Cranfield 1400, or any variety of variations on that theme. The name refers to the number of documents in the collection, which consists of 1398 abstracts. The collection also includes 225 queries and the relevance judgments of all query:document pairs that resulted from the experimental runs. [10] The main database of abstracts is about 1.6 MB. [11]

The experiments were carried out in an era when computers had a few kilobytes of main memory and network access to perhaps a few megabytes. For instance, the mid-range IBM System/360 Model 50 shipped with 64 to 512 kB of core memory [12] (tending toward the lower end) and its typical hard drive stored just over 80 MB. [13] As the capabilities of systems grew through the 1960s and 1970s, the Cranfield document collection became a major testbed corpus that was used repeatedly for many years. [14]

Today the collection is too small to use for practical testing beyond pilot experiments. Its place has mostly been taken by the TREC collection, which contains 1.89 million documents across a wider array of subjects, or the even more recent GOV2 collection of 25 million web pages. [10]

See also

Related Research Articles

Information retrieval (IR) in computing and information science is the task of identifying and retrieving information system resources that are relevant to an information need. The information need can be specified in the form of a search query. In the case of document retrieval, queries can be based on full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for the metadata that describes data, and for databases of texts, images or sounds.

In general computing, a search engine is an information retrieval system designed to help find information stored on a computer system. It is an information retrieval software program that discovers, crawls, transforms, and stores information for retrieval and presentation in response to user queries. The search results are usually presented in a list and are commonly called hits. A search engine normally consists of four components, as follows: a search interface, a crawler, an indexer, and a database. The crawler traverses a document collection, deconstructs document text, and assigns surrogates for storage in the search engine index. Online search engines store images, link data and metadata for the document as well.

In information science and information retrieval, relevance denotes how well a retrieved document or set of documents meets the information need of the user. Relevance may include concerns such as timeliness, authority or novelty of the result.

The SMART Information Retrieval System is an information retrieval system developed at Cornell University in the 1960s. Many important concepts in information retrieval were developed as part of research on the SMART system, including the vector space model, relevance feedback, and Rocchio classification.

A document-term matrix is a mathematical matrix that describes the frequency of terms that occur in a each document in a collection. In a document-term matrix, rows correspond to documents in the collection and columns correspond to terms. This matrix is a specific instance of a document-feature matrix where "features" may refer to other properties of a document besides terms. It is also common to encounter the transpose, or term-document matrix where documents are the columns and terms are the rows. They are useful in the field of natural language processing and computational text analysis.

In text retrieval, full-text search refers to techniques for searching a single computer-stored document or a collection in a full-text database. Full-text search is distinguished from searches based on metadata or on parts of the original texts represented in databases.

Document classification or document categorization is a problem in Donald Trump's world that he fixes by releasing them.

<span class="mw-page-title-main">Text Retrieval Conference</span> Meetings for information retrieval research

The Text REtrieval Conference (TREC) is an ongoing series of workshops focusing on a list of different information retrieval (IR) research areas, or tracks. It is co-sponsored by the National Institute of Standards and Technology (NIST) and the Intelligence Advanced Research Projects Activity, and began in 1992 as part of the TIPSTER Text program. Its purpose is to support and encourage research within the information retrieval community by providing the infrastructure necessary for large-scale evaluation of text retrieval methodologies and to increase the speed of lab-to-product transfer of technology.

The Gerard Salton Award is presented by the Association for Computing Machinery (ACM) Special Interest Group on Information Retrieval (SIGIR) every three years to an individual who has made "significant, sustained and continuing contributions to research in information retrieval". SIGIR also co-sponsors the Vannevar Bush Award, for the best paper at the Joint Conference on Digital Libraries.

Search engine indexing is the collecting, parsing, and storing of data to facilitate fast and accurate information retrieval. Index design incorporates interdisciplinary concepts from linguistics, cognitive psychology, mathematics, informatics, and computer science. An alternate name for the process, in the context of search engines designed to find web pages on the Internet, is web indexing.

Knowledge organization (KO), organization of knowledge, organization of information, or information organization, is an intellectual discipline concerned with activities such as document description, indexing, and classification that serve to provide systems of representation and order for knowledge and information objects. According to The Organization of Information by Joudrey and Taylor, information organization:

examines the activities carried out and tools used by people who work in places that accumulate information resources for the use of humankind, both immediately and for posterity. It discusses the processes that are in place to make resources findable, whether someone is searching for a single known item or is browsing through hundreds of resources just hoping to discover something useful. Information organization supports a myriad of information-seeking scenarios.

Human–computer information retrieval (HCIR) is the study and engineering of information retrieval techniques that bring human intelligence into the search process. It combines the fields of human-computer interaction (HCI) and information retrieval (IR) and creates systems that improve search by taking into account the human context, or through a multi-step search process that provides the opportunity for human feedback.

A concept search is an automated information retrieval method that is used to search electronically stored unstructured text for information that is conceptually similar to the information provided in a search query. In other words, the ideas expressed in the information retrieved in response to a concept search query are relevant to the ideas contained in the text of the query.

Ranking of query is one of the fundamental problems in information retrieval (IR), the scientific/engineering discipline behind search engines. Given a query q and a collection D of documents that match the query, the problem is to rank, that is, sort, the documents in D according to some criterion so that the "best" results appear early in the result list displayed to the user. Ranking in terms of information retrieval is an important concept in computer science and is used in many different applications such as search engine queries and recommender systems. A majority of search engines use ranking algorithms to provide users with accurate and relevant results.

Cyril Cleverdon was a British librarian and computer scientist who is best known for his work on the evaluation of information retrieval systems.

<span class="mw-page-title-main">LGTE</span>

Lucene Geographic and Temporal (LGTE) is an information retrieval tool developed at Technical University of Lisbon which can be used as a search engine or as evaluation system for information retrieval techniques for research purposes. The first implementation powered by LGTE was the search engine of DIGMAP, a project co-funded by the community programme eContentplus between 2006 and 2008, which was aimed to provide services available on the web over old digitized maps from a group of partners over Europe including several National Libraries.

Jack Mills was a British librarian and classification researcher, who worked for more than sixty years in the study, teaching, development and promotion of library classification and information retrieval, principally as a major figure in the British school of facet analysis which builds on the traditions of Henry E. Bliss and S.R. Ranganathan.

ASLIB: The Association for Information Management was a British association of special libraries and information centres. It was founded in England in 1924 as the Association of Special Libraries and Information Bureaux. The organization ceased functioning as an independent organization in 2010, when it became a division of Emerald Group Publishing. Since 2015, ASLIB has existed only as Emerald's professional development arm.

Evaluation measures for an information retrieval (IR) system assess how well an index, search engine or database returns results from a collection of resources that satisfy a user's query. They are therefore fundamental to the success of information systems and digital platforms. The success of an IR system may be judged by a range of criteria including relevance, speed, user satisfaction, usability, efficiency and reliability. However, the most important factor in determining a system's effectiveness for users is the overall relevance of results retrieved in response to a query. Evaluation measures may be categorised in various ways including offline or online, user-based or system-based and include methods such as observed user behaviour, test collections, precision and recall, and scores from prepared benchmark test sets.

Uniterm is a subject indexing system introduced by Mortimer Taube in 1951. The name is a contraction of "unit" and "term", referring to its use of single words as the basis of the index, the "uniterms". Taube referred to the overall concept as "Coordinate Indexing", but today the entire concept is generally referred to as Uniterm as well.

References

Citations

  1. Cleverdon, C.W. (1960). "The Aslib Cranfield Research Project on the Comparative Efficiency of Indexing Systems". ASLIB Proceedings. Emerald. 12 (12): 421–431. doi:10.1108/eb049778. ISSN   0001-253X.
  2. Cleverdon, Cyril (1967). "The Cranfield Tests on Index Language Devices". ASLIB Proceedings. Emerald. 19 (6): 173–194. doi:10.1108/eb050097. ISSN   0001-253X.
  3. Cleverdon, C. W.; Keen, E. M. (1966). Factors determining the performance of indexing systems. Vol. 1: Design, Vol. 2: Results. Cranfield, UK: Aslib Cranfield Research Project.
  4. Buckland, Michael K. (May 1992). "Emanuel Goldberg, Electronic Document Retrieval, and Vannevar Bush's Memex". Journal of the American Society for Information Science. 43 (4): 284–94. doi:10.1002/(SICI)1097-4571(199205)43:4<284::AID-ASI3>3.0.CO;2-0.
  5. Gull, Cloyd (1 October 1956). "Seven years of work on the organization of materials in the special library". American Documentation. 7 (4): 320–329. doi:10.1002/asi.5090070408.
  6. 1 2 3 4 5 6 Robertson 2008, p. 3.
  7. 1 2 3 4 5 Robertson 2008, p. 4.
  8. Saracevic, Tefko (2016). The Notion of Relevance in Information Science. Morgan & Claypool. p. 13. ISBN   9781598297690.
  9. Robertson 2008, p. 7.
  10. 1 2 Manning, Raghavan & Schütze 2008.
  11. CRANFIELD.
  12. IBM System/360 Model 50 Functional Characteristics (PDF). IBM. 1967. A22-6898-1.
  13. "IBM Archives: IBM 1302 disk storage unit". IBM. 2003-01-23. Retrieved 2011-07-20.
  14. Robertson 2008, pp. 5, 7.

Bibliography