Faceted search

Last updated

Faceted search augments lexical search with a faceted navigation system, allowing users to narrow results by applying filters based on a faceted classification of the items. [1] It is a parametric search technique. [2] A faceted classification system classifies each information element along multiple explicit dimensions, facets, enabling the classifications to be accessed and ordered in multiple ways rather than in a single, pre-determined, taxonomic order. [1]

Contents

Facets correspond to properties of the information elements. They are often derived by analysis of the text of an item using entity extraction techniques or from pre-existing fields in a database such as author, descriptor, language, and format. Thus, existing web-pages, product descriptions or online collections of articles can be augmented with navigational facets.

Faceted search interfaces were first developed in the academic world by Ben Shneiderman, Steven Pollitt, Marti Hearst, and Gary Marchionini in the 1990s and 2000s. [3] [4] [5] [6] The most well-known of these efforts was the Flamenco research project at University of California, Berkeley led by Marti Hearst. [7] Concurrently, there was development of commercial faceted search systems, notably Endeca and Spotfire.

Within the academic community, faceted search has attracted interest primarily among library and information science researchers, and to some extent among computer science researchers specializing in information retrieval. [8]

Mass market use

Faceted search has become a popular technique in commercial search applications, particularly for online retailers and libraries. An increasing number of enterprise search vendors provide software for implementing faceted search applications.

Online retail catalogs pioneered the earliest applications of faceted search, reflecting both the faceted nature of product data (most products have a type, brand, price, etc.) and the ready availability of the data in retailers' existing information-systems. In the early 2000s retailers started using faceted search, in part due to published studies that evaluated user search experience on popular sites. [9]

As of 2014, among the 50 largest US-based online retailers, 40% had implemented faceted search. [10] Examples include the filtering options that appear in the left column on amazon.com or Google Shopping after a keyword search has been performed.

Libraries and information science

In 1933, the noted librarian Ranganathan proposed a faceted classification system for library materials, known as colon classification. In the pre-computer era, he did not succeed in replacing the pre-coordinated Dewey Decimal Classification system. [11]

Modern online library catalogs, also known as online public access catalogs (OPAC), have increasingly adopted faceted search interfaces. Noted examples include the North Carolina State University library catalog (part of the Triangle Research Libraries Network) and the OCLC Open WorldCat system. The CiteSeerX project [12] at the Pennsylvania State University allows faceted search for academic documents and continues to expand into other facets such as table search.

See also

Related Research Articles

Information retrieval (IR) in computing and information science is the task of identifying and retrieving information system resources that are relevant to an information need. The information need can be specified in the form of a search query. In the case of document retrieval, queries can be based on full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for the metadata that describes data, and for databases of texts, images or sounds.

<span class="mw-page-title-main">Library classification</span> Systems of coding and organizing documents or library materials

A library classification is a system of organization of knowledge in which sources are arranged according to the classification scheme and ordered very systematically. Library classifications are a notational system that represents the order of topics in the classification and allows items to be stored in the order of classification. Library classification systems group related materials together, typically arranged as a hierarchical tree structure. A different kind of classification system, called a faceted classification system, is also widely used, which allows the assignment of multiple classifications to an object, enabling the classifications to be ordered in many ways.

<span class="mw-page-title-main">Library catalog</span> Register of bibliographic items

A library catalog is a register of all bibliographic items found in a library or group of libraries, such as a network of libraries at several locations. A catalog for a group of libraries is also called a union catalog. A bibliographic item can be any information entity that is considered library material, or a group of library materials, or linked from the catalog as far as it is relevant to the catalog and to the users (patrons) of the library.

<span class="mw-page-title-main">Glossary of library and information science</span>

This page is a glossary of library and information science.

Music information retrieval (MIR) is the interdisciplinary science of retrieving information from music. Those involved in MIR may have a background in academic musicology, psychoacoustics, psychology, signal processing, informatics, machine learning, optical music recognition, computational intelligence or some combination of these.

The online public access catalog (OPAC), now frequently synonymous with library catalog, is an online database of materials held by a library or group of libraries. Online catalogs have largely replaced the analog card catalogs previously used in libraries.

A faceted classification is a classification scheme used in organizing knowledge into a systematic order. A faceted classification uses semantic categories, either general or subject-specific, that are combined to create the full classification entry. Many library classification systems use a combination of a fixed, enumerative taxonomy of concepts with subordinate facets that further refine the topic.

Controlled vocabularies provide a way to organize knowledge for subsequent retrieval. They are used in subject indexing schemes, subject headings, thesauri, taxonomies and other knowledge organization systems. Controlled vocabulary schemes mandate the use of predefined, preferred terms that have been preselected by the designers of the schemes, in contrast to natural language vocabularies, which have no such restriction.

<span class="mw-page-title-main">Outline of library and information science</span> Overview of and topical guide to library science

The following outline is provided as an overview of and topical guide to library and information science:

Library technical services are the ongoing maintenance activities of a library's collection, including the three broad areas of collection development, cataloging, and processing. Technical services are the infrastructure that enable the user's experience of many library services and are typically performed "behind the scenes."

Knowledge organization (KO), organization of knowledge, organization of information, or information organization, is an intellectual discipline concerned with activities such as document description, indexing, and classification that serve to provide systems of representation and order for knowledge and information objects. According to The Organization of Information by Joudrey and Taylor, information organization:

examines the activities carried out and tools used by people who work in places that accumulate information resources for the use of humankind, both immediately and for posterity. It discusses the processes that are in place to make resources findable, whether someone is searching for a single known item or is browsing through hundreds of resources just hoping to discover something useful. Information organization supports a myriad of information-seeking scenarios.

<span class="mw-page-title-main">University of Maryland Human–Computer Interaction Lab</span> Research lab at the University of Maryland, College Park

The Human–Computer Interaction Lab (HCIL) at the University of Maryland, College Park is an academic research center specializing in the field of human-computer interaction (HCI). Founded in 1983 by Ben Shneiderman, it is one of the oldest HCI labs of its kind. The HCIL conducts research on the design, implementation, and evaluation of computer interface technologies. Additional research focuses on the development of user interfaces and design methods. Primary activities of the HCIL include collaborative research, publication and the sponsorship of open houses, workshops and annual symposiums.

Subject indexing is the act of describing or classifying a document by index terms, keywords, or other symbols in order to indicate what different documents are about, to summarize their contents or to increase findability. In other words, it is about identifying and describing the subject of documents. Indexes are constructed, separately, on three distinct levels: terms in a document such as a book; objects in a collection such as a library; and documents within a field of knowledge.

Human–computer information retrieval (HCIR) is the study and engineering of information retrieval techniques that bring human intelligence into the search process. It combines the fields of human-computer interaction (HCI) and information retrieval (IR) and creates systems that improve search by taking into account the human context, or through a multi-step search process that provides the opportunity for human feedback.

Folksonomy is a classification system in which end users apply public tags to online items, typically to make those items easier for themselves or others to find later. Over time, this can give rise to a classification system based on those tags and how often they are applied or searched for, in contrast to a taxonomic classification designed by the owners of the content and specified when it is published. This practice is also known as collaborative tagging, social classification, social indexing, and social tagging. Folksonomy was originally "the result of personal free tagging of information [...] for one's own retrieval", but online sharing and interaction expanded it into collaborative forms. Social tagging is the application of tags in an open online environment where the tags of other users are available to others. Collaborative tagging is tagging performed by a group of users. This type of folksonomy is commonly used in cooperative and collaborative projects such as research, content repositories, and social bookmarking.

<span class="mw-page-title-main">Marti Hearst</span> American computer scientist

Marti Hearst is a professor in the School of Information at the University of California, Berkeley. She did early work in corpus-based computational linguistics, including some of the first work in automating sentiment analysis, and word sense disambiguation. She invented an algorithm that became known as "Hearst patterns" which applies lexico-syntactic patterns to recognize hyponymy (ISA) relations with high accuracy in large text collections, including an early application of it to WordNet; this algorithm is widely used in commercial text mining applications including ontology learning. Hearst also developed early work in automatic segmentation of text into topical discourse boundaries, inventing a now well-known approach called TextTiling.

In the context of information retrieval, a thesaurus is a form of controlled vocabulary that seeks to dictate semantic manifestations of metadata in the indexing of content objects. A thesaurus serves to minimise semantic ambiguity by ensuring uniformity and consistency in the storage and retrieval of the manifestations of content objects. ANSI/NISO Z39.19-2005 defines a content object as "any item that is to be described for inclusion in an information retrieval system, website, or other source of information". The thesaurus aids the assignment of preferred terms to convey semantic metadata associated with the content object.

Gary Marchionini is an American information scientist and educator at the University of North Carolina at Chapel Hill (1998–present).

Pauline Atherton Cochrane is an American librarian and one of the most highly cited authors in the field of library and information sciences. She is considered a leading researcher in the campaign to redesign catalogues and indexes to provide improved online subject access in library and information services as well as "a leading teacher and theorist in cataloging, indexing, and information access."

Uniterm is a subject indexing system introduced by Mortimer Taube in 1951. The name is a contraction of "unit" and "term", referring to its use of single words as the basis of the index, the "uniterms". Taube referred to the overall concept as "Coordinate Indexing", but today the entire concept is generally referred to as Uniterm as well.

References

  1. 1 2 Tunkelang, Daniel (2009). "Faceted Search". Synthesis Lectures on Information Concepts, Retrieval, and Services. 1. Morgan & Claypool: 1–80. doi: 10.2200/S00190ED1V01Y200904ICR005 . S2CID   2430723.
  2. "Parametric Search, Faceted Search, and Taxonomies - New Idea Engineering". www.ideaeng.com. Retrieved 22 July 2022.
  3. Shneiderman, Ben (1994). "Dynamic queries for visual information seeking". IEEE Software. 11 (6): 70–77. doi:10.1109/52.329404. hdl: 1903/388 . S2CID   8021243.
  4. Pollitt, Steven; Smith, Martin; Treglown, Mark; Braekevelt, Patrick (1996). "View-based searching systems—progress towards effective disintermediation". Online Information 96 Proceedings: 433–441.
  5. Yee, Ka-Ping; Swearingen, Kirsten; Li, Kevin; Hearst, Marti (2003-04-05). "Faceted metadata for image search and browsing". Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. CHI '03. New York, NY, USA: Association for Computing Machinery. pp. 401–408. doi:10.1145/642611.642681. ISBN   978-1-58113-630-2. S2CID   367518.
  6. Hill, Gary Marchionini; Interaction Design Laboratory, University of North Carolina at Chapel Hill Ben Brunk; Interaction Design Laboratory, University of North Carolina at Chapel (2003-01-03). Towards a General Relation Browser: A GUI for Information Architects. Texas Digital Library. OCLC   751844113.{{cite book}}: CS1 maint: multiple names: authors list (link)
  7. Flamenco project
  8. "SIGIR'2006 Workshop on Faceted Search - Call for Participation". Facetedsearch.googlepages.com. 2006-08-10. Archived from the original on 2009-09-19. Retrieved 2019-03-19.
  9. Nielsen Norman Group. "The State of Ecommerce Search". Nielsen Norman Group. Retrieved 2021-12-13. In our first study on ease of search experience for users, we concluded that '27% of task failures were a result of not being able to locate a suitable item on the site, even though all of our tasks were designed so there was always at least one item available.'
  10. Smashing Magazine: The Current State of E-Commerce Search Retrieved on 2014-08-27.
  11. Major classification systems : the Dewey Centennial. Forest Press. 2007-08-01. ISBN   978-0-87845-044-2 . Retrieved 2019-03-19.
  12. CiteSeerX. Citeseerx.ist.psu.edu. Retrieved on 2013-07-21.