Collaborative search engine

Last updated

Collaborative search engines (CSE) are Web search engines and enterprise searches within company intranets that let users combine their efforts in information retrieval (IR) activities, share information resources collaboratively using knowledge tags, and allow experts to guide less experienced people through their searches. Collaboration partners do so by providing query terms, collective tagging, adding comments or opinions, rating search results, and links clicked of former (successful) IR activities to users having the same or a related information need.

Contents

Models of collaboration

Collaborative search engines can be classified along several dimensions: intent (explicit and implicit) and synchronization, [1] depth of mediation, [2] task vs. trait, [3] division of labor, and sharing of knowledge. [4]

Explicit vs. implicit collaboration

Implicit collaboration characterizes Collaborative filtering and recommendation systems in which the system infers similar information needs. I-Spy, [5] Jumper 2.0, Seeks, the Community Search Assistant, [6] the CSE of Burghardt et al., [7] and the works of Longo et al. [8] [9] [10] all represent examples of implicit collaboration. Systems that fall under this category identify similar users, queries and links clicked automatically, and recommend related queries and links to the searchers.

Explicit collaboration means that users share an agreed-upon information need and work together toward that goal. For example, in a chat-like application, query terms and links clicked are automatically exchanged. The most prominent example of this class is SearchTogether [11] published in 2007. SearchTogether offers an interface that combines search results from standard search engines and a chat to exchange queries and links. PlayByPlay [12] takes a step further to support general purpose collaborative browsing tasks with an instant messaging functionality. Reddy et al. [13] follow a similar approach and compares two implementations of their CSE called MUSE and MUST. Reddy et al. focus on the role of communication required for efficient CSEs. Cerciamo [2] supports explicit collaboration by allowing one person to concentrate on finding promising groups of documents while having the other person make in-depth judgments of relevance on documents found by the first person.

However, in Papagelis et al. [14] terms are used differently: they combine explicitly shared links and implicitly collected browsing histories of users to a hybrid CSE.

Community of practice

Recent work in collaborative filtering and information retrieval has shown that sharing of search experiences among users having similar interests, typically called a community of practice or community of interest, reduces the effort put in by a given user in retrieving the exact information of interest. [15]

Collaborative search deployed within a community of practice deploys novel techniques for exploiting context during search by indexing and ranking search results based on the learned preferences of a community of users. [16] The users benefit by sharing information, experiences and awareness to personalize result-lists to reflect the preferences of the community as a whole. The community representing a group of users who share common interests, similar professions. The best known example is the open-source project ApexKB (previously known as Jumper 2.0). [17]

Depth of mediation

The depth of mediation refers to the degree that the CSE mediates search. [2] SearchTogether [11] is an example of UI-level mediation: users exchange query results and judgments of relevance, but the system does not distinguish among users when they run queries. PlayByPlay [12] is another example of UI-level mediation where all users have full and equal access to the instant messaging functionality without the system's coordination. Cerchiamo [2] and recommendation systems such as I-Spy [5] keep track of each person's search activity independently and use that information to affect their search results. These are examples of deeper algorithmic mediation.

Task vs. trait

This model classifies people's membership in groups based on the task at hand vs. long-term interests; these may be correlated with explicit and implicit collaboration. [3]

Platforms and modalities

CSE systems started off on the desktop end, with the earliest ones being extensions or modifications to existing web browsers. GroupWeb [18] is a desktop web browser that offers a shared visual workspace for a group of users. SearchTogether [11] is a desktop application that combines search results from standard search engines and a chat interface for users to exchange queries and links. CoSense [19] supports sensemaking tasks in collaborative Web search by offering rich and interactive presentations of a group's search activities.

With the prevalence of mobile phones and tablets, CSEs are also taking advantage of these additional device modalities. CoSearch [20] is a system that supports co-located collaborative web search by leveraging extra mobile phones and mice. PlayByPlay [12] also supports collaborative browsing between mobile and desktop users.

Synchronous vs. asynchronous collaboration

Synchronous collaboration model enables different users to work toward the same goal together simultaneously, with each individual user having access to one another's progress in real-time. A typical example of the synchronous collaboration model is GroupWeb, [18] where users are made aware of what others are doing through features such as synchronous scrolling with pages, telepointers for enacting gestures, and group annotations that are attached to web pages.

Asynchronous collaboration models offer more flexibility toward when different users' different search processes are carried out while reducing the cognitive effort for later users to consume and build upon previous users' search results. SearchTogether, [11] for example, supports asynchronous collaboration functionalities by persisting previous users' chat logs, search queries, and web browsing histories so that the later users could quickly bring themselves up to speed.

Applications of collaborative search engines

The applications of CSEs are well-explored in both the academic community and industry. For example, GroupWeb [18] was used as a presentation tool for real-time distance education and conferences. ClassSearch [21] is deployed in middle-school classroom sessions to facilitate collaborative search activities in classrooms and study the space of co-located search pedagogies.

Privacy-aware collaborative search engines

Search terms and links clicked that are shared among users reveal their interests, habits, social relations and intentions. [22] In other words, CSEs put the privacy of the users at risk. Studies have shown that CSEs increase efficiency. [11] [23] [24] [25] Unfortunately, by the lack of privacy enhancing technologies, a privacy aware user who wants to benefit from a CSE has to disclose their entire search log. (Note, even when explicitly sharing queries and links clicked, the whole (former) log is disclosed to any user that joins a search session). Thus, sophisticated mechanisms that allow on a more fine grained level which information is disclosed to whom are desirable.

As CSEs are a new technology just entering the market, identifying user privacy preferences and integrating Privacy enhancing technologies (PETs) into collaborative search are in conflict. On the one hand, PETs have to meet user preferences, on the other hand, one cannot identify these preferences without using a CSE, i.e., implementing PETs into CSEs. Today, the only work addressing this problem comes from Burghardt et al. [26] They implemented a CSE with experts from the information system domain and derived the scope of possible privacy preferences in a user study with these experts. Results show that users define preferences referring to (i) their current context (e.g., being at work), (ii) the query content (e.g., users exclude topics from sharing), (iii) time constraints (e.g., do not publish the query X hours after the query has been issued, do not store longer than X days, do not share between working time), and that users intensively use the option to (iv) distinguish between different social groups when sharing information. Further, users require (v) anonymization and (vi) define reciprocal constraints, i.e., they refer to the behavior of other users, e.g., if a user would have shared the same query in turn.

Related Research Articles

<span class="mw-page-title-main">Semantic Web</span> Extension of the Web to facilitate data exchange

The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.

Personal information management (PIM) is the study and implementation of the activities that people perform in order to acquire or create, store, organize, maintain, retrieve, and use informational items such as documents, web pages, and email messages for everyday use to complete tasks and fulfill a person's various roles ; it is information management with intrapersonal scope.

An image retrieval system is a computer system used for browsing, searching and retrieving images from a large database of digital images. Most traditional and common methods of image retrieval utilize some method of adding metadata such as captioning, keywords, title or descriptions to the images so that retrieval can be performed over the annotation words. Manual image annotation is time-consuming, laborious and expensive; to address this, there has been a large amount of research done on automatic image annotation. Additionally, the increase in social web applications and the semantic web have inspired the development of several web-based image annotation tools.

<span class="mw-page-title-main">Collaborative filtering</span> Algorithm

Collaborative filtering (CF) is a technique used by recommender systems. Collaborative filtering has two senses, a narrow one and a more general one.

<span class="mw-page-title-main">Metasearch engine</span> Online information retrieval tool

A metasearch engine is an online information retrieval tool that uses the data of a web search engine to produce its own results. Metasearch engines take input from a user and immediately query search engines for results. Sufficient data is gathered, ranked, and presented to the users.

A recommender system, or a recommendation system, is a subclass of information filtering system that provide suggestions for items that are most pertinent to a particular user. Recommender systems are particularly useful when an individual needs to choose an item from a potentially overwhelming number of items that a service may offer.

<span class="mw-page-title-main">Content-based image retrieval</span> Method of image retrieval

Content-based image retrieval, also known as query by image content (QBIC) and content-based visual information retrieval (CBVIR), is the application of computer vision techniques to the image retrieval problem, that is, the problem of searching for digital images in large databases. Content-based image retrieval is opposed to traditional concept-based approaches.

Social bookmarking is an online service which allows users to add, annotate, edit, and share bookmarks of web documents. Many online bookmark management services have launched since 1996; Delicious, founded in 2003, popularized the terms "social bookmarking" and "tagging". Tagging is a significant feature of social bookmarking systems, allowing users to organize their bookmarks and develop shared vocabularies known as folksonomies.

Exploratory search is a specialization of information exploration which represents the activities carried out by searchers who are:

Relevance feedback is a feature of some information retrieval systems. The idea behind relevance feedback is to take the results that are initially returned from a given query, to gather user feedback, and to use information about whether or not those results are relevant to perform a new query. We can usefully distinguish between three types of feedback: explicit feedback, implicit feedback, and blind or "pseudo" feedback.

Social search is a behavior of retrieving and searching on a social searching engine that mainly searches user-generated content such as news, videos and images related search queries on social media like Facebook, LinkedIn, Twitter, Instagram and Flickr. It is an enhanced version of web search that combines traditional algorithms. The idea behind social search is that instead of ranking search results purely based on semantic relevance between a query and the results, a social search system also takes into account social relationships between the results and the searcher. The social relationships could be in various forms. For example, in LinkedIn people search engine, the social relationships include social connections between searcher and each result, whether or not they are in the same industries, work for the same companies, belong the same social groups, and go the same schools, etc.

Cold start is a potential problem in computer-based information systems which involves a degree of automated data modelling. Specifically, it concerns the issue that the system cannot draw any inferences for users or items about which it has not yet gathered sufficient information.

Geographic information retrieval (GIR) or geographical information retrieval systems are search tools for searching the Web, enterprise documents, and mobile local search that combine traditional text-based queries with location querying, such as a map or placenames. Like traditional information retrieval systems, GIR systems index text and information from structured and unstructured documents, and also augment those indices with geographic information. The development and engineering of GIR systems aims to build systems that can reliably answer queries that include a geographic dimension, such as "What wars were fought in Greece?" or "restaurants in Beirut". Semantic similarity and word-sense disambiguation are important components of GIR. To identify place names, GIR systems often rely on natural language processing or other metadata to associate text documents with locations. Such georeferencing, geotagging, and geoparsing tools often need databases of location names, known as gazetteers.

Expertise finding is the use of tools for finding and assessing individual expertise. In the recruitment industry, expertise finding is the problem of searching for employable candidates with certain required skills set. In other words, it is the challenge of linking humans to expertise areas, and as such is a sub-problem of expertise retrieval.

Folksonomy is a classification system in which end users apply public tags to online items, typically to make those items easier for themselves or others to find later. Over time, this can give rise to a classification system based on those tags and how often they are applied or searched for, in contrast to a taxonomic classification designed by the owners of the content and specified when it is published. This practice is also known as collaborative tagging, social classification, social indexing, and social tagging. Folksonomy was originally "the result of personal free tagging of information [...] for one's own retrieval", but online sharing and interaction expanded it into collaborative forms. Social tagging is the application of tags in an open online environment where the tags of other users are available to others. Collaborative tagging is tagging performed by a group of users. This type of folksonomy is commonly used in cooperative and collaborative projects such as research, content repositories, and social bookmarking.

<span class="mw-page-title-main">Learning to rank</span> Use of machine learning to rank items

Learning to rank or machine-learned ranking (MLR) is the application of machine learning, typically supervised, semi-supervised or reinforcement learning, in the construction of ranking models for information retrieval systems. Training data may, for example, consist of lists of items with some partial order specified between items in each list. This order is typically induced by giving a numerical or ordinal score or a binary judgment for each item. The goal of constructing the ranking model is to rank new, unseen lists in a similar way to rankings in the training data.

Collaborative information seeking (CIS) is a field of research that involves studying situations, motivations, and methods for people working in collaborative groups for information seeking projects, as well as building systems for supporting such activities. Such projects often involve information searching or information retrieval (IR), information gathering, and information sharing. Beyond that, CIS can extend to collaborative information synthesis and collaborative sense-making.

Personalized search is a web search tailored specifically to an individual's interests by incorporating information about the individual beyond the specific query provided. There are two general approaches to personalizing search results, involving modifying the user's query and re-ranking search results.

Contextual search is a form of optimizing web-based search results based on context provided by the user and the computer being used to enter the query. Contextual search services differ from current search engines based on traditional information retrieval that return lists of documents based on their relevance to the query. Rather, contextual search attempts to increase the precision of results based on how valuable they are to individual users.

Click tracking is when user click behavior or user navigational behavior is collected in order to derive insights and fingerprint users. Click behavior is commonly tracked using server logs which encompass click paths and clicked URLs. This log is often presented in a standard format including information like the hostname, date, and username. However, as technology develops, new software allows for in depth analysis of user click behavior using hypervideo tools. Given that the internet can be considered a risky environment, research strives to understand why users click certain links and not others. Research has also been conducted to explore the user experience of privacy with making user personal identification information individually anonymized and improving how data collection consent forms are written and structured.

References

  1. Golovchinsky Gene; Pickens Jeremy (2007), "Collaborative Exploratory Search" (PDF), Proceedings of HCIR 2007 Workshop
  2. 1 2 3 4 Pickens Jeremy; Golovchinsky Gene; Shah Chirag; Qvarfordt Pernilla; Back Maribeth (2008), "Algorithmic mediation for collaborative exploratory search", SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pp. 315–322, doi:10.1145/1390334.1390389, ISBN   9781605581644, S2CID   15704152
  3. 1 2 Morris Meredith; Teevan Jaime (2008), "Understanding Groups' Properties as a Means of Improving Collaborative Search Systems" (PDF), 1st International Workshop on Collaborative Information Retrieval, held in conjunction with JCDL 2008
  4. Foley, Colum (2008). Division of Labour and Sharing of Knowledge for Synchronous Collaborative Information Retrieval (PDF) (PhD thesis). Dublin City University. Archived from the original (PDF) on 2011-07-16. Retrieved 2009-07-30.
  5. 1 2 Barry Smyth; Evelyn Balfe; Peter Briggs; Maurice Coyle; Jill Freyne (2003), "Collaborative Web Search", IJCAI: 1417–1419
  6. Natalie S. Glance (2001), "Community search assistant", Workshop on AI for Web Search AAAI'02
  7. Thorben Burghardt; Erik Buchmann; Klemens Böhm (2008). "Discovering the Scope of Privacy Needs in Collaborative Search". 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology. pp. 910–913. doi:10.1109/WIIAT.2008.165. ISBN   978-0-7695-3496-1. S2CID   15921662.
  8. Longo Luca; Barrett Stephen; Dondio Pierpaolo (2009), "Toward Social Search - From Explicit to Implicit Collaboration to Predict Users' Interests", Webist 2009 - Proceedings of the Fifth International Conference on Web Information Systems and Technologies, Lisbon, Portugal, March 23–26, 2009, 1: 693–696, ISBN   978-989-8111-81-4
  9. Longo Luca; Barrett Stephen; Dondio Pierpaolo (2010). "Enhancing Social Search: A Computational Collective Intelligence Model of Behavioural Traits, Trust and Time". Transactions on Computational Collective Intelligence II. Lecture Notes in Computer Science. Vol. 2. pp. 46–69. Bibcode:2010LNCS.6450...46L. doi:10.1007/978-3-642-17155-0_3. ISBN   978-3-642-17154-3.{{cite book}}: |journal= ignored (help)
  10. Longo Luca; Barrett Stephen; Dondio Pierpaolo (2009), "Information Foraging Theory as a Form of Collective Intelligence for Social Search", Computational Collective Intelligence. Semantic Web, Social Networks and Multiagent Systems, First International Conference, ICCCI 2009, Wroclaw, Poland, October 5–7, 2009. Proceedings, 1: 63–74, ISBN   978-3-642-04440-3
  11. 1 2 3 4 5 Meredith Ringel Morris; Eric Horvitz (2007). "SearchTogether: An interface for collaborative web search". Proceedings of the 20th annual ACM symposium on User interface software and technology. pp. 3–12. doi:10.1145/1294211.1294215. ISBN   9781595936790. S2CID   10783726.{{cite book}}: |journal= ignored (help)
  12. 1 2 3 Heather Wiltse; Jeffrey Nichols (2008). "CoSearch: A system for co-located collaborative web search". Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. Chi '08. pp. 1647–1656. doi:10.1145/1357054.1357311. ISBN   9781605580111. S2CID   9854331.{{cite book}}: |journal= ignored (help)
  13. Madhu C. Reddy; Bernhard J. Jansen; Rashmi Krishnappa (2008), "The Role of Communication in Collaborative Information Searching", ASTIS
  14. Athanasios Papagelis; Christos Zaroliagis (2007). "Author Index". Eighth Mexican International Conference on Current Trends in Computer Science (ENC 2007). pp. 88–98. doi:10.1109/ENC.2007.34. ISBN   978-0-7695-2899-1.
  15. Rohini U; Vamshi Ambati (2002), "A Collaborative Filtering based Re-ranking Strategy for Search in Digital Libraries" (PDF), ICADL2005: The 8th International Conference on Asian Digital Libraries
  16. Maurice Coyle & Barry Smyth (2008), Nejdl, Wolfgang; Kay, Judy; Pu, Pearl; et al. (eds.), Adaptive Hypermedia and Adaptive Web-Based Systems, Lecture Notes in Computer Science, vol. 5149/2008, pp. 103–112, CiteSeerX   10.1.1.153.7573 , doi:10.1007/978-3-540-70987-9, ISBN   978-3-540-70984-8
  17. Jumper Networks Inc. (2010), "Jumper Networks Releases Jumper 2.0.1.5 Platform with New Community Search Features", Press Release, archived from the original on 2012-06-04, retrieved 2012-05-16
  18. 1 2 3 Saul Greenberg; Mark Roseman (1996), "GroupWeb: A WWW Browser As Real Time Groupware", CHI, doi:10.1145/257089.257317, S2CID   30982523
  19. Sharoda A. Paul; Meredith Ringel Morris (2009), "CoSense: Enhancing Sensemaking for Collaborative Web Search", CHI, doi:10.1145/1518701.1518974, S2CID   10280059
  20. Saleema Amershi; Meredith Ringel Morris (2008), "CoSearch: A System for Co-located Collaborative Web Search", CHI, doi:10.1145/1357054.1357311, S2CID   9854331
  21. Neema Moraveji; Meredith Ringel Morris; Daniel Morris; Mary Czerwinski; Nathalie Henry Riche (2011), "ClassSearch: Facilitating the Development of Web Search Skills Through Social Learning", CHI, doi:10.1145/1978942.1979203, S2CID   6816313
  22. Data Protection Working Party (2008), "Article 29 EU Data Protection Working Party", EU
  23. Barry Smyth; Evelyn Balfe; Oisin Boydell; Keith Bradley; Peter Briggs; Maurice Coyle; Jill Freyne (2005), "A Live-User Evaluation of Collaborative Web Search", IJCAI
  24. Smyth, Barry & Balfe, Evelyn (2005), "Anonymous personalization in collaborative web search", Inf. Retr., 9 (2): 165–190, doi:10.1007/s10791-006-7148-z, S2CID   11659895
  25. Seikyung Jung; Juntae Kim; Herlocker, JL (2004), "Applying Collaborative Filtering for Efficient Document Search", Inf. Retr.: 640–643
  26. Thorben Burghardt; Erik Buchmann; Klemens Böhm; Chris Clifton (2008), "Collaborative Search And User Privacy: How Can They Be Reconciled?", CollaborateCom