This article needs additional citations for verification . (February 2015) (Learn how and when to remove this template message) |
Page Hunt is a game developed by Bing for investigating human research behavior. It is a so-called "game with a purpose", as it pursues additional goals: not only to provide entertainment but also to harness human computation for some specific research task. The term "games with a purpose" was coined by Luis von Ahn, inventor of CAPTCHA, co-organizer of the reCAPTCHA project, and inventor of a famous ESP game. [1]
A game is a structured form of play, usually undertaken for enjoyment and sometimes used as an educational tool. Games are distinct from work, which is usually carried out for remuneration, and from art, which is more often an expression of aesthetic or ideological elements. However, the distinction is not clear-cut, and many games are also considered to be work or art.
Bing is a web search engine owned and operated by Microsoft. The service has its origins in Microsoft's previous search engines: MSN Search, Windows Live Search and later Live Search. Bing provides a variety of search services, including web, video, image and map search products. It is developed using ASP.NET.
Luis von Ahn is a Guatemalan entrepreneur and a Consulting Professor in the Computer Science Department at Carnegie Mellon University in Pittsburgh, Pennsylvania. He is known as one of the pioneers of crowdsourcing. He is the founder of the company reCAPTCHA, which was sold to Google in 2009, and the co-founder and CEO of Duolingo, a popular language-learning platform.
Page Hunt is only accessible through Internet Explorer, and requires Silverlight (freely downloadable from the Page Hunt website).
Internet Explorer is a series of graphical web browsers developed by Microsoft and included in the Microsoft Windows line of operating systems, starting in 1995. It was first released as part of the add-on package Plus! for Windows 95 that year. Later versions were available as free downloads, or in service packs, and included in the original equipment manufacturer (OEM) service releases of Windows 95 and later versions of Windows. The browser is discontinued, but still maintained.
Unlike the games of Luis von Ahn, Page Hunt is a single-player game. It does not support user registration (and hence does not rank players).
Shown a webpage, the player must find the best keyword or keywords which would bring this page to the list of top 5 search results by Bing. The higher the rank of the page within the first 5 results, the more points the player gets. Achieving this without frequent queries earns a bonus. The game lasts for 3 minutes.
An index term, subject term, subject heading, or descriptor, in information retrieval, is a term that captures the essence of the topic of a document. Index terms make up a controlled vocabulary for use in bibliographic records. They are an integral part of bibliographic control, which is the function by which libraries collect, organize and disseminate documents. They are used as keywords to retrieve documents in an information system, for instance, a catalog or a search engine. A popular form of keywords on the web are tags which are directly visible and can be assigned by non-experts. Index terms can consist of a word, phrase, or alphanumerical term. They are created by analyzing the document either manually with subject indexing or automatically with automatic indexing or more sophisticated methods of keyword extraction. Index terms can either come from a controlled vocabulary or be freely assigned.
The data gained using Page Hunt has several applications:
On testing a game internally, the following results were gathered (as described in “Page Hunt: Improving search engines using human computation games” [2] ): about 27% of the pages in the test database had 100% findability (it means that all the persons who were shown this page could bring it to the 5 best results), while almost the same number of pages (26%) were found by nobody. Thereby, a relation between the length URL and a webpage findability could be postulated: The longer the URL of the webpage, the harder it was to "hunt" it. Also the winning search queries were analyzed and classified. The queries that contain:
Information retrieval (IR) is the activity of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for the metadata that describes data, and for databases of texts, images or sounds.
In information science and information retrieval, relevance denotes how well a retrieved document or set of documents meets the information need of the user. Relevance may include concerns such as timeliness, authority or novelty of the result.
A backlink for a given web resource is a link from some other website to that web resource. A web resource may be a website, web page, or web directory.
The ESP game is a human-based computation game developed to address the problem of creating difficult metadata. The idea behind the game is to use the computational power of humans to perform a task that computers cannot by packaging the task as a game. It was originally conceived by Luis von Ahn of Carnegie Mellon University. Google bought a licence to create its own version of the game in 2006 in order to return better search results for its online images. The licence of the data acquired by Ahn's ESP game, or the Google version, is not clear. Google's version was shut down on September 16, 2011 as part of the Google Labs closure in September 2011.
Human-based computation (HBC), human-assisted computation, ubiquitous human computing or distributed thinking is a computer science technique in which a machine performs its function by outsourcing certain steps to humans, usually as microwork. This approach uses differences in abilities and alternative costs between humans and computer agents to achieve symbiotic human–computer interaction.
Exploratory search is a specialization of information exploration which represents the activities carried out by searchers who are:
Relevance feedback is a feature of some information retrieval systems. The idea behind relevance feedback is to take the results that are initially returned from a given query, to gather user feedback, and to use information about whether or not those results are relevant to perform a new query. We can usefully distinguish between three types of feedback: explicit feedback, implicit feedback, and blind or "pseudo" feedback.
Search engine optimisation indexing collects, parses, and stores data to facilitate fast and accurate information retrieval. Index design incorporates interdisciplinary concepts from linguistics, cognitive psychology, mathematics, informatics, and computer science. An alternate name for the process in the context of search engines designed to find web pages on the Internet is web indexing.
Query expansion (QE) is the process of reformulating a given query to improve retrieval performance in information retrieval operations, particularly in the context of query understanding. In the context of search engines, query expansion involves evaluating a user's input and expanding the search query to match additional documents. Query expansion involves techniques such as:
A web search query is a query based on a specific search term that a user enters into a web search engine to satisfy his or her information needs. Web search queries are distinctive in that they are often plain text or hypertext with optional search-6directives. They vary greatly from standard query languages, which are governed by strict syntax rules as command languages with keyword or positional parameters.
A Web query topic classification/categorization is a problem in information science. The task is to assign a Web search query to one or more predefined categories, based on its topics. The importance of query classification is underscored by many services provided by Web search. A direct application is to provide better search result pages for users with interests of different categories. For example, the users issuing a Web query “apple” might expect to see Web pages related to the fruit apple, or they may prefer to see products or news related to the computer company. Online advertisement services can rely on the query classification results to promote different products more accurately. Search result pages can be grouped according to the categories predicted by a query classification algorithm. However, the computation of query classification is non-trivial. Different from the document classification tasks, queries submitted by Web search users are usually short and ambiguous; also the meanings of the queries are evolving over time. Therefore, query topic classification is much more difficult than traditional document classification tasks.
A concept search is an automated information retrieval method that is used to search electronically stored unstructured text for information that is conceptually similar to the information provided in a search query. In other words, the ideas expressed in the information retrieved in response to a concept search query are relevant to the ideas contained in the text of the query.
A human-based computation game or game with a purpose (GWAP) is a human-based computation technique of outsourcing steps within a computational process to humans in an entertaining way (gamification).
An intelligent medical search engine is a vertical search engine that uses expert system technology to provide personalized medical information.
W. Bruce Croft is a distinguished professor of computer science at the University of Massachusetts Amherst whose work focuses on information retrieval. He is the founder of the Center for Intelligent Information Retrieval and served as the editor-in-chief of ACM Transactions on Information Systems from 1995 to 2002. He was also a member of the National Research Council Computer Science and Telecommunications Board from 2000 to 2003. Since 2015, he is the Dean of the College of Information and Computer Sciences at the University of Massachusetts Amherst. He was Chair of the UMass Amherst Computer Science Department from 2001 to 2007.
Learning to rank or machine-learned ranking (MLR) is the application of machine learning, typically supervised, semi-supervised or reinforcement learning, in the construction of ranking models for information retrieval systems. Training data consists of lists of items with some partial order specified between items in each list. This order is typically induced by giving a numerical or ordinal score or a binary judgment for each item. The ranking model's purpose is to rank, i.e. produce a permutation of items in new, unseen lists in a way which is "similar" to rankings in the training data in some sense.
Phetch is a Game with a purpose intended to label images on the World wide web with descriptive captions suitable to assist sight impaired readers. Approximately 75% of the images on the web do not have proper ALT text labels, making them inaccessible through Screen readers. The solution aimed at by Phetch is to label the images external to the web page rather than depending upon the web page author to create proper alt text for each image. Rather than paying people to do the mundane task of labeling images, Phetch aims to create a fun game that produces such descriptions as a side effect of having fun.
Vocabulary mismatch is a common phenomenon in the usage of natural languages, occurring when different people name the same thing or concept differently.
User intent, otherwise known as query intent or search intent, is the identification and categorisation of what a user online intended or wanted to find when they typed their search terms into an online web search engine for the purpose of search engine optimisation or conversion rate optimisation. Examples of user intent are fact-checking, comparison shopping or filling downtime.