ESP game

Last updated

The ESP game ( extrasensory perception game) is a human-based computation game developed to address the problem of creating difficult metadata. The idea behind the game is to use the computational power of humans to perform a task that computers cannot (originally, image recognition) by packaging the task as a game. It was originally conceived by Luis von Ahn of Carnegie Mellon University and first posted online in 2003. [1]

Contents

On the official website, there was a running count of "Labels collected since October 5, 2003", updated every 12 hours. They stated that "If the ESP game is played as much as other popular online games, we estimate that all the images on the Web can be labeled in a matter of weeks!" [2] 36 million labels had been collected as of May 2008. [3] The original paper (2004) reported that a pair of players can produce 3.89 ± 0.69 labels per minute. At this rate, 5,000 people continuously playing the game could provide one label per image indexed by Google (425 million) in 31 days. [1]

In late 2008, the game was rebranded as GWAP ("game with a purpose"), with a new user interface. Some other games that were also created by Luis von Ahn, such as "Peekaboom" and "Phetch", were discontinued at that point. "Peekaboom" extends the ESP game by asking players to select the region of the image that corresponds to the label. "Squigl" asks players to trace the object outline in an image. "Matchin" asks players to pick the more beautiful out of two images. [4] "Verbosity", which collects common-sense facts from players. [5]

Google bought a license to create its own version of the game (Google Image Labeler) in 2006 in order to return better search results for its online images. [6] The license of the data acquired by Ahn's ESP game, or the Google version, is not clear.[ clarification needed ] Google's version was shut down on September 16, 2011, as part of the Google Labs closure in September 2011.

Most of the ESP dataset is not publicly available. It was reported in the ImageNet paper that as of 2008, only 60K images and their labels can be accessed. [7]

Concept

Image recognition was historically a task that was difficult for computers to perform independently. Humans are perfectly capable of it, but are not necessarily willing. By making the recognition task a "game", people are more likely to participate. When questioned about how much they enjoyed playing the game, collected data from users was extremely positive.

The applications and uses of having so many labeled images are significant; for example, more accurate image searching and accessibility for visually impaired users, by reading out an image's labels. Partnering two people to label images makes it more likely that entered words will be accurate. Since the only thing the two partners have in common is that they both see the same image, they must enter reasonable labels to have any chance of agreeing on one.

The ESP Game as it is currently implemented encourages players to assign "obvious" labels, which are most likely to lead to an agreement with the partner. But these labels can often be deduced from the labels already present using an appropriate language model and such labels therefore add only little information to the system. A Microsoft research project assigns probabilities to the next label to be added. This model is then used in a program, which plays the ESP game without looking at the image. [8]

ESP game authors presented evidence that the labels produced using the game were indeed useful descriptions of the images. The results of searching for randomly chosen keywords were presented and show that the proportion of appropriate images when searching using the labels generated by the game is extremely high. Further evaluation was achieved by comparing the labels generated using the game to labels generated by participants that were asked to describe the images.

Rules of the game

Once logged in, a user is automatically matched with a random partner. The partners do not know each other's identity and they cannot communicate. Once matched, they will both be shown the same image. Their task is to agree on a word that would be an appropriate label for the image. They both enter possible words, and once a word is entered by both partners (not necessarily at the same time), that word is agreed upon, and that word becomes a label for the image. Once they agree on a word, they are shown another image. They have two and a half minutes to label 15 images.

Both partners have the option to pass; that is, give up on an image. Once one partner passes, the other partner is shown a message that their partner wishes to pass. Both partners must pass for a new image to be shown.

Some images have "taboo" words; that is, words that cannot be entered as possible labels. These words are usually related to the image and make the game harder as they prevent common words to be used to label the image. Taboo words are obtained from the game itself. The first time an image is used in the game, it will have no taboo words. If the image is ever used again, it will have one taboo word: the word that resulted from the previous agreement. The next time the image is used, it will have two taboo words, and so on. "Taboo" words is done automatically by the system: once an image has been labeled enough times with the same word, that word becomes taboo so that the image will get a variety of different words as labels.

Occasionally, the game will be played solo, without a human partner, with the ESP Game itself acting as the opponent and delivering a series of pre-determined labels to the single human player (which have been harvested from labels given to the image during the course of earlier games played by real humans). This is necessary if there are an odd number of people playing the game. [9]

This game has been used as an important example of Social Machine with a Purpose (teleological social machine), providing an example of an intelligent system emerging from the interaction of human participants in the book "The shortcut" by Nello Cristianini, [10] where the intelligence of social media platforms is discussed.

Cheating

Ahn has described countermeasures which prevent players from "cheating" the game, and introducing false data into the system. By giving players occasional test images for which common labels are known, it is possible to check that players are answering honestly, and a player's guesses are only stored if they successfully label the test images. [9]

Furthermore, a label is only stored after a certain number of players (N) have agreed on it. At this point, the tabooed words for the image are deleted, and the image is returned to the game pool as if it were a fresh image. If X is the probability of a label being incorrect despite a player having successfully labelled test images, then after N repetitions the probability of corruption is , assuming that end repetitions are independent of each other. [9]

Image selection

The choice of images used by the ESP game makes a difference in the player's experience. The game would be less entertaining if all the images were chosen from a single site and were all extremely similar.

The first run of the ESP game used a collection of 350,000 images chosen by the developers. Later versions selected images at random from the web, using a small amount of filtering. Such images are reintroduced into the game several times until they are fully labeled. [9] The random images were chosen using "Random Bounce Me", a website that selects a page at random from the Google database. "Random Bounce Me" was queried repeatedly, each time collecting all JPEG and GIF images in the random page, except for images that did not fit the criteria: blank images, images that consist of a single color, images that are smaller than 20 pixels on either dimension, and images with an aspect ratio greater than 4.5 or smaller than 1/4.5. This process was repeated until 350,000 images were collected. The images were then rescaled to fit the game's display. Fifteen different images from the 350,000 are chosen for each session of the game.

Related Research Articles

A CAPTCHA is a type of challenge–response test used in computing to determine whether the user is human in order to deter bot attacks and spam.

Text mining, text data mining (TDM) or text analytics is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources." Written resources may include websites, books, emails, reviews, and articles. High-quality information is typically obtained by devising patterns and trends by means such as statistical pattern learning. According to Hotho et al. (2005), there are three perspectives of text mining: information extraction, data mining, and knowledge discovery in databases (KDD). Text mining usually involves the process of structuring the input text, deriving patterns within the structured data, and finally evaluation and interpretation of the output. 'High quality' in text mining usually refers to some combination of relevance, novelty, and interest. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling.

<i>Taboo</i> (game) 1989 word guessing party game

Taboo is a word, guessing, and party game published by Parker Brothers in 1989. The objective of the game is for a player to have their partners guess the word on the player's card without using the word itself or five additional words listed on the card.

Automatic summarization is the process of shortening a set of data computationally, to create a subset that represents the most important or relevant information within the original content. Artificial intelligence algorithms are commonly developed and employed to achieve this, specialized for different types of data.

Hick's law, or the Hick–Hyman law, named after British and American psychologists William Edmund Hick and Ray Hyman, describes the time it takes for a person to make a decision as a result of the possible choices: increasing the number of choices will increase the decision time logarithmically. The Hick–Hyman law assesses cognitive information capacity in choice reaction experiments. The amount of time taken to process a certain amount of bits in the Hick–Hyman law is known as the "rate of gain of information". The plain language implication of the finding is that increasing the number of choices does not directly increase the time to choose. In other words, twice as many choices does not result in twice as long to choose. Also, because the relationship is logarithmic, the increase in time it takes to choose becomes less and less as the number of choices increases.

In computer science, an inverted index is a database index storing a mapping from content, such as words or numbers, to its locations in a table, or in a document or a set of documents. The purpose of an inverted index is to allow fast full-text searches, at a cost of increased processing when a document is added to the database. The inverted file may be the database file itself, rather than its index. It is the most popular data structure used in document retrieval systems, used on a large scale for example in search engines. Additionally, several significant general-purpose mainframe-based database management systems have used inverted list architectures, including ADABAS, DATACOM/DB, and Model 204.

Peekaboo is a form of play primarily played with an infant.

<span class="mw-page-title-main">Digital humanities</span> Area of scholarly activity

Digital humanities (DH) is an area of scholarly activity at the intersection of computing or digital technologies and the disciplines of the humanities. It includes the systematic use of digital resources in the humanities, as well as the analysis of their application. DH can be defined as new ways of doing scholarship that involve collaborative, transdisciplinary, and computationally engaged research, teaching, and publishing. It brings digital tools and methods to the study of the humanities with the recognition that the printed word is no longer the main medium for knowledge production and distribution.

Human-based computation (HBC), human-assisted computation, ubiquitous human computing or distributed thinking is a computer science technique in which a machine performs its function by outsourcing certain steps to humans, usually as microwork. This approach uses differences in abilities and alternative costs between humans and computer agents to achieve symbiotic human–computer interaction. For computationally difficult tasks such as image recognition, human-based computation plays a central role in training Deep Learning-based Artificial Intelligence systems. In this case, human-based computation has been referred to as human-aided artificial intelligence.

<span class="mw-page-title-main">Password strength</span> Resistance of a password to being guessed

Password strength is a measure of the effectiveness of a password against guessing or brute-force attacks. In its usual form, it estimates how many trials an attacker who does not have direct access to the password would need, on average, to guess it correctly. The strength of a password is a function of length, complexity, and unpredictability.

<span class="mw-page-title-main">Luis von Ahn</span> Guatemalan-American entrepreneur and software developer

Luis von Ahn is a Guatemalan-American entrepreneur, software developer, and consulting professor in the Computer Science Department at Carnegie Mellon University in Pittsburgh, Pennsylvania. He is known as one of the pioneers of crowdsourcing. He is the founder of the company reCAPTCHA, which was sold to Google in 2009, and the co-founder and CEO of Duolingo.

<span class="mw-page-title-main">Google Image Labeler</span> Feature in Google Images

Google Image Labeler is a feature, in the form of a game, of Google Images that allows the user to label random images to help improve the quality of Google's image search results. It was online from 2006 to 2011 at http://images.google.com/imagelabeler/ and relaunched in 2016 at https://get.google.com/crowdsource/.

reCAPTCHA CAPTCHA implementation owned by Google

reCAPTCHA Inc. is a CAPTCHA system owned by Google. It enables web hosts to distinguish between human and automated access to websites. The original version asked users to decipher hard-to-read text or match images. Version 2 also asked users to decipher text or match images if the analysis of cookies and canvas rendering suggested the page was being downloaded automatically. Since version 3, reCAPTCHA will never interrupt users and is intended to run automatically when users load pages or click buttons.

A human-based computation game or game with a purpose (GWAP) is a human-based computation technique of outsourcing steps within a computational process to humans in an entertaining way (gamification).

<span class="mw-page-title-main">Virtual assistant</span> Software agent

A virtual assistant (VA) is a software agent that can perform a range of tasks or services for a user based on user input such as commands or questions, including verbal ones. Such technologies often incorporate chatbot capabilities to simulate human conversation, such as via online chat, to facilitate interaction with their users. The interaction may be via text, graphical interface, or voice - as some virtual assistants are able to interpret human speech and respond via synthesized voices.

Page Hunt is a game developed by Bing for investigating human research behavior. It is a so-called "game with a purpose", as it pursues additional goals: not only to provide entertainment but also to harness human computation for some specific research task. The term "games with a purpose" was coined by Luis von Ahn, inventor of CAPTCHA, co-organizer of the reCAPTCHA project, and inventor of a famous ESP game.

Phetch is a game with a purpose intended to label images on the internet with descriptive captions suitable to assist sight impaired readers. Approximately 75% of the images on the web do not have proper ALT text labels, making them inaccessible through screen readers. The solution aimed at by Phetch is to label the images external to the web page rather than depending upon the web page author to create proper alt text for each image. Rather than paying people to do the mundane task of labeling images, Phetch aims to create a fun game that produces such descriptions as a side effect of having fun.

Culturomics is a form of computational lexicology that studies human behavior and cultural trends through the quantitative analysis of digitized texts. Researchers data mine large digital archives to investigate cultural phenomena reflected in language and word usage. The term is an American neologism first described in a 2010 Science article called Quantitative Analysis of Culture Using Millions of Digitized Books, co-authored by Harvard researchers Jean-Baptiste Michel and Erez Lieberman Aiden.

In natural language processing, a word embedding is a representation of a word. The embedding is used in text analysis. Typically, the representation is a real-valued vector that encodes the meaning of the word in such a way that the words that are closer in the vector space are expected to be similar in meaning. Word embeddings can be obtained using language modeling and feature learning techniques, where words or phrases from the vocabulary are mapped to vectors of real numbers.

Approximate computing is an emerging paradigm for energy-efficient and/or high-performance design. It includes a plethora of computation techniques that return a possibly inaccurate result rather than a guaranteed accurate result, and that can be used for applications where an approximate result is sufficient for its purpose. One example of such situation is for a search engine where no exact answer may exist for a certain search query and hence, many answers may be acceptable. Similarly, occasional dropping of some frames in a video application can go undetected due to perceptual limitations of humans. Approximate computing is based on the observation that in many scenarios, although performing exact computation requires large amount of resources, allowing bounded approximation can provide disproportionate gains in performance and energy, while still achieving acceptable result accuracy. For example, in k-means clustering algorithm, allowing only 5% loss in classification accuracy can provide 50 times energy saving compared to the fully accurate classification.

References

  1. 1 2 von Ahn, Luis; Dabbish, Laura (2004-04-25). "Labeling images with a computer game". Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM. pp. 319–326. doi:10.1145/985692.985733. ISBN   978-1-58113-702-6.
  2. "The ESP Game Project". 2003-10-20. Archived from the original on 2003-10-20. Retrieved 2024-11-13.
  3. "The ESP Game Project". 2008-05-09. Archived from the original on 2008-05-09. Retrieved 2024-11-13.
  4. "Solving the web's image problem". 2008-05-14. Retrieved 2024-11-13.
  5. von Ahn, Luis; Kedia, Mihir; Blum, Manuel (2006-04-22). "Verbosity: A game for collecting common-sense facts". Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM. pp. 75–78. doi:10.1145/1124772.1124784. ISBN   978-1-59593-372-0.
  6. "Solving the web's image problem". bbc. 2008-05-14. Retrieved 2008-12-14.
  7. Deng, Jia; Dong, Wei; Socher, Richard; Li, Li-Jia; Kai Li; Li Fei-Fei (June 2009). "ImageNet: A large-scale hierarchical image database". 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE. pp. 248–255. doi:10.1109/CVPR.2009.5206848. ISBN   978-1-4244-3992-8.
  8. Robertson, Stephen; Vojnovic, Milan; Weber, Ingmar (2009-04-04). "Rethinking the ESP game". CHI '09 Extended Abstracts on Human Factors in Computing Systems. CHI EA '09. New York, NY, USA: Association for Computing Machinery. pp. 3937–3942. doi:10.1145/1520340.1520597. ISBN   978-1-60558-247-4.
  9. 1 2 3 4 GoogleTalksArchive (2012-08-22). Human Computation . Retrieved 2024-11-13 via YouTube.{{cite AV media}}: |last= has generic name (help) Talk by Luis von Ahn on July 26, 2006
  10. Cristianini, Nello (2023). The shortcut: why intelligent machines do not think like us. Boca Raton. ISBN   978-1-003-33581-8. OCLC   1352480147.{{cite book}}: CS1 maint: location missing publisher (link)