ESP game

Last updated

The ESP game ( extrasensory perception game) is a human-based computation game developed to address the problem of creating difficult metadata. The idea behind the game is to use the computational power of humans to perform a task that computers cannot (originally, image recognition) by packaging the task as a game. It was originally conceived by Luis von Ahn of Carnegie Mellon University. Google bought a license to create its own version of the game (Google Image Labeler) in 2006 in order to return better search results for its online images. [1] The license of the data acquired by Ahn's ESP game, or the Google version, is not clear.[ clarification needed ] Google's version was shut down on September 16, 2011, as part of the Google Labs closure in September 2011.

Contents

Concept

Image recognition was historically a task that was difficult for computers to perform independently. Humans are perfectly capable of it, but are not necessarily willing. By making the recognition task a "game", people are more likely to participate. When questioned about how much they enjoyed playing the game, collected data from users was extremely positive.

The applications and uses of having so many labeled images are significant; for example, more accurate image searching and accessibility for visually impaired users, by reading out an image's labels. Partnering two people to label images makes it more likely that entered words will be accurate. Since the only thing the two partners have in common is that they both see the same image, they must enter reasonable labels to have any chance of agreeing on one.

The ESP Game as it is currently implemented encourages players to assign "obvious" labels, which are most likely to lead to an agreement with the partner. But these labels can often be deduced from the labels already present using an appropriate language model and such labels therefore add only little information to the system. A Microsoft research project assigns probabilities to the next label to be added. This model is then used in a program, which plays the ESP game without looking at the image. [2]

ESP game authors presented evidence that the labels produced using the game were indeed useful descriptions of the images. The results of searching for randomly chosen keywords were presented and show that the proportion of appropriate images when searching using the labels generated by the game is extremely high. Further evaluation was achieved by comparing the labels generated using the game to labels generated by participants that were asked to describe the images.

Rules of the game

Once logged in, a user is automatically matched with a random partner. The partners do not know each other's identity and they cannot communicate. Once matched, they will both be shown the same image. Their task is to agree on a word that would be an appropriate label for the image. They both enter possible words, and once a word is entered by both partners (not necessarily at the same time), that word is agreed upon, and that word becomes a label for the image. Once they agree on a word, they are shown another image. They have two and a half minutes to label 15 images.

Both partners have the option to pass; that is, give up on an image. Once one partner passes, the other partner is shown a message that their partner wishes to pass. Both partners must pass for a new image to be shown.

Some images have "taboo" words; that is, words that cannot be entered as possible labels. These words are usually related to the image and make the game harder as they prevent common words to be used to label the image. Taboo words are obtained from the game itself. The first time an image is used in the game, it will have no taboo words. If the image is ever used again, it will have one taboo word: the word that resulted from the previous agreement. The next time the image is used, it will have two taboo words, and so on. "Taboo" words is done automatically by the system: once an image has been labeled enough times with the same word, that word becomes taboo so that the image will get a variety of different words as labels.

Occasionally, the game will be played solo, without a human partner, with the ESP Game itself acting as the opponent and delivering a series of pre-determined labels to the single human player (which have been harvested from labels given to the image during the course of earlier games played by real humans). This is necessary if there are an odd number of people playing the game. [3]

In late 2008, the game was rebranded as GWAP ("game with a purpose"), with a new user interface. Some other games that were also created by Luis von Ahn, such as "Peekaboom" and "Phetch", were discontinued at that point. This game has been used as an important example of Social Machine with a Purpose (teleological social machine), providing an example of an intelligent system emerging from the interaction of human participants in the book "The shortcut" by Nello Cristianini, [4] where the intelligence of social media platforms is discussed.

Cheating

Ahn has described countermeasures which prevent players from "cheating" the game, and introducing false data into the system. By giving players occasional test images for which common labels are known, it is possible to check that players are answering honestly, and a player's guesses are only stored if they successfully label the test images. [5]

Furthermore, a label is only stored after a certain number of players (N) have agreed on it. At this point, all of the taboo lists[ clarification needed ] for the images are deleted and the image is returned to the game pool as if it were a fresh image. If X is the probability of a label being incorrect despite a player having successfully labelled test images, then after N repetitions the probability of corruption is , assuming that end repetitions are independent of each other. [5]

Image selection

The choice of images used by the ESP game makes a difference in the player's experience. The game would be less entertaining if all the images were chosen from a single site and were all extremely similar.

The first run of the ESP game used a collection of 350,000 images chosen by the developers. Later versions selected images at random from the web, using a small amount of filtering. Such images are reintroduced into the game several times until they are fully labeled. [6] The random images were chosen using "Random Bounce Me", a website that selects a page at random from the Google database. "Random Bounce Me" was queried repeatedly, each time collecting all JPEG and GIF images in the random page, except for images that did not fit the criteria: blank images, images that consist of a single color, images that are smaller than 20 pixels on either dimension, and images with an aspect ratio greater than 4.5 or smaller than 1/4.5. This process was repeated until 350,000 images were collected. The images were then rescaled to fit the game's display. Fifteen different images from the 350,000 are chosen for each session of the game.

Related Research Articles

In the field of artificial intelligence, the most difficult problems are informally known as AI-complete or AI-hard, implying that the difficulty of these computational problems, assuming intelligence is computational, is equivalent to that of solving the central artificial intelligence problem—making computers as intelligent as people, or strong AI. To call a problem AI-complete reflects an attitude that it would not be solved by a simple specific algorithm.

<span class="mw-page-title-main">Natural language processing</span> Field of linguistics and computer science

Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data. The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves.

<span class="mw-page-title-main">Fictionary</span>

Fictionary, also known as The Dictionary Game or simply Dictionary, is a word game in which players guess the definition of an obscure word. Each round consists of one player selecting and announcing a word from the dictionary, and other players composing a fake definition for it. The definitions, as well as the correct definition, are collected blindly by the selector and read aloud, and players vote on which definition they believe to be correct. Points are awarded for correct guesses, and for having a fake definition guessed by another player.

Word-sense disambiguation (WSD) is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious/automatic but can often come to conscious attention when ambiguity impairs clarity of communication, given the pervasive polysemy in natural language. In computational linguistics, it is an open problem that affects other computer-related writing, such as discourse, improving relevance of search engines, anaphora resolution, coherence, and inference.

A CAPTCHA is a type of challenge–response test used in computing to determine whether the user is human.

<i>Taboo</i> (game) 1989 word guessing party game

Taboo is a word, guessing, and party game published by Parker Brothers in 1989. The objective of the game is for a player to have their partners guess the word on the player's card without using the word itself or five additional words listed on the card.

Automatic summarization is the process of shortening a set of data computationally, to create a subset that represents the most important or relevant information within the original content. Artificial intelligence algorithms are commonly developed and employed to achieve this, specialized for different types of data.

Human-based computation (HBC), human-assisted computation, ubiquitous human computing or distributed thinking is a computer science technique in which a machine performs its function by outsourcing certain steps to humans, usually as microwork. This approach uses differences in abilities and alternative costs between humans and computer agents to achieve symbiotic human–computer interaction. For computationally difficult tasks such as image recognition, human-based computation plays a central role in training Deep Learning-based Artificial Intelligence systems. In this case, human-based computation has been referred to as human-aided artificial intelligence.

<span class="mw-page-title-main">Luis von Ahn</span> Guatemalan entrepreneur and computer scientist

Luis von Ahn is a German-Guatemalan entrepreneur and a consulting professor in the Computer Science Department at Carnegie Mellon University in Pittsburgh, Pennsylvania. He is known as one of the pioneers of crowdsourcing. He is the founder of the company reCAPTCHA, which was sold to Google in 2009, and the co-founder and CEO of Duolingo.

<span class="mw-page-title-main">Google Image Labeler</span> Feature in Google Images

Google Image Labeler is a feature, in the form of a game, of Google Images that allows the user to label random images to help improve the quality of Google's image search results. It was online from 2006 to 2011 and relaunched in 2016.

reCAPTCHA CAPTCHA implementation owned by Google

reCAPTCHA is a CAPTCHA system owned by Google. It enables web hosts to distinguish between human and automated access to websites. The original version asked users to decipher hard to read text or match images. Version 2 also asked users to decipher text or match images if the analysis of cookies and canvas rendering suggested the page was being downloaded automatically. Since version 3, reCAPTCHA will never interrupt users and is intended to run automatically when users load pages or click buttons.

A human-based computation game or game with a purpose (GWAP) is a human-based computation technique of outsourcing steps within a computational process to humans in an entertaining way (gamification).

Page Hunt is a game developed by Bing for investigating human research behavior. It is a so-called "game with a purpose", as it pursues additional goals: not only to provide entertainment but also to harness human computation for some specific research task. The term "games with a purpose" was coined by Luis von Ahn, inventor of CAPTCHA, co-organizer of the reCAPTCHA project, and inventor of a famous ESP game.

Phetch is a game with a purpose intended to label images on the internet with descriptive captions suitable to assist sight impaired readers. Approximately 75% of the images on the web do not have proper ALT text labels, making them inaccessible through screen readers. The solution aimed at by Phetch is to label the images external to the web page rather than depending upon the web page author to create proper alt text for each image. Rather than paying people to do the mundane task of labeling images, Phetch aims to create a fun game that produces such descriptions as a side effect of having fun.

<span class="mw-page-title-main">Duolingo</span> American language-learning website and mobile app

Duolingo is an American educational technology company which produces learning apps and provides language certification.

Culturomics is a form of computational lexicology that studies human behavior and cultural trends through the quantitative analysis of digitized texts. Researchers data mine large digital archives to investigate cultural phenomena reflected in language and word usage. The term is an American neologism first described in a 2010 Science article called Quantitative Analysis of Culture Using Millions of Digitized Books, co-authored by Harvard researchers Jean-Baptiste Michel and Erez Lieberman Aiden.

<span class="mw-page-title-main">Word embedding</span> Method in natural language processing

In natural language processing (NLP), a word embedding is a representation of a word. The embedding is used in text analysis. Typically, the representation is a real-valued vector that encodes the meaning of the word in such a way that words that are closer in the vector space are expected to be similar in meaning. Word embeddings can be obtained using language modeling and feature learning techniques, where words or phrases from the vocabulary are mapped to vectors of real numbers.

<span class="mw-page-title-main">Word2vec</span> Models used to produce word embeddings

Word2vec is a technique for natural language processing (NLP) published in 2013. The word2vec algorithm uses a neural network model to learn word associations from a large corpus of text. Once trained, such a model can detect synonymous words or suggest additional words for a partial sentence. As the name implies, word2vec represents each distinct word with a particular list of numbers called a vector. The vectors are chosen carefully such that they capture the semantic and syntactic qualities of words; as such, a simple mathematical function can indicate the level of semantic similarity between the words represented by those vectors.

Crowdsource is a crowdsourcing platform developed by Google intended to improve a host of Google services through the user-facing training of different algorithms.

Bidirectional Encoder Representations from Transformers (BERT) is a family of language models introduced in 2018 by researchers at Google. A 2020 literature survey concluded that "in a little over a year, BERT has become a ubiquitous baseline in Natural Language Processing (NLP) experiments counting over 150 research publications analyzing and improving the model."

References

  1. "Solving the web's image problem". bbc. 2008-05-14. Retrieved 2008-12-14.
  2. "Rethinking the ESP Game". September 2009. p. 11.
  3. Google Tech Talk on Human Computation by creator Luis von Ahn
  4. Cristianini, Nello (2023). The shortcut: why intelligent machines do not think like us. Boca Raton. ISBN   978-1-003-33581-8. OCLC   1352480147.
  5. 1 2 Google Tech Talk on human computation by Luis von Ahn
  6. Luis von Ahn. "Human Computation". 2005