Amazon Mechanical Turk (MTurk) is a crowdsourcing website with which businesses can hire remotely located "crowdworkers" to perform discrete on-demand tasks that computers are currently unable to do as economically. It is operated under Amazon Web Services, and is owned by Amazon. [1] Employers, known as requesters, post jobs known as Human Intelligence Tasks (HITs), such as identifying specific content in an image or video, writing product descriptions, or answering survey questions. Workers, colloquially known as Turkers or crowdworkers, browse among existing jobs and complete them in exchange for a fee set by the requester. To place jobs, requesters use an open application programming interface (API), or the more limited MTurk Requester site. [2] As of April 2019 [update] , requesters could register from 49 approved countries. [3]
The service was conceived by Venky Harinarayan in a U.S. patent disclosure in 2001. [4] Amazon coined the term artificial artificial intelligence for processes that outsource some parts of a computer program to humans, for those tasks carried out much faster by humans than computers. It is claimed[ by whom? ] that Jeff Bezos was responsible for proposing the development of Amazon's Mechanical Turk to realize this process. [5]
The name Mechanical Turk was inspired by "The Turk", an 18th-century chess-playing automaton made by Wolfgang von Kempelen that toured Europe, and beat both Napoleon Bonaparte and Benjamin Franklin. It was later revealed that this "machine" was not an automaton, but a human chess master hidden in the cabinet beneath the board and controlling the movements of a humanoid dummy. Analogously, the Mechanical Turk online service uses remote human labor hidden behind a computer interface to help employers perform tasks that are not possible using a true machine.
MTurk launched publicly on November 2, 2005. Its user base grew quickly. In early- to mid-November 2005, there were tens of thousands of jobs, all uploaded to the system by Amazon itself for some of its internal tasks that required human intelligence. HIT types expanded to include transcribing, rating, image tagging, surveys, and writing.
In March 2007, there were reportedly more than 100,000 workers in over 100 countries. [6] This increased to over 500,000 registered workers from over 190 countries in January 2011. [7] That year, Techlist published an interactive map pinpointing the locations of 50,000 of their MTurk workers around the world. [8] By 2018, research demonstrated that while over 100,000 workers were available on the platform at any time, only around 2,000 were actively working. [9]
A user of Mechanical Turk can be either a "Worker" (contractor) or a "Requester" (employer). Workers have access to a dashboard that displays three sections: total earnings, HIT status, and HIT totals. Workers set their own hours and are not under any obligation to accept any particular task.
The examples and perspective in this article may not represent a worldwide view of the subject.(April 2023) |
Amazon classifies Workers as contractors rather than employees and does not pay payroll taxes. Classifying Workers as contractors allows Amazon to avoid things like minimum wage, overtime, and workers compensation—this is a common practice among "gig economy" platforms. Workers are legally required to report their income as self-employment income.
In 2013, the average wage for the multiple microtasks assigned, if performed quickly, was about one dollar an hour, with each task averaging a few cents. [10] However, calculating people's average hourly earnings on a microtask site is extremely difficult and several sources of data show average hourly earnings in the $5–$9 per hour [11] [12] [13] [14] range among a substantial number of Workers, while the most experienced, active, and proficient workers may earn over $20 per hour. [15]
Workers can have a postal address anywhere in the world. Payment for completing tasks can be redeemed on Amazon.com via gift certificate (gift certificates are the only payment option available to international workers, apart from India) or can be transferred to a Worker's U.S. bank account.
Requesters can ask that Workers fulfill qualifications before engaging in a task, and they can establish a test designed to verify the qualification. They can also accept or reject the result sent by the Worker, which affects the Worker's reputation. As of April 2019 [update] , Requesters paid Amazon a minimum 20% commission on the price of successfully completed jobs, with increased amounts for additional services[ clarification needed ]. [6] Requesters can use the Amazon Mechanical Turk API to programmatically integrate the results of the work directly into their business processes and systems. When employers set up a job, they must specify
as well as the specific details about the job they want to be completed.
Workers have been primarily located in the United States since the platform's inception [16] with demographics generally similar to the overall Internet population in the U.S. [17] Within the U.S. workers are fairly evenly spread across states, proportional to each state’s share of the U.S. population. [18] As of 2019 [update] , between 15 and 30 thousand people in the U.S. complete at least one HIT each month and about 4,500 new people join MTurk each month. [19]
Cash payments for Indian workers were introduced in 2010, which updated the demographics of workers, who however remained primarily within the United States. [20] A website showing worker demographics in May 2015 showed that 80% of workers were located in the United States, with the remaining 20% located elsewhere in the world, most of whom were in India. [21] In May 2019, approximately 60% were in the U.S., 40% elsewhere (approximately 30% in India). [22] In early 2023 about 90% of workers were from the U.S. and about half of the remainder from India. [23]
Since 2010 [update] , numerous researchers have explored the viability of Mechanical Turk to recruit subjects for social science experiments. Researchers have generally found that while samples of respondents obtained through Mechanical Turk do not perfectly match all relevant characteristics of the U.S. population, they are also not wildly misrepresentative. [24] [25] As a result, thousands of papers that rely on data collected from Mechanical Turk workers are published each year, including hundreds in top ranked academic journals.
A challenge with using MTurk for human-subject research has been maintaining data quality. A study published in 2021 found that the types of quality control approaches used by researchers (such as checking for bots, VPN users, or workers willing to submit dishonest responses) can meaningfully influence survey results. They demonstrated this via impact on three common behavioral/mental healthcare screening tools. [26] Even though managing data quality requires work from researchers, there is a large body of research showing how to gather high quality data from MTurk. [27] The cost of using MTurk is considerably lower than many other means of conducting surveys, so many researchers continue to use it.
The general consensus among researchers is that the service works best for recruiting a diverse sample; it is less successful with studies that require more precisely defined populations or that require a representative sample of the population as a whole. [28] Many papers have been published on the demographics of the MTurk population. [18] [29] [30] MTurk workers tend to be younger, more educated, more liberal, and slightly less wealthy than the U.S. population overall. [31]
Supervised Machine Learning algorithms require large amounts of human-annotated data to be trained successfully. Machine learning researchers have hired Workers through Mechanical Turk to produce datasets such as SQuAD, a question answering dataset. [32]
Since 2007 [update] , the service has been used to search for prominent missing individuals. This use was first suggested during the search for James Kim, but his body was found before any technical progress was made. That summer, computer scientist Jim Gray disappeared on his yacht and Amazon's Werner Vogels, a personal friend, made arrangements for DigitalGlobe, which provides satellite data for Google Maps and Google Earth, to put recent photography of the Farallon Islands on Mechanical Turk. A front-page story on Digg attracted 12,000 searchers who worked with imaging professionals on the same data. The search was unsuccessful. [33]
In September 2007, a similar arrangement was repeated in the search for aviator Steve Fossett. Satellite data was divided into 85-square-metre (910 sq ft) sections, and Mechanical Turk users were asked to flag images with "foreign objects" that might be a crash site or other evidence that should be examined more closely. [34] This search was also unsuccessful. The satellite imagery was mostly within a 50-mile radius, [35] but the crash site was eventually found by hikers about a year later, 65 miles away. [36]
MTurk has also been used as a tool for artistic creation. One of the first artists to work with Mechanical Turk was xtine burrough, with The Mechanical Olympics (2008), [37] [38] Endless Om (2015), and Mediations on Digital Labor (2015). [39] Another work was artist Aaron Koblin's Ten Thousand Cents (2008).[ further explanation needed ]
Programmers have developed browser extensions and scripts designed to simplify the process of completing jobs. Amazon has stated that they disapprove of scripts that completely automate the process and preclude the human element. This is because of the concern that the task completion process—e.g. answering a survey—could be gamed with random responses, and the resultant collected data could be worthless. [40] Accounts using so-called automated bots have been banned. There are services that extend the capabilities to MTurk.[ clarification needed ]
Amazon makes available an application programming interface (API) for the MTurk system. The MTurk API lets a programmer submit jobs, retrieve completed work, and approve or reject that work. [41] In 2017, Amazon launched support for AWS Software Development Kits (SDK), allowing for nine new SDKs available to MTurk Users.[ importance? ] MTurk is accessible via API from the following languages: Python, JavaScript, Java, .NET, Go, Ruby, PHP, or C++. [42] Web sites and web services can use the API to integrate MTurk work into other web applications, providing users with alternatives to the interface Amazon has built for these functions.
Amazon Mechanical Turk provides a platform for processing images, a task well-suited to human intelligence. Requesters have created tasks that ask workers to label objects found in an image, select the most relevant picture in a group of pictures, screen inappropriate content, classify objects in satellite images, or digitize text from images such as scanned forms filled out by hand. [43]
Companies with large online catalogues use Mechanical Turk to identify duplicates and verify details of item entries. For example: removing duplicates in yellow pages directory listings, checking restaurant details (e.g. phone number and hours), and finding contact information from web pages (e.g. author name and email). [10] [43]
Diversification and scale of personnel of Mechanical Turk allow collecting information at a large scale, which would be difficult outside of a crowd platform. Mechanical Turk allows Requesters to amass a large number of responses to various types of surveys, from basic demographics to academic research. Other uses include writing comments, descriptions, and blog entries to websites and searching data elements or specific fields in large government and legal documents. [43]
Companies use Mechanical Turk's crowd labor to understand and respond to different types of data. Common uses include editing and transcription of podcasts, translation, and matching search engine results. [10] [43]
The validity of research conducted with the Mechanical Turk worker pool has long been debated among experts. [44] This is largely because questions of validity [45] are complex: they involve not only questions of whether the research methods were appropriate and whether the study was well-executed, but also questions about the goal of the project, how the researchers used MTurk, who was sampled, and what conclusions were drawn.
Most experts agree that MTurk is better suited for some types of research than others. MTurk appears well-suited for questions that seek to understand whether two or more things are related to each other (called correlational research; e.g., are happy people more healthy?) and questions that attempt to show one thing causes another thing (experimental research; e.g., being happy makes people more healthy). Fortunately, these categories capture most of the research conducted by behavioral scientists, and most correlational and experimental findings found in nationally representative samples replicate on MTurk. [46]
The type of research that is not well-suited for MTurk is often called "descriptive research." Descriptive research seeks to describe how or what people think, feel, or do; one example is public opinion polling. MTurk is not well-suited to such research because it does not select a representative sample of the general population. Instead, MTurk is a nonprobability,[ jargon ] convenience sample. Descriptive research is best conducted with a probability-based, representative sample of the population researchers want to understand. When compared to the general population, people on MTurk are younger, more highly educated, more liberal, and less religious. [47] [18] [30]
Mechanical Turk has been criticized by journalists and activists for its interactions with and use of labor. Computer scientist Jaron Lanier noted how the design of Mechanical Turk "allows you to think of the people as software components" in a way that conjures "a sense of magic, as if you can just pluck results out of the cloud at an incredibly low cost". [48] A similar point is made in the book Ghostwork by Mary L. Gray and Siddharth Suri. [49] [ importance? ]
Critics of MTurk argue that workers are forced onto the site by precarious economic conditions and then exploited by requesters with low wages and a lack of power when disputes occur. Journalist Alana Semuels’s article "The Internet Is Enabling a New Kind of Poorly Paid Hell" in The Atlantic is typical of such criticisms of MTurk. [50]
Some[ who? ] academic papers have obtained findings that support or serve as the basis for such common criticisms, [51] but others contradict them. [52] A recent academic commentary argued that study participants on sites like MTurk should be clearly warned about the circumstances in which they might later be denied payment as a matter of ethics, [53] even though such statements may not reduce the rate of careless responding. [54]
A paper published by a team at CloudResearch [14] shows that only about 7% of people on MTurk view completing HITs as something akin to a full-time job. Most people report that MTurk is a way to earn money during their leisure time or as a side gig. In 2019, the typical worker spent five to eight hours per week and earned around $7 per hour. The sampled workers did not report rampant[ clarification needed ] mistreatment at the hands of requesters; they reported trusting requesters more than employers outside of MTurk. Similar findings were presented in a review of MTurk by the Fair Crowd Work organization, a collective of crowd workers and unions. [55] [ unreliable source? ]
The minimum payment that Amazon allows for a task is one cent. Because tasks are typically simple and repetitive the majority of tasks pay only a few cents, [56] but there are also well-paying tasks on the site.
Many criticisms of MTurk stem from the fact that a majority of tasks offer low wages. In addition, workers are considered independent contractors rather than employees. Independent contractors are not protected by the Fair Labor Standards Act or other legislation that protects workers’ rights.[ United States-centric ] Workers on MTurk must compete with others for good HIT opportunities as well as spend time searching for tasks and other actions that they are not compensated for.
The low payment offered for many tasks has fueled criticism of Mechanical Turk for exploiting and not compensating workers for the true value of the task they complete. [57] One study of 3.8 million tasks completed by 2,767 workers showed that "workers earned a median hourly wage of about $2 an hour" with 4% of workers earning more than $7.25 per hour. [58]
The Pew Research Center and the International Labour Office published data indicating people made around $5.00 per hour in 2015. [12] [59] A study focused on workers in the U.S. indicated average wages of at least $5.70 an hour, [60] and data from the CloudResearch study found average wages of about $6.61 per hour. [14] Some evidence suggests that very active and experienced people can earn $20 per hour or more. [61]
The Nation magazine reported in 2014 that some Requesters had taken advantage of Workers by having them do the tasks, then rejecting their submissions in order to avoid paying them. [62] Available data indicates that rejections are fairly rare. Workers report having a small minority of their HITs rejected, perhaps as low as 1%. [14]
In the Facebook–Cambridge Analytica data scandal, Mechanical Turk was one of the means of covertly gathering private information for a massive database. [63] The system paid people a dollar or two to install a Facebook-connected app and answer personal questions. The survey task, as a work for hire, was not used for a demographic or psychological research project as it might have seemed. The purpose was instead to bait the worker to reveal personal information about the worker's identity that was not already collected by Facebook or Mechanical Turk.
Others have criticized that the marketplace does not allow workers to negotiate with employers. In response to criticisms of payment evasion and lack of representation, a group developed a third-party platform called Turkopticon which allows workers to give feedback on their employers. This allows workers to avoid potentially unscrupulous jobs and to recommend superior employers. [64] [65] Another platform called Dynamo allows workers to collect[ clarification needed ] anonymously and organize campaigns to better their work environment, such as the Guidelines for Academic Requesters and the Dear Jeff Bezos Campaign. [66] [67] [68] [69] Amazon made it harder for workers to enroll in Dynamo by closing the request account that provided workers with a required code for Dynamo membership. Workers created third-party plugins to identify higher paying tasks, but Amazon updated its website to prevent these plugins from working. [70] Workers have complained that Amazon's payment system will on occasion stop working. [70]
Mechanical Turk is comparable in some respects to the now discontinued Google Answers service. However, the Mechanical Turk is a more general marketplace that can potentially help distribute any kind of work tasks all over the world. The Collaborative Human Interpreter (CHI) by Philipp Lenssen also suggested using distributed human intelligence to help computer programs perform tasks that computers cannot do well. MTurk could be used as the execution engine for the CHI.[ citation needed ]
In 2014 the Russian search giant Yandex launched a similar system called Toloka that is similar to the Mechanical Turk. [71]
Social psychology is the scientific study of how thoughts, feelings, and behaviors are influenced by the actual, imagined, or implied presence of others. Social psychologists typically explain human behavior as a result of the relationship between mental states and social situations, studying the social conditions under which thoughts, feelings, and behaviors occur, and how these variables influence social interactions.
Survey methodology is "the study of survey methods". As a field of applied statistics concentrating on human-research surveys, survey methodology studies the sampling of individual units from a population and associated techniques of survey data collection, such as questionnaire construction and methods for improving the number and accuracy of responses to surveys. Survey methodology targets instruments or procedures that ask one or more questions that may or may not be answered.
The Hawthorne effect is a type of human behavior reactivity in which individuals modify an aspect of their behavior in response to their awareness of being observed. The effect was discovered in the context of research conducted at the Hawthorne Western Electric plant; however, some scholars think the descriptions are fictitious.
Freelance, freelancer, or freelance worker, are terms commonly used for a person who is self-employed and not necessarily committed to a particular employer long-term. Freelance workers are sometimes represented by a company or a temporary agency that resells freelance labor to clients; others work independently or use professional associations or websites to get work.
The overjustification effect occurs when an expected external incentive such as money or prizes decreases a person's intrinsic motivation to perform a task. Overjustification is an explanation for the phenomenon known as motivational "crowding out". The overall effect of offering a reward for a previously unrewarded activity is a shift to extrinsic motivation and the undermining of pre-existing intrinsic motivation. Once rewards are no longer offered, interest in the activity is lost; prior intrinsic motivation does not return, and extrinsic rewards must be continuously offered as motivation to sustain the activity.
Human-based computation (HBC), human-assisted computation, ubiquitous human computing or distributed thinking is a computer science technique in which a machine performs its function by outsourcing certain steps to humans, usually as microwork. This approach uses differences in abilities and alternative costs between humans and computer agents to achieve symbiotic human–computer interaction. For computationally difficult tasks such as image recognition, human-based computation plays a central role in training Deep Learning-based Artificial Intelligence systems. In this case, human-based computation has been referred to as human-aided artificial intelligence.
Crowdsourcing involves a large group of dispersed participants contributing or producing goods or services—including ideas, votes, micro-tasks, and finances—for payment or as volunteers. Contemporary crowdsourcing often involves digital platforms to attract and divide work between participants to achieve a cumulative result. Crowdsourcing is not limited to online activity, however, and there are various historical examples of crowdsourcing. The word crowdsourcing is a portmanteau of "crowd" and "outsourcing". In contrast to outsourcing, crowdsourcing usually involves less specific and more public groups of participants.
Workplace wellness, also known as corporate wellbeing outside the United States, is a broad term used to describe activities, programs, and/or organizational policies designed to support healthy behavior in the workplace. This often involves health education, medical screenings, weight management programs, and onsite fitness programs or facilities. It can also include flex-time for exercise, providing onsite kitchen and eating areas, offering healthy food options in vending machines, holding "walk and talk" meetings, and offering financial and other incentives for participation.
Crowdreviewing is the practice of gathering opinion or feedback from a large number of people, typically via the internet or an online community; a portmanteau of "crowd" and "reviews". Crowdreviewing is also often viewed as a form of crowd voting which occurs when a website gathers a large group's opinions and judgment. The concept is based on the principles of crowdsourcing and lets users submit online reviews to participate in building online metrics that measure performance. By harnessing social collaboration in the form of feedback individuals are generally able to form a more informed opinion.
An online panel is a group of selected research participants who have agreed to provide information at specified intervals over an extended period of time.
In research of human subjects, a survey is a list of questions aimed for extracting specific data from a particular group of people. Surveys may be conducted by phone, mail, via the internet, and also in person in public spaces. Surveys are used to gather or gain knowledge in fields such as social research and demography.
Figure Eight was a human-in-the-loop machine learning and artificial intelligence company based in San Francisco.
Microwork is a series of many small tasks which together comprise a large unified project, and it is completed by many people over the Internet. Microwork is considered the smallest unit of work in a virtual assembly line. It is most often used to describe tasks for which no efficient algorithm has been devised, and require human intelligence to complete reliably. The term was developed in 2008 by Leila Chirayath Janah of Samasource.
Panagiotis G. Ipeirotis is a professor and George A. Kellner Faculty Fellow at the Department of Technology, Operations, and Statistics at Leonard N. Stern School of Business of New York University.
Ergonomics, also known as human factors or human factors engineering (HFE), is the application of psychological and physiological principles to the engineering and design of products, processes, and systems. Primary goals of human factors engineering are to reduce human error, increase productivity and system availability, and enhance safety, health and comfort with a specific focus on the interaction between the human and equipment.
The International Affective Picture System (IAPS) is a database of pictures designed to provide a standardized set of pictures for studying emotion and attention that has been widely used in psychological research. The IAPS was developed by the National Institute of Mental Health Center for Emotion and Attention at the University of Florida. In 2005, the IAPS comprised 956 color photographs ranging from everyday objects and scenes − such as household furniture and landscapes − to extremely rare or exciting scenes − such as mutilated bodies and erotic nudes.
Labeled data is a group of samples that have been tagged with one or more labels. Labeling typically takes a set of unlabeled data and augments each piece of it with informative tags. For example, a data label might indicate whether a photo contains a horse or a cow, which words were uttered in an audio recording, what type of action is being performed in a video, what the topic of a news article is, what the overall sentiment of a tweet is, or whether a dot in an X-ray is a tumor.
Crowdsourced science refers to collaborative contributions of a large group of people to the different steps of the research process in science. In psychology, the nature and scope of the collaborations can vary in their application and in the benefits it offers.
Lilly Christine Irani is an Iranian-American academic whose research spans topics in computer science, communication studies, feminist studies, entrepreneurship, and microwork. She is an associate professor in the Department of Communication at the University of California, San Diego.
{{cite journal}}
: Cite journal requires |journal=
(help){{cite book}}
: CS1 maint: location missing publisher (link){{cite journal}}
: Cite journal requires |journal=
(help){{cite journal}}
: CS1 maint: DOI inactive as of November 2024 (link)