Appen declined to give an attributable comment.
“If we suspect a user has violated the User Agreement, Toloka will perform an identity check and request a photo ID and a photo of the user holding the ID,” Geo Dzhikaev, head of Toloka operations, says.
Driven by a global rush into AI, the global data labeling and collection industry is expected to grow to over $17.1 billion by 2030, according to Grand View Research, a market research and consulting company. Crowdsourcing platforms such as Toloka, Appen, Clickworker, Teemwork.AI, and OneForma connect millions of remote gig workers in the global south to tech companies located in Silicon Valley. Platforms post micro-tasks from their tech clients, which have included Amazon, Microsoft Azure, Salesforce, Google, Nvidia, Boeing, and Adobe. Many platforms also partner with Microsoft’s own data services platform, the Universal Human Relevance System (UHRS).
These workers are predominantly based in East Africa, Venezuela, Pakistan, India, and the Philippines—though there are even workers in refugee camps, who label, evaluate, and generate data. Workers are paid per task, with remuneration ranging from a cent to a few dollars—although the upper end is considered something of a rare gem, workers say. “The nature of the work often feels like digital servitude—but it’s a necessity for earning a livelihood,” says Hassan, who also now works for Clickworker and Appen.
Sometimes, workers are asked to upload audio, images, and videos, which contribute to the data sets used to train AI. Workers typically don’t know exactly how their submissions will be processed, but these can be pretty personal: On Clickworker’s worker jobs tab, one task states: “Show us you baby/child! Help to teach AI by taking 5 photos of your baby/child!” for €2 ($2.15). The next says: “Let your minor (aged 13-17) take part in an interesting selfie project!”
Some tasks involve content moderation—helping AI distinguish between innocent content and that which contains violence, hate speech, or adult imagery. Hassan shared screen recordings of tasks available the day he spoke with WIRED. One UHRS task asked him to identify “fuck,” “c**t,” “dick,” and “bitch” from a body of text. For Toloka, he was shown pages upon pages of partially naked bodies, including sexualized images, lingerie ads, an exposed sculpture, and even a nude body from a Renaissance-style painting. The task? Decipher the adult from the benign, to help the algorithm distinguish between salacious and permissible torsos.