The pursuit of building intelligent, superhuman machines is nothing new. One Jewish folktale from the early 1900s describes the creation of a golem, an inanimate humanoid, imbued with life by Rabbi Loew in Prague, to protect the local Jews from anti-Semitic attacks.
Article continues below
The story’s consequences are predictable: the golem runs amok and is ultimately undone by its creator. This tale is resonant of Mary Shelley’s Frankenstein, the modern-day tale that helped birth the science-fiction genre, and of the AI discourse in recent news cycles, which is growing ever more preoccupied with the dangers of rogue AI.
Today, real-world AI is less autonomous and more an assistive technology. Since about 2009, a boom in technical advancements has been fueled by the voluminous data generated from our intensive use of connected devices and the internet, as well as the growing power of silicon chips. In particular, this has led to the rise of a subtype of AI known as machine learning, and its descendent deep learning, methods of teaching computer software to spot statistical correlations in enormous pools of data—be they words, images, code or numbers.
Data workers are as precarious as factory workers, their labor is largely ghost work and they remain an undervalued bedrock of the AI industry.
One way to spot patterns is to show AI models millions of labelled examples. This method requires humans to painstakingly label all this data so they can be analyzed by computers. Without them, the algorithms that underpin self-driving cars or facial recognition remain blind. They cannot learn patterns.
The algorithms built in this way now augment or stand in for human judgement in areas as varied as medicine, criminal justice, social welfare and mortgage and loan decisions. Generative AI, the latest iteration of AI software, can create words, code and images. This has transformed them into creative assistants, helping teachers, financial advisers, lawyers, artists and programmers to co-create original works.
To build AI, Silicon Valley’s most illustrious companies are fighting over the limited talent of computer scientists in their backyard, paying hundreds of thousands of dollars to a newly minted Ph.D. But to train and deploy them using real-world data, these same companies have turned to the likes of Sama, and their veritable armies of low-wage workers with basic digital literacy, but no stable employment.
Sama isn’t the only service of its kind globally. Start-ups such as Scale AI, Appen, Hive Micro, iMerit and Mighty AI (now owned by Uber), and more traditional IT companies such as Accenture and Wipro are all part of this growing industry estimated to be worth $17bn by 2030.
Because of the sheer volume of data that AI companies need to be labelled, most start-ups outsource their services to lower-income countries where hundreds of workers like Ian and Benja are paid to sift and interpret data that trains AI systems.
Displaced Syrian doctors train medical software that helps diagnose prostate cancer in Britain. Out-of-work college graduates in recession-hit Venezuela categorize fashion products for e-commerce sites. Impoverished women in Kolkata’s Metiabruz, a poor Muslim neighborhood, have labelled voice clips for Amazon’s Echo speaker. Their work couches a badly kept secret about so-called artificial intelligence systems—that the technology does not ‘learn’ independently, and it needs humans, millions of them, to power it. Data workers are the invaluable human links in the global AI supply chain.
This workforce is largely fragmented, and made up of the most precarious workers in society: disadvantaged youth, women with dependents, minorities, migrants and refugees. The stated goal of AI companies and the outsourcers they work with is to include these communities in the digital revolution, giving them stable and ethical employment despite their precarity. Yet, as I came to discover, data workers are as precarious as factory workers, their labor is largely ghost work and they remain an undervalued bedrock of the AI industry.
As this community emerges from the shadows, journalists and academics are beginning to understand how these globally dispersed workers impact our daily lives: the wildly popular content generated by AI chatbots like ChatGPT, the content we scroll through on TikTok, Instagram and YouTube, the items we browse when shopping online, the vehicles we drive, even the food we eat, it’s all sorted, labeled and categorized with the help of data workers.
Milagros Miceli, an Argentinian researcher based in Berlin, studies the ethnography of data work in the developing world. When she started out, she couldn’t find anything about the lived experience of AI laborers, nothing about who these people actually were and what their work was like. “As a sociologist, I felt it was a big gap,” she says. “There are few who are putting a face to those people: who are they and how do they do their jobs, what do their work practices involve? And what are the labor conditions that they are subject to?”
The arrangements of a company like Sama—low wages, secrecy, extraction of labor from vulnerable communities—is veered towards inequality.
Miceli was right—it was hard to find a company that would allow me access to its data laborers with minimal interference. Secrecy is often written into their contracts in the form of non-disclosure agreements that forbid direct contact with clients and public disclosure of clients’ names. This is usually imposed by clients rather than the outsourcing companies. For instance, Facebook-owner Meta, who is a client of Sama, asks workers to sign a non-disclosure agreement. Often, workers may not even know who their client is, what type of algorithmic system they are working on, or what their counterparts in other parts of the world are paid for the same job.
The arrangements of a company like Sama—low wages, secrecy, extraction of labor from vulnerable communities—is veered towards inequality. After all, this is ultimately affordable labor. Providing employment to minorities and slum youth may be empowering and uplifting to a point, but these workers are also comparatively inexpensive, with almost no relative bargaining power, leverage or resources to rebel.
Even the objective of data-labelling work felt extractive: it trains AI systems, which will eventually replace the very humans doing the training. But of the dozens of workers I spoke to over the course of two years, not one was aware of the implications of training their replacements, that they were being paid to hasten their own obsolescence.
“These people are so dependent on these jobs, that they become obedient to whatever the client says. They are prepared not to think about whether what they’re doing makes sense, or is ethically questionable, but trained to think simply of what the client may want,” Miceli told me. AI development is a booming business, and companies in the data-labelling industry are competing to be as inexpensive as possible, providing labor to massive corporations and flush start-ups for a few pennies per task.
“It needs to be said—the technology industry is growing and benefiting from this cheap labor.”
__________________________________
Excerpted from Code Dependent: Living in the Shadow of AI by Madhumita Murgia. Published by Henry Holt and Company, an imprint of Macmillan, Inc. Copyright © Madhumita Murgia 2024. All rights reserved.