Mark Zuckerberg Used “Pirated” Books to Train Meta’s AI Model

January 10, 20256701 views0

Mark Zuckerberg Leveraged “Pirated” Books To Train AI Model

In the latest court filing, a group of authors accused Meta’s CEO, Mark Zuckerberg of approving the use of pirated” versions of copyright-protected books to train the company’s artificial intelligence models, despite internal concerns flagged by his executive team.

The filing ignites a fresh controversy for Mark Zuckerberg amid the ongoing debate around the ethics of AI training datasets. The legal challenge claims the use of the LibGen dataset (short form of Library Genesis), which is a “shadow library” known for hosting millions of pirated books and articles.

Controversial AI Models: A New Challenger for Mark Zuckerberg

Authors including Sarah Silverman and Ta-Nehisi Coates argued that Meta’s reliance on such shadow libraries violates copyright laws and jeopardizes the livelihoods of creative professionals.

(Source: Court Filing)

According to the filing, internal communications within Meta reveal that Zuckerberg gave the green light for the use of the controversial dataset. The memo, referencing his initials, stated that the decision was escalated to “MZ” before being approved. This approval reportedly came despite warnings from Meta’s AI team about the potential risks of utilizing a dataset “we know to be pirated.”

The lawsuit argues that this decision reflects Meta’s credibility in negotiations with regulators. It also raised concerns over the training of large language models (LLMs) such as LLaMA, which is the foundation of Meta’s AI-powered chatbots.

Internal Meta communications included in the court documents reveal engineers expressing discomfort about accessing LibGen data. In the leaked conversations, the engineer stated that downloading torrents from a corporate laptop “doesn’t feel right”.

The LibGen dataset has been a controversial topic in the publishing world. The platform’s anonymous operators have been stuck in multiple legal battles. The New York federal court previously ordered them to $30 million in damages to a group of publishers for copyright violation.

Also Read: Consumers Demand Generative AI in Shopping, Finds Capgemini Report

Rajpalsinh Parmar

Rajpalsinh has been decoding the AI universe for three years, turning tech jargon into tales of wonder and possibility. With a knack for making the abstract tangible, he brings AI's potential to life for everyone.