Tech and Telecom

Nvidia Caught Stealing Pirated Books for AI Training

NVIDIA executives allegedly authorized the use of millions of pirated books to train the company’s artificial intelligence models, according to an expanded class-action lawsuit cited in a report by TorrentFreak.

The amended complaint, filed last Friday by several authors, claims NVIDIA directly contacted the shadow library Anna’s Archive to obtain high-speed access to its collection of pirated books. The lawsuit significantly broadens earlier claims by adding more authors, books, AI models, and alleged sources of copyrighted material.

Internal Documents Cited

The authors, including Abdi Nazemian, cite internal NVIDIA emails and documents that they say show the company knowingly downloaded large volumes of copyrighted books. The complaint alleges that competitive pressure in the AI sector led NVIDIA to pursue pirated materials as training data.

Ad Powered By Advergic
Loading ad . . .
Ad - Continue scrolling to read

According to the filing, a member of NVIDIA’s data strategy team contacted Anna’s Archive to explore what the library could offer for large-scale AI training. The complaint states that NVIDIA was “desperate for books” and sought to include Anna’s Archive content in the pre-training data for its large language models.

Alleged Warnings and Approval Process

The complaint claims Anna’s Archive informed NVIDIA that its collections were illegally obtained and maintained. The shadow library reportedly asked whether NVIDIA executives had internal authorization to proceed, citing previous instances where time had been wasted on discussions with other AI companies.

According to the lawsuit, NVIDIA management allegedly approved within a week of the initial contact, despite being warned of the illegal nature of the materials. Anna’s Archive then provided NVIDIA with access to millions of pirated books.

The complaint alleges that Anna’s Archive offered access to approximately 500 terabytes of data, including millions of books typically available only through the Internet Archive’s digital lending system, which has itself faced legal challenges. The filing does not state whether NVIDIA ultimately paid Anna’s Archive for the access.

Allegations Extend Beyond Anna’s Archive

The amended complaint also accuses NVIDIA of using additional pirated sources. Beyond the previously cited Books3 dataset, the authors allege that NVIDIA downloaded copyrighted material from LibGen, Sci-Hub, and Z-Library.

The lawsuit further claims that NVIDIA distributed scripts and tools to corporate customers that enabled automated downloads of “The Pile,” a dataset that includes the Books3 collection of pirated books. According to the authors, this allowed third parties to access copyrighted materials as part of AI development workflows.

Share
Published by
Afaq Wajdan Malik