AI Training Dataset Sparks Copyright Infringement Claims Against Meta

AI Training Dataset Sparks Copyright Infringement Claims Against Meta

Meta Platforms, the parent company of social media giants Facebook and Instagram, is facing a consolidated lawsuit from notable authors including Sarah Silverman and Michael Chabon. The lawsuit accuses Meta of copyright infringement, specifically in relation to the unauthorized use of thousands of copyrighted books to train its artificial intelligence language model, Llama.

The crux of the accusation against Meta Platforms lies in its alleged use of copyrighted books without proper authorization. The company’s AI language model, Llama, was trained using this contentious dataset, which has led to the consolidated lawsuit from prominent authors.

Despite receiving stern warnings from Meta’s legal team about the potential legal risks associated with using pirated books for AI training, the company reportedly went ahead with the dataset. This decision has further complicated the legal situation surrounding the copyright infringement allegations.

Evidence from Chat Logs

Adding to the complexity of the case, evidence in the form of chat logs has emerged. These chat logs feature discussions between Meta-affiliated researcher Tim Dettmers and others, where the procurement of the dataset is mentioned. These conversations took place in a Discord server, shedding light on the process of acquiring the copyrighted books for training the AI language model.

Scroll to Top