Tag: AI training dataset

  • Three Authors File Lawsuit Against Nvidia in Unusual AI Copyright Dispute

    Three Authors File Lawsuit Against Nvidia in Unusual AI Copyright Dispute

    Three writers have initiated legal action against chipmaker Nvidia, disrupting the realm of artificial intelligence (AI) by alleging that their copyrighted works were utilized without permission in Nvidia's AI platform, NeMo. Brian Keene, Abdi Nazemian, and Stewart O’Nan have raised concerns about their books being part of a dataset containing nearly 200,000 books that NeMo used for language generation training purposes.

    Lawsuit Highlights

    The authors claim that Nvidia's removal of the dataset in October 2023 following copyright infringement assertions is an implicit admission of wrongdoing. This incident sets a precedent that might lead to similar legal battles emerging as AI technologies become increasingly prevalent.

    In their class-action lawsuit filed in a San Francisco federal court, the trio seeks unspecified damages on behalf of US authors whose copyrighted material may have contributed to the training of NeMo's language models over the past three years. Specific works such as Keene’s “Ghost Walk” (2008), Nazemian’s “Like a Love Story” (2019), and O’Nan’s “Last Night at the Lobster” (2007) were highlighted in the lawsuit as examples of allegedly misappropriated content.

    Industry Implications

    This legal dispute places Nvidia among a growing list of companies confronting legal challenges from content creators and major media entities like the New York Times. The core issue revolves around generative AI technology, which can generate new content by learning from existing text, images, and audio sources.

    Nvidia has refrained from commenting on the matter as of the most recent reports, while the authors' legal representatives have yet to provide additional details in response to inquiries.

  • AI Training Dataset Sparks Copyright Infringement Claims Against Meta

    AI Training Dataset Sparks Copyright Infringement Claims Against Meta

    Meta Platforms, the parent company of social media giants Facebook and Instagram, is facing a consolidated lawsuit from notable authors including Sarah Silverman and Michael Chabon. The lawsuit accuses Meta of copyright infringement, specifically in relation to the unauthorized use of thousands of copyrighted books to train its artificial intelligence language model, Llama.

    The crux of the accusation against Meta Platforms lies in its alleged use of copyrighted books without proper authorization. The company’s AI language model, Llama, was trained using this contentious dataset, which has led to the consolidated lawsuit from prominent authors.

    Despite receiving stern warnings from Meta’s legal team about the potential legal risks associated with using pirated books for AI training, the company reportedly went ahead with the dataset. This decision has further complicated the legal situation surrounding the copyright infringement allegations.

    Evidence from Chat Logs

    Adding to the complexity of the case, evidence in the form of chat logs has emerged. These chat logs feature discussions between Meta-affiliated researcher Tim Dettmers and others, where the procurement of the dataset is mentioned. These conversations took place in a Discord server, shedding light on the process of acquiring the copyrighted books for training the AI language model.