Do IP Rights Matter to AI? Keep an Eye on NY Times v. Microsoft and 23 Other Future-Defining AI Disputes

Legal battles currently being fought over aspects of generative artificial intelligence are an indication of the complexity and uncertainty that businesses and society are facing.

For AI the future is now. Taking your eye off of the legal ball is not a good idea.

Suits involve not only content “scraping,” broad collection of copyrighted content and data to train platforms like ChatGPT, but such topics as rights to an a person’s image or voice or the manipulation of a copyrighted image. Fake images are an issue, too – and they are getting mighty hard to detect without professional tools and training.

User prompts also serve to train highly valued AI platforms, even though few people and most businesses are aware of it.

AI has thrown a spotlight on creating and inventing with AI, both generative and machine learning. E.g., How much does a human have to play a part in a new invention to make it original and a patent valid? Same is true for a musical composition, image or sound recording.

Publishers, authors, image creators and others are filing lawsuits against generative AI companies for using their data to train AI systems without their permission or compensation. Do we need a new definition of “fair use” or do we simply need companies, especially large, solvent ones, to acknowledge they require huge amounts of content, and should be paying something to use it.

Once AI content licenses become more routine, and they will, (see the six different types of music licenses), don’t expect that tracking invention usage will be too far behind. The technology to monitor the use of patents, copyrights and trademarks already exists. Few potential licensees are inclined to use it. Inventors, content creation workers, SMEs and investors, as well as many in the general public, have been sensitized to the reality that their content, decisions and data are in play, and they have financial value.

Unlike the Internet in the 1990s, the newly infringed are likely to be smarter about what gets used, by whom and for how much.

Flurry of Disputes

The rise of advanced generative AI has spawned a flurry of copyright litigations. BakerHostetler’s case tracker is very useful to get a lay of the legal/business land, identify who are the likely plaintiffs and defendants, and what it might mean for research as well as investment.

Almost all involve major content distributors or publishers – The New York Times, Concord Music Group, Daily News, Getty Images, Thomson Reuters. Many involve large tech companies who are heavily invested in AI – OpenAI, Anthropic, Microsoft, GitHub, Meta, StabilityAI, Google, Nvidia.

BH monitors these cases in near real-time, providing case overviews, current statuses and key legal filings.

Singificant AI cases and what they are about:

Alter v. OpenAI: What started as three separate cases brought by three different author groups has been consolidated into a single action against OpenAI and Microsoft. (This case includes Authors Guild and Basbanes).

Plaintiffs alleged that OpenAI and Microsoft are liable for copyright infringement arising from the use of plaintiffs’ works to train defendants’ AI models. The Tremblay plaintiffs (from In re OpenAI ChatGPT Litigation) have filed a motion to intervene in this case as the first-filed class action case. Nos. 1:23-cv-08292, 1:23-cv-10211, 1:24-cv-00084 (S.D.N.Y.)

Andersen v. Stability AI: Visual artists filed this putative class action, alleging direct and induced copyright infringement, DMCA violations, false endorsement and trade dress claims based on the creation and functionality of Stability AI’s Stable Diffusion and DreamStudio, Midjourney Inc.’s eponymous generative AI tool, and DeviantArt’s DreamUp. Each of the four defendant groups have pending motions to dismiss plaintiffs’ first amended complaint. No. 3:23-cv-00201 (N.D. Cal.)

Concord Music Group, Inc. v. Anthropic PBC: Several large music publishers sued Anthropic for direct and secondary copyright infringement and DMCA § 1202(b) violations, alleging that Anthropic improperly created and used unauthorized copies of copyrighted lyrics to train Claude and removed CMI from these copies.

Plaintiffs also filed a motion for a preliminary injunction for defendants to preclude Anthropic from creating or using unauthorized copies of those lyrics to train future AI models. The parties are concurrently briefing plaintiffs’ motion for a preliminary injunction and Anthropic’s motion to dismiss (or in the alternative, transfer). No. 3:23-cv-01092 (M.D. Tenn.)

Once content licenses becomes more routine in AI,and they will, don’t expect that tracking invention usage will be too far behind.

Daily News v. Microsoft: Newspaper publishers sued Microsoft and OpenAI in the Southern District of New York for direct, vicarious and contributory copyright infringement, DMCA violations, common law unfair competition, trademark dilution, and dilution and injury to business reputation. We are awaiting defendants’ responses.

Doe v. GitHub, Inc.: Anonymous plaintiffs filed this putative class action, alleging that GitHub, Microsoft and OpenAI used plaintiffs copyrighted materials to create Codex and Copilot. The current causes of action include DMCA violations, breach of contract for open-source software licenses, and breach of contract for violating GitHub terms. The parties are currently briefing defendants’ motions to dismiss. No. 4:22-cv-06823 (N.D. Cal.)

Getty Images v. Stability AI: Getty Images filed this lawsuit accusing Stability AI of infringing more than 12 million photographs, their associated captions and metadata, in building and offering Stable Diffusion and DreamStudio.

This case also includes trademark infringement allegations arising from the accused technology’s ability to replicate Getty Images’ watermarks in the AI outputs. Parties are currently engaged in jurisdictional discovery related to defendants’ motion to transfer. No. 1:23-cv-0013 (D. Del.)

Huckabee v. Bloomberg: Mike Huckabee (former governor of Arkansas) and others filed a putative class action complaint against Bloomberg alleging that Bloomberg is liable for direct copyright infringement for its use of the Books3 dataset to train its LLM. Defendant’s motion to dismiss is due March 22, 2024, with the opposition and reply due April 19 and May 3. (formerly Huckabee v. Meta) No. 1:23-cv-09152 (S.D.N.Y.)

The Intercept Media and Raw Story Media v. OpenAI: In two nearly identical lawsuits, a trio of news organizations represented by the same firm alleged DMCA violations arising out of the alleged inclusion of plaintiffs’ works of journalism in the datasets used to train ChatGPT. Defendants’ responses to the complaints are due on April 29, 2024. Nos. 1:24-cv-01514, 1:24-cv-01515 (S.D.N.Y.)

Kadrey v. Meta: Some of the same plaintiffs from the OpenAI ChatGPT Litigation filed a similar complaint against Meta, alleging Meta’s unauthorized copying of the plaintiffs’ books for purposes of training LLaMA models constitutes copyright infringement. Meta filed answer to first amended complaint on January 10. No. 3:23-cv-03417 (N.D. Cal.)

Leovy v. Google: Leovy v. Google: Plaintiffs filed this putative class action arising from the scraping and use of personal data and copyrighted content to train Google’s AI products (including Bard). Plaintiffs allege direct infringement claims based on Bard being trained on copyrighted works and outputting derivatives of those works.

Google’s motion to dismiss the plaintiffs’ first amended complaint is pending. (formerly JL v. Alphabet) No. 3:23-cv-3440 (N.D. Cal.).

“Humans are not smart or fast enough to manage all that AI has wrought and will provide. I and others have maintained that we will need bots to manage other bots – to make sure they are doing what they are supposed to.”

Nazemian v. NVIDIA Corporation: A group of authors filed this putative class action complaint against NVIDIA Corporation, alleging that NVIDIA copied the authors’ copyrighted books without their permission to train its LLM, Nemo Megatron-GPT. Plaintiffs alleged that “[w]henever an LLM generates text output in response to a user prompt, it is performing a computation . . . with the goal of imitating the protected expression ingested from the training dataset.”

New York Times v. Microsoft: The New York Times alleged that millions of its copyrighted works were used to create the LLMs of Microsoft’s Copilot (formerly Bing Chat) and OpenAI’s ChatGPT, and that these AI tools generate verbatim NYT content, closely summarize it, mimic its expressive style, and falsely attribute outputs to NYT.

We are awaiting the court’s ruling on Tremblay plaintiffs’ motion to intervene. Parties are briefing defendants’ motions to dismiss. No. 1:23-cv-11195 (S.D.N.Y.)

OpenAI ChatGPT Litigation: Three plaintiff groups, self-identified fiction and nonfiction authors, each filed a complaint in the Northern District of California against OpenAI, alleging copyright infringement, vicarious copyright infringement, DMCA violations and torts related to OpenAI’s GPT models and ChatGPT service. These cases — Tremblay v. OpenAI, Silverman v. OpenAI, and Chabon v. OpenAI — are now consolidated here.

On March 13, plaintiffs filed amended complaint against all defendants. Nos. 3:23-cv-3223, 3:23-cv-03416, 3:23-cv-04625 (N.D. Cal.)

Thomson Reuters v. ROSS: Thomson Reuters sued ROSS Intelligence in May 2020, alleging the AI/legal research company unlawfully copied content from Thomson Reuter’s legal research platform Westlaw for the purpose of training its AI-based platform.

On Sept. 25, the court denied both parties’ motions for summary judgment, leaving the issues of direct infringement and fair use for the jury to decide. Motions for summary judgment on defendant’s antitrust/anticompetition claims are pending. Trial is set for August 26, 2024. No. 1:20-cv-00613 (D. Del.)

There are at least a dozen other cases. Go here for the status of 24 listed AI litigations as of June 14 by ChatGPTiseatingtheworld.com.

What are some of the key copyright questions in these disputes?

Does training a model on copyrighted material require a license?
Does generative AI output infringe on copyright for the materials on which the model was trained?
Does generative AI violate restrictions on removing, altering or falsifying copyright management information?
Does generating work in the style of someone violate that person’s rights?

Patent questions include:

To what extent are AI-assisted inventions patentable?
If a response to an AI prompt is responsible for a significant part of an invention should the prompt or response be be entitled to some of the value?
Is the content or data that trained the model and helped to shape the invention be subject to licenses?

The Weaponization of Technology

Humans are not smart or fast enough to manage all that AI has wrought and will provide. I and others have maintained that we will need bots to manage other bots – to make sure they are doing what they are supposed to and that content and other providers are receiving fair value. If venture capitalists are not already on this, they will be very soon.

Image source: law.com