The New York Times sued OpenAI in December, arguing that the company used its articles without permission to train ChatGPT.
The case is now in the discovery phase, where both sides gather and exchange evidence before the trial. As part of that, OpenAI requested to know more about how the Times uses generative AI, including its use of generative AI tools from other companies, any AI tools it's developing for its reporting, and its views on the technology.
Judge Ona T. Wang rejected that request on Friday, calling it irrelevant. She then offered an analogy to explain her decision, comparing OpenAI to a video game manufacturer and the Times to a copyright holder.
If a copyright holder sued a video game manufacturer for copyright infringement, the copyright holder might be required to produce documents relating to their interactions with that video game manufacturer, but the video game manufacturer would not be entitled to wide-ranging discovery concerning the copyright holder's employees' gaming history, statements about video games generally, or even their licensing of different content to other video game Manufacturers.
In the same case, legal filings revealed earlier this month that OpenAI engineers accidentally deleted evidence that Times lawyers had gathered from their servers. Lawyers for the outlet spent over 150 hours searching through OpenAI's training data for instances of infringement, which they stored on virtual machines the company created. The majority of the data has been recovered, and the Times lawyer said there is no reason to believe it was "intentional."
The case is one among dozens of copyright cases filed against OpenAI, including by media organizations like the New York Daily News, the Denver Post, and The Intercept. Some of these cases have already been dismissed. Earlier this month a federal judge dismissed cases from Raw Story and AlterNet, because the outlets did not demonstrate "concrete" harm from OpenAI's actions.
OpenAI is also facing lawsuits from authors, including one involving comedian Sarah Silverman. Silverman and over a dozen authors filed an initial complaint against OpenAI in 2023, saying the tech company illegally used their books to train ChatGPT.
"Much of the material in OpenAI's training datasets, however, comes from copyrighted works — including books written by Plaintiffs — that were copied by OpenAI without consent, without credit, and without compensation," the complaint says.
OpenAI's website says the company develops ChatGPT and its other services using three sources: publicly available information online, information accessed by partnering with third parties, and information provided or generated by its users, researchers, or human trainers.
Silverman, who authored "The Bedwetter: Stories of Courage, Redemption, and Pee," discussed the ongoing legal dispute with actor Rob Lowe on his SiriusXM podcast. She said taking on OpenAI will be "tough."
"They are the richest entities in the world, and we live in a country where that's considered a person that can influence, practically create policy, let alone influence it," she said.
Some media organizations, including Axel Springer, the parent company of Business Insider, have chosen to partner with OpenAI, licensing their content in deals worth tens of millions of dollars.
OpenAI and the Times did not immediately respond to a request for comment from Business Insider.