In a landmark legal battle, The New York Times (NYT) has taken on OpenAI and Microsoft, alleging copyright infringement during the training of its AI models with information wrongfully sourced from the publication’s archive.
Both parties have released back and forth statements on their take on the situation, with OpenAI calling the NYT’s claims “meritless” and the prosecution saying OpenAI’s usage of the material was “not fair use by any measure.”
The case has drawn attention from both AI and legal experts, who are closely watching how it could reshape the landscape of AI regulation and the rights of content creators.
Cointelegraph spoke with Bryan Sterba, partner at Lowenstein Sandler and member of the Lowenstein AI practice group, and Matthew Kohel, partner at Saul Ewing, to better understand the legal intricacies of the case.
Sterba notes that OpenAI is advocating for a broad interpretation of the ‘fair use’ defense, a position not entirely supported by existing laws, but deemed necessary for the advancement of generative AI.
He continued saying it’s “basically a public policy argument” that OpenAI is framing around the ‘fair use’ defense, which has already been adopted in other countries to avoid stifling AI progress.
“While it’s always difficult to say with any certainty how a court will decide on a given issue, NYT has made a strong showing of the basic elements of an infringement claim.”
Kohel also commented that there is “undoubtedly” a lot potentially at stake in this lawsuit.
“The NYT is seeking billions of dollars in damages,” he said, “and alleges that OpenAI is providing its valuable content – which cannot be accessed without a paid subscription – for free.”
He believes that a ruling in favor of OpenAI not committing any sort of infringement would mean that it and other providers of AI technologies can use and freely reproduce one of the “most valuable assets” of the NYT – its content.
Kohel stressed that at the moment there is no legal framework in place that specifically governs the use of training data for an AI model. As a result, content creators such as the NYTs and authors like Sarah Silverman filed suits relying on the Copyright Act to protect their intellectual property rights.
This could change, however, as United States lawmakers introduced the AI Foundation Model Transparency Act, on behalf of the bipartisan Congressional Artificial Intelligence Caucus in December.
According to Kohel if the act is passed, it would implicate the use and transparency of training data.
In its defense, OpenAI has said that by providing publishers with the option to “opt-out” of being used for data collection, it is doing the “right thing.”
Sterba commented on the move saying:
“The opt-out concept will be cold comfort for NYT and other publishers, as they do not have any insight into what portions of their published copyrighted material have already been scraped by OpenAI.”
As the lawsuit unfolds, it brings to the forefront the evolving legal landscape surrounding AI for both developers and creators. Kohel stressed the importance of awareness for both parties.
“AI developers should understand that Congress and the White House, as shown by the Executive Order that President Biden issued in October 2023,” he said “are taking a hard look at the various implications that AI models are having on society.”
This would extend past just intellectual property rights, and extend to national security matters.
“Content creators should protect their interests by registering their works with the Copyright Office, because AI developers may end up having to pay them a licensing fee if they use their works to train their LLMs.”
The outcome of this lawsuit reminds anticipated by industry insiders and is likely influence future discussions on AI regulation, the balance between technological innovation and intellectual property rights and the ethical considerations surrounding AI model training with publicly available data.