NYT vs OpenAI and Microsoft: Three Scenarios with a Big Impact on Our Digital Environment
It was bound to happen. On December 27, 2023, The New York Times sued OpenAI and Microsoft for mass copyright infringement related to how the companies used the publisher’s content as training data for their generative artificial intelligence systems. The lawsuit is the latest and the most high-profile case in a string of similar lawsuits that have been filed against AI companies recently.
According to the case documents, the The New York Times wants OpenAI and Microsoft to stop using its content as training data and to destroy the content from their systems along with the models trained on the content. This would not only require re-training systems such as ChatGPT with potentially poorer training data but also open the door for other copyright owners to require OpenAI and Microsoft to do the same.
The case is built on two arguments.
First, The New York Times argues that the defendants have already benefited commercially and aim to further benefit from derivative products unlawfully based on the copyrighted content. In other words, the financial interests of the company and its shareholders are being violated. Second, The New York Times argues that its capacity to fund high quality journalism is at stake and, more generally, journalism in general if technology companies are allowed to scrape and use copyrighted content under the fair use doctrine.
A lot is at stake in the lawsuit that may turn out to be a watershed moment in the evolution of AI. Generative AI is just as much about the data than the algorithms.
In some ways, the situation is analogous to the rise of search engines twenty years ago. Search engines indexed everyone’s content and started selling advertisements next to the index, rerouting massive amounts of advertising money away from publishers. Yet, search engines by design pass visitors on to publishers’ websites. Generative AI tools are not built to do this, which is why they can have potentially much more dramatic impact on publishers. However, this time publishers such as The New York Times are better prepared to protect their interests.
Below, I sketch three scenarios how the lawsuit may unfold.
OpenAI and Microsoft win
This seems an unlikely outcome. People who have read through the case documents (including me) against OpenAI and Microsoft seem to think that the case is strong. I would also imagine that OpenAI and Microsoft will prefer, if possible, an out-of-court settlement to the risk of losing the case. Yet, when it comes to complex court cases, anything is possible. If OpenAI and Microsoft win the case, this would probably redefine the idea ‘fair use’ of copyrighted content quite a bit. Systematically harvesting millions of copyrighted documents and using them to build a competing product cannot be fair use, or can it?
Yet, there is a caveat. Even if OpenAI and Microsoft would win, no one can force copyright owners to make their content available for technology companies — and technology can be used to stop AI from scraping content. Some content platforms have already implemented such features and the victory for the two tech companies could trigger an arms race to secure content ecosystems against unwanted scraping by the developers of AI.
Out-of-court settlement
This is the most likely outcome if The New York Times is more interested in its shareholders than journalism as a public good. Indeed, given that The New York Times was in talks with OpenAI and Microsoft before suing them, a possible interpretation of the lawsuit is that it is a negotiation tactic. The New York Times will use it as a leverage to reach a favorable agreement with the companies. This way, the company could protect its own capacity to produce high quality journalism.
Other large corporations will then undoubtedly follow, using armies of lawyers to erect a complex web of contracts with AI companies to supply them training data — while smaller players and individuals are left out in the cold. Axel Springer has already entered into such a relationship with OpenAI. All in all, I would not be surprised if big copyright holders will find a way to retain status quo by forcing OpenAI and Microsoft to share some of the revenues from products based on generative AI. Whether anything will trickle down to individual content producers will remain to be seen (I would not hold my breath).
The New York Times wins
If The New York Times is interested in journalism as a public good, then it might see through the lawsuit and refuse an out-of-court settlement. If the publisher wins, this will not be the end of generative AI or Microsoft or even OpenAI. The companies will have to cough up some damages and maybe retrain their models and, most importantly, come up with solutions to source training data fairly. If technology companies can create highly sophisticated generative AI systems, they should be able come up with solutions to distribute value created by AI tools to those whose works are used as training data. A win for The New York Times could set the development of generative AI on a trajectory where technology companies have an incentive to do this.
Finally, OpenAI and Microsoft probably expected to be sued for their use of copyrighted material as training data so it will be interesting to see what kind of approach they take to defend themselves. Is it going to be based on ‘fair use’ argument as the case documents assume or something else? One way or the other, The New York Times vs. OpenAI and Microsoft case will have a major impact on the development of generative AI and our future digital environment.