It is widely accepted that AI companies use web articles to train their models without compensating the creators or seeking permission. Publishers such as the New York Times, the Chicago Tribune and the Toronto Star have already filed lawsuits against the practice. Now another prominent organization has joined the court case.
Techcrunch has reported that Encyclopedia Britannica and its subsidiary Merriam-Webster have filed a lawsuit against OpenAI, alleging that the AI giant committed “massive copyright infringement” by cracking nearly 100,000 of its online articles without permission and using them to train its LLMs.
What is this lawsuit about?
Britannica claims that ChatGPT generates responses that replace its content, reducing web traffic and potential revenue. If users can ask ChatGPT a question and get an answer based on Britannica’s articles, there may be less incentive to visit the site directly.
The complaint also targets OpenAI’s use of Britannica content in ChatGPT’s RAG workflow, a process in which the AI searches the web for updated information when answering questions, alleging that when answering questions the AI reproduces its content in whole or in part.
Additionally, Britannica claims that OpenAI violates trademark law. The company has argued that ChatGPT hallucinates information and then falsely attributes it to the publisher. According to Britannica, ChatGPT’s hallucinations “endanger the public’s continued access to high-quality and trustworthy online information.”
What will happen next?
That’s the big question. There is no solid precedent as to whether training an AI with copyrighted content constitutes copyright infringement. Anyone can tell you that it’s not right to use someone else’s work to train your data, but the law on this is unclear at best.
In a recent case involving Anthropic, a federal judge ruled that using copyrighted content as training data was transformative enough to be legal. However, the same judge found that Anthropic had illegally downloaded millions of books, leading to a $1.5 billion settlement with the affected authors.
As this issue continues to evolve, there is still work for lawmakers to do. The outcome of these cases will likely impact how AI companies can legally use web content in the future.




