Photo by Smithsonian on Unsplash
- As of June 24, 2026, a coalition of 398 newspaper titles across 47 states filed a federal copyright lawsuit against OpenAI and Microsoft, alleging at least 6 million articles were scraped without authorization.
- The suit — the largest coordinated copyright action by regional news publishers on record — seeks statutory damages, a permanent injunction, restitution, and disgorgement of profits under the Copyright Act and the DMCA.
- Major publishers like News Corp have signed licensing deals worth tens of millions annually; local papers have been largely shut out of those negotiations while absorbing real revenue losses.
- AI content licensing deals grew from zero in 2022 to 28 in 2024, with 36 projected by end of 2026 — yet the average deal runs $24 million per year per publisher, a figure that simply doesn't scale to newsrooms of 12 people.
What Happened
398. That's the number of newspaper titles — spread across 47 states — that joined a coordinated federal copyright lawsuit against OpenAI and Microsoft on June 24, 2026, according to reporting aggregated by Google News and confirmed by Courthouse News Service. The complaint, filed in the U.S. District Court for the Southern District of New York under the name Richner Communications, Inc. v. Microsoft Corp., was organized through the newly formed Local Press Copyright Alliance (LPCA) and is represented by Platkin LLP, a firm founded in 2026 by former New Jersey Attorney General Matthew J. Platkin.
The core allegation is straightforward: defendants scraped at least 6 million articles from member publishers' websites — including content behind paywalls — without permission, bypassing both paywall protections and robots.txt directives. Plaintiffs are seeking statutory damages, a permanent injunction, restitution, and disgorgement of profits for alleged violations of both the Copyright Act and the Digital Millennium Copyright Act (DMCA).
LPCA executive director Marisol Vega framed the stakes directly, stating that the coalition's members "are not billion-dollar corporations with armies of lawyers." That framing matters. This isn't The New York Times, which filed its own suit against OpenAI and Microsoft in December 2023 and as of June 2026 is still in discovery — with a federal judge ordering OpenAI to produce 20 million anonymized ChatGPT logs. This is the Akron Beacon Journal and the Quad-City Times: newsrooms that can't fund a multi-year litigation war.
OpenAI spokesperson Drew Pusateri responded that "our models empower innovation, are trained on publicly available data, and are grounded in fair use," adding a commitment to working with content creators on "new revenue models." Microsoft issued a similar statement about respecting copyright and building a "sustainable future for news in the AI era."
The Licensing Gap That Made This Inevitable
The lawsuit didn't emerge from thin air. It's the predictable endpoint of a two-tier copyright economy that AI companies built — and that their own deal-making put on public display.
As of June 27, 2026, the deal math is stark. Meta signed an arrangement with News Corp valued at up to $50 million per year. News Corp's separate deal with OpenAI is worth over $250 million across five years. These are the agreements that get announced in press releases and cited by AI executives as proof of good-faith engagement with journalism. Meanwhile, nearly 1 in 5 newspapers in the LPCA coalition has reduced staff or cut publication frequency since 2023 — a period that maps directly onto the rise of AI-driven search tools that surface synthesized answers rather than sending readers to source articles.
The licensing acceleration chart tells the inclusion story clearly.
Chart: AI content licensing deal volume by year. The green 2026 bar represents a projected figure; all other bars reflect publicly tracked deals.
As of June 27, 2026, OpenAI leads with 24 publicly announced licensing agreements — nearly double the combined count of Microsoft and Meta. News and journalism accounts for 48 content licensing deals sector-wide, far ahead of music and audio (16) and images and video (12). Yet the publishers filing this suit are the ones who were never invited to those negotiations. The average deal, at $24 million per year per publisher, is a number that works for a multinational media conglomerate. It's an abstraction to a regional paper with two full-time reporters.
The legal groundwork also shifted in plaintiffs' favor. The February 2025 ruling in Thomson Reuters v. Ross found that AI training on copyrighted works is not automatically fair use — though legal observers note its applicability to generative AI specifically remains contested. The Courthouse News Service reported that the LPCA complaint specifically flags paywall bypass and robots.txt violations, which could create liability under the DMCA independent of the fair-use argument entirely. That's a meaningful tactical choice by Platkin LLP.
This isn't the only front. Encyclopedia Britannica and Merriam-Webster sued OpenAI in March 2026 for what they called "free riding" on trusted reference content. Denmark's DPCMO media body filed against OpenAI in February 2026. The pattern across Courthouse News Service, Press Gazette, and legal trade reporting points toward the same structural conclusion: the legal perimeter around AI training data is tightening from multiple directions simultaneously — and this echoes the permission-layer dynamic that AI Trends examined around GPT-5.6 government approval, where authorization for AI capabilities is being formalized at regulatory, contractual, and judicial levels all at once.
What This Means for Teams Using AI Tools Today
For professionals who rely on AI platforms for research synthesis, content drafting, or news monitoring, this lawsuit signals a structural shift: the era of consequence-free scraping as an AI training strategy is ending, and the costs of that transition will land somewhere in your tool budget.
The AI datasets and licensing market stood at approximately $460 million in 2025, with projections pointing toward multi-billion dollar levels by 2030. That's not an investment opportunity — it's the emerging raw-material cost of building a competitive large language model. As licensing becomes legally enforced rather than optional, AI companies paying licensing fees will build those costs into enterprise plans first. Vendors that continue to rely on unlicensed scraping face injunctive exposure that could limit training sets or specific outputs.
Data freshness is a practical risk that often gets overlooked. If courts restrict training on news content without licensing, AI tools' ability to surface accurate, current reporting may degrade on platforms that don't have licensed feeds. That matters for any workflow that depends on AI-assisted research into recent events.
Three Questions Worth Asking Before This Resolves
OpenAI's 24 publicly announced deals are findable on its website and through trade reporting. Microsoft's Publisher Content Marketplace, announced in February 2026, offers a pay-per-use model for rights-cleared content. Vendors that cannot answer the data-provenance question clearly may carry more legal exposure — and therefore more product risk — than those with documented agreements. This is a vendor due-diligence question, not an abstract ethics one.
If your workflow involves AI-generated summaries of news articles as a substitute for direct source access, the legal and reputational surface area extends to your organization, not just the tool vendor. Enterprise AI providers will likely tighten output policies around news-derived content as litigation pressure increases. Building in direct-source verification steps now is less disruptive than retraining a team after a policy change forces it.
Folha de São Paulo filed a lawsuit against OpenAI in August 2025 and settled it by May 2026 — through a commercial licensing agreement. That outcome — litigation as negotiation leverage — may be the actual trajectory for the LPCA case. If 398 U.S. regional papers can extract even a fraction of the per-publisher rates that News Corp commands, the coalition will have accomplished something no individual newsroom could: establishing a market floor for local journalism content that AI companies currently price at zero.
Frequently Asked Questions
Can AI companies legally use copyrighted news articles for model training?
The legal question is unresolved in U.S. courts. OpenAI and Microsoft argue that training on publicly available web content constitutes fair use under copyright law. However, the February 2025 ruling in Thomson Reuters v. Ross found that AI training on copyrighted works is not automatically fair use — though courts have not yet applied that ruling directly to generative AI at scale. The LPCA lawsuit adds a DMCA layer by alleging paywall bypass and robots.txt circumvention, which may create a separate liability track independent of the fair-use debate.
What happens if OpenAI loses the local newspaper copyright lawsuit?
Potential outcomes include substantial statutory damages under the Copyright Act, a permanent injunction restricting future training on unlicensed news content, and disgorgement of profits — meaning OpenAI could be required to return earnings attributable to the infringing training data. A loss could accelerate mandatory licensing industry-wide and significantly raise the cost basis for training large language models. It could also trigger coordinated suits from other content categories currently underrepresented in licensing agreements.
How much do AI companies pay publishers for content licensing deals?
As of June 27, 2026, deal sizes vary widely. The average licensing arrangement is valued at approximately $24 million per year per publisher. At the high end, News Corp's deal with OpenAI exceeds $250 million over five years; Meta's arrangement with News Corp reaches up to $50 million annually. Smaller and regional publishers have generally not received formal licensing offers at any price — which is the central grievance of the 398-newspaper coalition filing this suit.
In my analysis, the most underreported dimension of this case is the settlement calculus. Former New Jersey AG Matthew Platkin called this "the largest legal effort led by local and regional newspapers" — and he's right on the count. But the more consequential outcome may not be a trial verdict. It may be a licensing template. If nearly 400 papers can collectively extract terms that approximate even a fraction of what major publishers command, they will have done what no single local newsroom could: forced a market price onto content that the AI industry has treated as free infrastructure. The legal proceedings are the mechanism. The market rate is the goal.
Disclaimer: This article is editorial commentary for informational purposes only and does not constitute legal or financial advice. Research based on publicly available sources current as of June 27, 2026.