Wikimedia Foundation Urges AI Giants to Pay for Wikipedia Data Access and Stop Scraping

The Wikimedia Foundation, the nonprofit organization responsible for hosting the world’s seventh-most visited website, Wikipedia, has issued a formal request to AI companies to cease unauthorized data scraping and begin compensating the organization for the use of its content.

In a recent blog post, the Foundation stated that AI models critically depend on high-quality, human-curated data, a resource that Wikipedia’s global volunteer editor network consistently provides across more than 300 languages, ensuring information remains well-sourced.

Despite being a free public resource, running Wikipedia is a costly operation. Audited figures show that the organization required $179 million to operate the platform during the 2023-2024 fiscal year, funding its mission primarily through direct public donations and operating without advertising revenue.

The rise of conversational AI tools, such as ChatGPT, is fundamentally altering user behavior. Instead of visiting Wikipedia for research, users are increasingly turning directly to AI models for answers, a shift that poses a significant financial threat to the platform.

This circumvention of the Wikipedia website means that users are less likely to encounter the donation requests typically placed at the top of the homepage, potentially leading to a decline in the financial support necessary to keep the massive service running.

To address this financial imbalance, Wikimedia is urging AI companies to subscribe to its Enterprise API. This paid service is intended to allow firms to access and utilize Wikipedia content at scale in a sustainable manner.

By adopting the Enterprise API, AI companies would not only avoid severely taxing Wikipedia’s servers but would also directly contribute to supporting the Foundation’s nonprofit mission, turning their data consumption into a revenue stream for the encyclopedia.

The Foundation’s move comes amidst a growing pushback from content creators and publishers against the unauthorized use of their material for training AI models. Several major publishers, including The New York Times and News Corp, are currently engaged in copyright infringement lawsuits against AI firms.

In contrast to litigation, other organizations, such as the Associated Press and Reuters, have opted to establish formal licensing deals with AI companies, demonstrating a varied approach across the industry to capitalizing on the AI boom. Google, notably, already has a commercial access deal with Wikimedia dating back to 2022.

The controversy highlights the immense financial growth of Big Tech in the AI era—with companies like Nvidia, Microsoft, and Alphabet recently reaching trillion-dollar valuations—and the content providers’ assertion that they should be compensated for the fundamental data that fuels this growth.

Wikipedia Demands Compensation from AI Companies for Data Usage