Cloudflare will begin blocking web crawlers that combine traditional search, AI agent activity, and model training from ad-supported pages by default starting September 15, 2026.
The company wants AI providers to operate separate crawlers for search, agent use, and training so website owners can decide how each type accesses their content. Crawlers that do not clearly separate these purposes will face the most restrictive applicable setting.
Cloudflare will continue allowing search crawlers by default because they can send visitors to websites.
However, it will block crawlers classified as Training or Agent from pages displaying advertisements. Mixed-use crawlers combining these purposes will also be blocked unless the website owner changes the settings.
The new defaults will apply to new Cloudflare customers, new domains created by existing customers, and current free-tier customers who have not changed their settings before September 15.
Website owners will remain free to adjust the controls through their Cloudflare dashboard.
The change could limit how AI companies collect online material for training models and operating AI agents.
Cloudflare said many publishers want their content to remain visible in traditional and AI-powered search. However, businesses supported by advertising or subscriptions may not want their work used for AI training without permission or payment.
The company said bots should clearly disclose why they are visiting a website. A provider using content for search indexing, AI agents and model training should therefore run three separate crawlers instead of combining the activities under one bot.
Cloudflare referred to the “world’s largest search engine” without naming Google and claimed it could access approximately twice as much information as competing AI companies.
It argued that mixed-use systems can force publishers to choose between appearing in search results and allowing their material to support AI products.
Google has previously challenged that argument. It provides a control called Google-Extended that allows publishers to prevent their content from being used to train future Gemini models or support some Gemini and Vertex AI services without affecting inclusion in Google Search.
However, Google’s AI Overviews and AI Mode form part of Google Search and use the main Googlebot controls. Blocking Googlebot would also affect a website’s ability to appear in conventional search results.
Cloudflare CEO Matthew Prince said automated systems and bots now account for most internet traffic, reaching that point earlier than expected.
He said the company must act more quickly to support a sustainable online ecosystem and encourage AI providers to state clearly whether their bots perform search, training or agent tasks.
Cloudflare has already introduced several tools that give publishers more control over AI access, including AI crawler blocking and Pay Per Crawl, which allows website owners to charge companies for scraping their pages.
Cloudflare is now developing Pay Per Crawl into a system called Pay Per Use.
Instead of paying whenever a crawler downloads a page, AI companies would compensate publishers when their material provides value, such as appearing in an AI-generated result or supplying premium information to an agent.
Cloudflare is initially testing the system with Ceramic.ai and You.com.
Publishers participating in Ceramic’s system can receive payment when their content appears in its AI search results. You.com can pay for access to individual pieces of premium content when its agents need them. Other AI providers will be able to adapt the model to their own services.
Cloudflare said more than 50% of legitimate bot traffic involves repeatedly downloading pages that have not changed.
Reducing these unnecessary requests could save publishers bandwidth and lower the computing costs faced by AI companies. Cloudflare plans to use its network data to help crawlers determine when a page has changed and needs to be downloaded again.
The company said its new rules aim to let publishers remain visible in AI services without automatically giving away their content for training or agent use.
Get the latest tech news, telecom insights, and product launches wherever you prefer.
Add ProPakistani to Preferred Sources and see more of our stories in Google Search and Top Stories.