Cloudflare calls out Perplexity AI for stealthily scraping blocked sites
The issue started Monday when Cloudflare accused Perplexity, an AI search engine, of ignoring robots.txt rules and scraping a test site blocked specifically for its crawlers. Cloudflare set up a fresh domain that denied Perplexity’s bots, but Perplexity still accessed and answered questions about the site.
Cloudflare’s CEO Matthew Prince shared the findings on X, exposing that Perplexity used a generic browser spoofing Chrome on macOS to bypass restrictions.
Prince wrote on X:
Some supposedly ‘reputable’ AI companies act more like North Korean hackers. Time to name, shame, and hard block them.
Soon after, defenders rallied behind Perplexity. They argued that AI assistants fetching websites on behalf of users should be treated like human visitors—not bots. On Hacker News one commenter said:
If I as a human request a website, then I should be shown the content. Why would the LLM accessing the website on my behalf be in a different legal category as my Firefox web browser?
Perplexity responded with a blog post blaming a third-party service for the bot behavior and criticized Cloudflare’s system for failing to differentiate AI helpers from malicious crawlers.
The blog stated:
The difference between automated crawling and user-driven fetching isn’t just technical — it’s about who gets to access information on the open web. This controversy reveals that Cloudflare’s systems are fundamentally inadequate for distinguishing between legitimate AI assistants and actual threats.
Cloudflare pushed back, highlighting that competitors like OpenAI don’t dodge crawling rules. Prince wrote:
OpenAI is an example of a leading AI company that follows these best practices. They respect robots.txt and do not try to evade either a robots.txt directive or a network level block. And ChatGPT Agent is signing http requests using the newly proposed open standard Web Bot Auth.
Web Bot Auth is a Cloudflare-backed industry effort for cryptographically verifying AI bot identities.
This clash taps into a bigger issue as AI traffic overtakes humans online. Imperva’s recent report shows over 50% of internet traffic now comes from AI, with 37% flagged as malicious bots. Sites have long used CAPTCHAs and blockers to keep bad bots out, but AI assistants blur the lines between human and bot-like visits.
The debate on X captured the contradictions perfectly:
“I WANT perplexity to visit any public content on my behalf when I give it a request/task!”
versus
“What if the site owners don’t want it? they just want you [to] directly visit the home, see their stuff”
Another user predicted:
“This is why I can’t see ‘agentic browsing’ really working — much harder problem than people think. Most website owners will just block.”
The storm shows how tangled AI’s role on the open web has become. Legitimate AI assistants or rogue scrapers? The answer could reshape how the internet functions.