AI Crawler Traffic Is Exploding: What Website Owners Must Do Now


Right now, OpenAI, Apple, Meta, Anthropic, Amazon, and Perplexity are crawling your website. Not occasionally. Continuously. They are ingesting your content, indexing it, and using it to power AI products that your users are increasingly turning to instead of searching the web directly.

Radware's telemetry puts that activity at around 30% of all legitimate bot traffic, which is a mix of Legitimate Bots, Crawlers and AI Crawlers. AI crawlers have started to mirror the amount of traffic being seen from traditional search bots like Googlebot and Bingbot combined. That's not a trend to watch. It's a shift that's already happened. Most organizations are absorbing this volume without the visibility to know who is crawling them, what those crawlers are after, or what it's costing them.

What the Data Actually Shows

Radware's telemetry data across our protected assets during the current month shows that AI Crawler traffic is almost mirroring the traffic seen from the traditional search crawlers. This in itself is a significant increase in % terms of AI Crawler traffic data seen for the same period last month. Across the 3 categories (Legitimate Bots, Traditional Crawlers, AI Crawlers), it is worth noting that none of this is human traffic. The "legitimate" category is not an actual user. It's recognized for automated services like Facebook crawlers, Zscaler, Pinterest bots, and Dynatrace. If your infrastructure or analytics strategy assumes the non-bot share of traffic is human, that assumption needs revisiting.

On top of this, bad bots continue to be seen on customer applications and that count also continues to grow as the sophistication of the bad bot attacks increase. But that's a separate and significant problem. But it's the AI crawler's share (purposeful, growing, and concentrated) that demands a fundamentally different operational response.

(For a broader look at the crawler landscape and how we got here, Dhanesh Ramachandran's post on Understanding AI Crawlers is worth reading first.)

A Small Number of AI Platforms Are Driving Almost All of It

Of the AI crawler requests seen on the customer applications during the chosen time period in this month, eight platforms account for nearly all of it:

  • Apple Bot (27.47%)
  • GPTBot (23.71%)
  • OAI-SearchBot (17.13%)
  • ChatGPT-User (11.50%)
  • meta-externalagent (11.49%)
  • Anthropicbot (5.45%)
  • Amazonbot (2.71%)
  • PerplexityBot (0.48%)

Concentration matters. The AI ecosystem is not a fragmented long tail of scrapers. It's a handful of major platforms, each with distinct behaviors, operating continuously at scale. For infrastructure teams, that means you're increasingly in an ongoing relationship with these systems, one that requires deliberate management, not just passive tolerance.

The Behavior Problem Is Harder Than the Volume Problem

Volume, once understood, is manageable. Behavior is harder.

Legacy search bots were predictable: they crawled on schedules, respected robots.txt, and identified themselves honestly. AI-driven crawlers operate differently. Two distinct patterns are emerging:

Mass ingestion: high-volume sweeps designed to harvest training datasets. These can look like a sustained DDoS from a distributed IP range, except blocking them isn't straightforwardly the right call.

Real-time retrieval: on-demand fetches triggered by live user queries, typically in RAG-based AI systems. The bot is fetching your content right now because a user somewhere just asked a question your page might answer. Latency matters. Volume is bursty and unpredictable.

These two behaviors require very different responses, but they can arrive looking similar to the origin server. That's the core operational challenge.

There's a downstream problem too: analytics distortion. When AI crawlers and other automated traffic make up most of your request volume, your KPIs become unreliable. Conversion rates, bounce rates and session duration: all of these get skewed if AI crawler traffic isn't filtered cleanly at the source. Business decisions made on that data are made on noise.

Why robots.txt isn't a Strategy Anymore

For years, the standard playbook was simple: maintain a robots.txt, run a basic allowlist, and let the major crawlers do their thing. That worked when the bot ecosystem was stable, and the major players were operating in good faith on a shared protocol.

That model has three problems today.

First, AI crawlers don't all respect robots.txt, and even when they do, compliance is voluntary. There's no enforcement mechanism.

Second, modern crawlers can distribute requests across global IP ranges, mimic browser behavior, and dynamically adjust crawling patterns. Static controls can't be kept up.

Third, and most importantly, the question has changed. The old question was binary: allow or block. The new question is which crawlers, doing what, at what rate, against which assets. That requires classification, not gatekeeping.

A Framework for Managing AI Crawler Traffic

The organizations handling this well have moved from binary blocking to strategic orchestration. The goal isn't to wall off all AI automation, because blocking AI indexing bots has real costs for discoverability in AI-generated answers, which is increasingly how people find things. The goal is granular, intentional control. Three capabilities matter most:

Genuine behavioral visibility. User-agent strings are easily spoofed and increasingly unreliable. Effective classification requires going deeper: TLS fingerprints, rendering patterns, request timing, behavioral telemetry across hundreds of parameters, all to build a high-fidelity picture of what each request is doing, not just what it claims to be.

Intent-based classification. Not all AI crawlers are equivalent. Training bots are resource-intensive and offer low immediate value to the site owner. Indexing bots (the ones powering AI search) have real discoverability implications and blocking them has real costs. Retrieval bots are time-sensitive and need low-latency handling. Treating these as a single category leads to bad decisions in every direction.

Edge-based enforcement. Mitigation needs to happen before requests reach the origin. If your bot management is running the application layer, you're already paying the cost of the traffic you're trying to block. Pushing classification and enforcement to the network edge preserves origin capacity for the traffic that matters.

This is the capability set Radware's bot management platform is built around: enabling organizations to set differentiated policies for different AI crawler types, throttle what's costly, allow what's valuable, and protect high-value content without sacrificing their presence in the AI discovery ecosystem.

What This Means for How You Operate

If you're responsible for web infrastructure or application security, a few things are worth acting on now:

Get a clean traffic baseline. If you don't have a clear separation of AI crawlers from traditional crawlers and legitimate automated services in your analytics, your operational metrics are compromised. This is the starting point for everything else.

Map your crawlable assets. Not everything on your site should be equally available to AI systems. High-value proprietary content, pricing data, and competitive intelligence deserve different treatment from public marketing pages.

Revisit your crawler policies with intent in mind. Which AI systems do you actually want to index your content? Which are consuming resources without providing value? robots.txt is a starting point, not a policy.

Prepare for retrieval-pattern traffic. If AI systems are fetching your content in real-time to answer user queries, your caching strategy and origin capacity need to account for that. The traffic won't arrive on schedule.

The Bigger Picture

Websites have always been destinations. They're increasingly becoming data sources, continuously ingested, processed, and synthesized by AI systems that never appear in your session of analytics.

The organizations that adapt to this will have control over how their content is used, how their infrastructure is consumed, and how visible they are in AI-generated answers. Those that don't find themselves both overloaded by traffic they didn't plan for and invisible in the places their users are actually looking at.

The web is machine-driven. The question is whether you're managing those machines or just absorbing them.

To learn more about how Radware helps organizations detect, classify, and manage AI crawler traffic, get in touch or read our State of the Underground Ecosystem.

Anirudh K

Anirudh K

Related Articles

Contact Radware Sales

Our experts will answer your questions, assess your needs, and help you understand which products are best for your business.

Already a Customer?

We’re ready to help, whether you need support, additional services, or answers to your questions about our products and solutions.

Locations
Get Answers Now from KnowledgeBase
Get Free Online Product Training
Engage with Radware Technical Support
Join the Radware Customer Program

Get Social

Connect with experts and join the conversation about Radware technologies.

Blog
Security Research Center
CyberPedia