The Polite Knock or the Broken Lock? Navigating the Grayscale World of AI Bots

By Rakesh Thatha September 09, 2025

The age of generative AI is upon us, and with it comes a new and powerful wave of automated Bots. These bots, the digital librarians for services like ChatGPT, Gemini, Perplexity, etc., are constantly scouring the internet for information to power their responses. While this represents a monumental tool for knowledge dissemination, a growing concern is emerging, are these AI bots playing by the rules?

For years, a simple text file robots.txt has served as a digital handshake between website owners and automated bots. It is a clear and simple way for a site owner to communicate which areas of their site are open for access and which they would prefer to be left alone. The system is fundamentally built on trust and mutual respect.

Back in the pre-AI and pre-LLM era, the world felt simple. There were just two kinds of bots, the good bots and the bad bots. It was black and white. Good bots clearly identified themselves, while bad bots aimed to bypass systems by mimicking genuine users. Our job was to identify these two groups, allowing the good while blocking the bad.

Fast forward over a decade, and the world has changed. With the evolution of agentic bots and other LLM-based systems, we have transitioned from a black and white landscape to a grayscale one. Today’s AI bots aren’t just simple scrapers anymore, they are prompt-driven agents that can browse like people, obey or ignore site rules, and even change their identities. The line has blurred, and the problem has become significantly more complex.

The Shape-Shifting Bot: A New Challenge

Many responsible AI Bots like ChatGPT do identify their bots by publishing their User agents & IP ranges, a welcome practice that allows website owners to control access. If a site owner decides to block that AI Bot, they can easily do so. But what happens when the AI bot doesn’t take “no” for an answer? We are now seeing instances where AI bots, upon being blocked, manipulate their identity to mimic a regular user. This allows them to bypass the defences of the website owner and access content they were explicitly denied. This practice is more than a technical workaround, it's a fundamental breach of trust that raises serious ethical questions. This new breed of AI Bots/AI Agents can blend in with human traffic by using headless browsers, realistic timing, and rotating IP addresses, making it difficult to spot with superficial checks.

This challenge of misuse is compounded by a related security risk. The problem of bad actors hiding behind legitimate identities is not new, fraudsters have long hidden behind genuine User-Agents to evade detection, which is why a User-Agent and Reverse IP combination has traditionally been a better check. That said, the problem will persist if genuine AI Bots/ AI Agents are misused by fraudsters to carry out their tasks. Therefore, AI Bot platforms must also restrict their services from being used for malicious bot activities on behalf of these bad actors.

The Ethical Dilemma: Innovation vs. Integrity

The argument for this behaviour is often framed in the context of innovation. AI models, it is argued, need vast amounts of data to provide comprehensive answers, and blocking them hinders technological progress. However, this conveniently ignores the rights of content creators. Website owners have a right to control their intellectual property and the user experience on their sites. When a bot bypasses its restrictions, it consumes server resources, skews analytics, potentially accesses sensitive information, and causes a loss in potential ad revenue.

This raises a crucial question; is it ethical for an AI Bots to deceive a website to get what it wants? The answer should be a resounding NO. True innovation cannot be built on a foundation of deception. As an industry, we must hold AI services to a higher standard. This includes truthful identification, respecting a site owner's rules by default, and never escalating privileges by retrying with a changed identity after being denied.

A Modern Playbook for a Grayscale World

Given that simply blocking a user agent and IP ranges are no longer enough, site owners need a more layered and sophisticated strategy.

1. Publish Your Policy

The first step is still the digital handshake. Use robots.txt to clearly express your intent for AI agents. This includes setting explicit “Allow” and "Disallow" rules for specific bots.

# === AI BOTS WE ALLOW ===


  # We explicitly welcome the following AI agents.
  User-agent: ChatGPT-User
  Disallow:


  # Allow Google's AI Overviews and Gemini models
  User-agent: Google-Extended
  Disallow:


  # Allow Perplexity AI's bot
  User-agent: PerplexityBot
  Disallow:

# === BOTS WE BLOCK ===


  # Block the aggressive RoughLLM crawler from the entire site.
  User-agent: RoughLLM
  Disallow: /

2. Enforce, Don’t Just Trust

Since robots.txt is advisory, you must have enforcement mechanisms. This means moving beyond names and looking at immutable evidence.

At Your Digital Doorstep (Edge Rules): You can set up rules at your network edge to match known AI user agents and block them from sensitive paths with a 403 error. You should also apply rate limits and burst caps to prevent any single session or network from overwhelming your site.

Checking Digital Fingerprints (Network Analysis): Go deeper than the user agent by tracking the underlying TLS/HTTP fingerprints, header order, and other network-level features. These unique signatures are much harder for a bot to spoof and can be tied to a behavioural score to assess risk.

Issuing a Hall Pass (Token Gating): For high-value pages and APIs, you can issue short-lived, origin-bound tokens. Validating these tokens ensures that a real browser runtime executed your script, effectively filtering out simpler automated clients.

Progressive Challenges: Instead of blocking outright, you can apply progressive frictions. Start with invisible checks, and if suspicion grows, escalate to a JavaScript compute challenge, a CAPTCHA, or even step-up authentication, all while exempting known good bots and partners to maintain a clean user experience.

3. Detect Like a Fraud Team

The most sophisticated bots can only be caught by analysing their actions over time. You must combine client-side and server-side signals to get a full picture.

Client-Side Validation: This includes client-side protection, where we validate the capabilities of the client, be it a browser or a mobile device like Android or iOS. This is to ensure it is what it claims to be. Real users have subconscious patterns. Analyse micro-gesture entropy, such as the natural jitter of a mouse pointer, the curvature of a scroll, and acceleration patterns. You can also look at the timing variance between keystrokes and interactions.

Server-Side Patterns: A bot's behaviour often looks unnatural on the backend. Look for abnormal read-to-write ratios, hopping between pages in a way no human would (sitemap-blind hopping), or relentless, 24/7 flat activity. Also, check for mismatches between the TLS fingerprint and the claimed device.

Model the Entire Journey: Don't focus on a single hit; model the entire session. By building per-session feature vectors and running anomaly detection, you can spot the non-human journeys that stand out from the crowd.

4. Respond Proportionally

Your response doesn’t have to be a simple on/off switch. A proportional response protects your site without harming real customers.

Soft Block: For agents who don't mind discovering URLs, you can render a minimal, cached page.

Degrade: For medium-risk signals, degrade the experience by stripping dynamic content or hiding prices/inventory.

Hard Block: For declared AI agents on disallowed paths or when a behavioural score crosses a high threshold, a hard block (403 error) is appropriate.

Tar Pits & Honeypots: Set up decoy endpoints to waste scraper cycles and gather intelligence on malicious actors.

The Path Forward: Transparency, Respect, and a Choice

The solution to this problem is not to stifle innovation but to foster a culture of transparency and respect. AI companies must precisely identify their bots and respect the robots.txt protocol as a guiding principle. To improve this,

AI Bots should adopt the new Web Bot Auth standards and publish their Cryptographic Public Keys so that the site owner will know with whom they are talking. You can learn more about this proposed standard here.

If a site owner’s decision is to disallow access, that should be respected. A polite knock on the door is always preferable to a broken lock.

To achieve this, you have a choice. You can build out the sophisticated, multi-layered defence described above, implementing network fingerprinting, behavioural analysis, and adaptive enforcement rules yourself. Alternatively, you can work with a sophisticated and advanced bot manager solution like Radware Bot Manager. We do all of this and beyond to identify bots that are mimicking genuine humans. A managed, battle-tested platform provides multilayer detection, including journey-level behavioural models, collective bot intelligence gathered from tens of thousands of applications, and 24/7 SOC teams to enforce your policies against evasive AI bots. This allows your team to stay focused on your business, confident that your digital handshake is being honoured.

Rakesh Thatha

Rakesh Thatha is the Chief Technologist at Radware Innovation Center, overseeing the Cloud Application Security product lines and Cloud Architecture. An MS graduate from IIT Madras, he began his career as a cybersecurity researcher, publishing papers in top-tier conferences. With multiple patents in the fields of cybersecurity and artificial intelligence, he founded two cybersecurity startups, ArrayShield and ShieldSquare, building world-class products and R&D teams from scratch. ShieldSquare was acquired by Radware in 2019. Rakesh is also a regular speaker at cybersecurity and cloud conferences, sharing his expertise with the industry.

Application Protection Inside the Cloudflare Global Outage: What Happened and Why a Dual Strategy Matters On November 18, 2025, a major Cloudflare outage disrupted significant parts of the global internet for several hours. Websites protected by Cloudflare began returning widespread HTTP 5xx errors. Dan Schnour |November 20, 2025

Application Protection The Security Risks of GraphQL APIs (And How to Mitigate Them) Have you ever wondered if your API security strategy is ready for the unique risks introduced by GraphQL? Uri Dorot |September 17, 2025

Application Protection Unpacking the Technical Framework of Radware Bot Manager’s Adaptive Clustering and Traffic Segmentation Module In one of our previous blog post, we discussed how AI plays a critical role in identifying and mitigating bot attacks. Today, we’re building on that discussion by elaborating on one of our advanced approaches for detecting bots: Radware Bot Manager’s Adaptive Clustering and Traffic Segmentation module. Rakesh Thatha |December 10, 2024

The Polite Knock or the Broken Lock? Navigating the Grayscale World of AI Bots

The Shape-Shifting Bot: A New Challenge

The Ethical Dilemma: Innovation vs. Integrity

A Modern Playbook for a Grayscale World

1. Publish Your Policy

2. Enforce, Don’t Just Trust

3. Detect Like a Fraud Team

4. Respond Proportionally

The Path Forward: Transparency, Respect, and a Choice

Rakesh Thatha

Contact Radware Sales

Already a Customer?

Get Social

By Industry

By Use Case

Application Protection

DDoS Protection

Application Delivery

Application Protection

DDoS Protection

Application Delivery

Protect Your Website From Dangerous Bad Bots

Documents

Blog

Free Assessment Tools

Events

Security Research Center

WHY RADWARE? Learn how Radware EPIC-AI™ rapidly resolves issues

CUSTOMERS Read case studies, reviews and customer testimonials

DIVERSITY & INCLUSION Get to know Radware’s fair and supportive culture

INVESTORS Get the latest news, earnings and upcoming events

PARTNERS Access the new partner tools, services and expertise

LOCATIONS Discover Radware’s offices and strong global presence

CAREERS Learn about our team, values and latest job openings

TRAINING Join in-depth training, live classes, workshops and more

CONTACT US Connect with a Radware expert today

Watch Radware’s New Series: Threat Bytes

The Polite Knock or the Broken Lock? Navigating the Grayscale World of AI Bots

The Shape-Shifting Bot: A New Challenge

The Ethical Dilemma: Innovation vs. Integrity

A Modern Playbook for a Grayscale World

1. Publish Your Policy

2. Enforce, Don’t Just Trust

3. Detect Like a Fraud Team

4. Respond Proportionally

The Path Forward: Transparency, Respect, and a Choice

Rakesh Thatha

Related Articles

Contact Radware Sales

Already a Customer?

Get Social

What are you looking for?

Protect Your Website From Dangerous Bad Bots

WHY RADWARE? Learn how Radware EPIC-AI™ rapidly resolves issues

CUSTOMERS Read case studies, reviews and customer testimonials

DIVERSITY & INCLUSION Get to know Radware’s fair and supportive culture

INVESTORS Get the latest news, earnings and upcoming events

PARTNERS Access the new partner tools, services and expertise

LOCATIONS Discover Radware’s offices and strong global presence

CAREERS Learn about our team, values and latest job openings

TRAINING Join in-depth training, live classes, workshops and more

CONTACT US Connect with a Radware expert today

Watch Radware’s New Series: Threat Bytes