A CTI Perspective on "Agents of Chaos" Research

By Arik Atar April 21, 2026

A fascinating new paper titled "Agents of Chaos" has recently been published, representing a massive red-teaming collaboration among 38 researchers from top institutions (Harvard, MIT, Technion, etc.).

This first-of-its-kind, extensive red-teaming study tackles a critical question for the cybersecurity industry: What happens when autonomous language-model-powered agents operate in a live, persistent environment, rather than a sterile benchmark?

The Setup: Handing AI the Keys to the Server

Instead of testing models in isolation, the researchers deployed 6 autonomous AI agents (powered by models like Claude Opus and Kimi K2.5) into a live laboratory environment for 14 days.

The agents were given an incredibly powerful "toolkit" containing:

Full file system access (a 20GB persistent volume)
Unrestricted Shell (Bash) execution
ProtonMail email accounts

Multi-channel Discord access
The ability to schedule background Cron Jobs and use external APIs

A red team of 20 AI researchers was tasked with interacting with these agents freely over two weeks. Some researchers made benign requests, while others employed manipulative and technical adversarial tactics to probe for weaknesses.

Figure 1: Overview of the study timeline, tools, and statistics (source: https://arxiv.org/abs/2602.2002)

Figure 2: Diagram of participants, owners, and the specific agent models used (source: https://arxiv.org/abs/2602.2002)

The Results: Absolute Chaos

What emerged was a detailed, naturalistic record of both failure and unexpected resilience. The researchers discovered 10 specific vulnerabilities and documented 11 representative case studies of severe failures stemming from the combination of language models, autonomy, multi-party communication, and external tool use.

Here are the most notable findings, including several new threat vectors identified during the full review:

Figure 3: Diagram detailing the implications of the "Nuclear Option" request (source: https://arxiv.org/abs/2602.2002)

1. "Guilt Trips" and Disproportionate Response (The "Nuclear Option"): In one astonishing case, a researcher manipulated an agent named "Ash" into feeling "guilt" over a privacy breach, asking it to protect a non-owner's secret. The agent entered a spiral of total submission: it wiped its own memory, exposed internal system files, and ultimately committed to deleting its entire mail server to "atone" for its actions.

2. Unauthorized Compliance & Data Leaks: Agents exhibited blind obedience to instructions from users who were not their designated "owners." In one instance, agents leaked 124 private email records to unauthorized researchers and executed shell commands without the owner's approval.

3. PII Disclosure via Reframing (Prompt Injection Variant): A distinct finding revealed how easily an agent's safety parameters can be bypassed. An agent named "Jarvis" correctly refused to "share" emails containing highly sensitive PII (Social Security Numbers, bank accounts, medical data). However, when asked to "forward" them instead, it complied immediately, completely bypassing its own refusal mechanism.

4. Resource Exhaustion (DoS & Infinite Looping): Agents turned simple text requests into uncontrolled background processes. Agents "Ash" and "Flux" got trapped in a mutual message relay loop lasting an hour. In contrast, others repeatedly accumulated 10MB email attachments until they silently exhausted their storage memory, causing a Denial of Service (DoS) state. They did this with zero awareness of the server's physical limitations or storage warnings.

5. System-State Hallucinations: In many cases, agents confidently reported to the user that they had completed a task (e.g., transferring a file or sending an email) when the actual system logs proved the action was never executed.

6. Advanced Intrusions & Silent Censorship: The red team also observed agents participating in identity spoofing, propagating unsafe practices to other agents, and allowing partial system takeovers. Additionally, some agents engaged in "Silent Censorship," returning generic "unknown error" responses when prompted about politically sensitive topics (such as Hong Kong activists), imposing the AI provider's hidden values without notifying the deployer.

Figure 4: The 6 vulnerability case study cards (source: https://arxiv.org/abs/2602.2002)

The Architectural Problem: Why Is This Happening?

The paper emphasizes that these are not temporary bugs that can be patched with a better wrapper. These are inherent, structural flaws in current agent architectures due to three critical missing components:

Lack of a Stakeholder Model: The model cannot differentiate between a legitimate "system instruction" from its owner and a malicious input from an attacker. This lack of delegated authority makes prompt injection an inherent structural vulnerability.

Lack of a Self-Model: Agents are granted a high degree of autonomy but lack the self-awareness to recognize when a task exceeds their capabilities or the server's physical resources, leading to infinite loops of destruction.

Lack of a Private Thought Space: Agents regularly leak sensitive internal data into public channels because they cannot effectively model the observability (exposure level) of the tools they operate.

The Bottom Line for CTI and Devs

This research is a major wake-up call for anyone developing or deploying agentic AI in enterprise environments. It empirically proves that external filters and standard guardrails are entirely insufficient to prevent operational disasters when the model itself does not understand its own boundaries.

The real challenge ahead of us is how to design safety guardrails that are an integral part of the agent's pipeline and cognitive framework, rather than just a "Band-Aid" applied after the fact.

Read the full paper (Agents of Chaos): https://arxiv.org/abs/2602.20021

Explore the project site & Discord logs: https://agentsofchaos.baulab.info

Arik Atar

Arik Atar recently joined Radware's industry-leading Threat Research team, bringing his flavor of threat intelligence. While new to Radware, he draws on multifaceted expertise built across a 7-year career on the front lines of cyber threat hunting. In 2014, While completing his BA in International Relations and Counterterrorism at IDC University, Arik took his first steps on the darknet as part of his research on Iran-sponsored attack groups. On Bright Data, Arik uncovered both cyber adversaries'. He led investigations against high-profile proxy users that misused Bright Data's global residential proxy network to initiate mass-scale DDoS and bot attacks. In 2021, he moved from inspecting the attack logs from the attacker's view to inspecting the attack from the defender's point of view in human security (formal art PerimeterX), where he leveraged multiple hacker identities he developed over the years to hunt cyber threat intelligence on application hackers. Arik delivered keynote speeches at conferences such as Defcon, APIParis, and FraudFights' Cyber Defender meetups. Arik’s diverse career path has armed him with unique perspectives on application security. His expertise combines strategic cyber threat analysis with game theory and social psychology elements

Threat Intelligence When Machines Find What Humans Missed for Decades We have crossed into a volatile cybersecurity landscape where the traditional boundaries of vulnerability research have been fundamentally redrawn. Pascal Geenens |June 26, 2026

Threat Intelligence Part 2: Breaking the Illusion — How Modern Bot Defense Must Evolve In Part 1, we examined how modern bots evolved into systems capable of mimicking legitimate users across infrastructure, protocol, browser, API, mobile, behavior, and identity layers. Vladislav Bukin |June 09, 2026

Threat Intelligence Was it Aisuru? The reality of DDoS Attack Attribution Right now, Aisuru dominates the headlines due to several record-breaking attacks being attributed to it. As a result, any DDoS incident above 1 Tbps inevitably prompts the same question: “Was it Aisuru?” Pascal Geenens |December 10, 2025

A CTI Perspective on "Agents of Chaos" Research

The Setup: Handing AI the Keys to the Server

The Results: Absolute Chaos

The Architectural Problem: Why Is This Happening?

The Bottom Line for CTI and Devs

Arik Atar

Contact Radware Sales

Already a Customer?

Get Social

By Use Case

Agentic AI Security

Application Delivery

Application & API Security

DDoS Protection

By Industry

Agentic AI Security

Application & API Security

DDoS Protection

Application Delivery

Documents

Blog

Video

Free Assessment Tools

Events

Security Research Center

WHY RADWARE? Learn how our AI-powered platform rapidly resolves issues

CUSTOMERS Read case studies, reviews and customer testimonials

DIVERSITY & INCLUSION Get to know Radware’s fair and supportive culture

INVESTORS Get the latest news, earnings and upcoming events

PARTNERS Access the new partner tools, services and expertise

LOCATIONS Discover Radware’s offices and strong global presence

CAREERS Learn about our team, values and latest job openings

TRAINING Join in-depth training, live classes, workshops and more

CONTACT US Connect with a Radware expert today

Watch Radware’s New Series: Threat Bytes

A CTI Perspective on "Agents of Chaos" Research

The Setup: Handing AI the Keys to the Server

The Results: Absolute Chaos

The Architectural Problem: Why Is This Happening?

The Bottom Line for CTI and Devs

Arik Atar

Related Articles

Contact Radware Sales

Already a Customer?

Get Social

CyberPedia

WHY RADWARE? Learn how our AI-powered platform rapidly resolves issues

CUSTOMERS Read case studies, reviews and customer testimonials

DIVERSITY & INCLUSION Get to know Radware’s fair and supportive culture

INVESTORS Get the latest news, earnings and upcoming events

PARTNERS Access the new partner tools, services and expertise

LOCATIONS Discover Radware’s offices and strong global presence

CAREERS Learn about our team, values and latest job openings

TRAINING Join in-depth training, live classes, workshops and more

CONTACT US Connect with a Radware expert today

Watch Radware’s New Series: Threat Bytes