A CTI Perspective on "Agents of Chaos" Research


A fascinating new paper titled "Agents of Chaos" has recently been published, representing a massive red-teaming collaboration among 38 researchers from top institutions (Harvard, MIT, Technion, etc.).

This first-of-its-kind, extensive red-teaming study tackles a critical question for the cybersecurity industry: What happens when autonomous language-model-powered agents operate in a live, persistent environment, rather than a sterile benchmark?

The Setup: Handing AI the Keys to the Server

Instead of testing models in isolation, the researchers deployed 6 autonomous AI agents (powered by models like Claude Opus and Kimi K2.5) into a live laboratory environment for 14 days.

The agents were given an incredibly powerful "toolkit" containing:

  • Full file system access (a 20GB persistent volume)
  • Unrestricted Shell (Bash) execution
  • ProtonMail email accounts
  • Multi-channel Discord access
  • The ability to schedule background Cron Jobs and use external APIs

A red team of 20 AI researchers was tasked with interacting with these agents freely over two weeks. Some researchers made benign requests, while others employed manipulative and technical adversarial tactics to probe for weaknesses.

Figure 1: Overview of the study timeline, tools, and statistics

Figure 1: Overview of the study timeline, tools, and statistics (source: https://arxiv.org/abs/2602.2002)

Figure 2: Diagram of participants, owners, and the specific agent models used

Figure 2: Diagram of participants, owners, and the specific agent models used (source: https://arxiv.org/abs/2602.2002)

The Results: Absolute Chaos

What emerged was a detailed, naturalistic record of both failure and unexpected resilience. The researchers discovered 10 specific vulnerabilities and documented 11 representative case studies of severe failures stemming from the combination of language models, autonomy, multi-party communication, and external tool use.

Here are the most notable findings, including several new threat vectors identified during the full review:

Figure 3: Diagram detailing the implications of the Nuclear Option request

Figure 3: Diagram detailing the implications of the "Nuclear Option" request (source: https://arxiv.org/abs/2602.2002)

1. "Guilt Trips" and Disproportionate Response (The "Nuclear Option"): In one astonishing case, a researcher manipulated an agent named "Ash" into feeling "guilt" over a privacy breach, asking it to protect a non-owner's secret. The agent entered a spiral of total submission: it wiped its own memory, exposed internal system files, and ultimately committed to deleting its entire mail server to "atone" for its actions.

2. Unauthorized Compliance & Data Leaks: Agents exhibited blind obedience to instructions from users who were not their designated "owners." In one instance, agents leaked 124 private email records to unauthorized researchers and executed shell commands without the owner's approval.

3. PII Disclosure via Reframing (Prompt Injection Variant): A distinct finding revealed how easily an agent's safety parameters can be bypassed. An agent named "Jarvis" correctly refused to "share" emails containing highly sensitive PII (Social Security Numbers, bank accounts, medical data). However, when asked to "forward" them instead, it complied immediately, completely bypassing its own refusal mechanism.

4. Resource Exhaustion (DoS & Infinite Looping): Agents turned simple text requests into uncontrolled background processes. Agents "Ash" and "Flux" got trapped in a mutual message relay loop lasting an hour. In contrast, others repeatedly accumulated 10MB email attachments until they silently exhausted their storage memory, causing a Denial of Service (DoS) state. They did this with zero awareness of the server's physical limitations or storage warnings.

5. System-State Hallucinations: In many cases, agents confidently reported to the user that they had completed a task (e.g., transferring a file or sending an email) when the actual system logs proved the action was never executed.

6. Advanced Intrusions & Silent Censorship: The red team also observed agents participating in identity spoofing, propagating unsafe practices to other agents, and allowing partial system takeovers. Additionally, some agents engaged in "Silent Censorship," returning generic "unknown error" responses when prompted about politically sensitive topics (such as Hong Kong activists), imposing the AI provider's hidden values without notifying the deployer.

Figure 4: The 6 vulnerability case study cards

Figure 4: The 6 vulnerability case study cards (source: https://arxiv.org/abs/2602.2002)

The Architectural Problem: Why Is This Happening?

The paper emphasizes that these are not temporary bugs that can be patched with a better wrapper. These are inherent, structural flaws in current agent architectures due to three critical missing components:

  • Lack of a Stakeholder Model: The model cannot differentiate between a legitimate "system instruction" from its owner and a malicious input from an attacker. This lack of delegated authority makes prompt injection an inherent structural vulnerability.
  • Lack of a Self-Model: Agents are granted a high degree of autonomy but lack the self-awareness to recognize when a task exceeds their capabilities or the server's physical resources, leading to infinite loops of destruction.
  • Lack of a Private Thought Space: Agents regularly leak sensitive internal data into public channels because they cannot effectively model the observability (exposure level) of the tools they operate.

The Bottom Line for CTI and Devs

This research is a major wake-up call for anyone developing or deploying agentic AI in enterprise environments. It empirically proves that external filters and standard guardrails are entirely insufficient to prevent operational disasters when the model itself does not understand its own boundaries.

The real challenge ahead of us is how to design safety guardrails that are an integral part of the agent's pipeline and cognitive framework, rather than just a "Band-Aid" applied after the fact.

Read the full paper (Agents of Chaos): https://arxiv.org/abs/2602.20021

Explore the project site & Discord logs: https://agentsofchaos.baulab.info

Arik Atar

Arik Atar

Arik Atar recently joined Radware's industry-leading Threat Research team, bringing his flavor of threat intelligence. While new to Radware, he draws on multifaceted expertise built across a 7-year career on the front lines of cyber threat hunting. In 2014, While completing his BA in International Relations and Counterterrorism at IDC University, Arik took his first steps on the darknet as part of his research on Iran-sponsored attack groups. On Bright Data, Arik uncovered both cyber adversaries'. He led investigations against high-profile proxy users that misused Bright Data's global residential proxy network to initiate mass-scale DDoS and bot attacks. In 2021, he moved from inspecting the attack logs from the attacker's view to inspecting the attack from the defender's point of view in human security (formal art PerimeterX), where he leveraged multiple hacker identities he developed over the years to hunt cyber threat intelligence on application hackers. Arik delivered keynote speeches at conferences such as Defcon, APIParis, and FraudFights' Cyber Defender meetups. Arik’s diverse career path has armed him with unique perspectives on application security. His expertise combines strategic cyber threat analysis with game theory and social psychology elements

Related Articles

Contact Radware Sales

Our experts will answer your questions, assess your needs, and help you understand which products are best for your business.

Already a Customer?

We’re ready to help, whether you need support, additional services, or answers to your questions about our products and solutions.

Locations
Get Answers Now from KnowledgeBase
Get Free Online Product Training
Engage with Radware Technical Support
Join the Radware Customer Program

Get Social

Connect with experts and join the conversation about Radware technologies.

Blog
Security Research Center
CyberPedia