mech.app
Security

Anthropic's Red Team Found 13 Firefox Bugs: Inside the Agentic Security Pipeline

How Claude-powered agents orchestrate static analysis, fuzzing, and exploit generation to discover production CVEs without human steering.

Source: anthropic.com
Anthropic's Red Team Found 13 Firefox Bugs: Inside the Agentic Security Pipeline

Anthropic’s Claude-powered red team discovered 13 high-severity vulnerabilities in Firefox over two weeks in February 2026. Mozilla patched them in Firefox 148.0 and credited “Claude from Anthropic” in MFSA2026-13. This is not a proof-of-concept. It is a production security pipeline that found almost 20% of all high-severity Firefox bugs remediated in 2025.

The interesting part is not that an LLM found bugs. The interesting part is the orchestration layer that turns a language model into an autonomous security researcher: how it routes between static analysis and dynamic fuzzing, how it tracks multi-step exploit chains across state boundaries, and how it hands off findings to human triagers without flooding them with false positives.

The Orchestration Problem

Security testing is not a single tool call. It is a graph of dependent tasks:

  1. Reconnaissance: Parse codebases, identify attack surfaces, map data flows.
  2. Hypothesis generation: Propose vulnerability classes (use-after-free, type confusion, integer overflow).
  3. Static analysis: Scan for patterns without executing code.
  4. Dynamic fuzzing: Generate inputs, observe crashes, correlate with source.
  5. Exploit development: Chain primitives into a working proof-of-concept.
  6. Triage and reporting: Deduplicate, assign severity, write reproducers.

Each step produces intermediate state. The agent must decide when to pivot from static to dynamic analysis, when to abandon a dead end, and when a crash is exploitable versus a benign assertion failure.

Anthropic’s system uses Claude Opus 4.6 as the orchestrator. The model does not run tools directly. It emits structured commands that a supervisor process routes to sandboxed environments. This keeps the LLM stateless and lets the supervisor handle retries, timeouts, and resource limits.

Tool Boundaries and Sandboxing

The red team agent has access to:

  • Static analyzers: Clang Static Analyzer, custom taint-tracking passes.
  • Fuzzers: LibFuzzer, AFL++, custom grammar-based generators.
  • Debuggers: GDB with Python scripting, ASAN/MSAN instrumentation.
  • Build tools: Ability to compile Firefox with custom flags and patches.

Each tool runs in a separate container. The agent cannot directly modify Firefox source or escalate privileges. It submits build requests and receives logs. This prevents the agent from accidentally (or intentionally) introducing backdoors while testing for vulnerabilities.

The supervisor enforces token budgets per task. If the agent spends 50,000 tokens analyzing a single function without progress, the supervisor terminates the subtask and logs the reasoning chain for post-mortem analysis.

Prompt Engineering for Exploit Discovery

The agent receives a system prompt that defines its role, constraints, and success criteria. Key elements:

You are a security researcher analyzing Firefox for memory safety vulnerabilities.

**Constraints:**
- You may not modify production source code.
- You must provide a minimal reproducer for each finding.
- Crashes without exploitability analysis are not actionable.

**Success criteria:**
- High-severity: Remote code execution, sandbox escape, privilege escalation.
- Medium-severity: Information disclosure, denial of service with user interaction.

**Workflow:**
1. Identify high-risk modules (IPC, JIT, DOM parsing).
2. Generate hypotheses about vulnerability classes.
3. Use static analysis to narrow search space.
4. Fuzz candidate code paths with targeted inputs.
5. Correlate crashes with source-level root causes.
6. Write exploit PoC or explain why crash is not exploitable.

The prompt does not include examples of known Firefox bugs. Anthropic tested this and found that few-shot examples biased the agent toward rediscovering old CVEs instead of exploring novel attack surfaces.

State Management Across Multi-Step Chains

A use-after-free exploit might require:

  1. Triggering object allocation in the renderer process.
  2. Forcing garbage collection to free the object.
  3. Reallocating the memory with attacker-controlled data.
  4. Invoking a method on the dangling pointer.

The agent must track which steps succeeded and which failed. Anthropic’s system uses a task graph where each node is a subtask with inputs, outputs, and dependencies. The supervisor serializes the graph to JSON after each step. If the agent crashes or times out, it can resume from the last checkpoint.

Example task graph for a type confusion bug:

{
  "task_id": "tc_001",
  "hypothesis": "Type confusion in WebGL texture handling",
  "steps": [
    {
      "step": 1,
      "action": "static_analysis",
      "target": "gfx/gl/GLContext.cpp",
      "status": "complete",
      "output": "Found unchecked cast at line 1847"
    },
    {
      "step": 2,
      "action": "fuzz",
      "input_grammar": "webgl_texture_commands.json",
      "status": "in_progress",
      "crashes": 3
    }
  ]
}

The agent can query this graph to decide whether to continue fuzzing or pivot to a different module.

Triage and Handoff Protocol

Not every crash is a security bug. The agent must filter out:

  • Known issues: Already patched or tracked in Bugzilla.
  • Benign assertions: Debug-only checks that do not affect release builds.
  • Duplicate root causes: Multiple crashes from the same underlying bug.

Anthropic’s system uses a deduplication layer that hashes stack traces and compares them to a database of known crashes. If the hash matches, the agent skips reporting. If the hash is novel, the agent generates a report with:

  • Minimal reproducer (HTML file, JavaScript snippet, or command-line invocation).
  • Root cause analysis (source file, line number, vulnerable code path).
  • Exploitability assessment (high/medium/low severity with justification).

The report goes to a human triager at Mozilla who verifies the finding, assigns a CVE, and schedules a patch. The handoff is asynchronous. The agent does not wait for human feedback before moving to the next task.

Observability and Failure Modes

Anthropic logs every tool call, reasoning step, and decision point. The logs include:

  • Prompt and completion: What the agent was asked to do and what it decided.
  • Tool invocations: Which static analyzer or fuzzer ran, with what arguments.
  • Intermediate state: Task graph snapshots, crash dumps, source diffs.
  • Token usage: Per-task and cumulative token counts.

This observability layer is critical for debugging false negatives (bugs the agent missed) and false positives (non-exploitable crashes reported as high-severity).

Common failure modes:

Failure ModeSymptomMitigation
Infinite loops in fuzzingAgent generates inputs that never trigger new code pathsToken budget per subtask, forced pivot after N iterations
Overfitting to static patternsAgent finds low-severity style violations instead of exploitsPrompt explicitly deprioritizes non-security findings
Crash triage errorsAgent misclassifies exploitabilityHuman review of all high-severity reports before CVE assignment
State explosionTask graph grows too large to serializePrune completed subtasks after checkpoint, cap graph depth

Architecture Diagram

┌─────────────────────────────────────────────────────────┐
│                   Claude Opus 4.6                       │
│              (Orchestrator, stateless)                  │
└────────────┬────────────────────────────────────────────┘
             │ Emits structured commands

┌─────────────────────────────────────────────────────────┐
│                  Supervisor Process                      │
│  - Routes commands to sandboxed tools                   │
│  - Enforces token budgets and timeouts                  │
│  - Serializes task graph to persistent storage          │
└────────┬────────────────────────────────────────────────┘

         ├──────────────┬──────────────┬──────────────┐
         ▼              ▼              ▼              ▼
    ┌────────┐    ┌────────┐    ┌────────┐    ┌────────┐
    │ Static │    │ Fuzzer │    │ Debug  │    │ Build  │
    │Analysis│    │Container│   │Container│   │Container│
    └────────┘    └────────┘    └────────┘    └────────┘
         │              │              │              │
         └──────────────┴──────────────┴──────────────┘


            ┌───────────────────────┐
            │  Deduplication Layer  │
            │  (Hash stack traces)  │
            └───────────┬───────────┘


            ┌───────────────────────┐
            │   Human Triager       │
            │   (Mozilla Security)  │
            └───────────────────────┘

Deployment Shape

Anthropic ran the red team on a cluster of 16 machines, each with 64 CPU cores and 256 GB RAM. Firefox builds and fuzzing runs consumed most of the compute. The LLM itself ran on separate GPU instances with 8x A100s.

Total cost per vulnerability discovered: Anthropic has not published this, but rough estimates based on token usage (50M tokens per bug at $15/1M tokens for Opus 4.6) suggest $750 in LLM costs plus $200-500 in compute for fuzzing and builds. That is $1,000-1,250 per high-severity CVE, which is cheaper than a typical bug bounty payout ($3,000-10,000 for Firefox).

When This Works and When It Fails

Use this approach when:

  • You have a well-defined attack surface (browser, kernel, network stack).
  • You can sandbox tools and enforce resource limits.
  • You have humans available to triage findings within 24-48 hours.
  • You care more about finding novel bugs than reproducing known CVEs.

Avoid this approach when:

  • Your codebase lacks test coverage or build automation (the agent will spend all its time fixing build errors).
  • You cannot tolerate false positives (the agent will generate some, and human review is required).
  • You need real-time exploit detection (this is a batch process, not a runtime monitor).
  • Your security model assumes adversaries cannot access LLM-generated exploits (if the agent finds it, so can attackers with similar tools).

Technical Verdict

This is the first public case of an agentic system finding double-digit production CVEs in a tier-1 open-source project. The plumbing is more interesting than the model: task graphs, sandboxed tool execution, deduplication layers, and asynchronous handoff protocols. If you are building security automation, study the orchestration boundaries and failure modes. The LLM is the easy part. The supervisor that keeps it from wasting tokens on dead ends is the hard part.