Agent Security Is a Systems Problem: Why Isolation, Sandboxing, and Least Privilege Matter More Than Prompt Filters

Most agent security research focuses on making the LLM itself more robust: prompt injection filters, alignment techniques, adversarial training. A new arXiv paper from researchers at Microsoft, Wisconsin, and UCSD argues this is backwards. The model is the untrusted component. Security invariants must be enforced at the system level, not the model level.

The paper (arXiv:2605.18991) analyzes eleven real-world agent attacks and shows how systems principles (process isolation, capability boundaries, resource limits) could have prevented them. This is not a theoretical exercise. Production agent deployments already face these problems when one agent needs to read Slack messages but should not delete channels, or when a code-execution agent must run untrusted scripts without leaking credentials.

The Core Argument

Treat the LLM as you would treat any untrusted binary. You do not rely on a user-space process to police itself. You enforce boundaries at the OS level: file permissions, network policies, syscall filtering, resource quotas. The same logic applies to agents.

Key principles from the paper:

Least privilege: An agent gets exactly the capabilities it needs, nothing more. A summarization agent does not need write access to your database.
Isolation: Agents run in separate execution contexts. One compromised agent cannot pivot to another.
Fail-safe defaults: If a capability check fails, the operation is denied. No fallback to a permissive mode.
Complete mediation: Every tool call, API request, and file access goes through an authorization layer. No shortcuts.

These are not new ideas. They are the foundation of operating system security. The paper’s contribution is showing how to apply them to agent architectures where the “process” is an LLM inference loop and the “syscalls” are tool invocations.

What This Looks Like in Practice

Capability Model for Tool Calls

An agent orchestrator maintains a capability map. Each agent gets a token that encodes its allowed tools and parameters. When the agent requests a tool call, the orchestrator checks the token before execution.

class AgentCapability:
    def __init__(self, agent_id: str, allowed_tools: dict[str, ToolPolicy]):
        self.agent_id = agent_id
        self.allowed_tools = allowed_tools  # tool_name -> ToolPolicy

class ToolPolicy:
    def __init__(self, read: bool, write: bool, scopes: list[str]):
        self.read = read
        self.write = write
        self.scopes = scopes  # e.g., ["channel:general", "user:self"]

def authorize_tool_call(capability: AgentCapability, tool_name: str, params: dict) -> bool:
    if tool_name not in capability.allowed_tools:
        return False
    
    policy = capability.allowed_tools[tool_name]
    
    # Check operation type
    if params.get("operation") == "delete" and not policy.write:
        return False
    
    # Check scope boundaries
    requested_scope = params.get("scope")
    if requested_scope not in policy.scopes:
        return False
    
    return True

This is a simplified example. Production systems need to handle dynamic scopes (e.g., “all channels the user has access to”), time-based policies, and audit logging. The point is that authorization happens outside the agent’s control.

Sandboxing Code Execution

Many agents need to run code: data analysis, file processing, API scripting. The naive approach is to call subprocess.run() and hope the LLM does not generate malicious commands. The systems approach is to run the code in a container with strict resource limits and no network access.

# gVisor runsc config for agent code execution
apiVersion: v1
kind: Pod
metadata:
  name: agent-code-executor
spec:
  runtimeClassName: gvisor
  containers:
  - name: executor
    image: python:3.11-slim
    securityContext:
      runAsNonRoot: true
      readOnlyRootFilesystem: true
      capabilities:
        drop: ["ALL"]
    resources:
      limits:
        cpu: "500m"
        memory: "256Mi"
        ephemeral-storage: "100Mi"
      requests:
        cpu: "100m"
        memory: "128Mi"
    volumeMounts:
    - name: workspace
      mountPath: /workspace
      readOnly: false
  volumes:
  - name: workspace
    emptyDir:
      sizeLimit: 50Mi

The container has no persistent storage, no network, and limited CPU. If the agent tries to mine cryptocurrency or exfiltrate data, it hits a wall. The orchestrator can also enforce execution time limits and kill runaway processes.

Resource Quotas Per Agent

In a multi-tenant system, one agent should not consume all available API quota or memory. The orchestrator tracks resource usage and enforces limits.

Resource Type	Limit Mechanism	Failure Mode
API calls	Token bucket per agent	429 response, backoff
Memory	cgroup limits	OOM kill, restart
CPU time	cgroup CPU quota	Throttling, timeout
Disk I/O	cgroup blkio	Slow writes, quota exceeded
Network egress	iptables rate limit	Dropped packets, retry

When an agent exceeds its quota, the system logs the event and either throttles or terminates the agent. This prevents one misbehaving agent from degrading the entire system.

Observability and Boundary Detection

How do you tell the difference between an agent legitimately using a tool and an agent exceeding its security boundary? The paper does not provide a full answer, but the primitives are clear:

Audit logs: Every tool call, with parameters, timestamp, and agent ID. Stored in an append-only log.
Anomaly detection: Baseline normal behavior (e.g., “this agent calls the Slack API 10 times per hour”) and flag deviations.
Capability violations: Any denied tool call is a potential security event. Log it, alert on repeated failures.
Resource usage metrics: Track CPU, memory, API calls per agent. Spike detection catches runaway loops or exfiltration attempts.

The key is that these signals are generated by the orchestrator, not the agent. The agent cannot suppress its own audit logs or reset its resource counters.

Trade-Offs and Implementation Challenges

Functionality vs. Security

Strict sandboxing breaks some agent use cases. If an agent needs to call arbitrary APIs, you cannot whitelist every possible endpoint. If it needs to process user-uploaded files, you cannot predict the file format. The solution is layered defenses:

Start with the most restrictive sandbox that still allows the core functionality.
Add capability escalation paths that require human approval.
Use runtime monitoring to detect unexpected behavior even within allowed capabilities.

Performance Overhead

Process isolation, capability checks, and resource accounting add latency. The paper does not quantify this, but production systems report 10-50ms overhead per tool call for authorization checks and 100-500ms for container startup. For long-running agents, this is acceptable. For high-frequency tool calls, you need to batch operations or use lighter-weight isolation (e.g., seccomp-bpf instead of full containers).

Complexity of Multi-Agent Systems

When agents call other agents, capability propagation becomes tricky. If Agent A delegates a task to Agent B, does B inherit A’s capabilities? Does it get a subset? The paper suggests explicit delegation tokens, but this requires careful design to avoid privilege escalation.

Real-World Attack Analysis

The paper examines eleven attacks, including:

Prompt injection leading to data exfiltration: Agent tricked into calling an API with attacker-controlled parameters. Systems defense: capability check would have blocked the unauthorized API endpoint.
Resource exhaustion via infinite loops: Agent generates code that runs forever. Systems defense: CPU quota and execution timeout.
Credential leakage through tool misuse: Agent reads environment variables and sends them to an external service. Systems defense: read-only filesystem, no network access from the execution sandbox.

In every case, the vulnerability existed because the system trusted the LLM to make correct decisions. The fix is to remove that trust and enforce boundaries externally.

Architecture Example: Multi-Agent Orchestrator with Capability Enforcement

┌─────────────────────────────────────────────────────────┐
│ Orchestrator (Trusted Computing Base)                   │
│                                                          │
│  ┌──────────────┐      ┌──────────────┐                │
│  │ Capability   │      │ Resource     │                │
│  │ Manager      │      │ Quota        │                │
│  │              │      │ Tracker      │                │
│  └──────────────┘      └──────────────┘                │
│         │                      │                        │
│         ▼                      ▼                        │
│  ┌─────────────────────────────────────┐               │
│  │ Tool Call Authorization Layer       │               │
│  └─────────────────────────────────────┘               │
│         │                                               │
└─────────┼───────────────────────────────────────────────┘
          │
          ▼
┌─────────────────────────────────────────────────────────┐
│ Agent Execution Environments (Untrusted)                │
│                                                          │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐ │
│  │ Agent 1      │  │ Agent 2      │  │ Agent 3      │ │
│  │ (gVisor)     │  │ (gVisor)     │  │ (gVisor)     │ │
│  │              │  │              │  │              │ │
│  │ LLM Inference│  │ LLM Inference│  │ LLM Inference│ │
│  │ Tool Calls   │  │ Tool Calls   │  │ Tool Calls   │ │
│  └──────────────┘  └──────────────┘  └──────────────┘ │
│                                                          │
└─────────────────────────────────────────────────────────┘

The orchestrator is the only trusted component. It holds the capability map, enforces resource limits, and mediates all tool calls. Agents run in isolated containers and cannot bypass the authorization layer.

Technical Verdict

Use this approach when:

You are deploying agents in production with access to sensitive data or critical systems.
You have multiple agents with different trust levels (e.g., user-facing agents vs. internal automation).
You need to enforce compliance requirements (SOC 2, GDPR) that demand audit logs and access controls.
You expect adversarial inputs (user-generated prompts, untrusted data sources).
Your infrastructure already supports container orchestration (Kubernetes, Docker, or equivalent).

Avoid this approach when:

You are prototyping in a sandboxed environment with no production data and no external tool access.
Your agents have read-only access to non-sensitive information and no ability to modify state.
Per-tool-call latency overhead exceeding 50ms breaks your use case (e.g., real-time chat interfaces requiring sub-100ms response times).
You lack infrastructure for container orchestration or cannot justify the operational complexity of running gVisor, seccomp-bpf, or equivalent sandboxing technology.
You have a single-agent system with no delegation, no tool calling, and no access to external APIs or filesystems.
Your team does not have systems engineering expertise to maintain capability managers, resource quotas, and audit pipelines.

The paper’s core insight is correct: you cannot secure agents by making the LLM smarter. You secure them by treating the LLM as untrusted and enforcing invariants at the system level. This requires infrastructure investment (orchestrators, capability managers, sandboxes), but it is the only path to predictable security guarantees in production.