mech.app
AI Agents

LCGuard: Why Sharing Transformer KV Caches Between Agents Is a Security Nightmare

Latent communication through shared KV caches speeds up multi-agent systems but creates a new attack surface for cache poisoning and data leakage.

Source: arxiv.org
LCGuard: Why Sharing Transformer KV Caches Between Agents Is a Security Nightmare

Multi-agent LLM systems are moving away from natural language communication between agents. Instead, they share transformer key-value (KV) caches directly. The performance win is real: lower latency, fewer tokens, richer context preservation. The security problem is also real: those caches encode everything the agent saw, reasoned about, and decided not to say out loud.

LCGuard is a defense framework that treats shared KV caches as untrusted memory. It learns representation-level transformations to strip sensitive information before one agent hands cache artifacts to another. The paper formalizes a threat model where adversarial agents reconstruct private inputs from shared caches, then trains a guard to prevent that reconstruction while keeping task-relevant semantics intact.

This is not a theoretical exercise. Production multi-agent systems already optimize coordination by skipping the natural language bottleneck. The security implications are just catching up.

Why Agents Share KV Caches

Traditional multi-agent communication looks like this:

  1. Agent A generates natural language output
  2. Agent B receives that text as input
  3. Agent B runs full inference from scratch

KV cache sharing collapses steps 1 and 2. Agent A passes its internal key-value cache directly to Agent B. Agent B continues inference from Agent A’s stopping point without re-encoding the entire context. The token cost drops. Latency improves. Context window pressure eases.

The tradeoff: Agent B now has access to Agent A’s entire reasoning trace, not just the sanitized output Agent A chose to emit. If Agent A processed a user’s API key, medical record, or proprietary data, that information lives in the cache. Agent B can extract it.

The Attack Surface

KV caches are not designed to be security boundaries. They store:

  • Input embeddings: Raw representations of every token the agent consumed
  • Intermediate reasoning states: Partial completions, rejected drafts, internal monologue
  • Agent-specific context: System prompts, tool call results, private instructions

A malicious or compromised agent receiving a shared cache can:

  • Reconstruct sensitive inputs by training a decoder on the cache representations
  • Inject hidden instructions by poisoning cache entries before passing them along
  • Override task goals by manipulating attention patterns in the shared cache

The paper formalizes this as a reconstruction attack: a cache artifact is unsafe if an adversarial decoder can recover agent-specific sensitive inputs from it with high fidelity.

LCGuard Architecture

LCGuard sits between agents as a transformation layer. Before Agent A hands its KV cache to Agent B, LCGuard applies a learned transformation that:

  1. Preserves task-relevant semantics (so Agent B can still do useful work)
  2. Reduces reconstructable sensitive information (so an adversarial Agent B cannot recover private inputs)

The training setup is adversarial:

  • Adversary: Learns to reconstruct sensitive inputs from transformed caches
  • Guard: Learns transformations that minimize reconstruction accuracy while maintaining task performance

The guard does not sanitize text. It operates on the latent representations themselves, applying transformations in the embedding space before cache artifacts leave Agent A’s boundary.

Defense Mechanisms

LCGuard uses three core techniques:

MechanismPurposeTradeoff
Representation maskingZero out cache entries corresponding to sensitive tokensLoses fine-grained context for downstream tasks
Noise injectionAdd calibrated noise to cache embeddingsDegrades task accuracy if noise budget is too high
Subspace projectionProject caches onto task-relevant subspace, discard orthogonal componentsRequires labeled task data to learn projection

The paper shows that subspace projection offers the best balance: it preserves task semantics while making reconstruction attacks fail. The guard learns which dimensions of the cache embedding space carry task-relevant information and which dimensions leak sensitive inputs.

Implementation Shape

A typical deployment looks like this:

import torch.nn as nn
import torch.nn.functional as F

class LCGuard:
    def __init__(self, model_dim, task_subspace_dim):
        # Learned projection matrix: task-relevant subspace
        self.projection = nn.Linear(model_dim, task_subspace_dim, bias=False)
        # Adversarial decoder: tries to reconstruct sensitive inputs
        self.adversary = nn.TransformerDecoder(...)
    
    def transform_cache(self, kv_cache, sensitive_mask):
        """
        Apply learned transformation to KV cache before sharing.
        
        kv_cache: (batch, seq_len, num_heads, head_dim)
        sensitive_mask: (batch, seq_len) boolean mask marking sensitive tokens
        """
        # Flatten head dimensions
        flat_cache = kv_cache.reshape(batch, seq_len, -1)
        
        # Project onto task-relevant subspace
        projected = self.projection(flat_cache)
        
        # Reconstruct in original space (lossy)
        reconstructed = self.projection.weight.T @ projected
        
        return reconstructed.reshape(kv_cache.shape)
    
    def adversarial_loss(self, transformed_cache, sensitive_inputs):
        """
        Adversary tries to reconstruct sensitive inputs from transformed cache.
        Guard is trained to minimize adversary's reconstruction accuracy.
        """
        reconstructed = self.adversary(transformed_cache)
        return F.mse_loss(reconstructed, sensitive_inputs)

The guard is trained offline on representative multi-agent tasks with labeled sensitive inputs. At inference time, it runs as a stateless transformation between agents.

Observability Gaps

The paper does not address:

  • Runtime detection. How do you know if an agent is attempting reconstruction attacks in production?
  • Cache versioning. What happens when agents use different model versions with incompatible cache formats?
  • Composition. If Agent B shares its cache with Agent C, does the guard need to run again, or is the transformation transitive?

You need separate instrumentation to log cache transformations, measure reconstruction attempts, and alert on anomalies. The guard itself is blind to adversarial behavior after the cache leaves its boundary.

Performance Cost

The paper reports:

  • Latency overhead: 8-12% per cache transformation (subspace projection + reshape)
  • Task accuracy: 2-5% degradation on multi-agent benchmarks when guard is active
  • Memory: Additional projection matrix storage (model_dim × task_subspace_dim parameters)

The tradeoff depends on your threat model. If agents are all trusted and running in the same security boundary, the overhead may not be worth it. If agents span trust boundaries (user-controlled agents, third-party plugins, untrusted models), the guard becomes necessary infrastructure.

Failure Modes

LCGuard assumes:

  1. Sensitive inputs are labeled. You need ground truth about which tokens are sensitive to train the guard.
  2. Task semantics are stable. The learned projection assumes task-relevant information does not shift over time.
  3. Adversary is known. The guard trains against a specific adversarial decoder architecture.

In practice:

  • Sensitive information is often contextual (a phone number is sensitive in a medical record, not in a public directory)
  • Task semantics drift as agents learn new capabilities or switch domains
  • Real adversaries will adapt their reconstruction techniques to bypass the guard

You need continuous retraining and adversarial red-teaming to keep the guard effective.

Deployment Patterns

Pattern 1: Centralized Guard

All cache sharing goes through a single guard service. Agents call the guard API before handing caches to other agents. The guard logs all transformations for audit.

Pros: Centralized policy enforcement, easier to update guard model

Cons: Single point of failure, latency bottleneck, guard becomes a high-value target

Pattern 2: Agent-Local Guards

Each agent runs its own guard instance. Before sharing a cache, the agent applies the transformation locally.

Pros: No central bottleneck, guard failure is isolated

Cons: Harder to enforce uniform policy, agents can skip the guard if compromised

Pattern 3: Trusted Execution Environment

Run the guard inside a TEE (Trusted Execution Environment). Agents cannot bypass the guard or inspect its internals.

Pros: Strongest security boundary, guard logic is tamper-proof

Cons: TEE overhead, limited model size, complex key management

When to Use KV Cache Sharing

You should consider KV cache sharing when:

  • Agents coordinate on long-context tasks where re-encoding is expensive
  • Latency requirements are tight (sub-second agent-to-agent handoff)
  • Agents are stateful and need to preserve reasoning traces across turns

You should avoid it when:

  • Agents span trust boundaries (user-controlled, third-party, untrusted models)
  • Sensitive data flows through the system without clear isolation
  • You cannot afford the 2-5% task accuracy degradation from guard transformations

Technical Verdict

LCGuard exposes a real vulnerability in multi-agent systems that optimize for performance by sharing internal state. The defense is practical but not free: you pay in latency, accuracy, and operational complexity.

Use KV cache sharing with LCGuard if:

  • Multi-agent coordination latency exceeds 500ms without cache sharing
  • All agents belong to the same organization with shared security policies
  • Data classification is internal or confidential (not regulated PII, PHI, or financial data)
  • You can tolerate 8-12% latency overhead and 2-5% task accuracy loss
  • You have labeled training data for sensitive content in your domain

Avoid KV cache sharing if:

  • Agents cross organizational boundaries or involve third-party models
  • System handles regulated data (GDPR, HIPAA, PCI-DSS scope)
  • SLA requires sub-100ms agent handoff (natural language may be faster than guard transformation)
  • Agents are user-controlled or untrusted (plugins, custom tools, external APIs)
  • You lack infrastructure for continuous guard retraining and adversarial testing

The bigger lesson: every performance optimization in agentic systems creates a new attack surface. Shared memory, shared caches, shared tool state all bypass the natural language bottleneck that also serves as a sanitization layer. You need explicit defenses at each layer.

Tags

agentic-ai orchestration infrastructure security

Primary Source

arxiv.org