LCGuard: Why Sharing Transformer KV Caches Between Agents Is a Security Nightmare

Multi-agent LLM systems are moving away from natural language communication between agents. Instead, they share transformer key-value (KV) caches directly. The performance win is real: lower latency, fewer tokens, richer context preservation. The security problem is also real: those caches encode everything the agent saw, reasoned about, and decided not to say out loud.

LCGuard is a defense framework that treats shared KV caches as untrusted memory. It learns representation-level transformations to strip sensitive information before one agent hands cache artifacts to another. The paper formalizes a threat model where adversarial agents reconstruct private inputs from shared caches, then trains a guard to prevent that reconstruction while keeping task-relevant semantics intact.

This is not a theoretical exercise. Production multi-agent systems already optimize coordination by skipping the natural language bottleneck. The security implications are just catching up.

Traditional multi-agent communication looks like this:

Agent A generates natural language output
Agent B receives that text as input
Agent B runs full inference from scratch

KV cache sharing collapses steps 1 and 2. Agent A passes its internal key-value cache directly to Agent B. Agent B continues inference from Agent A’s stopping point without re-encoding the entire context. The token cost drops. Latency improves. Context window pressure eases.

The tradeoff: Agent B now has access to Agent A’s entire reasoning trace, not just the sanitized output Agent A chose to emit. If Agent A processed a user’s API key, medical record, or proprietary data, that information lives in the cache. Agent B can extract it.

The Attack Surface

KV caches are not designed to be security boundaries. They store:

Input embeddings: Raw representations of every token the agent consumed
Intermediate reasoning states: Partial completions, rejected drafts, internal monologue
Agent-specific context: System prompts, tool call results, private instructions

A malicious or compromised agent receiving a shared cache can:

Reconstruct sensitive inputs by training a decoder on the cache representations
Inject hidden instructions by poisoning cache entries before passing them along
Override task goals by manipulating attention patterns in the shared cache

The paper formalizes this as a reconstruction attack: a cache artifact is unsafe if an adversarial decoder can recover agent-specific sensitive inputs from it with high fidelity.

LCGuard Architecture

LCGuard sits between agents as a transformation layer. Before Agent A hands its KV cache to Agent B, LCGuard applies a learned transformation that:

Preserves task-relevant semantics (so Agent B can still do useful work)
Reduces reconstructable sensitive information (so an adversarial Agent B cannot recover private inputs)

The training setup is adversarial:

Adversary: Learns to reconstruct sensitive inputs from transformed caches
Guard: Learns transformations that minimize reconstruction accuracy while maintaining task performance

The guard does not sanitize text. It operates on the latent representations themselves, applying transformations in the embedding space before cache artifacts leave Agent A’s boundary.

Defense Mechanisms

LCGuard uses three core techniques:

Mechanism	Purpose	Tradeoff
Representation masking	Zero out cache entries corresponding to sensitive tokens	Loses fine-grained context for downstream tasks
Noise injection	Add calibrated noise to cache embeddings	Degrades task accuracy if noise budget is too high
Subspace projection	Project caches onto task-relevant subspace, discard orthogonal components	Requires labeled task data to learn projection

The paper shows that subspace projection offers the best balance: it preserves task semantics while making reconstruction attacks fail. The guard learns which dimensions of the cache embedding space carry task-relevant information and which dimensions leak sensitive inputs.

Implementation Shape

A typical deployment looks like this:

import torch.nn as nn
import torch.nn.functional as F

class LCGuard:
    def __init__(self, model_dim, task_subspace_dim):
        # Learned projection matrix: task-relevant subspace
        self.projection = nn.Linear(model_dim, task_subspace_dim, bias=False)
        # Adversarial decoder: tries to reconstruct sensitive inputs
        self.adversary = nn.TransformerDecoder(...)
    
    def transform_cache(self, kv_cache, sensitive_mask):
        """
        Apply learned transformation to KV cache before sharing.
        
        kv_cache: (batch, seq_len, num_heads, head_dim)
        sensitive_mask: (batch, seq_len) boolean mask marking sensitive tokens
        """
        # Flatten head dimensions
        flat_cache = kv_cache.reshape(batch, seq_len, -1)
        
        # Project onto task-relevant subspace
        projected = self.projection(flat_cache)
        
        # Reconstruct in original space (lossy)
        reconstructed = self.projection.weight.T @ projected
        
        return reconstructed.reshape(kv_cache.shape)
    
    def adversarial_loss(self, transformed_cache, sensitive_inputs):
        """
        Adversary tries to reconstruct sensitive inputs from transformed cache.
        Guard is trained to minimize adversary's reconstruction accuracy.
        """
        reconstructed = self.adversary(transformed_cache)
        return F.mse_loss(reconstructed, sensitive_inputs)

The guard is trained offline on representative multi-agent tasks with labeled sensitive inputs. At inference time, it runs as a stateless transformation between agents.

Observability Gaps

The paper does not address:

Runtime detection. How do you know if an agent is attempting reconstruction attacks in production?
Cache versioning. What happens when agents use different model versions with incompatible cache formats?
Composition. If Agent B shares its cache with Agent C, does the guard need to run again, or is the transformation transitive?

You need separate instrumentation to log cache transformations, measure reconstruction attempts, and alert on anomalies. The guard itself is blind to adversarial behavior after the cache leaves its boundary.

Performance Cost

The paper reports:

Latency overhead: 8-12% per cache transformation (subspace projection + reshape)
Task accuracy: 2-5% degradation on multi-agent benchmarks when guard is active
Memory: Additional projection matrix storage (model_dim × task_subspace_dim parameters)

The tradeoff depends on your threat model. If agents are all trusted and running in the same security boundary, the overhead may not be worth it. If agents span trust boundaries (user-controlled agents, third-party plugins, untrusted models), the guard becomes necessary infrastructure.

Failure Modes

LCGuard assumes:

Sensitive inputs are labeled. You need ground truth about which tokens are sensitive to train the guard.
Task semantics are stable. The learned projection assumes task-relevant information does not shift over time.
Adversary is known. The guard trains against a specific adversarial decoder architecture.

In practice:

Sensitive information is often contextual (a phone number is sensitive in a medical record, not in a public directory)
Task semantics drift as agents learn new capabilities or switch domains
Real adversaries will adapt their reconstruction techniques to bypass the guard

You need continuous retraining and adversarial red-teaming to keep the guard effective.

Deployment Patterns

Pattern 1: Centralized Guard

All cache sharing goes through a single guard service. Agents call the guard API before handing caches to other agents. The guard logs all transformations for audit.

Pros: Centralized policy enforcement, easier to update guard model

Cons: Single point of failure, latency bottleneck, guard becomes a high-value target

Pattern 2: Agent-Local Guards

Each agent runs its own guard instance. Before sharing a cache, the agent applies the transformation locally.

Pros: No central bottleneck, guard failure is isolated

Cons: Harder to enforce uniform policy, agents can skip the guard if compromised

Pattern 3: Trusted Execution Environment

Run the guard inside a TEE (Trusted Execution Environment). Agents cannot bypass the guard or inspect its internals.

Pros: Strongest security boundary, guard logic is tamper-proof

Cons: TEE overhead, limited model size, complex key management

You should consider KV cache sharing when:

Agents coordinate on long-context tasks where re-encoding is expensive
Latency requirements are tight (sub-second agent-to-agent handoff)
Agents are stateful and need to preserve reasoning traces across turns

You should avoid it when:

Agents span trust boundaries (user-controlled, third-party, untrusted models)
Sensitive data flows through the system without clear isolation
You cannot afford the 2-5% task accuracy degradation from guard transformations

Technical Verdict

LCGuard exposes a real vulnerability in multi-agent systems that optimize for performance by sharing internal state. The defense is practical but not free: you pay in latency, accuracy, and operational complexity.

Use KV cache sharing with LCGuard if:

Multi-agent coordination latency exceeds 500ms without cache sharing
All agents belong to the same organization with shared security policies
Data classification is internal or confidential (not regulated PII, PHI, or financial data)
You can tolerate 8-12% latency overhead and 2-5% task accuracy loss
You have labeled training data for sensitive content in your domain

Avoid KV cache sharing if:

Agents cross organizational boundaries or involve third-party models
System handles regulated data (GDPR, HIPAA, PCI-DSS scope)
SLA requires sub-100ms agent handoff (natural language may be faster than guard transformation)
Agents are user-controlled or untrusted (plugins, custom tools, external APIs)
You lack infrastructure for continuous guard retraining and adversarial testing

The bigger lesson: every performance optimization in agentic systems creates a new attack surface. Shared memory, shared caches, shared tool state all bypass the natural language bottleneck that also serves as a sanitization layer. You need explicit defenses at each layer.

Source Links

LCGuard: Latent Communication Guard for Safe KV Sharing in Multi-Agent Systems (ArXiv)

Why Agents Share KV Caches