Ghost Tool Calls: How Speculative Agent Execution Leaks Intent Before Commitment

Speculative execution in tool-augmented agents hides latency by issuing likely future tool calls before the agent commits to a decision path. The problem: external services receive those calls and retain the disclosure even after the agent abandons the branch. A new ArXiv paper (2606.02483v1) formalizes this as “ghost tool calls” and proposes issue-time privacy contracts to prevent leakage at dispatch rather than cleanup after the fact.

This is not a theoretical edge case. Modern agent runtimes speculatively dispatch tool calls to third-party APIs, internal microservices, and observability backends to mask network round-trip time. Every observer that receives a speculative call learns something about inferred user intent, whether or not the agent ever commits to that branch.

The Privacy Boundary Problem

Traditional speculative execution in CPUs rolls back state when a branch prediction fails. Ghost tool calls cannot be rolled back because the disclosure happens at issue time, not consumption time.

Why rollback fails for tool calls:

External services log the request before returning a response
Read-only calls still disclose query patterns and inferred intent
Access control allow-lists do not prevent observation by allowed services
Post-hoc filters cannot unsend what an observer already holds

The paper introduces “issue-time privacy” as a distinct requirement: preventing leakage at the moment of dispatch, not after the agent decides which branch to commit.

Speculative Tool Privacy Contracts

The authors propose a runtime abstraction that treats observation before commitment as a first-class effect, separate from state mutation. A Speculative Tool Privacy Contract wraps each tool call with a policy that modifies or suppresses the call’s arguments or destination before dispatch.

Contract enforcement points:

Argument projection: Redact or generalize parameters before sending (e.g., send city instead of full address)
Destination projection: Route speculative calls to a privacy-preserving proxy instead of the real service
Dispatch suppression: Block the call entirely if the privacy cost exceeds the latency benefit
Commitment binding: Only issue the full call after the agent commits to the branch

The runtime evaluates policies at dispatch time, not at result consumption. This shifts the privacy boundary from “what did the agent use” to “what did external observers see.”

Implementation Architecture

The prototype runtime intercepts tool calls between the agent orchestrator and external services. Each tool has an associated privacy contract that the runtime evaluates before dispatch.

class SpeculativeToolContract:
    def __init__(self, tool_name, policy):
        self.tool_name = tool_name
        self.policy = policy
    
    def dispatch(self, args, speculative=False):
        if speculative:
            # Apply privacy policy before sending
            projected_args = self.policy.project_args(args)
            projected_dest = self.policy.project_destination()
            
            if self.policy.should_suppress(args):
                return SpeculativeStub()
            
            return self.send(projected_dest, projected_args)
        else:
            # Committed call uses full arguments
            return self.send(self.tool_name, args)

Policy types evaluated:

Generalization: Replace precise values with broader categories
Noise injection: Add differential privacy noise to numeric arguments
Proxy routing: Send speculative calls to a local cache or mock service
Delayed dispatch: Wait for partial commitment before issuing the call

Privacy vs Latency Trade-offs

The paper evaluates twelve policies across three corpora and measures the impact on both privacy leakage and latency hiding effectiveness.

Policy	Privacy Gain	Latency Penalty	Implementation Complexity
Post-hoc filter	None	None	Low
Read-only restriction	None	None	Low
Access control allow-list	None	None	Medium
Argument generalization	Medium	Low	Medium
Destination proxy	High	Medium	High
Dispatch suppression	High	High	Low
Commitment binding	Highest	Highest	Medium

Key finding: Post-hoc filters, read-only restrictions, and access control allow-lists provide zero privacy gain because the observer already received the speculative call. Only issue-time policies that modify the call before dispatch reduce leakage.

Observability Implications

Speculative tool calls create noise in observability pipelines. Tracing systems record abandoned branches, metrics count tool invocations that never contribute to final output, and log aggregators store inferred intent that the agent discarded.

Observability challenges:

Distinguishing speculative from committed calls in distributed traces
Preventing speculative call metrics from skewing latency percentiles
Avoiding alert fatigue from abandoned error paths
Maintaining audit trails without retaining ghost disclosures

The runtime needs to tag speculative calls with metadata that observability backends can filter or aggregate separately. Without this, operators cannot distinguish real failures from speculative branch exploration.

Concrete example: A speculative database query in a financial agent leaks account lookup patterns to observability backends. Even if the agent abandons the branch, the trace shows which accounts were considered. In regulated environments, this speculative disclosure may violate data residency or audit requirements.

Security Boundaries

Ghost tool calls cross security boundaries that traditional agent sandboxing does not address. A sandboxed agent can still leak intent to external services through speculative dispatch, even if it cannot mutate state.

Boundary violations:

Speculative calls to third-party APIs disclose user queries before commitment
Internal microservices log inferred intent that may violate data residency rules
Observability backends retain speculative traces in jurisdictions with different privacy laws
Caching layers store abandoned query patterns that reveal sensitive inference

Issue-time privacy contracts need to integrate with existing security boundaries: network policies, service mesh authorization, and data classification labels. A speculative call to a PII-sensitive service should trigger stricter projection policies than a call to a public API.

Deployment Patterns

Pattern 1: Proxy-first speculation

Route all speculative calls to a local proxy that returns cached or synthetic responses. Only committed calls reach external services. This maximizes privacy but requires maintaining a high-quality cache.

Pattern 2: Tiered projection

Apply different projection policies based on service sensitivity. Public APIs receive full speculative calls, internal services receive generalized arguments, PII-sensitive services receive no speculative calls.

Pattern 3: Commitment threshold

Issue speculative calls only after the agent reaches a confidence threshold. Low-confidence branches remain local until the agent commits or abandons them.

Pattern 4: Differential dispatch

Add calibrated noise to speculative call arguments using differential privacy. External observers cannot distinguish individual user intent from the noisy aggregate.

Failure Modes

Projection policy mismatch: The privacy policy generalizes arguments too aggressively, causing the speculative call to return useless results. The agent cannot hide latency because it must re-issue the call with full arguments after commitment.

Proxy staleness: The local proxy returns outdated cached responses for speculative calls. The agent makes decisions based on stale data, then discovers the mismatch when issuing the committed call.

Observability blind spots: Suppressing speculative calls from observability pipelines hides real performance problems. Operators cannot debug latency issues because the runtime filtered out the speculative attempts.

Policy bypass: The agent framework issues tool calls directly without going through the contract runtime. External services receive unprotected speculative calls, defeating the privacy boundary.

Technical Verdict

Use speculative tool privacy contracts when:

Your agent calls third-party APIs that log requests before returning responses
Inferred user intent has privacy or compliance implications (healthcare, finance, legal)
Observability pipelines need to distinguish speculative from committed calls
You can tolerate latency penalties from projection or suppression policies

Avoid or defer when:

All tool calls target internal services you control and can modify to ignore speculative tags
Latency hiding is critical and privacy leakage is acceptable (public data, low-sensitivity queries)
Your agent framework does not support intercepting tool dispatch
You lack the infrastructure to run privacy-preserving proxies or caches

The core insight is that timing matters. Authorization, sandboxing, and post-hoc filtering do not prevent disclosure at issue time. If you are hiding latency with speculative execution, you need issue-time privacy policies or you are leaking inferred intent to every observer that receives the call.