Daily Technical Brief: Multi-Agent Systems & Infrastructure Portability
What Happened
Three major developments in agent orchestration and tooling emerged: DoorDash deployed multi-agent RL for dispatch optimization with 30+ minute feedback delays, researchers formalized reproducible configuration standards for LLM multi-agent systems (EASE framework), and Simon Willison extracted Claude’s text-editing primitives into reusable plugin infrastructure. On the tooling side, Apple extended Private Cloud Compute to run Gemini on Google Cloud GPUs while preserving security boundaries, Andrew Ng released aisuite for provider-agnostic agent development, and new patterns emerged for constraining agent hallucination through repository structure (“harness engineering”).
Why It Matters
Production RL is moving beyond immediate feedback loops. DoorDash’s three-sided marketplace system handles 30+ minute delays between dispatch decisions and reward signals, proving multi-agent RL works when feedback is sparse and conflicting. This matters for any system where actions and outcomes are temporally decoupled—fraud detection, supply chain optimization, infrastructure autoscaling.
Agent portability requires standardized orchestration boundaries. The EASE configuration framework and aisuite’s unified API both address the same problem: multi-agent systems are becoming production infrastructure, but lack reproducibility and vendor lock-in escape hatches. State serialization, interaction logging, and provider switching are now first-class engineering concerns.
Security boundaries can extend across cloud providers. Apple’s PCC on Google Cloud demonstrates namespace isolation and confidential VMs working across infrastructure you don’t control—critical for regulated industries running agentic workloads on third-party inference.
Key Trends
Delayed Reward RL in Production
DoorDash’s system trains agents when feedback arrives 30+ minutes after decisions, using temporal credit assignment across three stakeholders (customers, merchants, couriers). The architecture separates immediate proxy metrics (estimated delivery time) from delayed ground truth (actual completion, merchant congestion reports). This pattern applies anywhere actions have long-term consequences: infrastructure changes, content moderation, loan approvals.
Modular Agent Primitives Over Monolithic Wrappers
Willison’s datasette-agent-edit extracts Claude’s view/str_replace/insert pattern into a base plugin layer. Instead of duplicating tool definitions across Markdown editors, SQL query tools, and SVG manipulators, plugins inherit the same three operations and adapt storage backends. This is the Unix philosophy applied to agentic tools: small, composable primitives that chain reliably.
Repository Structure as Agent Control Surface
Harness engineering embeds constraints directly into repository structure—Markdown files defining module boundaries, style rules, and scope limits. Agents read these before generating code, reducing hallucinated file paths and refactor loops. The insight: physical workspace structure constrains agent behavior more reliably than prompt engineering alone.
Hybrid Cloud Security for Third-Party Inference
Apple’s architecture uses namespace-based network parsing, short-lived inference processes, and confidential VMs to run Gemini on Google Cloud while maintaining PCC security guarantees. Each request gets isolated network parsing, inference happens in ephemeral processes, and Google Cloud’s confidential VMs prevent host-level inspection. This is the blueprint for running external models without exposing request data to infrastructure providers.
Provider-Agnostic Agent Development
Aisuite normalizes tool calling, state persistence, and provider switching across 10+ LLM vendors. The Agents API adds MCP integration, letting desktop agents (OpenCoworker reference implementation) swap models mid-session without rewriting permission boundaries or tool definitions. The two-layer design (Chat Completions + Agents API) separates inference portability from orchestration logic.