Most multi-agent demos show parallel execution or simple sequential chains. A developer spent 96 hours building a three-agent engineering team that handles continuous integration, trend monitoring, and publication management on a single machine. The result exposes the coordination plumbing required when agents need to cover for each other’s failures and maintain state across long-running workflows.
The Role Specialization Pattern
The system runs three agents with distinct responsibilities:
- Sakura (Orchestrator): Manages communication handoffs, tracks state dependencies, monitors execution anomalies
- Zero (Lead Engineer): Handles code generation, structural decisions, aggressive refactoring
- Hana (implied third role): Likely manages testing, deployment, or documentation based on the workflow context
This is not generic prompt chaining. Each agent has a defined backstory, operational constraints, and behavioral alignment tuned for its role. Sakura acts as the control plane. Zero executes technical decisions. The third agent fills gaps.
The key insight: you cannot rely on one-size-fits-all prompts when agents need to hand off context without losing execution state.
State Management Across Agent Boundaries
The 96-hour build revealed a core problem: behavioral alignment and multi-agent coordination break down when state transitions are implicit.
What breaks:
- Agent B starts work before Agent A finishes writing output
- Agent A produces valid syntax but semantically invalid results
- Shared file system access creates race conditions
- No clear owner when an integration test fails
What works:
- Explicit handoff protocols with state checkpoints
- Sakura monitors execution anomalies and decides which agent retries
- Agents maintain independent publication spaces to avoid write conflicts
- Continuous integration runs validate each agent’s output before the next agent starts
The architecture treats each agent as a microservice with defined input/output contracts. Sakura enforces those contracts.
Failure Recovery Without Full Replay
When Agent B depends on Agent A’s work but Agent A silently produces invalid output, you need observability primitives that do not require replaying the entire workflow.
The implemented approach:
- Each agent logs execution state to a shared monitoring layer
- Sakura tracks dependencies and detects anomalies
- Agents “cover for each other’s execution failures” by retrying specific steps
- The system maintains enough state to resume from the last valid checkpoint
This is not automatic. The developer spent 48 hours after the initial migration fixing permission states and behavioral alignment. The breakthrough came from treating coordination as a first-class architectural concern, not an afterthought.
Coordination Bottleneck Trade-offs
| Pattern | Latency | Failure Isolation | Context Loss Risk |
|---|---|---|---|
| Central orchestrator (Sakura) | Higher (serial handoffs) | Good (single point of control) | Low (explicit state tracking) |
| Peer-to-peer handoffs | Lower (parallel execution) | Poor (cascading failures) | High (implicit dependencies) |
| Event-driven queue | Medium (async processing) | Excellent (dead letter queues) | Medium (requires replay logic) |
The developer chose central orchestration. Sakura introduces serialization overhead but prevents the coordination chaos that emerges when three agents try to coordinate directly.
The Local Execution Constraint
The entire system runs on a single machine using OpenClaw. This is not a cloud-native deployment. It is a local multi-agent workspace with:
- Shared file system access (requires careful write coordination)
- Local API rate limits (no distributed retry logic)
- Single point of failure (if the machine goes down, everything stops)
- No horizontal scaling (adding a fourth agent means rearchitecting handoffs)
The upside: no network latency, no cloud costs, full control over execution environment. The downside: scaling requires moving to a distributed architecture with message queues and remote state management.
What the 96-Hour Build Actually Solved
After the initial 48-hour migration battle, the second 48 hours addressed:
- Behavioral alignment: Tuning each agent’s prompt configuration to prevent role confusion
- Multi-agent coordination: Building explicit handoff protocols so agents do not step on each other
- Execution monitoring: Giving Sakura enough observability to detect and recover from failures
- State persistence: Ensuring agents can resume work without losing context
The result is a system that handles continuous code integrations, monitors macro tech trends, and manages independent publication spaces. It is not a demo. It is operational infrastructure running production workflows.
Code Execution and Repository Pipelines
The agents do not just process text. They run code. The workflow includes:
# Simplified handoff protocol
class AgentHandoff:
def __init__(self, orchestrator):
self.orchestrator = orchestrator
self.state_checkpoint = {}
def execute_with_recovery(self, agent, task, dependencies):
# Validate dependencies before starting
if not self.orchestrator.validate_state(dependencies):
return self.orchestrator.retry_dependency_chain(dependencies)
# Execute with state tracking
result = agent.execute(task)
self.state_checkpoint[agent.name] = result
# Let orchestrator decide if output is valid
if not self.orchestrator.validate_output(result):
return self.orchestrator.assign_recovery(agent, task)
return result
This is not the actual implementation, but it shows the pattern: every execution goes through the orchestrator, state is checkpointed, and validation happens before the next agent starts.
Observability Without Replay
The system needs to debug a three-agent pipeline without replaying the entire workflow. The solution:
- Structured logging at each state transition
- Sakura maintains a dependency graph of which agent produced which output
- Execution anomalies trigger alerts before failures cascade
- Agents log not just results but decision rationale
When a test fails, the developer can inspect the state checkpoint, see which agent produced the invalid input, and retry just that step. This is the difference between a 5-minute debug cycle and a 2-hour replay cycle.
The Next Experiment That Might Break Everything
The developer hints at a future experiment that could destabilize the system. Likely candidates:
- Adding a fourth agent (requires rearchitecting handoff protocols)
- Moving to distributed execution (introduces network partitions)
- Allowing agents to spawn sub-agents (coordination complexity explodes)
- Implementing self-modification (agents rewrite their own prompts)
Any of these changes would stress the current orchestration model. The system is stable because it has fixed roles, explicit handoffs, and a single control plane. Remove any of those constraints and you are back to coordination chaos.
Technical Verdict
Use this pattern when:
- You need specialized agents with distinct responsibilities
- Workflows require multi-step handoffs with state persistence
- Failure recovery must happen without full replay
- You can afford serialization overhead for coordination safety
Avoid this pattern when:
- Latency matters more than correctness (parallel execution wins)
- Agents do not depend on each other’s output (no coordination needed)
- You need horizontal scaling (central orchestrator becomes bottleneck)
- Workflows are simple enough for single-agent execution
The 96-hour build proves that multi-agent coordination is an infrastructure problem, not a prompt engineering problem. You need explicit handoff protocols, state management, and observability primitives. The developer built those primitives. The result is a three-agent system that handles real workflows without falling apart.