Building a Three-Agent Engineering Team: What 96 Hours of Workflow Automation Reveals About Agent Coordination

Most multi-agent demos show parallel execution or simple sequential chains. A developer spent 96 hours building a three-agent engineering team that handles continuous integration, trend monitoring, and publication management on a single machine. The result exposes the coordination plumbing required when agents need to cover for each other’s failures and maintain state across long-running workflows.

The Role Specialization Pattern

The system runs three agents with distinct responsibilities:

Sakura (Orchestrator): Manages communication handoffs, tracks state dependencies, monitors execution anomalies
Zero (Lead Engineer): Handles code generation, structural decisions, aggressive refactoring
Hana (implied third role): Likely manages testing, deployment, or documentation based on the workflow context

This is not generic prompt chaining. Each agent has a defined backstory, operational constraints, and behavioral alignment tuned for its role. Sakura acts as the control plane. Zero executes technical decisions. The third agent fills gaps.

The key insight: you cannot rely on one-size-fits-all prompts when agents need to hand off context without losing execution state.

State Management Across Agent Boundaries

The 96-hour build revealed a core problem: behavioral alignment and multi-agent coordination break down when state transitions are implicit.

What breaks:

Agent B starts work before Agent A finishes writing output
Agent A produces valid syntax but semantically invalid results
Shared file system access creates race conditions
No clear owner when an integration test fails

What works:

Explicit handoff protocols with state checkpoints
Sakura monitors execution anomalies and decides which agent retries
Agents maintain independent publication spaces to avoid write conflicts
Continuous integration runs validate each agent’s output before the next agent starts

The architecture treats each agent as a microservice with defined input/output contracts. Sakura enforces those contracts.

Failure Recovery Without Full Replay

When Agent B depends on Agent A’s work but Agent A silently produces invalid output, you need observability primitives that do not require replaying the entire workflow.

The implemented approach:

Each agent logs execution state to a shared monitoring layer
Sakura tracks dependencies and detects anomalies
Agents “cover for each other’s execution failures” by retrying specific steps
The system maintains enough state to resume from the last valid checkpoint

This is not automatic. The developer spent 48 hours after the initial migration fixing permission states and behavioral alignment. The breakthrough came from treating coordination as a first-class architectural concern, not an afterthought.

Coordination Bottleneck Trade-offs

Pattern	Latency	Failure Isolation	Context Loss Risk
Central orchestrator (Sakura)	Higher (serial handoffs)	Good (single point of control)	Low (explicit state tracking)
Peer-to-peer handoffs	Lower (parallel execution)	Poor (cascading failures)	High (implicit dependencies)
Event-driven queue	Medium (async processing)	Excellent (dead letter queues)	Medium (requires replay logic)

The developer chose central orchestration. Sakura introduces serialization overhead but prevents the coordination chaos that emerges when three agents try to coordinate directly.

The Local Execution Constraint

The entire system runs on a single machine using OpenClaw. This is not a cloud-native deployment. It is a local multi-agent workspace with:

Shared file system access (requires careful write coordination)
Local API rate limits (no distributed retry logic)
Single point of failure (if the machine goes down, everything stops)
No horizontal scaling (adding a fourth agent means rearchitecting handoffs)

The upside: no network latency, no cloud costs, full control over execution environment. The downside: scaling requires moving to a distributed architecture with message queues and remote state management.

What the 96-Hour Build Actually Solved

After the initial 48-hour migration battle, the second 48 hours addressed:

Behavioral alignment: Tuning each agent’s prompt configuration to prevent role confusion
Multi-agent coordination: Building explicit handoff protocols so agents do not step on each other
Execution monitoring: Giving Sakura enough observability to detect and recover from failures
State persistence: Ensuring agents can resume work without losing context

The result is a system that handles continuous code integrations, monitors macro tech trends, and manages independent publication spaces. It is not a demo. It is operational infrastructure running production workflows.

Code Execution and Repository Pipelines

The agents do not just process text. They run code. The workflow includes:

# Simplified handoff protocol
class AgentHandoff:
    def __init__(self, orchestrator):
        self.orchestrator = orchestrator
        self.state_checkpoint = {}
    
    def execute_with_recovery(self, agent, task, dependencies):
        # Validate dependencies before starting
        if not self.orchestrator.validate_state(dependencies):
            return self.orchestrator.retry_dependency_chain(dependencies)
        
        # Execute with state tracking
        result = agent.execute(task)
        self.state_checkpoint[agent.name] = result
        
        # Let orchestrator decide if output is valid
        if not self.orchestrator.validate_output(result):
            return self.orchestrator.assign_recovery(agent, task)
        
        return result

This is not the actual implementation, but it shows the pattern: every execution goes through the orchestrator, state is checkpointed, and validation happens before the next agent starts.

Observability Without Replay

The system needs to debug a three-agent pipeline without replaying the entire workflow. The solution:

Structured logging at each state transition
Sakura maintains a dependency graph of which agent produced which output
Execution anomalies trigger alerts before failures cascade
Agents log not just results but decision rationale

When a test fails, the developer can inspect the state checkpoint, see which agent produced the invalid input, and retry just that step. This is the difference between a 5-minute debug cycle and a 2-hour replay cycle.

The Next Experiment That Might Break Everything

The developer hints at a future experiment that could destabilize the system. Likely candidates:

Adding a fourth agent (requires rearchitecting handoff protocols)
Moving to distributed execution (introduces network partitions)
Allowing agents to spawn sub-agents (coordination complexity explodes)
Implementing self-modification (agents rewrite their own prompts)

Any of these changes would stress the current orchestration model. The system is stable because it has fixed roles, explicit handoffs, and a single control plane. Remove any of those constraints and you are back to coordination chaos.

Technical Verdict

Use this pattern when:

You need specialized agents with distinct responsibilities
Workflows require multi-step handoffs with state persistence
Failure recovery must happen without full replay
You can afford serialization overhead for coordination safety

Avoid this pattern when:

Latency matters more than correctness (parallel execution wins)
Agents do not depend on each other’s output (no coordination needed)
You need horizontal scaling (central orchestrator becomes bottleneck)
Workflows are simple enough for single-agent execution

The 96-hour build proves that multi-agent coordination is an infrastructure problem, not a prompt engineering problem. You need explicit handoff protocols, state management, and observability primitives. The developer built those primitives. The result is a three-agent system that handles real workflows without falling apart.

Source Links

Primary Article