Most agent projects start with two Markdown files: HANDOFF.md for current state and LOG.md for append-only history. This works until it doesn’t. The Dead Light Framework offers a three-question test to identify when file-based coordination becomes the bottleneck and what tier of infrastructure you actually need.
The File-Based State Pattern
The minimal setup looks like this:
- HANDOFF.md: Current snapshot. Agent reads this first on session start. Contains active tasks, decisions, blockers.
- LOG.md: Append-only history. Every meaningful action gets logged. HANDOFF is derived from LOG.
This pattern works because agents reset to zero memory each session. Files provide durable state without databases, message queues, or orchestration layers.
The failure mode is predictable: file I/O becomes a coordination bottleneck when multiple agents or sessions need concurrent access.
The Three-Question Test
Run this in three minutes to find your infrastructure tier:
1. How many concurrent sessions touch the same state?
- One session, one repo: Plain README or single HANDOFF file works.
- Multiple sessions, sequential: Two files (HANDOFF + LOG) handle it.
- Multiple sessions, concurrent: File locks break down. You need structured state.
2. How often does state change?
- Hourly or daily: Files are fine. Append to LOG, regenerate HANDOFF.
- Every few minutes: File writes start causing conflicts.
- Sub-minute: Files are already the bottleneck. Move to in-memory or database.
3. What happens if state gets corrupted?
- Rebuild from scratch in under 10 minutes: Files are acceptable.
- Rebuild takes hours or requires manual intervention: You need transactional storage.
- Can’t rebuild at all: You skipped this test too late.
Infrastructure Tiers
| Tier | State Storage | Coordination | When to Use |
|---|---|---|---|
| Plain README | Single markdown file | None | Solo prototypes, documentation-only projects |
| HANDOFF + LOG | Two markdown files | File-based, sequential | Single-session agents, low-frequency updates |
| Multi-unit paperwork | Multiple markdown files per domain | Directory structure, file naming conventions | Multiple agents, different domains, still sequential |
| Running service | Database, message queue, or state store | API, locks, transactions | Concurrent agents, high-frequency updates, production workloads |
Failure Modes of File-Based State
File coordination breaks in predictable ways:
Race conditions: Two agents read HANDOFF, both append to LOG, both write conflicting HANDOFF updates. Last write wins, first agent’s work disappears.
Lock contention: File locks serialize access. If Agent A holds the lock for 30 seconds while calling an LLM, Agent B waits. With three agents, you get a queue.
Merge conflicts: Git-based coordination (committing HANDOFF and LOG) creates merge conflicts when branches diverge. Resolving conflicts manually defeats the automation.
State size: Appending to LOG forever means file size grows unbounded. Reading a 50 MB markdown file on every session start adds latency.
No rollback: If an agent writes garbage to HANDOFF, you need manual intervention or a separate backup strategy. Files don’t give you transactions.
Migration Path: Files to Structured State
When the test says you need more than files, here’s the transition:
Step 1: Add a State Abstraction Layer
Wrap file access behind functions:
class StateManager:
def read_handoff(self) -> dict:
# Currently reads HANDOFF.md
# Later: query database
pass
def append_log(self, entry: dict):
# Currently appends to LOG.md
# Later: insert into time-series table
pass
def update_handoff(self, state: dict):
# Currently overwrites HANDOFF.md
# Later: transactional update
pass
This lets you swap storage without rewriting agent logic.
Step 2: Choose Your State Store
SQLite: Single-file database. Transactions, concurrent reads, write serialization. Good for local agents or single-machine deployments.
PostgreSQL: Full ACID transactions, row-level locking, replication. Use when multiple machines run agents.
Redis: In-memory state with persistence. Fast reads, pub/sub for coordination. Good for high-frequency updates.
Message queue (RabbitMQ, SQS): Append-only log becomes a queue. Agents consume messages, update state, publish results. Decouples producers and consumers.
Step 3: Handle Concurrency
File locks become database transactions:
# File-based (breaks under concurrency)
handoff = read_handoff()
handoff['status'] = 'in_progress'
write_handoff(handoff)
# Database (serializable transaction)
with db.transaction():
handoff = db.query("SELECT * FROM handoff WHERE id = ? FOR UPDATE", id)
handoff['status'] = 'in_progress'
db.execute("UPDATE handoff SET status = ? WHERE id = ?", 'in_progress', id)
The FOR UPDATE lock prevents other agents from reading stale state.
Step 4: Add Observability
Files give you grep and tail. Databases need structured logging:
- State transitions: Log every HANDOFF update with timestamp, agent ID, previous state, new state.
- Query patterns: Track which agents read which state, how often, and latency.
- Conflict detection: Count transaction retries, lock wait times, deadlocks.
When to Stay on Files
Files remain the right choice when:
- You run one agent session at a time
- State updates happen infrequently (hourly or slower)
- The entire state fits in a few hundred KB
- You can rebuild from scratch in minutes
- You want zero infrastructure dependencies
The Dead Light Framework’s core insight is that stateless agents need durable state, but durable doesn’t mean complex. Files work until concurrency or frequency forces your hand.
Technical Verdict
Use HANDOFF + LOG files when: You have a single-session agent, state updates are infrequent, and you can tolerate sequential access. This covers most prototypes and solo side projects.
Migrate to structured state when: You need concurrent agent sessions, state updates happen every few minutes, or you can’t rebuild from scratch quickly. The three-question test identifies this transition point before file coordination becomes a production incident.
Avoid premature migration: Starting with a database adds complexity you don’t need until the test says otherwise. Files are simpler to debug, version control, and reason about. The framework’s value is knowing when simplicity stops being sufficient.