mech.app
AI Agents

Darc's Lexical Memory Search: Why Coding Agents Need grep Over Embeddings for Session History

Darc indexes Codex and Claude Code session rollouts into SQLite, exposing grep-style search over past decisions without embeddings or injection hooks.

Source: github.com
Darc's Lexical Memory Search: Why Coding Agents Need grep Over Embeddings for Session History

Darc treats agent session history like code. It indexes Codex and Claude Code rollouts into SQLite, exposes grep-style search commands, and shares session indexes across teams via Git. No embeddings, no automatic injection, no background consolidation agents. Just lexical queries over past tool calls, file edits, and turn-level decisions.

The author built Darc after watching reviewer agents cite memory from the same session that built the feature under review. The native memory systems (MEMORY.md in Codex and Claude Code) inject summarized context automatically, which introduced bias during iterative code review. Darc sidesteps injection by letting agents explicitly search session history when they need it, mirroring how they already prefer rg over semantic search for codebases.

Why Lexical Search Over Embeddings

Modern coding agents default to UNIX tools (ripgrep, sed, awk) instead of vector embeddings when searching code. Lexical search is deterministic, fast, and doesn’t require embedding model alignment. Darc applies the same principle to session history:

  • Deterministic retrieval: Regex and keyword queries return the same results every time.
  • No model drift: Embedding models change, breaking retrieval pipelines. SQLite schemas are stable.
  • Agent-native: Agents already know how to chain grep-style queries. No new prompting patterns.

The trade-off is precision. Lexical search misses semantic matches, but for session history that contains explicit tool calls, file paths, and error messages, keyword overlap is often sufficient.

Architecture: Indexing Session Rollouts

Darc watches ~/.codex and ~/.claude directories where Codex and Claude Code store session rollouts. Each session is a JSON structure with turns, tool calls, and file diffs. Darc parses these into a normalized SQLite schema:

CREATE TABLE sessions (
  id TEXT PRIMARY KEY,
  agent_type TEXT,  -- 'codex' or 'claude'
  start_time INTEGER,
  end_time INTEGER,
  project_path TEXT
);

CREATE TABLE turns (
  id TEXT PRIMARY KEY,
  session_id TEXT,
  turn_index INTEGER,
  user_message TEXT,
  assistant_message TEXT,
  FOREIGN KEY(session_id) REFERENCES sessions(id)
);

CREATE TABLE tool_calls (
  id TEXT PRIMARY KEY,
  turn_id TEXT,
  tool_name TEXT,
  arguments TEXT,  -- JSON blob
  result TEXT,
  FOREIGN KEY(turn_id) REFERENCES turns(id)
);

CREATE TABLE file_edits (
  id TEXT PRIMARY KEY,
  turn_id TEXT,
  file_path TEXT,
  diff TEXT,
  FOREIGN KEY(turn_id) REFERENCES turns(id)
);

Darc runs a file watcher that triggers incremental indexing when new session files appear. It does not modify the original rollout files, so native memory systems continue to work.

Search Commands

Darc exposes four search scopes:

  1. Session-level: Find sessions by project path, date range, or agent type.
  2. Turn-level: Search user or assistant messages for keywords.
  3. Tool call-level: Filter by tool name (e.g., edit_file, run_command) and argument patterns.
  4. File-level: Search diffs or file paths touched during a session.

Agents chain these queries to reconstruct context. Example flow using Darc CLI:

# Find sessions that edited auth.py in the last 30 days
darc search --file auth.py --days 30

# Retrieve tool calls from those sessions that invoked run_command with pytest
darc search --session <session_id> --tool run_command --result-contains pytest

# Read assistant messages from turns where tests failed
darc search --session <session_id> --turn --message-contains FAILED

# Use that context to decide whether to refactor or add more tests

This iterative search pattern mirrors how engineers use git log, git blame, and rg to understand code history before making changes.

Team Sharing via Git

Darc supports team-level memory sharing through encrypted SQLite databases stored in Git. The flow:

  1. Redaction: Before indexing, Darc runs regex patterns over session JSON to strip secrets (API keys, tokens, environment variables). Users configure redaction rules in .darc/redact.yaml.
  2. Encryption: Darc encrypts the SQLite file using age with team public keys. Each team member has a keypair.
  3. Git backend: The encrypted database is committed to a Git repository (GitHub, GitLab, or self-hosted). Darc uses Git as the transport and versioning layer.
  4. Pull and merge: Team members run darc pull to fetch encrypted databases from teammates, decrypt them locally, and merge session indexes into their own SQLite file.

This design avoids centralized memory servers. Each developer controls their own index and selectively shares it. The Git backend provides audit trails and rollback.

Failure Modes and Observability

Failure ModeImpactMitigation
Tool schema changesParsing breaks when Codex/Claude change rollout formatDarc uses versioned parsers per agent type; falls back to raw JSON storage
Redaction misses secretsSensitive data leaks into shared indexesRegex patterns are user-configurable; Darc logs redaction matches for review
Merge conflicts in SQLiteConcurrent writes from multiple pulls corrupt the databaseDarc uses write-ahead logging (WAL) and detects conflicts via Git merge markers
Search query explosionAgents issue hundreds of queries per turnDarc caches query results per session; exposes query count metrics

Darc logs all search queries, redaction matches, and encryption operations to ~/.darc/logs/. Agents can inspect these logs to debug search behavior or verify redaction coverage.

When Native Memory Injection Breaks

The author observed bias in iterative code review when native memory systems injected context from recent sessions. Example scenario:

  1. Agent A writes a feature in session 1.
  2. Agent B reviews the feature in session 2.
  3. Agent B’s context window includes memory from session 1 (via MEMORY.md injection).
  4. Agent B reports “no findings” because the memory primes it to accept the original design.

Darc avoids this by making memory retrieval explicit. The reviewer agent must issue search queries to pull context, which makes the dependency visible in the tool call log. If the reviewer skips the search, it has no memory of session 1.

Comparison with Embedding-Based Memory

Darc’s lexical approach trades recall for reliability. Here’s how it compares to vector-based memory systems:

DimensionDarc (Lexical)Embedding-Based
Query latency<10ms (SQLite FTS)50-200ms (vector search)
Semantic matchingNoYes
Model dependencyNoneRequires embedding model
DeterminismExact matchApproximate
Storage overhead~1MB per 100 sessions~10MB per 100 sessions (vectors)

For coding agents, the lack of semantic matching is acceptable because session history contains structured data (file paths, tool names, error codes) that lexical queries handle well. Semantic search shines when memory is unstructured prose, which is rare in coding sessions.

Implementation Example

Here’s how an agent chains Darc searches to reconstruct context before editing a file:

# Agent decides to refactor auth.py
# Step 1: Find recent sessions that touched auth.py
sessions = subprocess.run(
    ["darc", "search", "--file", "auth.py", "--days", "30"],
    capture_output=True, text=True
).stdout.splitlines()

# Step 2: Filter for sessions where tests failed
failed_sessions = []
for session_id in sessions:
    result = subprocess.run(
        ["darc", "search", "--session", session_id, 
         "--tool", "run_command", "--result-contains", "FAILED"],
        capture_output=True, text=True
    )
    if result.stdout:
        failed_sessions.append(session_id)

# Step 3: Read assistant messages from failed test turns
context = []
for session_id in failed_sessions:
    turns = subprocess.run(
        ["darc", "search", "--session", session_id, 
         "--turn", "--message-contains", "pytest"],
        capture_output=True, text=True
    ).stdout
    context.append(turns)

# Step 4: Use context to inform refactor decision
prompt = f"Context from past test failures:\n{context}\n\nNow refactor auth.py"

This pattern is verbose but explicit. The agent’s search logic is visible in the tool call log, making it easier to debug memory-related failures.

Technical Verdict

Use Darc when you need deterministic, auditable memory retrieval for coding agents. The author built it after native memory injection biased code review rounds, where reviewer agents cited context from the session that built the feature under review. Darc makes memory retrieval explicit, which prevents automatic injection from priming agent decisions.

Use it if:

  • Your team shares agent context across multiple developers working on the same codebase.
  • Native memory injection (MEMORY.md) introduces bias or noise in iterative tasks like code review.
  • You prefer UNIX-style tools (grep, Git) over managed memory services.
  • You need to audit which past sessions influenced an agent’s current decision.

Avoid Darc when:

  • You need semantic search over unstructured agent conversations (use embeddings).
  • Your agents run on ephemeral infrastructure without persistent ~/.codex or ~/.claude directories.
  • You want automatic memory consolidation without explicit agent queries.
  • Your team cannot manage age encryption keys or Git-based sharing workflows.

Darc is infrastructure-level tooling designed for teams managing agent context at scale, not an end-user product. It assumes you’re comfortable with SQLite, Git, and regex-based redaction. The author is running evals to compare Darc against native memory systems and raw rg on session history. Watch the repo for results.