ADHD Stack: Parallel Divergent Ideation for Coding Agents

Most coding agents converge too fast. You ask for design ideas and get the same three answers a senior engineer would give. The model evaluates while it generates, early tokens anchor late tokens, and you end up with the centroid of the training distribution.

ADHD Stack attacks this premature convergence problem by forcing parallel exploration under different cognitive frames. Instead of one reasoning path, you get N isolated branches (regulator, speedrunner, biology-inspired, $0 budget) that never see each other’s context. A separate critic pass scores, clusters, and deepens only the top-K survivors.

This is not chain-of-thought with more steps. The architecture enforces structural isolation and generator-critic separation at the orchestration level.

The Premature Convergence Problem

LLMs sample from high-probability completions. When you prompt “give me a few ways to do X,” the model produces the most likely answers first. Those early tokens constrain the rest of the generation. You get competent but forgettable output.

This failure mode hits hardest in open-ended tasks:

Architecture decisions
API and SDK design
Debugging intermittent failures
Refactor planning
Naming and positioning

The cost is not wrong answers. The cost is obvious answers when you needed escape velocity from the training distribution.

Architecture: Generator-Critic Split

ADHD enforces a two-phase execution model with no shared context between phases.

Phase 1: Divergent Generation

Spawn N parallel branches
Each branch gets a different cognitive frame in the system prompt
No cross-branch communication
No shared memory or state
Each branch runs to completion independently

Phase 2: Convergent Criticism

Separate LLM calls with opposite system prompts
Score all N outputs on novelty, breadth, trap detection
Cluster similar approaches
Prune low-scoring branches
Deepen top-K survivors with follow-up prompts

The split is mechanical, not aspirational. You cannot accidentally leak context between generator and critic because they are different API calls with different prompts.

Cognitive Frames as Branching Primitive

Traditional tree-of-thought varies next-step choices. ADHD varies the vantage point before any steps are taken.

Example frames for “design a caching layer”:

Regulator: Prioritize audit trails, compliance, data residency
Speedrunner: Minimize latency, maximize throughput, ignore edge cases
Biology: Mimic immune system memory, adaptive eviction, symbiotic relationships
$0 Budget: Use only stdlib, no external dependencies, optimize for developer time

Each frame is a different system prompt. The model generates a complete solution from that perspective without seeing other branches.

This produces structural diversity, not just surface variation. A regulator frame might choose append-only logs. A speedrunner frame might choose in-memory hash tables. A biology frame might choose probabilistic data structures with decay.

State Management Across Branches

Parallel exploration creates a state explosion risk. N branches with M steps each = N*M states to track.

ADHD avoids this by making branches stateless relative to each other:

Each branch is a single LLM call (or small chain)
No shared memory between branches
No incremental state updates
Outputs are immutable artifacts

The orchestrator holds a flat list of (frame, output) tuples. No tree structure. No parent-child pointers. Just N independent results.

# Simplified orchestration pseudocode
frames = ["regulator", "speedrunner", "biology", "zero_budget"]
results = []

# Phase 1: Parallel generation
for frame in frames:
    prompt = build_prompt(task, frame)
    output = llm.generate(prompt, temperature=0.9)
    results.append({"frame": frame, "output": output})

# Phase 2: Critic scoring
scored = []
for result in results:
    critic_prompt = build_critic_prompt(task, result["output"])
    scores = llm.generate(critic_prompt, temperature=0.2)
    scored.append({**result, "scores": scores})

# Prune and deepen
top_k = sorted(scored, key=lambda x: x["scores"]["total"])[:3]
for survivor in top_k:
    deepened = llm.generate(deepen_prompt(survivor), temperature=0.7)
    survivor["deepened"] = deepened

No recursion. No backtracking. No graph traversal. Just map, score, filter, map again.

Pruning Heuristics

The critic phase uses three scoring dimensions:

Dimension	What It Measures	Failure Mode
Novelty	Distance from obvious answers	Rehashing Stack Overflow top answer
Breadth	Coverage of solution space	Tunnel vision on one approach
Trap Detection	Awareness of failure modes	Ignoring edge cases, security, scale

Each dimension gets a 0-10 score from the critic LLM. Total score determines survival.

Pruning happens in two passes:

Hard filter: Drop anything below threshold (e.g., total < 15)
Clustering: Group similar approaches, keep only highest-scoring representative

Clustering prevents redundant exploration. If three branches independently suggest Redis, keep the one with the best trap detection score.

Orchestration Primitives

ADHD needs three primitives from the orchestration layer:

1. Parallel dispatch

Launch N independent LLM calls without waiting for sequential completion. This is true parallelism, not async-await over a single-threaded event loop.

2. Isolated context

Each branch must get a clean context. No accidental prompt leakage from previous branches. No shared conversation history.

3. Batch scoring

The critic phase can batch-score all N outputs in a single call if the LLM supports it. Otherwise, N sequential critic calls.

Most orchestration frameworks (LangGraph, Temporal, Prefect) can handle this. The key is not using their stateful graph features. ADHD is deliberately stateless.

Error Handling in Parallel Branches

When multiple branches fail simultaneously, you need a failure budget.

Failure modes:

Rate limit hit on branch 3 of 6
Timeout on branch 5 of 6
Invalid JSON in branch 2 of 6

Handling strategy:

Set a minimum viable branch count (e.g., 3 of 6)
If failures drop you below minimum, retry failed branches with exponential backoff
If retries exhaust, fall back to single-branch mode with best-effort frame

Do not fail the entire task because one branch timed out. The whole point is redundancy.

def execute_with_fallback(frames, task, min_viable=3):
    results = []
    failures = []
    
    for frame in frames:
        try:
            output = llm.generate(build_prompt(task, frame))
            results.append({"frame": frame, "output": output})
        except Exception as e:
            failures.append({"frame": frame, "error": e})
    
    if len(results) < min_viable:
        # Retry failures
        for failure in failures[:min_viable - len(results)]:
            try:
                output = llm.generate(build_prompt(task, failure["frame"]))
                results.append({"frame": failure["frame"], "output": output})
            except:
                continue
    
    if len(results) < min_viable:
        # Fall back to single best-effort frame
        output = llm.generate(build_prompt(task, "generalist"))
        results = [{"frame": "fallback", "output": output}]
    
    return results

Evaluation Results

ADHD was tested on six open-ended engineering problems with an LLM-as-judge scoring outputs.

Win rate: 5 of 6 against single-shot baseline

Score improvements (0-10 rubric):

Novelty: +5.17
Breadth: +4.17
Trap detection: +7.67

The trap detection improvement is the most interesting. Parallel frames surface failure modes that single-path reasoning misses. A speedrunner frame ignores security. A regulator frame ignores performance. The critic sees both and flags the tension.

When Parallel Exploration Fails

ADHD has clear failure modes:

1. Cost explosion

N branches = N LLM calls. If N=10 and each call is 4K tokens, you just spent 40K tokens on generation before the critic even runs.

2. Diminishing returns

After 4-6 frames, additional branches rarely add new information. You get redundant variations instead of structural diversity.

3. Critic collapse

If the critic LLM is weaker than the generator LLM, scoring becomes noise. The critic needs to be at least as capable as the generator.

4. Frame design burden

Good cognitive frames require domain knowledge. Generic frames (optimistic, pessimistic, creative) produce generic diversity.

Deployment Shape

ADHD fits two deployment patterns:

Pattern 1: Batch ideation

User submits open-ended task
System runs full ADHD pipeline (5-10 minutes)
Returns top-K deepened solutions
User picks one or requests another round

Pattern 2: Interactive refinement

User submits task
System runs divergent phase only (2-3 minutes)
User picks interesting frames
System deepens selected frames on demand

Pattern 2 keeps the human in the loop and avoids wasting tokens on branches the user will ignore.

Observability Needs

You need visibility into four things:

Branch completion times: Identify slow frames
Critic score distributions: Detect if all branches score similarly (frame design problem)
Cluster sizes: See if multiple branches converge on same idea
Deepening deltas: Measure how much the deepening phase adds

Standard LLM observability (token counts, latencies, error rates) is table stakes. The ADHD-specific metrics are about diversity and convergence.

Technical Verdict

Use ADHD when:

The task is open-ended with no single correct answer
Premature convergence is more costly than extra LLM calls
You need structural diversity, not just surface variation
You can tolerate 5-10x token cost versus single-shot
You have domain knowledge to design good cognitive frames

Avoid ADHD when:

The task has a deterministic correct answer
You need real-time response (< 1 second)
Token budget is tight
You lack domain expertise to design frames
Single-shot output is already good enough

ADHD is not a general-purpose agent architecture. It is a specialized tool for creative, interdisciplinary, and design-shaped tasks where the failure mode is not wrong but obvious.