Most coding agents converge too fast. You ask for design ideas and get the same three answers a senior engineer would give. The model evaluates while it generates, early tokens anchor late tokens, and you end up with the centroid of the training distribution.
ADHD Stack attacks this premature convergence problem by forcing parallel exploration under different cognitive frames. Instead of one reasoning path, you get N isolated branches (regulator, speedrunner, biology-inspired, $0 budget) that never see each other’s context. A separate critic pass scores, clusters, and deepens only the top-K survivors.
This is not chain-of-thought with more steps. The architecture enforces structural isolation and generator-critic separation at the orchestration level.
The Premature Convergence Problem
LLMs sample from high-probability completions. When you prompt “give me a few ways to do X,” the model produces the most likely answers first. Those early tokens constrain the rest of the generation. You get competent but forgettable output.
This failure mode hits hardest in open-ended tasks:
- Architecture decisions
- API and SDK design
- Debugging intermittent failures
- Refactor planning
- Naming and positioning
The cost is not wrong answers. The cost is obvious answers when you needed escape velocity from the training distribution.
Architecture: Generator-Critic Split
ADHD enforces a two-phase execution model with no shared context between phases.
Phase 1: Divergent Generation
- Spawn N parallel branches
- Each branch gets a different cognitive frame in the system prompt
- No cross-branch communication
- No shared memory or state
- Each branch runs to completion independently
Phase 2: Convergent Criticism
- Separate LLM calls with opposite system prompts
- Score all N outputs on novelty, breadth, trap detection
- Cluster similar approaches
- Prune low-scoring branches
- Deepen top-K survivors with follow-up prompts
The split is mechanical, not aspirational. You cannot accidentally leak context between generator and critic because they are different API calls with different prompts.
Cognitive Frames as Branching Primitive
Traditional tree-of-thought varies next-step choices. ADHD varies the vantage point before any steps are taken.
Example frames for “design a caching layer”:
- Regulator: Prioritize audit trails, compliance, data residency
- Speedrunner: Minimize latency, maximize throughput, ignore edge cases
- Biology: Mimic immune system memory, adaptive eviction, symbiotic relationships
- $0 Budget: Use only stdlib, no external dependencies, optimize for developer time
Each frame is a different system prompt. The model generates a complete solution from that perspective without seeing other branches.
This produces structural diversity, not just surface variation. A regulator frame might choose append-only logs. A speedrunner frame might choose in-memory hash tables. A biology frame might choose probabilistic data structures with decay.
State Management Across Branches
Parallel exploration creates a state explosion risk. N branches with M steps each = N*M states to track.
ADHD avoids this by making branches stateless relative to each other:
- Each branch is a single LLM call (or small chain)
- No shared memory between branches
- No incremental state updates
- Outputs are immutable artifacts
The orchestrator holds a flat list of (frame, output) tuples. No tree structure. No parent-child pointers. Just N independent results.
# Simplified orchestration pseudocode
frames = ["regulator", "speedrunner", "biology", "zero_budget"]
results = []
# Phase 1: Parallel generation
for frame in frames:
prompt = build_prompt(task, frame)
output = llm.generate(prompt, temperature=0.9)
results.append({"frame": frame, "output": output})
# Phase 2: Critic scoring
scored = []
for result in results:
critic_prompt = build_critic_prompt(task, result["output"])
scores = llm.generate(critic_prompt, temperature=0.2)
scored.append({**result, "scores": scores})
# Prune and deepen
top_k = sorted(scored, key=lambda x: x["scores"]["total"])[:3]
for survivor in top_k:
deepened = llm.generate(deepen_prompt(survivor), temperature=0.7)
survivor["deepened"] = deepened
No recursion. No backtracking. No graph traversal. Just map, score, filter, map again.
Pruning Heuristics
The critic phase uses three scoring dimensions:
| Dimension | What It Measures | Failure Mode |
|---|---|---|
| Novelty | Distance from obvious answers | Rehashing Stack Overflow top answer |
| Breadth | Coverage of solution space | Tunnel vision on one approach |
| Trap Detection | Awareness of failure modes | Ignoring edge cases, security, scale |
Each dimension gets a 0-10 score from the critic LLM. Total score determines survival.
Pruning happens in two passes:
- Hard filter: Drop anything below threshold (e.g., total < 15)
- Clustering: Group similar approaches, keep only highest-scoring representative
Clustering prevents redundant exploration. If three branches independently suggest Redis, keep the one with the best trap detection score.
Orchestration Primitives
ADHD needs three primitives from the orchestration layer:
1. Parallel dispatch
Launch N independent LLM calls without waiting for sequential completion. This is true parallelism, not async-await over a single-threaded event loop.
2. Isolated context
Each branch must get a clean context. No accidental prompt leakage from previous branches. No shared conversation history.
3. Batch scoring
The critic phase can batch-score all N outputs in a single call if the LLM supports it. Otherwise, N sequential critic calls.
Most orchestration frameworks (LangGraph, Temporal, Prefect) can handle this. The key is not using their stateful graph features. ADHD is deliberately stateless.
Error Handling in Parallel Branches
When multiple branches fail simultaneously, you need a failure budget.
Failure modes:
- Rate limit hit on branch 3 of 6
- Timeout on branch 5 of 6
- Invalid JSON in branch 2 of 6
Handling strategy:
- Set a minimum viable branch count (e.g., 3 of 6)
- If failures drop you below minimum, retry failed branches with exponential backoff
- If retries exhaust, fall back to single-branch mode with best-effort frame
Do not fail the entire task because one branch timed out. The whole point is redundancy.
def execute_with_fallback(frames, task, min_viable=3):
results = []
failures = []
for frame in frames:
try:
output = llm.generate(build_prompt(task, frame))
results.append({"frame": frame, "output": output})
except Exception as e:
failures.append({"frame": frame, "error": e})
if len(results) < min_viable:
# Retry failures
for failure in failures[:min_viable - len(results)]:
try:
output = llm.generate(build_prompt(task, failure["frame"]))
results.append({"frame": failure["frame"], "output": output})
except:
continue
if len(results) < min_viable:
# Fall back to single best-effort frame
output = llm.generate(build_prompt(task, "generalist"))
results = [{"frame": "fallback", "output": output}]
return results
Evaluation Results
ADHD was tested on six open-ended engineering problems with an LLM-as-judge scoring outputs.
Win rate: 5 of 6 against single-shot baseline
Score improvements (0-10 rubric):
- Novelty: +5.17
- Breadth: +4.17
- Trap detection: +7.67
The trap detection improvement is the most interesting. Parallel frames surface failure modes that single-path reasoning misses. A speedrunner frame ignores security. A regulator frame ignores performance. The critic sees both and flags the tension.
When Parallel Exploration Fails
ADHD has clear failure modes:
1. Cost explosion
N branches = N LLM calls. If N=10 and each call is 4K tokens, you just spent 40K tokens on generation before the critic even runs.
2. Diminishing returns
After 4-6 frames, additional branches rarely add new information. You get redundant variations instead of structural diversity.
3. Critic collapse
If the critic LLM is weaker than the generator LLM, scoring becomes noise. The critic needs to be at least as capable as the generator.
4. Frame design burden
Good cognitive frames require domain knowledge. Generic frames (optimistic, pessimistic, creative) produce generic diversity.
Deployment Shape
ADHD fits two deployment patterns:
Pattern 1: Batch ideation
- User submits open-ended task
- System runs full ADHD pipeline (5-10 minutes)
- Returns top-K deepened solutions
- User picks one or requests another round
Pattern 2: Interactive refinement
- User submits task
- System runs divergent phase only (2-3 minutes)
- User picks interesting frames
- System deepens selected frames on demand
Pattern 2 keeps the human in the loop and avoids wasting tokens on branches the user will ignore.
Observability Needs
You need visibility into four things:
- Branch completion times: Identify slow frames
- Critic score distributions: Detect if all branches score similarly (frame design problem)
- Cluster sizes: See if multiple branches converge on same idea
- Deepening deltas: Measure how much the deepening phase adds
Standard LLM observability (token counts, latencies, error rates) is table stakes. The ADHD-specific metrics are about diversity and convergence.
Technical Verdict
Use ADHD when:
- The task is open-ended with no single correct answer
- Premature convergence is more costly than extra LLM calls
- You need structural diversity, not just surface variation
- You can tolerate 5-10x token cost versus single-shot
- You have domain knowledge to design good cognitive frames
Avoid ADHD when:
- The task has a deterministic correct answer
- You need real-time response (< 1 second)
- Token budget is tight
- You lack domain expertise to design frames
- Single-shot output is already good enough
ADHD is not a general-purpose agent architecture. It is a specialized tool for creative, interdisciplinary, and design-shaped tasks where the failure mode is not wrong but obvious.