mech.app
AI Agents

Thoughtworks Downgrades LangGraph: Why CLI-Based Agents Beat Shared State Graphs

Thoughtworks moved LangGraph from Adopt to Trial. The reason: rigid graphs with massive shared state fail testability and debuggability.

Source: dev.to
Thoughtworks Downgrades LangGraph: Why CLI-Based Agents Beat Shared State Graphs

Thoughtworks Technology Radar moved LangGraph from “Adopt” to “Trial” in April 2026. The stated reason: rigid graphs with massive shared state are hard to reason about, test, and debug. The recommendation is to favor simple agents communicating through code execution, with graph structures added only when needed.

This is not a minor framework preference. It is an architectural claim about how multi-agent systems should coordinate. The Radar argues that each agent should access only the state it needs, not a view into one global object. Communication should happen through code execution, not a shared mutable blackboard.

A cloud-security evaluator built as a CLI tool accidentally landed on this pattern. The tool was never designed as an agent framework. Every capability is a deterministic command that reads files and writes files. When traced end to end, the workflow matches the pattern Thoughtworks now recommends.

The Four-Agent Workflow

Here is how an engineer might orchestrate four agents using the CLI tool:

Agent 1: Data collection

  • Reads contracts/steampipe/aws_s3_bucket.yaml and contracts/steampipe/aws_iam_role.yaml
  • Queries Steampipe, transforms, validates
  • Writes to observations/

Agent 2: Risk evaluation

  • Runs stave apply → produces findings.json
  • Runs stave gaps → produces gap-report.json

Agent 3: Formal proof

  • Reads reasoning-specs/.../z3-public-read-bucket/spec.yaml
  • Runs stave export-sir → generates SMT-LIB facts
  • Follows the spec → returns SAT or UNSAT

Agent 4: Compliance mapping

  • Reads the compliance crosswalk
  • Runs stave export compliance --framework hipaa → produces status report

Each agent is a separate invocation. Each reads files, executes a command, and writes files. No shared state object. No graph orchestrator. No message bus.

Why Files Beat Shared State

The file-based boundary gives you three things:

Isolation
Each agent sees only the files it reads. If Agent 3 crashes, Agent 4 still runs. If Agent 2 writes malformed JSON, Agent 3 never sees it unless it explicitly reads that file.

Auditability
Every input and output is on disk. You can replay any step by re-running the command with the same input files. You can diff outputs across runs. You can version-control the entire workflow.

Testability
You can unit-test each command with fixture files. You can integration-test by running commands in sequence and asserting on file contents. You do not need to mock a stateful orchestrator or inject test state into a graph.

LangGraph’s shared state model makes all three harder. When every agent reads from and writes to a single state object, you lose the boundary. You cannot isolate failures. You cannot replay a single step without reconstructing the entire state. You cannot test one agent without setting up the full graph.

Failure Modes of Shared State

State bloat
As agents add keys to the shared object, the state grows. Eventually, no one knows what keys are safe to delete. Agents start reading keys they should not touch. The state becomes a junk drawer.

Concurrency bugs
If two agents write to the same key, you need locking or conflict resolution. If you use optimistic locking, you need retry logic. If you use pessimistic locking, you serialize execution. Either way, you add complexity.

Debugging opacity
When a workflow fails, you have to trace which agent wrote which key at which step. The state object is a black box. You cannot see the intermediate values without adding logging or breakpoints.

Test setup cost
To test Agent 3, you need to populate the state object with the keys Agent 1 and Agent 2 would have written. If those agents change, your tests break. You end up maintaining test fixtures that mirror the orchestration logic.

The CLI Pattern

The alternative is to make each agent a CLI command:

# Agent 1: collect observations
stave observe --provider steampipe --contracts contracts/ --output observations/

# Agent 2: evaluate risks
stave apply --observations observations/ --output findings.json
stave gaps --findings findings.json --output gap-report.json

# Agent 3: prove reachability
stave export-sir --observations observations/ --output facts.smt2
z3 -smt2 facts.smt2 reasoning-specs/z3-public-read-bucket/spec.smt2

# Agent 4: map to compliance
stave export compliance --framework hipaa --findings findings.json --output hipaa-status.json

Each command is deterministic. Given the same input files, it produces the same output files. You can run them in sequence, in parallel, or one at a time. You can replace any command with a different implementation as long as it reads and writes the same file formats.

This is not a new idea. Unix pipelines have worked this way since the 1970s. The insight is that it also works for agent orchestration.

When Graphs Still Win

File-based coordination has limits. If you need:

  • Dynamic routing: Agent 3’s output determines whether to run Agent 4 or Agent 5
  • Streaming: Agent 2 starts processing before Agent 1 finishes
  • Stateful loops: Agent 3 retries until a condition is met, with state carried across retries

Then you need an orchestrator. LangGraph, Temporal, Prefect, and Airflow all solve this. But the Thoughtworks claim is that you should start without the orchestrator and add it only when you hit one of these limits.

Architecture Comparison

DimensionShared State GraphCLI-Based Agents
IsolationAll agents see all stateEach agent sees only input files
ReplayMust reconstruct full stateRe-run command with same files
Test setupMock state object or full graphWrite fixture files
ConcurrencyNeeds locking or conflict resolutionParallel by default (different files)
DebuggingTrace state mutations across agentsInspect file contents at each step
Dynamic routingNativeRequires external orchestrator
StreamingNativeRequires pipes or message queue

Security Implications

For security tooling, the file-based pattern has two advantages:

Audit trail
Every file write is an audit event. You can log file hashes, timestamps, and command invocations. You can prove which agent produced which output at which time.

Least privilege
Each agent runs with access only to the files it needs. Agent 3 does not need read access to observations/ if it only reads facts.smt2. You can enforce this with file permissions or container mounts.

Shared state makes both harder. The state object is a single security boundary. Every agent that touches it needs read-write access to the entire object. You cannot grant partial access without adding custom authorization logic.

Implementation Notes

The CLI pattern requires discipline:

File format contracts
Every command must document its input and output formats. If Agent 2 expects JSON with a findings array, Agent 1 must produce it. Schema validation helps but does not eliminate the need for clear contracts.

Error handling
If a command fails, it should exit with a non-zero status and write an error message to stderr. It should not write partial output files. The orchestrator (or human) can decide whether to retry or abort.

Idempotency
If a command is re-run with the same inputs, it should produce the same outputs. This means no timestamps, no random IDs, no dependency on external state that might change.

Code Example: Deterministic Command

#!/usr/bin/env python3
import sys
import json
from pathlib import Path

def apply_findings(observations_dir: Path, output_path: Path):
    """
    Read observation files, apply security rules, write findings.
    Deterministic: same inputs always produce same outputs.
    """
    findings = []
    
    for obs_file in observations_dir.glob("*.json"):
        with obs_file.open() as f:
            obs = json.load(f)
        
        # Apply rules (simplified)
        if obs.get("public_access") and obs.get("contains_phi"):
            findings.append({
                "resource": obs["resource_id"],
                "severity": "critical",
                "rule": "public-phi-exposure"
            })
    
    # Sort for determinism
    findings.sort(key=lambda x: x["resource"])
    
    with output_path.open("w") as f:
        json.dump({"findings": findings}, f, indent=2, sort_keys=True)

if __name__ == "__main__":
    if len(sys.argv) != 3:
        print("Usage: apply-findings <observations-dir> <output-file>", file=sys.stderr)
        sys.exit(1)
    
    apply_findings(Path(sys.argv[1]), Path(sys.argv[2]))

This command reads all JSON files in a directory, applies a rule, and writes a single output file. It is deterministic (sorted output), testable (just write fixture files), and isolated (no shared state).

Technical Verdict

Use CLI-based agents when:

  • You need auditability and replay for compliance or security
  • Your workflow is mostly sequential or embarrassingly parallel
  • You want to test each step independently
  • You want to version-control intermediate outputs

Use a graph orchestrator when:

  • You need dynamic routing based on runtime conditions
  • You need streaming or pipelining between agents
  • You need stateful loops or retries with carried state
  • You already have an orchestration platform (Temporal, Airflow) and want to reuse it

The Thoughtworks downgrade is not a rejection of LangGraph. It is a claim that most teams should start with simple inter-process communication and add orchestration only when they need it. For security tooling, where auditability and testability are non-negotiable, the CLI pattern is often the right default.

Tags

agentic-ai orchestration infrastructure

Primary Source

dev.to