mech.app
AI Agents

BeamWeaver: What Building LangGraph-Style Agents in Elixir Reveals About OTP-Native Orchestration

How Elixir's OTP primitives reshape agent orchestration: supervision trees for failures, GenServer state for checkpoints, and message-passing for tool c...

Source: github.com
BeamWeaver: What Building LangGraph-Style Agents in Elixir Reveals About OTP-Native Orchestration

BeamWeaver is the first LangGraph-equivalent for Elixir, and it exposes a sharp contrast in how agent orchestration works when you swap Python’s async/await runtime for the BEAM VM’s OTP primitives. The project brings agents, tool calling, graph workflows, checkpoints, and typed streaming to Elixir, but the interesting part is not feature parity. It’s how supervision trees, GenServers, and message passing change the plumbing of retries, state persistence, and failure isolation.

Why Elixir for Agents

Most agent frameworks live in Python because that’s where the LLM SDKs are. BeamWeaver exists because teams running Elixir services hit a wall when they need agentic workflows. Pushing that logic into a separate Python service means managing two runtimes, serializing state across HTTP, and losing OTP’s fault-tolerance guarantees.

The Show HN post mentions observability as the trigger: “We kept running into the same issue and found there is no observability for agentic systems.” BeamWeaver’s answer is to treat agents as OTP processes, which means you get supervision trees, telemetry hooks, and process introspection for free.

OTP Primitives vs. Python Orchestration Patterns

Here’s how the runtime swap changes the design:

ConcernPython (LangChain/LangGraph)Elixir (BeamWeaver)
Failure handlingTry/catch blocks, error boundaries in graph nodesSupervision trees restart failed agent processes with configurable strategies
State persistenceIn-memory dicts, manual serialization to checkpointsGenServer state automatically managed, checkpoints via process snapshots
ConcurrencyAsync/await, event loops, thread pools for tool callsMessage-passing actors, one process per agent or tool invocation
Streaming eventsAsyncIterator or callback queuesTyped messages sent to subscriber processes
TestingMock HTTP clients, fixture dataFake/replay models that deterministically return canned responses

The supervision tree model is the biggest shift. In LangGraph, if a tool call fails, you write retry logic in your graph definition. In BeamWeaver, you define a supervisor that restarts the tool-calling GenServer with a backoff strategy. The agent graph becomes a tree of supervised processes instead of a DAG of Python functions.

Checkpoints and Resumable Execution

LangGraph checkpoints are snapshots of graph state saved to a store (memory, Redis, Postgres). You serialize the state dict, write it, and reload it to resume after an interrupt or failure.

BeamWeaver checkpoints are GenServer state snapshots. Because each agent runs in a GenServer, the state is already isolated and serializable (Elixir terms). The checkpoint store writes the process state to a memory backend or persistent store, and resuming means spawning a new GenServer with the saved state.

This matters for interrupts and human review. In LangGraph, you pause execution by returning a special node result and saving state. In BeamWeaver, you send a message to the agent process to pause, and the GenServer blocks until it receives a resume message. The process stays alive, so you can inspect its state or send it new instructions without reloading from a checkpoint.

Tool Calling and Message Passing

Python agent frameworks use async/await to call tools concurrently. You await multiple tool functions, gather results, and pass them to the next LLM call.

BeamWeaver uses message passing. Each tool call spawns a process (or reuses one from a pool), and the agent GenServer sends it a message with the tool arguments. The tool process runs the function, sends the result back, and the agent continues. If a tool call times out or crashes, the supervisor restarts it or the agent handles the failure message.

This changes how you think about tool concurrency. In Python, you write asyncio.gather([tool_a(), tool_b()]). In Elixir, you send messages to two tool processes and wait for replies with a timeout. The failure modes are explicit: you get a timeout message or a crash report, not an exception that bubbles up.

Typed Streaming Events

LangGraph streams events as JSON objects or Python dicts. BeamWeaver uses Elixir’s type specs to define event structs and sends them as messages to subscriber processes.

Example event types:

  • %AgentStarted{agent_id, timestamp}
  • %ToolCalled{tool_name, args, request_id}
  • %ToolResult{request_id, result, duration}
  • %AgentCompleted{agent_id, final_state}

Subscribers register with the agent’s event stream and receive typed messages. This makes it easier to build observability dashboards or log aggregators because you pattern-match on message types instead of parsing JSON keys.

Deterministic Testing with Fake Models

BeamWeaver includes fake/replay models for testing. This is necessary because agent workflows are non-deterministic by default: the LLM might return different tool calls or text on each run.

The fake model is a GenServer that returns canned responses based on a script. You define a test scenario:

fake_model = FakeModel.new([
  {:user_message, "What's the weather?"},
  {:assistant_tool_call, "get_weather", %{location: "NYC"}},
  {:tool_result, %{temp: 72, condition: "sunny"}},
  {:assistant_message, "It's 72 and sunny in NYC."}
])

agent = Agent.new(model: fake_model, tools: [WeatherTool])
{:ok, result} = Agent.run(agent, "What's the weather?")
assert result.final_message == "It's 72 and sunny in NYC."

The replay model records real LLM interactions and plays them back in tests. This is similar to VCR for HTTP requests but for LLM calls. It solves the flakiness problem in agent tests without mocking every tool and LLM call manually.

Provider Adapters and Multi-Model Support

BeamWeaver ships with adapters for OpenAI, Anthropic, Google Gemini, xAI, and Moonshot/Kimi. Each adapter is a behaviour (Elixir’s interface) that implements:

  • chat_completion(messages, opts)
  • stream_completion(messages, opts)
  • parse_tool_calls(response)

You swap providers by passing a different adapter module to the agent config. The adapter handles API-specific quirks (tool call formats, streaming protocols, rate limits) and returns normalized responses.

This is cleaner than LangChain’s provider abstraction because Elixir behaviours enforce the contract at compile time. If an adapter doesn’t implement parse_tool_calls/1, the code won’t compile.

Observability and WeaveScope

The Show HN post mentions WeaveScope, an observability layer coming soon. Based on the context, it likely hooks into Elixir’s telemetry system to trace agent execution across processes.

Telemetry in Elixir emits events like [:beam_weaver, :agent, :start] and [:beam_weaver, :tool, :call]. WeaveScope would subscribe to these events, aggregate them, and expose a dashboard or API for inspecting agent workflows.

This is easier to build in Elixir than Python because every agent and tool runs in a process with a unique PID. You can trace message flows between processes and visualize the supervision tree without instrumenting every function call.

When OTP Supervision Beats Error Boundaries

LangGraph uses error boundaries to isolate failures in graph nodes. If a node crashes, you catch the exception and decide whether to retry, skip, or fail the entire graph.

OTP supervision trees do this automatically. You define a supervisor with a strategy (one-for-one, one-for-all, rest-for-one) and child specs for each agent or tool process. If a child crashes, the supervisor restarts it according to the strategy.

Example supervision tree for a multi-agent workflow:

children = [
  {AgentSupervisor, name: :research_agent, max_restarts: 3},
  {AgentSupervisor, name: :writer_agent, max_restarts: 3},
  {ToolPool, tools: [WebSearch, Calculator], size: 5}
]

Supervisor.start_link(children, strategy: :one_for_one)

If the research agent crashes, the supervisor restarts it without affecting the writer agent or tool pool. If it crashes three times in five seconds, the supervisor gives up and reports the failure upstream.

This is more declarative than writing retry logic in your graph definition. You separate failure recovery (supervision strategy) from business logic (agent workflow).

Failure Modes and Debugging

The main failure modes in BeamWeaver are:

  1. Tool timeout: The tool process doesn’t reply within the timeout. The agent receives a timeout message and can retry or fail.
  2. LLM API error: The provider adapter crashes or returns an error. The supervisor restarts the adapter process.
  3. State corruption: The GenServer state becomes invalid. The checkpoint store has the last good state, so you can restart from there.
  4. Deadlock: Two agents wait for each other’s messages. This is detectable with process monitoring and timeouts.

Debugging is easier than in Python because you can attach to a running BEAM node with iex and inspect process state, message queues, and supervision trees. You don’t need to add logging or restart the service.

Technical Verdict

Use BeamWeaver when:

  • You already run Elixir services and need agentic workflows without adding a Python runtime.
  • You want OTP’s fault-tolerance guarantees for agent failures and retries.
  • You need to run hundreds or thousands of concurrent agents (BEAM handles millions of processes).
  • You value deterministic testing and typed event streams.

Avoid BeamWeaver when:

  • Your team doesn’t know Elixir and the learning curve isn’t worth it.
  • You need tight integration with Python-native LLM tools (vector DBs, embedding pipelines, fine-tuning scripts).
  • The LLM provider you need doesn’t have a BeamWeaver adapter yet.
  • You’re prototyping and LangChain’s ecosystem is faster for iteration.

The real win is not feature parity with LangGraph. It’s that BeamWeaver makes agent orchestration feel like building a distributed system with supervision trees and message passing, which is what Elixir was designed for. If you’re already in that world, the plumbing fits naturally.