mech.app
Dev Tools

Building a 150-Line AI Agent CLI: What Minimal Orchestration Actually Looks Like

Examine the infrastructure primitives a minimal agent CLI needs: tool registration, state management, streaming output, and error boundaries.

Source: go-micro.dev
Building a 150-Line AI Agent CLI: What Minimal Orchestration Actually Looks Like

The go-micro team shipped a CLI that lets you talk to microservices through an LLM. The entire orchestration layer is 150 lines. No LangChain, no AutoGPT, no agent framework. Just tool discovery, execution dispatch, and conversation memory.

This is what minimal orchestration looks like when you strip away abstractions and ask what an agent actually needs to work.

The Four Primitives

Every tool-calling agent needs the same four pieces:

  1. Tool registry: A list of available functions with descriptions and parameter schemas
  2. Execution dispatcher: A way to route tool calls to the right handler
  3. Conversation state: Memory so follow-up questions make sense
  4. Model interface: A way to send messages and receive tool calls

The 150-line constraint forces you to pick the simplest implementation for each. No plugin systems, no middleware stacks, no retry logic. Just the minimum viable plumbing.

Tool Registration Without Reflection

Most agent frameworks use reflection to auto-discover functions. The go-micro approach uses the service registry instead:

tools := ai.NewTools(reg, ai.ToolClient(client))
discovered, err := tools.Discover()

The registry already knows every service endpoint, request type, and field metadata. Discover() walks that data and builds a []ai.Tool slice. Each tool has:

  • Name (service + method, like users_Users_Create)
  • Description (from handler doc comments)
  • Parameter schema (from request struct fields)

If you don’t have a service registry, you write this part manually. Enumerate your functions, extract their signatures, and build the list. The registry just makes it automatic.

The trade-off: you need structured metadata somewhere. Either in code (reflection), in a registry (service mesh), or in static config (YAML manifests). There’s no free lunch.

Execution Dispatch in 20 Lines

When the model returns a tool call, something has to route it to the right handler. In go-micro, the Tools object does double duty:

m := ai.New("anthropic",
    ai.WithAPIKey(apiKey),
    ai.WithTools(tools),
)

ai.WithTools(tools) wires up the execution side. When the model says “call users_Users_Create with these args,” the handler:

  1. Looks up the tool by name
  2. Marshals the JSON args into the request struct
  3. Calls the RPC client
  4. Returns the result as a string

No middleware. No retry logic. No circuit breakers. If the RPC fails, the error goes straight back to the model as a tool result. The model decides whether to retry or tell the user.

This works for a CLI because the user is in the loop. For autonomous agents, you’d add retry boundaries here.

State Management: In-Memory Queue

Conversation memory is a size-limited message accumulator:

hist := ai.NewHistory(50)

Every user message, model response, and tool call goes into the history. When you send the next prompt, you ship the entire history to the model. The 50-message limit prevents context overflow.

Trade-offs:

ApproachProsCons
In-memory queueSimple, no I/O, works for CLI sessionsLost on crash, no multi-session support
SQLite filePersistent, queryable, single-fileAdds dependency, requires schema migrations
Redis/external storeMulti-process, scalableNetwork hop, deployment complexity

For a CLI tool, in-memory is fine. The session dies when you close the terminal. For a web service or background agent, you need persistence.

Streaming Output in a CLI Context

The go-micro CLI streams model responses line-by-line to stdout. The implementation is synchronous:

resp, err := m.Generate(ctx, prompt, ai.WithHistory(hist))
for chunk := range resp.Stream() {
    fmt.Print(chunk.Text)
}

No SSE, no websockets, no async event loop. Just a blocking iterator over response chunks. The model client handles the HTTP streaming connection under the hood.

This works because CLI tools are single-threaded by default. You type a prompt, the agent thinks, you see the response. No concurrent requests, no multiplexing.

For a web API, you’d replace fmt.Print with an SSE writer or websocket send. The streaming primitive is the same, but the transport changes.

Error Boundaries: Fail Fast

There are no retry loops in 150 lines. If a tool call fails, the error message goes back to the model as a tool result:

Tool: users_Users_Create
Result: error: connection refused

The model sees the error and decides what to do. It might retry with different args, call a different tool, or tell the user the operation failed.

This shifts error handling from the orchestration layer to the model. It works when the model is smart enough to interpret errors. It breaks when the model hallucinates retry logic or ignores failures.

The trade-off: you save code complexity but lose deterministic error handling. For a CLI, that’s acceptable. For production automation, you’d add explicit retry policies and circuit breakers.

Deployment Shape: Single Binary

The go-micro CLI compiles to a single static binary. No Python virtual environment, no Node modules, no Docker container. You ship one file.

This works because:

  • Go compiles to native code with no runtime dependencies
  • The model API calls are HTTP, so no native bindings
  • The service registry is a network call, not a local database

For Python-based agents, you’d use PyInstaller or Docker. For Node, you’d use pkg or a container. The deployment story depends on your language runtime, not your orchestration logic.

What You Lose at 150 Lines

Keeping orchestration minimal means cutting features:

  • No observability: No tracing, no metrics, no structured logs. You get stdout and error messages.
  • No retries: If a tool call fails, the model handles it. No exponential backoff, no circuit breakers.
  • No concurrency: One tool call at a time. No parallel execution, no batching.
  • No prompt caching: Every request sends the full history. No semantic caching, no embedding lookups.
  • No guardrails: No input validation, no output filtering, no safety checks.

These aren’t oversights. They’re deliberate cuts to stay under the line budget. You add them back when you need them, not before.

When Minimal Orchestration Works

This approach fits when:

  • You have a human in the loop (CLI, chat interface)
  • Tool calls are cheap and fast (sub-second RPCs)
  • Errors are rare or recoverable by the model
  • You control the tool surface area (internal services, not arbitrary APIs)

It breaks when:

  • You need autonomous operation (no human to catch errors)
  • Tool calls are expensive (long-running jobs, paid APIs)
  • You need audit logs or compliance tracking
  • You’re calling untrusted or rate-limited external services

Technical Verdict

A 150-line agent CLI is viable for internal tooling, developer utilities, and prototyping. The orchestration layer is simple enough to debug in an afternoon and extend without framework lock-in.

Use this pattern when you want to ship fast, control every line of code, and avoid framework churn. The code is boring, and that’s the point.

Avoid it when you need production-grade observability, retry logic, or multi-tenant isolation. At that point, the missing features cost more than the framework overhead you’re trying to avoid.

The 150-line constraint is a forcing function. It makes you ask which primitives are essential and which are nice-to-have. Most agent systems need fewer abstractions than their frameworks provide.