mech.app
Dev Tools

Cadenza.Agent: How 50 Lines of C# Turn Any LLM into an OpenAI-Compatible Agent Backend

Protocol adapter pattern for agent tooling. A thin compatibility shim lets you swap LLM providers without rewriting orchestration code.

Source: dev.to
Cadenza.Agent: How 50 Lines of C# Turn Any LLM into an OpenAI-Compatible Agent Backend

OpenAI Codex CLI ships with a polished agent UX: shell tool, apply_patch, plan tracking. The problem is it only speaks the OpenAI Responses API. As of February 2026, the codebase dropped Chat Completion support entirely. If you want to point it at Ollama, LM Studio, or any other local runner, you’re locked out.

Cadenza.Agent is a 50-line C# script that solves this by standing up both a Chat Completion endpoint and a Responses API endpoint, backed by the same IChatClient interface. Point it at OpenRouter (or any other provider) and suddenly Codex CLI runs on Claude, Gemini, Llama, or whatever model you choose. The agent client stays the same. The brain swaps out.

The Lock-In Problem

Codex CLI’s model_provider config block expects a Responses-shaped server. The wire format is different from Chat Completions: streaming deltas, tool call schemas, and error structures don’t map 1:1. Most agent frameworks hide this adapter layer or hard-code it to a single provider.

The result is vendor lock-in at the protocol level. If your agent tooling expects Responses and your LLM provider only speaks Chat Completions, you’re stuck rewriting orchestration code or running a proxy.

Architecture: Dual-Endpoint Adapter

Cadenza.Agent exposes two HTTP endpoints from a single process:

  • POST /v1/chat/completions (for Aider, Continue, Cursor, sgpt)
  • POST /v1/responses (for Codex CLI)

Both endpoints route through the same IChatClient abstraction. The adapter translates incoming requests into the provider-agnostic schema, calls the LLM, then translates the response back into the expected wire format.

The flow looks like this:

  1. Codex CLI sends a Responses API request to localhost:5000/v1/responses
  2. Cadenza.Agent parses the request and converts it to an IChatClient call
  3. The configured provider (OpenRouter, Ollama, Azure) handles the inference
  4. The response is translated back into Responses format and streamed to Codex

The same process works in reverse for Chat Completion clients. The adapter is stateless. No conversation history, no session management. The client owns the state.

Implementation: IChatClient as the Abstraction Boundary

IChatClient is part of Microsoft.Extensions.AI, a vendor-neutral interface for LLM calls. It defines:

  • CompleteAsync for single-turn inference
  • CompleteStreamingAsync for streaming responses
  • Tool call schemas
  • Message history structures

Cadenza.Agent uses this as the translation layer. When a Responses request arrives, the adapter:

  1. Extracts the prompt, tools, and parameters
  2. Maps them to ChatMessage objects
  3. Calls CompleteStreamingAsync
  4. Translates the streaming chunks back into Responses deltas

Here’s the core adapter logic (simplified):

app.MapPost("/v1/responses", async (HttpContext ctx, IChatClient client) =>
{
    var request = await ctx.Request.ReadFromJsonAsync<ResponsesRequest>();
    var messages = request.Messages.Select(m => new ChatMessage(m.Role, m.Content));
    var options = new ChatOptions
    {
        Tools = request.Tools?.Select(t => AIFunctionFactory.Create(t)).ToList(),
        Temperature = request.Temperature
    };

    await foreach (var chunk in client.CompleteStreamingAsync(messages, options))
    {
        var delta = new ResponsesDelta
        {
            Content = chunk.Text,
            ToolCalls = chunk.ToolCalls?.Select(MapToolCall).ToList()
        };
        await ctx.Response.WriteAsync($"data: {JsonSerializer.Serialize(delta)}\n\n");
    }
});

The Chat Completion endpoint follows the same pattern but outputs OpenAI-shaped JSON instead of Responses deltas.

Provider Routing: OpenRouter as the Universal Backend

OpenRouter speaks the OpenAI Chat Completion wire format but routes to 100+ models from different providers. You send a request to https://openrouter.ai/api/v1/chat/completions with a model name like anthropic/claude-3.5-sonnet and it handles the provider-specific translation.

Cadenza.Agent configures the IChatClient to point at OpenRouter:

var client = new OpenAIChatClient(
    new OpenAIClient(new ApiKeyCredential(apiKey), 
    new OpenAIClientOptions { Endpoint = new Uri("https://openrouter.ai/api/v1") }),
    modelId: "anthropic/claude-3.5-sonnet"
);

Now Codex CLI talks to your local adapter, which talks to OpenRouter, which talks to Anthropic. The same pattern works for Ollama (local models), Azure OpenAI (enterprise deployments), or any other provider that implements IChatClient.

Trade-Offs: What Gets Lost in Translation

FeatureResponses APIChat CompletionsAdapter Behavior
Streaming deltasNativeNativePass-through
Tool callsStructured schemaFunction callingMapped via AIFunctionFactory
Thinking tokens (Claude)Not specifiedProvider-specificDropped (no standard mapping)
Grounding metadata (Gemini)Not specifiedProvider-specificDropped
Error codesOpenAI-specificProvider-specificNormalized to 500/503
Rate limit headersExpectedVariesForwarded when present

The adapter works best when both sides of the translation support the same feature set. When provider-specific capabilities don’t map cleanly, they’re either dropped or approximated.

For example, Claude’s “thinking tokens” (internal reasoning steps) don’t have a Responses API equivalent. The adapter can include them in the content stream or drop them. Neither is perfect. If your agent logic depends on separating reasoning from output, you’ll need to extend the adapter.

Failure Modes

Provider timeout: If OpenRouter takes longer than Codex’s timeout, the request fails. The adapter doesn’t retry. You need to handle retries at the client level or add a retry policy to the IChatClient pipeline.

Tool schema mismatch: If the provider returns a tool call that doesn’t match the schema you declared, the adapter passes it through. Codex CLI will reject it. The fix is to validate tool schemas at the adapter layer before forwarding.

Streaming backpressure: If Codex CLI stops reading the stream (user cancels, network issue), the adapter keeps writing until the provider finishes. This wastes tokens. A proper implementation would cancel the IChatClient call when the HTTP connection drops.

Model-specific quirks: Some models (Llama 3.2, older GPT-3.5 variants) don’t reliably follow tool call schemas. The adapter can’t fix this. You need to pick a model that supports structured outputs or add a validation layer.

Deployment Shape

Cadenza.Agent runs as a single-file .NET script. No Docker, no container orchestration. You start it with:

dotnet run agent.cs

It binds to localhost:5000 by default. For production, you’d:

  1. Run it behind a reverse proxy (nginx, Caddy) with TLS
  2. Add authentication (API key header, OAuth)
  3. Configure observability (OpenTelemetry, structured logs)
  4. Deploy it as a systemd service or Windows service

The script itself is stateless. You can run multiple instances behind a load balancer. Each request is independent.

Observability Gaps

The 50-line version has no built-in observability. You don’t see:

  • Token counts per request
  • Latency breakdown (adapter vs. provider)
  • Tool call success/failure rates
  • Error rate by model

To fix this, you’d add OpenTelemetry instrumentation:

var meter = new Meter("cadenza.agent");
var requestCounter = meter.CreateCounter<long>("requests");
var tokenCounter = meter.CreateCounter<long>("tokens");

app.Use(async (ctx, next) =>
{
    requestCounter.Add(1, new("endpoint", ctx.Request.Path));
    await next();
});

Then export metrics to Prometheus, Grafana, or your observability stack.

Security Boundaries

The adapter trusts the client. If Codex CLI sends a malicious tool call schema, the adapter forwards it to the provider. If the provider returns a malicious response, the adapter forwards it to Codex.

There’s no input validation, no output sanitization, no rate limiting. For a local dev tool, this is fine. For a multi-tenant service, you’d need:

  • Request size limits
  • Tool call schema validation
  • Output content filtering
  • Per-user rate limits
  • API key rotation

The adapter also exposes your provider API key in the process environment. If you’re running this on a shared machine, anyone with access can read it from /proc/<pid>/environ.

Technical Verdict

Use Cadenza.Agent when:

  • You want to run Codex CLI (or any Responses-only agent) on non-OpenAI models
  • You need a quick protocol adapter for local development
  • You’re experimenting with different LLM providers without rewriting client code
  • You want a minimal reference implementation for building your own adapter

Avoid it when:

  • You need production-grade observability, retries, or error handling
  • Your agent logic depends on provider-specific features (thinking tokens, grounding)
  • You’re building a multi-tenant service (no auth, no rate limiting)
  • You need to support non-HTTP transports (WebSockets, gRPC)

The 50-line script is a starting point, not a production system. It exposes the plumbing clearly enough that you can extend it or replace it with a proper adapter layer when you need to.

Tags

agentic-ai orchestration infrastructure

Primary Source

dev.to