mech.app
Dev Tools

Zero-Dependency Agent Observability: JSONL Audit Logs to OTLP Spans

How a minimal Python library transforms agent audit logs into OpenTelemetry spans without SDK coupling, preserving trace context from append-only files.

Source: dev.to
Zero-Dependency Agent Observability: JSONL Audit Logs to OTLP Spans

Most agent observability solutions want you to instrument your code. Add the SDK, configure the tracer provider, set up batch exporters, point at a collector. That works fine for new runs. It is the wrong answer when you have thirty JSONL audit logs from last week and you want them in Datadog.

The problem is not the OpenTelemetry collector. The collector speaks OTLP/JSON on POST /v1/traces. The problem is getting your logs into that format without re-running the agent or coupling your code to an observability library.

trace-to-otel solves this by treating logs as the source of truth and spans as a derived view. You write JSONL events during execution. You transform them into OTLP spans after the fact. No SDK in your agent code. No vendor lock-in. No runtime coordination.

The Audit Log Contract

Hermes agents write one JSON object per line. Each event carries a timestamp, session ID, and kind. Tool calls, budget denials, egress blocks, and costs all land in the same append-only file.

{"ts": 1779638601.262, "session_id": "abc12", "kind": "session_open"}
{"ts": 1779638601.265, "session_id": "abc12", "kind": "tool_ok", "tool": "locus.payments.charge", "usd": 4.99, "extra": {"latency_ms": 12}}
{"ts": 1779638601.266, "session_id": "abc12", "kind": "budget_denied", "tool": "locus.payments.charge", "usd": 7.0, "error": "budget exceeded: 11.99 > 10"}

This format is simple. You can write it with json.dumps() and a file handle. You can read it with jq. You can grep it. You can version it in git. It requires zero dependencies at write time.

The trade-off is that you lose real-time observability. You cannot query spans while the agent is running. You cannot alert on latency spikes until after the run completes. For post-mortem analysis and batch processing, this is acceptable. For live debugging, it is not.

JSONL to OTLP Transformation

OpenTelemetry spans have trace IDs, span IDs, parent span IDs, start times, end times, attributes, and status codes. JSONL events have timestamps and session IDs. The transformation layer must infer the span hierarchy from event ordering and session boundaries.

trace-to-otel does this in three passes:

  1. Parse events: Read JSONL, validate schema, extract timestamps and session IDs.
  2. Build span tree: Group events by session, infer parent-child relationships from event kinds (session_open starts a root span, tool_ok creates a child span).
  3. Emit OTLP/JSON: Convert each span into the wire format, assign trace IDs and span IDs, serialize to JSON.

The library assumes events within a session are ordered by timestamp. If your agent writes events out of order (async logging, clock skew), you must sort them before transformation. The library does not reorder events. It trusts the log.

Handling Incomplete Traces

When an agent crashes mid-execution, the audit log ends abruptly. The last span has a start time but no end time. SDK-based instrumentation can flush partial spans on shutdown. Log-based observability cannot.

trace-to-otel handles this by treating missing end times as open spans. It assigns a synthetic end time equal to the last event timestamp in the session. This produces a valid OTLP payload, but the span duration is wrong. You see the crash in the trace UI as a span that ends at the last logged event.

The alternative is to drop incomplete spans. This loses information. You know the agent started a tool call, but you do not see it in the trace. The library chooses to preserve partial data over silent omission.

Clock Skew and Distributed Traces

If your agent runs across multiple machines, each machine writes its own audit log with its own clock. When you merge logs from three machines, timestamps may be out of order. A tool call on machine B might have a timestamp earlier than the session_open on machine A.

The library does not solve this. It assumes you have already merged and sorted the logs. If you have not, the span tree will be wrong. Parent spans will appear to start after their children. The trace UI will render a broken hierarchy.

NTP helps. Logging the clock source helps. Sorting by a Lamport clock or vector clock helps. But the library does not enforce any of this. It trusts the log.

Five-Line Integration

Once you have the library installed, you pipe your JSONL file through the transformer and POST the result to your collector.

from trace_to_otel import jsonl_to_otlp
import requests

with open("agent_run.jsonl") as f:
    otlp_payload = jsonl_to_otlp(f.read())

requests.post("https://api.datadoghq.com/v1/traces", json=otlp_payload)

The jsonl_to_otlp function returns a dictionary that matches the OTLP/JSON schema. You can serialize it to JSON, write it to a file, or POST it directly. The collector does not care whether the spans came from a live SDK or a batch transformation. It only cares about the wire format.

Companion Tools

The library ships with two CLI tools:

  • trace-tree: Renders the span hierarchy as ASCII art in the terminal. Useful for quick inspection of a single run.
  • trace-filter: Filters events by predicate (e.g., “show me all tool calls that cost more than $5”). Useful for narrowing down large logs before transformation.

Both tools operate on JSONL. They do not require the collector. They do not require the SDK. They are Unix-style filters that read stdin and write stdout.

Trade-Offs: SDK vs. Log Transformation

DimensionSDK InstrumentationLog Transformation
Runtime overheadAdds memory and CPU for span bufferingZero runtime cost, all work happens post-hoc
Real-time visibilitySpans available immediatelySpans available after log processing
Code couplingAgent code imports observability libraryAgent code writes plain JSONL
Partial tracesCan flush on shutdown or crashRequires synthetic end times for incomplete spans
Clock skewCollector handles clock driftMust sort logs before transformation
Vendor lock-inSDK may tie you to a specific backendOTLP is vendor-neutral, but you still need a collector

If you need live dashboards and alerting, use the SDK. If you need post-mortem analysis and you want to keep your agent code simple, use log transformation.

Failure Modes

Out-of-order events: If your agent writes events asynchronously and they land in the log out of order, the span tree will be wrong. Sort the log by timestamp before transformation.

Missing session_close: If the agent crashes before writing a session_close event, the root span will have a synthetic end time. The trace UI will show the session ending at the last logged event, not the actual crash time.

Schema drift: If you add new event kinds or change the JSONL schema, the transformation logic must be updated. The library does not auto-detect schema changes. It will fail loudly on unknown event kinds.

Large logs: The library loads the entire JSONL file into memory. If your log is 10 GB, you will run out of memory. Stream processing is not supported. Split large logs into smaller files before transformation.

Technical Verdict

Use log-based observability when:

  • You have existing agent runs captured as JSONL and you want them in a trace UI.
  • You want to keep your agent code free of observability dependencies.
  • You are okay with post-hoc analysis instead of real-time dashboards.
  • Your logs fit in memory (under 1 GB per file).

Avoid it when:

  • You need live alerting on agent behavior.
  • Your agent runs across multiple machines with unsynchronized clocks.
  • Your logs are too large to load into memory.
  • You need sub-millisecond precision on span timings.

The library is a bridge, not a replacement for SDK instrumentation. It gives you a way to extract value from logs you already have without retrofitting observability into code you already wrote.

Tags

agentic-ai orchestration infrastructure

Primary Source

dev.to