Zero-Dependency Agent Observability: JSONL Audit Logs to OTLP Spans Without an SDK

Most agent frameworks emit structured logs. Most observability platforms want OpenTelemetry spans. The standard path is to instrument your agent with the OTel SDK, configure a tracer provider, wire up a batch exporter, and point it at a collector. That works for new runs. It does not work when you have thirty JSONL audit logs from last week and you want them in Datadog without re-executing anything.

The trace-to-otel library solves this by converting JSONL events directly into OTLP/JSON payloads. No SDK. No runtime instrumentation. No dependency tree. You read a file, you get a payload, you POST it to a collector.

The Problem with Post-Hoc Traces

Agent audit logs capture everything: tool calls, costs, errors, blocked egress, latency. They look like this:

{"ts": 1779638601.262, "session_id": "abc12", "kind": "session_open"}

{"ts": 1779638601.265, "session_id": "abc12", "kind": "tool_ok", "tool": "locus.payments.charge", "usd": 4.99, "extra": {"latency_ms": 12}}

{"ts": 1779638601.266, "session_id": "abc12", "kind": "budget_denied", "tool": "locus.payments.charge", "usd": 7.0, "error": "budget exceeded: 11.99 > 10"}

You can read these in the terminal with trace-tree. That works for one run. Once you have thirty runs, you want a real tracing UI with search, grouping, and percentile queries.

The OTel SDK expects to be present during execution. It creates spans as your code runs. If you already have logs, you need a different path: a converter that understands your log schema and produces the wire format collectors expect.

Architecture: JSONL to OTLP/JSON

The library performs a schema mapping from agent audit events to OpenTelemetry spans. Here is what it does:

Session ID becomes Trace ID: Each session_id maps to a single distributed trace.
Event kind becomes Span name: tool_ok, budget_denied, session_open become span operation names.
Timestamp becomes Span start/end: Single-event spans use the same timestamp for both. Multi-event spans (like tool calls with start and finish events) use the event pair.
Extra fields become Span attributes: tool, usd, latency_ms, error all become key-value attributes.
Parent relationships inferred from nesting: If your JSONL includes a parent_id or lane structure, the library builds the span tree.

The output is OTLP/JSON, the wire format defined in the OpenTelemetry protocol specification. Any collector that accepts POST /v1/traces will ingest it. The library has been tested with the OpenTelemetry Collector. Other backends like Datadog, Grafana, Jaeger, and Honeycomb require OTLP/JSON support.

Implementation: Five Lines

from trace_to_otel import convert_jsonl_to_otlp
import requests

with open("agent_run.jsonl") as f:
    otlp_payload = convert_jsonl_to_otlp(f)

requests.post("https://collector.example.com/v1/traces", json=otlp_payload)

The library reads the JSONL stream, builds an in-memory span tree, and serializes it to OTLP/JSON. No background threads. No sampling decisions. No retry logic. You control when and how the POST happens.

If you want batching, you batch the files. If you want retries, you wrap the POST in your own retry loop. If you want filtering, you pipe through trace-filter first.

Companion Tools

The library ships with two CLI utilities:

trace-filter: Filters 5000+ events using composable predicates. Example: trace-filter --kind tool_ok --tool locus.payments.charge < run.jsonl returns only successful payment tool calls.
trace-tree: Renders 800-line logs as an ASCII tree. Useful for local debugging before you push to a collector.

These tools operate on JSONL, not OTLP. You filter and inspect locally, then convert and push only what you need.

Failure Modes

Failure	Cause	Mitigation
Silent drop	Collector rejects malformed OTLP	Validate payload schema before POST
Missing spans	JSONL missing `session_id` or `ts`	Add schema validation to log emission
Broken parent links	No `parent_id` or lane field	Emit explicit parent references in logs
Timestamp skew	Agent clock drift	Use monotonic timestamps or NTP sync
Attribute truncation	Collector limits span attribute size	Truncate long strings before conversion

The library does not retry. If the POST fails, you see the HTTP error. If the collector is down, you wait or write to a file and replay later. This is intentional: post-hoc ingestion does not need the same reliability guarantees as live instrumentation.

Why Avoid the OTel SDK

The OpenTelemetry SDK is 50+ dependencies. It includes tracer providers, span processors, batch exporters, context propagation, sampling logic, and resource detection.

If you are instrumenting a production agent, you want all of this. If you are converting old logs, you want none of it.

The SDK also makes decisions for you: sampling rates, batch sizes, export intervals. When you convert logs, you already have the data. You decide what to send and when. You do not need a background thread flushing spans every five seconds.

Binary size matters in Lambda or edge deployments. The OTel SDK adds 10+ MB to your deployment package. A zero-dependency converter adds 50 KB.

Trade-Offs

Approach	Pros	Cons
OTel SDK	Live spans, automatic context propagation, vendor integrations	Heavy dependency tree, runtime overhead, sampling decisions baked in
JSONL-to-OTLP	Zero dependencies, post-hoc conversion, full control over ingestion	No live spans, manual parent linking, no automatic resource detection

If you are building a new agent, use the SDK. If you are backfilling traces from logs, use the converter.

Observability Boundaries

This approach separates log emission from trace ingestion. Your agent writes JSONL to disk or stdout. A separate process reads the logs, converts them, and pushes to a collector. This separation has benefits:

Agent stays simple: No network calls, no retry logic, no collector configuration.
Ingestion is idempotent: You can replay the same log file multiple times without duplicating spans (if you deduplicate on trace ID).
Backpressure is explicit: If the collector is slow, the conversion process blocks. The agent does not.

The downside is latency. You do not see spans in your tracing UI until you run the conversion. For live debugging, this is a problem. For batch analysis, it is fine.

Security Considerations

The library does not redact sensitive data. If your JSONL contains API keys, PII, or secrets, they will appear as span attributes in your tracing backend. You have three options:

Filter before conversion: Use trace-filter to remove sensitive events.
Redact during conversion: Fork the library and add a redaction pass.
Rely on collector redaction: Configure your OTel Collector to scrub attributes before forwarding.

Option three is cleanest. The collector already has redaction processors. Use them.

Technical Verdict

Use it if:

You have 100+ historical agent runs to backfill into a tracing UI
Your agent already emits structured JSONL with session IDs and timestamps
You deploy to Lambda or edge environments where binary size matters
You want full control over trace ingestion timing and batching
You need to filter or redact logs before they reach the collector

Avoid it if:

You need sub-second trace visibility for live debugging
You require automatic context propagation across service boundaries
Your logs lack session IDs, parent references, or consistent timestamps
You want vendor-specific integrations (like AWS X-Ray native format)
You need the OTel SDK’s sampling, resource detection, or batch optimization

The five-line implementation is literal. The hard part is not the conversion. The hard part is deciding what schema your logs should follow so the conversion produces useful spans. If your JSONL is already structured with session IDs, timestamps, and parent references, this library is a direct path to production observability.

Source Links

Primary article