mech.app
Automation

Trigger.dev V2: Durable Execution for TypeScript Without Temporal's Operational Overhead

How Trigger.dev pivoted from webhooks to durable workflows, exposing the state persistence and resumption primitives TypeScript agents need.

Source: trigger.dev
Trigger.dev V2: Durable Execution for TypeScript Without Temporal's Operational Overhead

Trigger.dev launched in February 2023 as a “developer-first Zapier alternative” (745 points on Show HN, item 34610686). By October, the team had pivoted hard: V2 became a “Temporal alternative for TypeScript devs” (172 points, item 37750763). That shift from event triggers to durable workflows exposes a real infrastructure gap. Building agent workflows that survive restarts, handle partial failures, and resume mid-execution requires primitives that neither webhooks nor traditional job queues provide.

The pivot happened because user feedback revealed a consistent demand for long-running tasks that don’t die when a container restarts or an API call times out. Temporal solves this with Go-based workers and a complex event-sourcing architecture. Trigger.dev’s bet is that TypeScript developers need the same guarantees without the operational complexity.

What Durable Execution Actually Means

Durable execution is not just retry logic. It’s the ability to pause a running function, persist its state, and resume from the exact same point after a failure or timeout. This matters for agent workflows because:

  • LLM calls can take 30+ seconds. Serverless platforms timeout. Containers get recycled.
  • Multi-step reasoning requires state. An agent that searches, analyzes, and summarizes needs to remember what it already did.
  • Partial failures are normal. If step 3 of 10 fails, you don’t want to re-run steps 1 and 2.

Temporal achieves this by recording every function call as an event in a database. When a worker crashes, a new worker replays the event log and reconstructs the exact execution state. The trade-off: you need to run Temporal Server (Cassandra or PostgreSQL, plus multiple services), write workflows in Go or Java (or use a less mature SDK), and understand event sourcing.

Trigger.dev’s Architecture Choices

Trigger.dev V2 takes a different approach. Instead of event sourcing, it uses checkpoint-based resumption. Here’s how it works:

  1. Task definition. You write a TypeScript function and wrap it in task(). The runtime instruments your code to detect async boundaries (await points).
  2. Execution tracking. Each task run gets a unique ID. The platform stores execution state in Postgres after every await.
  3. Resumption. If a task crashes or times out, the runtime fetches the last checkpoint and resumes from the next await point.

This is simpler than event sourcing but has limits. You can’t replay arbitrary code paths. You can only resume from explicit checkpoints (await statements). For agent workflows, this is usually fine because every LLM call, tool invocation, or database query is already async.

State Persistence Model

Trigger.dev persists three things:

  • Task metadata (ID, status, retry count, scheduled time)
  • Checkpoint state (serialized function arguments and local variables at each await)
  • Output artifacts (return values, logs, errors)

The checkpoint state is JSON-serialized. This means you can’t checkpoint closures, class instances with methods, or circular references. You can checkpoint plain objects, arrays, primitives, and serializable data structures. For agent workflows, this usually means storing conversation history, tool results, and intermediate outputs.

Retry and Timeout Handling

Trigger.dev gives you per-task retry policies:

export const researchAgent = task({
  id: "research-agent",
  retry: {
    maxAttempts: 3,
    factor: 2,
    minTimeout: 1000,
    maxTimeout: 10000,
  },
  run: async ({ topic }: { topic: string }) => {
    // Task logic with automatic checkpointing at each await
    const searchResults = await search(topic);
    const analysis = await analyze(searchResults);
    return analysis;
  },
});

If search() fails, the task retries from the beginning. If analyze() fails, the task resumes from the checkpoint after search() completes. You don’t re-run the search.

Timeouts work the same way. If a task exceeds its timeout, it gets killed and restarted from the last checkpoint. This is critical for agent workflows where a single LLM call might hang indefinitely.

Deployment Shape

Trigger.dev runs as a managed platform. You deploy tasks by pushing code to their infrastructure. The runtime handles:

  • Worker provisioning. Tasks run in isolated containers. The platform auto-scales based on queue depth.
  • Queue management. Each task gets a queue. You can configure concurrency limits (max 10 concurrent runs of this task) and priority.
  • Observability. Every task run generates traces, logs, and metrics. You get a dashboard showing execution timelines, retry counts, and failure modes.

For self-hosting, Trigger.dev provides Docker images. You need:

  • Postgres (for state persistence)
  • Redis (for queue coordination)
  • A container runtime (Docker, Kubernetes, or similar)

The self-hosted version has the same checkpoint-based resumption but requires you to manage scaling, monitoring, and database backups.

Agent Workflow Example

Here’s a realistic agent workflow: research a topic, call multiple tools, and synthesize results. This example is adapted from Trigger.dev’s official documentation:

export const researchAgent = task({
  id: "research-agent",
  run: async ({ topic }: { topic: string }) => {
    const messages: CoreMessage[] = [
      { role: "user", content: `Research: ${topic}` },
    ];

    for (let i = 0; i < 10; i++) {
      const { text, toolCalls, steps } = await generateText({
        model: anthropic("claude-opus-4-20250514"),
        system: "You are a research assistant with web access.",
        messages,
        tools: { search, browse, analyze },
        maxSteps: 5,
      });

      if (!toolCalls.length) {
        return { summary: text, stepsUsed: steps.length };
      }

      for (const call of toolCalls) {
        const result = await executeTool(call);
        messages.push({ role: "tool", content: result });
      }
    }
  },
});

Each await is a checkpoint. If the task crashes after the first generateText() call, it resumes with the same messages array. If a tool call fails, the retry logic kicks in. If the entire loop times out, the task restarts from the last successful checkpoint.

This is the plumbing that makes agent workflows durable. Without it, you’d need to manually persist state, track retries, and handle partial failures.

Trade-Offs vs. Temporal

DimensionTemporalTrigger.dev V2
Language supportGo, Java, TypeScript (beta), Python (beta)TypeScript only
State modelEvent sourcing (full replay)Checkpoint-based (resume from await)
Operational complexityHigh (Cassandra/Postgres + multiple services)Low (managed) or medium (self-hosted)
Resumption granularityAny code pathOnly at async boundaries
Ecosystem maturityProduction-proven at Uber, Stripe, NetflixNewer, smaller user base
Vendor lock-inSelf-hostable, open protocolManaged platform or self-hosted Docker

Temporal gives you more control and flexibility. You can replay arbitrary code paths, run workflows in any language, and deploy on your own infrastructure without vendor dependencies. The cost is operational complexity and a steeper learning curve.

Trigger.dev gives you faster time-to-production for TypeScript teams. You get durable execution without running a distributed database or learning event sourcing. The cost is less flexibility (TypeScript only, checkpoint-based resumption) and potential vendor lock-in if you use the managed platform.

Failure Modes

Understanding these limitations is essential for production deployments. Checkpoint-based resumption has predictable failure modes:

  1. Non-serializable state. If you checkpoint a closure or class instance, serialization fails. The task crashes. Solution: only checkpoint plain data.
  2. Checkpoint bloat. If you checkpoint large objects (e.g., a 10MB conversation history), database writes slow down. Solution: store large artifacts externally (S3, blob storage) and checkpoint references.
  3. Replay divergence. If your code has side effects (e.g., incrementing a counter), replaying from a checkpoint can produce different results. Solution: make task logic idempotent or use external state stores.

Temporal avoids these by replaying the entire event log, but that introduces different failure modes (event log corruption, replay performance, non-deterministic code).

Observability Primitives

Trigger.dev exposes:

  • Execution timeline. See every checkpoint, retry, and timeout in a visual trace.
  • Queue metrics. Track queue depth, concurrency, and throughput per task.
  • Error aggregation. Group failures by error type and task ID.

For agent workflows, this is critical. You need to see why a tool call failed, how many retries happened, and where the task spent most of its time. The dashboard shows this without requiring custom instrumentation.

When to Use Trigger.dev

Use Trigger.dev if:

  • You’re building agent workflows in TypeScript and need durable execution.
  • You want managed infrastructure without running Temporal Server.
  • Your tasks have clear async boundaries (LLM calls, API requests, database queries).
  • You can tolerate checkpoint-based resumption instead of full event replay.

Avoid Trigger.dev if:

  • You need multi-language support (Go, Python, Java).
  • Your workflows require fine-grained replay (arbitrary code paths, not just await points).
  • You have strict data residency requirements and can’t use a managed platform.
  • You need the ecosystem maturity and production track record of Temporal.

Technical Verdict

Trigger.dev V2 solves a real problem: TypeScript developers building agent workflows need durable execution without the operational complexity of Temporal. The checkpoint-based model is simpler to reason about and faster to deploy. The trade-off is less flexibility and a smaller ecosystem.

For teams building agentic AI projects in TypeScript, Trigger.dev is a pragmatic choice. You get resumable workflows, automatic retries, and built-in observability without running a distributed database. For teams that need multi-language support or full event replay, Temporal remains the better option despite its complexity.

The pivot from “Zapier alternative” to “Temporal alternative” signals a market shift. Developers don’t just need event triggers. They need orchestration primitives that handle long-running, stateful, fault-tolerant workflows. Trigger.dev’s architecture reveals what those primitives look like in a TypeScript-first world.