mech.app
Automation

Trigger.dev V2: What a Temporal Alternative for TypeScript Reveals About Durable Execution Plumbing

How Trigger.dev's pivot from Zapier-style webhooks to Temporal-style durable execution exposes retry semantics, state persistence, and TypeScript orches...

Source: trigger.dev
Trigger.dev V2: What a Temporal Alternative for TypeScript Reveals About Durable Execution Plumbing

Trigger.dev launched in February 2023 as a “developer-first Zapier alternative” and earned 745 HN points. Eight months later, the team pivoted to V2 and repositioned as a “Temporal alternative for TypeScript.” That shift exposes a real infrastructure gap: event-driven webhook automation cannot handle long-running agent tasks that need retry semantics, state persistence, and durable execution.

The V2 announcement drew 172 points, but the architectural change matters more than the score. Developers building AI agents need workflows that survive process crashes, retry failed API calls with exponential backoff, and maintain state across hours or days. Temporal solves this with event sourcing and deterministic replay, but it requires Go or Java. TypeScript shops face a choice: adopt a new runtime or build retry logic by hand.

Trigger.dev’s pivot reveals what durable execution plumbing looks like when optimized for TypeScript-native workflows instead of polyglot microservices.

Why Zapier-Style Automation Breaks for Agents

Zapier and similar tools chain webhooks with simple retry policies. A failed step either retries immediately or dies. State lives in external databases or gets passed as JSON payloads. This works for short-lived tasks like “new Stripe payment → send Slack message.”

Agent workflows break this model:

  • Long-running tasks: An agent researching a topic might call 10 APIs over 30 minutes. If the process crashes at step 7, you need to resume without re-executing steps 1-6.
  • Non-deterministic retries: LLM calls return different results on retry. You cannot replay a workflow by re-running the same code.
  • State explosion: Agent memory, tool call history, and intermediate results exceed what you can pass in webhook payloads.

Temporal handles this with event sourcing. Every workflow decision gets logged. On failure, the system replays the log to reconstruct state, then resumes. Deterministic execution means the same inputs produce the same outputs, so replay works.

TypeScript developers building agents need this durability but cannot always adopt Temporal’s Go/Java runtime.

Trigger.dev’s Durable Execution Model

Trigger.dev V2 provides durable execution without requiring a new runtime. The architecture separates orchestration (task scheduling, retry logic, state persistence) from execution (your TypeScript code).

Core Components

ComponentResponsibilityFailure Mode
Task definitionTypeScript function with retry configSyntax errors caught at build time
Execution engineRuns tasks in isolated containersContainer crash triggers retry from last checkpoint
State storePostgres-backed task historyDatabase failure pauses new tasks, existing tasks resume after recovery
Retry coordinatorExponential backoff, idempotency keysInfinite retries require manual intervention
Observability layerReal-time task logs and tracesMissing spans indicate dropped messages

Tasks run in ephemeral containers. If a container dies, the engine reads the last checkpoint from Postgres and resumes. Unlike Temporal’s deterministic replay, Trigger.dev uses explicit checkpoints. You mark progress points in your code:

export const researchAgent = task({
  id: "research-agent",
  run: async ({ topic }: { topic: string }) => {
    const messages: CoreMessage[] = [
      { role: "user", content: `Research: ${topic}` }
    ];

    for (let i = 0; i < 10; i++) {
      // Checkpoint before expensive LLM call
      await checkpoint(`iteration-${i}`);

      const { text, toolCalls, steps } = await generateText({
        model: anthropic("claude-opus-4-20250514"),
        system: "You are a research assistant with web access.",
        messages,
        tools: { search, browse, analyze },
        maxSteps: 5,
      });

      if (!toolCalls.length) {
        return { summary: text, stepsUsed: steps.length };
      }

      for (const call of toolCalls) {
        const result = await executeTool(call);
        messages.push({ role: "tool", content: result });
      }
    }
  },
});

The checkpoint() call persists messages state to Postgres. If the container crashes during generateText(), the next execution resumes from the last checkpoint with the same message history.

State Persistence Strategy

Trigger.dev uses Postgres as the durable state store. Each task execution gets a row with:

  • Task ID and execution attempt number
  • Checkpoint data (serialized JSON)
  • Retry count and next retry timestamp
  • Idempotency key for deduplication

When a task fails, the retry coordinator reads the checkpoint, increments the retry count, calculates the next attempt time using exponential backoff, and schedules a new container. The new container deserializes the checkpoint and resumes.

This differs from Temporal’s event log. Temporal replays every decision to reconstruct state. Trigger.dev snapshots state at explicit points. The trade-off:

  • Temporal: Full audit trail, deterministic replay, larger storage footprint.
  • Trigger.dev: Smaller state snapshots, faster resume, requires manual checkpoint placement.

For agent workflows with non-deterministic LLM calls, explicit checkpoints avoid the replay problem. You cannot replay an LLM call and expect the same output, so snapshotting the result makes more sense.

Retry Semantics and Idempotency

Trigger.dev provides configurable retry policies per task:

export const apiTask = task({
  id: "api-call",
  retry: {
    maxAttempts: 5,
    factor: 2,
    minTimeout: 1000,
    maxTimeout: 60000,
  },
  run: async (payload) => {
    // Task code
  },
});

The engine generates an idempotency key from the task ID, execution ID, and attempt number. If a retry executes twice (network partition, duplicate message), the second attempt sees the existing result and returns immediately.

Idempotency keys live in Postgres with a TTL. After the TTL expires, the key gets purged. This prevents unbounded table growth but means very old retries might execute twice.

Handling Non-Idempotent Operations

Some operations cannot be made idempotent (sending an email, charging a credit card). Trigger.dev provides a once() wrapper:

await once("send-email", async () => {
  await sendEmail(recipient, body);
});

The once() call checks Postgres for a completion marker. If found, it skips execution. If not, it runs the code and writes the marker. This ensures the operation executes exactly once, even across retries.

Observability Without Separate Infrastructure

Trigger.dev embeds observability into the execution engine. Every task execution produces:

  • Real-time logs streamed to the dashboard
  • Trace spans for each checkpoint and external call
  • Retry history with failure reasons
  • State snapshots at each checkpoint

The dashboard shows a timeline view of task execution. You see when checkpoints occurred, which API calls failed, and how long each step took. This replaces the need for separate tracing infrastructure like Jaeger or Honeycomb for basic workflows.

For complex multi-task workflows, you can export traces to OpenTelemetry-compatible backends. The engine injects trace context into each task, so distributed traces work across task boundaries.

Deployment Shape

Trigger.dev runs as a managed service or self-hosted. The managed version handles container orchestration, Postgres scaling, and retry coordination. Self-hosted deployments require:

  • Postgres instance for state storage
  • Container runtime (Docker, Kubernetes)
  • Message queue for task scheduling (Redis, RabbitMQ)
  • Object storage for large payloads (S3, GCS)

The execution engine polls the message queue for scheduled tasks, spawns containers, injects environment variables and secrets, and streams logs back to the dashboard. Containers run with resource limits (CPU, memory, timeout) to prevent runaway tasks.

For agent workflows, you typically set high timeouts (30+ minutes) and generous memory limits (2GB+). LLM calls and tool execution can consume significant resources.

Failure Modes and Mitigation

FailureImpactMitigation
Container crash mid-executionTask resumes from last checkpointPlace checkpoints before expensive operations
Postgres unavailableNew tasks queue, existing tasks pauseUse managed Postgres with automatic failover
Infinite retry loopTask never completes, burns resourcesSet maxAttempts and monitor retry count metrics
Checkpoint data too largePostgres write fails, task abortsOffload large data to object storage, store references in checkpoint
Idempotency key collisionDuplicate execution or skipped workUse unique execution IDs, avoid manual key generation

The most common failure mode is missing checkpoints. If you place checkpoints too far apart, a crash forces re-execution of expensive work. If you place them too frequently, you pay serialization overhead and Postgres write latency.

For agent workflows, checkpoint after each tool call. This balances durability with performance.

When TypeScript-Native Matters

Temporal’s Go and Java SDKs require running a separate worker process. TypeScript developers must:

  1. Write workflow logic in TypeScript
  2. Compile to JavaScript
  3. Run a Node.js worker that communicates with Temporal server via gRPC
  4. Handle serialization between TypeScript types and Temporal’s protobuf format

Trigger.dev eliminates steps 2-4. You write TypeScript, deploy to the platform, and the engine handles execution. Type safety extends from your code to the execution environment.

This matters for teams without polyglot infrastructure. If your stack is TypeScript end-to-end (Next.js frontend, Node.js backend, TypeScript agents), adding Go or Java for orchestration introduces operational complexity.

The trade-off: Temporal’s event sourcing provides stronger durability guarantees. Trigger.dev’s checkpoint model is simpler but requires careful checkpoint placement.

Technical Verdict

Use Trigger.dev when:

  • Your stack is TypeScript-native and you want to avoid polyglot orchestration
  • You need durable execution for agent workflows with retry and state persistence
  • You prefer explicit checkpoints over deterministic replay
  • You want embedded observability without separate tracing infrastructure
  • You are building on a managed platform and do not need full control over the execution runtime

Avoid Trigger.dev when:

  • You require deterministic replay for audit or compliance
  • Your workflows span multiple languages (Go, Python, Java)
  • You need sub-second task latency (checkpoint overhead adds 10-50ms per checkpoint)
  • You already run Temporal and have invested in its operational model
  • You need fine-grained control over worker pools and task routing

Trigger.dev’s pivot from Zapier-style automation to Temporal-style durable execution exposes the infrastructure gap for TypeScript developers building agents. The checkpoint-based state model trades replay guarantees for simplicity, which fits non-deterministic LLM workflows better than event sourcing.