Trigger.dev V2: What a Temporal Alternative for TypeScript Reveals About Durable Execution Plumbing

Trigger.dev launched in February 2023 as a “developer-first Zapier alternative” and earned 745 HN points. Eight months later, the team pivoted to V2 and repositioned as a “Temporal alternative for TypeScript.” That shift exposes a real infrastructure gap: event-driven webhook automation cannot handle long-running agent tasks that need retry semantics, state persistence, and durable execution.

The V2 announcement drew 172 points, but the architectural change matters more than the score. Developers building AI agents need workflows that survive process crashes, retry failed API calls with exponential backoff, and maintain state across hours or days. Temporal solves this with event sourcing and deterministic replay, but it requires Go or Java. TypeScript shops face a choice: adopt a new runtime or build retry logic by hand.

Trigger.dev’s pivot reveals what durable execution plumbing looks like when optimized for TypeScript-native workflows instead of polyglot microservices.

Why Zapier-Style Automation Breaks for Agents

Zapier and similar tools chain webhooks with simple retry policies. A failed step either retries immediately or dies. State lives in external databases or gets passed as JSON payloads. This works for short-lived tasks like “new Stripe payment → send Slack message.”

Agent workflows break this model:

Long-running tasks: An agent researching a topic might call 10 APIs over 30 minutes. If the process crashes at step 7, you need to resume without re-executing steps 1-6.
Non-deterministic retries: LLM calls return different results on retry. You cannot replay a workflow by re-running the same code.
State explosion: Agent memory, tool call history, and intermediate results exceed what you can pass in webhook payloads.

Temporal handles this with event sourcing. Every workflow decision gets logged. On failure, the system replays the log to reconstruct state, then resumes. Deterministic execution means the same inputs produce the same outputs, so replay works.

TypeScript developers building agents need this durability but cannot always adopt Temporal’s Go/Java runtime.

Trigger.dev’s Durable Execution Model

Trigger.dev V2 provides durable execution without requiring a new runtime. The architecture separates orchestration (task scheduling, retry logic, state persistence) from execution (your TypeScript code).

Core Components

Component	Responsibility	Failure Mode
Task definition	TypeScript function with retry config	Syntax errors caught at build time
Execution engine	Runs tasks in isolated containers	Container crash triggers retry from last checkpoint
State store	Postgres-backed task history	Database failure pauses new tasks, existing tasks resume after recovery
Retry coordinator	Exponential backoff, idempotency keys	Infinite retries require manual intervention
Observability layer	Real-time task logs and traces	Missing spans indicate dropped messages

Tasks run in ephemeral containers. If a container dies, the engine reads the last checkpoint from Postgres and resumes. Unlike Temporal’s deterministic replay, Trigger.dev uses explicit checkpoints. You mark progress points in your code:

export const researchAgent = task({
  id: "research-agent",
  run: async ({ topic }: { topic: string }) => {
    const messages: CoreMessage[] = [
      { role: "user", content: `Research: ${topic}` }
    ];

    for (let i = 0; i < 10; i++) {
      // Checkpoint before expensive LLM call
      await checkpoint(`iteration-${i}`);

      const { text, toolCalls, steps } = await generateText({
        model: anthropic("claude-opus-4-20250514"),
        system: "You are a research assistant with web access.",
        messages,
        tools: { search, browse, analyze },
        maxSteps: 5,
      });

      if (!toolCalls.length) {
        return { summary: text, stepsUsed: steps.length };
      }

      for (const call of toolCalls) {
        const result = await executeTool(call);
        messages.push({ role: "tool", content: result });
      }
    }
  },
});

The checkpoint() call persists messages state to Postgres. If the container crashes during generateText(), the next execution resumes from the last checkpoint with the same message history.

State Persistence Strategy

Trigger.dev uses Postgres as the durable state store. Each task execution gets a row with:

Task ID and execution attempt number
Checkpoint data (serialized JSON)
Retry count and next retry timestamp
Idempotency key for deduplication

When a task fails, the retry coordinator reads the checkpoint, increments the retry count, calculates the next attempt time using exponential backoff, and schedules a new container. The new container deserializes the checkpoint and resumes.

This differs from Temporal’s event log. Temporal replays every decision to reconstruct state. Trigger.dev snapshots state at explicit points. The trade-off:

Temporal: Full audit trail, deterministic replay, larger storage footprint.
Trigger.dev: Smaller state snapshots, faster resume, requires manual checkpoint placement.

For agent workflows with non-deterministic LLM calls, explicit checkpoints avoid the replay problem. You cannot replay an LLM call and expect the same output, so snapshotting the result makes more sense.

Retry Semantics and Idempotency

Trigger.dev provides configurable retry policies per task:

export const apiTask = task({
  id: "api-call",
  retry: {
    maxAttempts: 5,
    factor: 2,
    minTimeout: 1000,
    maxTimeout: 60000,
  },
  run: async (payload) => {
    // Task code
  },
});

The engine generates an idempotency key from the task ID, execution ID, and attempt number. If a retry executes twice (network partition, duplicate message), the second attempt sees the existing result and returns immediately.

Idempotency keys live in Postgres with a TTL. After the TTL expires, the key gets purged. This prevents unbounded table growth but means very old retries might execute twice.

Handling Non-Idempotent Operations

Some operations cannot be made idempotent (sending an email, charging a credit card). Trigger.dev provides a once() wrapper:

await once("send-email", async () => {
  await sendEmail(recipient, body);
});

The once() call checks Postgres for a completion marker. If found, it skips execution. If not, it runs the code and writes the marker. This ensures the operation executes exactly once, even across retries.

Observability Without Separate Infrastructure

Trigger.dev embeds observability into the execution engine. Every task execution produces:

Real-time logs streamed to the dashboard
Trace spans for each checkpoint and external call
Retry history with failure reasons
State snapshots at each checkpoint

The dashboard shows a timeline view of task execution. You see when checkpoints occurred, which API calls failed, and how long each step took. This replaces the need for separate tracing infrastructure like Jaeger or Honeycomb for basic workflows.

For complex multi-task workflows, you can export traces to OpenTelemetry-compatible backends. The engine injects trace context into each task, so distributed traces work across task boundaries.

Deployment Shape

Trigger.dev runs as a managed service or self-hosted. The managed version handles container orchestration, Postgres scaling, and retry coordination. Self-hosted deployments require:

Postgres instance for state storage
Container runtime (Docker, Kubernetes)
Message queue for task scheduling (Redis, RabbitMQ)
Object storage for large payloads (S3, GCS)

The execution engine polls the message queue for scheduled tasks, spawns containers, injects environment variables and secrets, and streams logs back to the dashboard. Containers run with resource limits (CPU, memory, timeout) to prevent runaway tasks.

For agent workflows, you typically set high timeouts (30+ minutes) and generous memory limits (2GB+). LLM calls and tool execution can consume significant resources.

Failure Modes and Mitigation

Failure	Impact	Mitigation
Container crash mid-execution	Task resumes from last checkpoint	Place checkpoints before expensive operations
Postgres unavailable	New tasks queue, existing tasks pause	Use managed Postgres with automatic failover
Infinite retry loop	Task never completes, burns resources	Set `maxAttempts` and monitor retry count metrics
Checkpoint data too large	Postgres write fails, task aborts	Offload large data to object storage, store references in checkpoint
Idempotency key collision	Duplicate execution or skipped work	Use unique execution IDs, avoid manual key generation

The most common failure mode is missing checkpoints. If you place checkpoints too far apart, a crash forces re-execution of expensive work. If you place them too frequently, you pay serialization overhead and Postgres write latency.

For agent workflows, checkpoint after each tool call. This balances durability with performance.

When TypeScript-Native Matters

Temporal’s Go and Java SDKs require running a separate worker process. TypeScript developers must:

Write workflow logic in TypeScript
Compile to JavaScript
Run a Node.js worker that communicates with Temporal server via gRPC
Handle serialization between TypeScript types and Temporal’s protobuf format

Trigger.dev eliminates steps 2-4. You write TypeScript, deploy to the platform, and the engine handles execution. Type safety extends from your code to the execution environment.

This matters for teams without polyglot infrastructure. If your stack is TypeScript end-to-end (Next.js frontend, Node.js backend, TypeScript agents), adding Go or Java for orchestration introduces operational complexity.

The trade-off: Temporal’s event sourcing provides stronger durability guarantees. Trigger.dev’s checkpoint model is simpler but requires careful checkpoint placement.

Technical Verdict

Use Trigger.dev when:

Your stack is TypeScript-native and you want to avoid polyglot orchestration
You need durable execution for agent workflows with retry and state persistence
You prefer explicit checkpoints over deterministic replay
You want embedded observability without separate tracing infrastructure
You are building on a managed platform and do not need full control over the execution runtime

Avoid Trigger.dev when:

You require deterministic replay for audit or compliance
Your workflows span multiple languages (Go, Python, Java)
You need sub-second task latency (checkpoint overhead adds 10-50ms per checkpoint)
You already run Temporal and have invested in its operational model
You need fine-grained control over worker pools and task routing

Trigger.dev’s pivot from Zapier-style automation to Temporal-style durable execution exposes the infrastructure gap for TypeScript developers building agents. The checkpoint-based state model trades replay guarantees for simplicity, which fits non-deterministic LLM workflows better than event sourcing.