mech.app
Automation

Trigger.dev V2: What a Temporal Alternative for TypeScript Reveals About Durable Execution Plumbing

How Trigger.dev pivoted from event hooks to durable execution primitives, exposing the infrastructure gap between workflow orchestration and background...

Source: trigger.dev
Trigger.dev V2: What a Temporal Alternative for TypeScript Reveals About Durable Execution Plumbing

Trigger.dev launched as a Zapier alternative (745 HN points), then pivoted to a Temporal alternative (172 points). That trajectory exposes a real infrastructure gap: TypeScript developers building multi-step agent workflows need durable execution primitives, not just webhook glue.

The V2 pivot happened because early users kept asking for long-running background jobs with retry semantics, not event-driven automation chains. That feedback loop mirrors what agent builders face when moving from proof-of-concept to production: you need state persistence, resumption after crashes, and observability across steps that might take hours or days.

What Changed Between V1 and V2

V1 model: Webhook-triggered workflows with integrations. You connected external services (GitHub, Slack, Stripe) and ran code when events fired. Think Zapier but with TypeScript instead of a visual builder.

V2 model: Durable background tasks with execution guarantees. You define jobs that survive process restarts, handle retries automatically, and expose their internal state for debugging. Think Temporal but without the operational complexity of running a cluster.

The shift is from event-driven to execution-driven. V1 optimized for connecting APIs. V2 optimizes for running code that must complete even if your server crashes halfway through.

Durable Execution Primitives

Trigger.dev exposes these building blocks:

  • Task definitions: Functions wrapped in a task() call that registers them with the execution runtime
  • Automatic retries: Exponential backoff with configurable limits, no manual try-catch needed
  • Wait primitives: wait.for() and wait.until() that pause execution without holding a process
  • Idempotency keys: Built-in deduplication so the same input never runs twice
  • Versioning: Deploy new code without breaking in-flight tasks from old versions

The runtime persists execution state after every step. If your process dies, the task resumes from the last checkpoint when a new worker picks it up.

import { task, wait } from "@trigger.dev/sdk/v3";

export const processDocument = task({
  id: "process-document",
  run: async (payload: { documentId: string }) => {
    // Step 1: Extract text (persisted checkpoint)
    const text = await extractText(payload.documentId);
    
    // Step 2: Wait 5 minutes (releases worker, no timeout)
    await wait.for({ minutes: 5 });
    
    // Step 3: Analyze sentiment (new checkpoint)
    const sentiment = await analyzeSentiment(text);
    
    // Step 4: Retry up to 3 times on failure
    const summary = await generateSummary(text, {
      retry: { maxAttempts: 3 }
    });
    
    return { sentiment, summary };
  }
});

Each await becomes a checkpoint. If the worker crashes after extractText() but before analyzeSentiment(), the next worker starts from the wait step with text already available.

State Persistence vs. Event Sourcing

Temporal uses event sourcing: every state change is an event appended to a log. Replay the log to reconstruct current state. This gives you complete history but requires a database cluster and careful schema evolution.

Trigger.dev uses checkpoint snapshots: serialize execution state after each step, store it in Postgres, resume from the latest snapshot. You lose fine-grained history but gain simpler deployment (single database, no separate workflow engine).

AspectTemporalTrigger.dev
State modelEvent sourcing (append-only log)Checkpoint snapshots (latest state)
InfrastructureSeparate cluster (Cassandra/Postgres + workers)Single Postgres + worker processes
History retentionFull replay capabilityLast successful checkpoint only
Language supportPolyglot (Go, Java, TypeScript, Python)TypeScript only
Versioning strategyWorkflow definitions versioned separatelyCode and state versioned together
DebuggingTime-travel replay through eventsInspect last checkpoint state

The trade-off: Temporal gives you audit trails and deterministic replay. Trigger.dev gives you faster onboarding and lower operational overhead. For agent workflows where you care about completion more than history, the checkpoint model works.

Execution Model for Multi-Step Agents

Agent workflows often look like this:

  1. Call LLM with tools
  2. Execute tool (API call, database query, file operation)
  3. Feed result back to LLM
  4. Repeat until done or max iterations

Each tool execution is a checkpoint boundary. If the API call times out or the worker crashes, you resume from the last LLM response without re-running earlier steps.

export const researchAgent = task({
  id: "research-agent",
  run: async ({ topic }: { topic: string }) => {
    const messages: CoreMessage[] = [
      { role: "user", content: `Research: ${topic}` }
    ];
    
    for (let i = 0; i < 10; i++) {
      // Checkpoint before LLM call
      const { text, toolCalls } = await generateText({
        model: anthropic("claude-opus-4"),
        messages,
        tools: { search, browse, analyze },
        maxSteps: 5
      });
      
      if (!toolCalls.length) {
        return { summary: text };
      }
      
      // Each tool execution is a separate checkpoint
      for (const call of toolCalls) {
        const result = await executeTool(call);
        messages.push({ role: "tool", content: result });
      }
    }
  }
});

The runtime serializes messages after every iteration. If executeTool() fails on iteration 3, the next worker resumes at iteration 3 with the first two iterations already complete.

Observability and Debugging

Trigger.dev provides:

  • Real-time logs: Stream logs from running tasks to the dashboard
  • Execution traces: See which steps completed, which failed, and how long each took
  • Replay capability: Re-run failed tasks with the same input
  • State inspection: View serialized checkpoint data for debugging

The dashboard shows a timeline view: each checkpoint is a node, edges show transitions, failed steps are highlighted. You can click into any step to see input, output, and error details.

This matters for agent workflows because failures are often non-deterministic (API rate limits, transient network errors, LLM refusals). You need to see where the failure happened and what state the agent was in.

Deployment Shape

Trigger.dev runs in two modes:

Cloud: Fully managed. You push code, they run workers, handle scaling, and manage the database. Good for getting started or teams that don’t want infrastructure work.

Self-hosted: You run the coordinator and workers in your own infrastructure. Requires Postgres and Redis. Good for compliance requirements or high-volume workloads where you want cost control.

Both modes use the same SDK. The coordinator handles task scheduling, checkpoint storage, and retry logic. Workers pull tasks from a queue, execute them, and report results back.

Scaling happens at the worker level. Add more worker processes to increase throughput. The coordinator is stateless except for the database connection.

Failure Modes

Checkpoint serialization: If your task state includes non-serializable objects (open file handles, database connections, class instances with methods), checkpoints fail. Solution: keep state as plain JSON-serializable data.

Long-running steps: A single step that runs for hours without yielding holds a worker and can’t checkpoint. Solution: break long operations into smaller steps with explicit wait calls.

Version skew: Deploying new code while old tasks are in-flight can cause deserialization errors if you changed the state shape. Solution: use versioning to run old and new code side-by-side.

Database contention: High task throughput can overwhelm Postgres with checkpoint writes. Solution: batch checkpoints or use a separate database for task state.

Retry storms: If a task fails consistently (bad API key, invalid input), retries can pile up. Solution: set maxAttempts and use dead-letter queues for manual inspection.

When TypeScript-Only Matters

Temporal supports multiple languages because it separates workflow definitions from execution. You write workflows in Go, call activities in Python, and the engine coordinates across languages.

Trigger.dev is TypeScript-only because it embeds execution logic in the same process as your application code. This simplifies deployment (no separate workflow engine) but locks you into the Node.js ecosystem.

For agent builders, this is often fine. Most LLM SDKs are JavaScript-first (OpenAI, Anthropic, Vercel AI SDK). If your tools are already in TypeScript, staying in one language reduces friction.

If you need to call Python ML models or Rust performance-critical code, you either wrap them in HTTP APIs or use a polyglot orchestrator like Temporal.

Comparison with BullMQ and Inngest

BullMQ: Redis-backed job queue. No durable execution (if your worker crashes mid-job, you start over). Good for fire-and-forget tasks, bad for multi-step workflows.

Inngest: Event-driven workflows with step functions. Similar to Trigger.dev V2 but optimized for event triggers (webhooks, cron, user actions). Trigger.dev optimized for background tasks you invoke directly.

Trigger.dev: Durable execution for long-running background jobs. Best for tasks that must complete even if they take hours and survive crashes.

FeatureBullMQInngestTrigger.dev
Execution modelJob queueEvent-driven stepsDurable tasks
State persistenceNone (retry from start)Checkpoint per stepCheckpoint per step
Trigger mechanismEnqueue jobWebhook/eventDirect invocation
ObservabilityBasic job statusEvent timelineExecution trace
Self-hostingYes (Redis only)NoYes (Postgres + Redis)

Technical Verdict

Use Trigger.dev when:

  • You’re building in TypeScript and don’t need polyglot support
  • Your workflows are background tasks (data processing, agent loops, batch jobs) rather than user-facing request-response
  • You want durable execution without running a Temporal cluster
  • You need observability and replay for debugging multi-step failures
  • You’re okay with checkpoint-based state instead of full event history

Avoid Trigger.dev when:

  • You need sub-second latency (checkpoints add overhead)
  • Your workflows span multiple languages (Python ML, Rust compute, Go services)
  • You need full audit trails and deterministic replay (use Temporal)
  • Your tasks are simple fire-and-forget jobs (use BullMQ)
  • You’re building event-driven automation chains (use Inngest or n8n)

The V1-to-V2 pivot reveals what developers actually need for agent infrastructure: not more integrations, but better execution guarantees. Trigger.dev fills the gap between simple job queues and full workflow engines for teams that live in TypeScript.