mech.app
Automation

Trigger.dev V2: What a Temporal Alternative for TypeScript Reveals About Durable Execution Plumbing

How Trigger.dev pivoted from event triggers to durable execution, exposing retry semantics, state persistence, and the infrastructure gap in TypeScript.

Source: trigger.dev
Trigger.dev V2: What a Temporal Alternative for TypeScript Reveals About Durable Execution Plumbing

Trigger.dev started as a “Zapier alternative for developers” (745 HN points, February 2023). Eight months later, the team shipped V2 and repositioned as a “Temporal alternative for TypeScript” (172 points, October 2023). That pivot exposes a real infrastructure gap: developers want durable execution without learning Go or managing Temporal’s operational complexity.

The shift reveals what “durable execution” actually means when you strip away the workflow engine abstractions. It’s not about event routing or webhook fanout. It’s about persisting function call state, resuming execution after crashes, and handling retries without forcing developers to write explicit state machines.

What Changed Between V1 and V2

V1 focused on event-driven triggers. You connected integrations (GitHub, Slack, Stripe), defined event handlers, and Trigger.dev routed payloads. The model worked for short-lived webhooks but broke down for long-running jobs: multi-hour AI workflows, batch processing, or anything that needed to survive process restarts.

V2 introduced durable execution primitives:

  • Task persistence: Function calls serialize to storage mid-execution
  • Automatic retries: Exponential backoff without manual try/catch blocks
  • Resumable state: Execution picks up from the last checkpoint after failure
  • Observability hooks: Built-in tracing for jobs that span hours or days

The architecture change is fundamental. V1 was stateless request/response. V2 is stateful orchestration with explicit checkpoints.

How Durable Execution Works in TypeScript

Temporal achieves durability through event sourcing. Every workflow decision (function call, timer, signal) appends an event to a log. On replay, the runtime reconstructs state by replaying events. This works but requires a custom DSL and runtime.

Trigger.dev takes a different approach. Instead of replaying events, it checkpoints function state at specific boundaries:

  1. Entry point: Task function starts, initial state persists
  2. Await boundaries: Every await call checkpoints before and after
  3. Tool calls: External API calls serialize request/response pairs
  4. Completion: Final result persists with execution metadata

The runtime intercepts async operations and writes state snapshots to Postgres. On failure, it reloads the last checkpoint and resumes from the next await.

export const processVideo = task({
  id: "process-video",
  run: async ({ videoUrl }: { videoUrl: string }) => {
    // Checkpoint 1: Task starts
    const download = await downloadVideo(videoUrl);
    
    // Checkpoint 2: Download completes
    const transcript = await transcribeAudio(download.audioPath);
    
    // Checkpoint 3: Transcription completes
    const summary = await generateSummary(transcript.text);
    
    // Checkpoint 4: Summary completes
    return { summary, duration: download.duration };
  }
});

If the process crashes after transcription, the runtime skips downloadVideo and transcribeAudio on restart. It loads the cached transcript and continues with generateSummary.

Retry Semantics and Idempotency

Automatic retries sound simple until you hit side effects. If a task sends an email, crashes, and retries, you don’t want duplicate emails.

Trigger.dev handles this with execution IDs and idempotency keys:

  • Every task run gets a unique execution ID
  • External API calls include the execution ID in headers or request metadata
  • Idempotent APIs (Stripe, Twilio) deduplicate based on the key
  • Non-idempotent calls require explicit guards (database checks, distributed locks)

The platform doesn’t enforce idempotency. It provides the primitives (execution IDs, checkpoint boundaries) and expects developers to handle side effects correctly.

Failure ModeTrigger.dev BehaviorDeveloper Responsibility
Network timeoutAutomatic retry with exponential backoffEnsure API calls are idempotent or use execution ID
Process crashResume from last checkpointDesign checkpoints around side effects
Code deploymentGraceful drain, new version picks up pending tasksMaintain backward-compatible state schemas
Database unavailableRetry with circuit breakerHandle transient failures in task logic

State Persistence and Queue Management

Trigger.dev uses Postgres for state storage. Each task execution writes:

  • Execution metadata (ID, status, start time, retry count)
  • Checkpoint snapshots (serialized function state at await boundaries)
  • Event log (tool calls, errors, state transitions)

The queue is also Postgres-backed. Tasks enter a pending state, workers poll for available jobs, and execution moves to running. On completion or failure, the row updates with final state.

This design trades throughput for simplicity. Postgres can’t match Redis or RabbitMQ for queue performance, but it eliminates operational complexity. No separate message broker, no distributed coordination, no split-brain scenarios between queue and state store.

The bottleneck appears around 10,000 concurrent tasks. Beyond that, you need connection pooling, read replicas, or a move to dedicated queue infrastructure.

Observability for Long-Running Jobs

When a task runs for six hours, you need more than logs. Trigger.dev exposes:

  • Real-time execution view: Current checkpoint, elapsed time, next scheduled retry
  • Trace timeline: Visual breakdown of await boundaries and tool calls
  • State snapshots: Inspect serialized state at any checkpoint
  • Retry history: See every attempt, failure reason, and backoff delay

The observability model assumes you’re debugging distributed systems, not single-process scripts. You need to answer: “Where did this task stall?” and “Why did this retry loop 47 times?”

The platform streams execution events to a dashboard. Each checkpoint emits a trace span. Tool calls include request/response payloads (with PII redaction hooks). Errors capture stack traces and serialized state.

Deployment Shape and Failure Modes

Trigger.dev offers cloud-hosted and self-hosted options. The cloud version runs workers in isolated containers with automatic scaling. Self-hosted requires:

  • Postgres (state and queue storage)
  • Redis (optional, for distributed locks and rate limiting)
  • Worker processes (Node.js or Bun runtimes)
  • Coordinator service (schedules tasks, manages retries)

The failure modes differ by deployment:

Cloud-hosted risks:

  • Vendor lock-in for state storage
  • Cold start latency for infrequent tasks
  • Network egress costs for large payloads

Self-hosted risks:

  • Database failover complexity
  • Worker autoscaling configuration
  • Checkpoint storage growth (unbounded if not pruned)

Both models share a core risk: checkpoint serialization failures. If your task holds non-serializable state (open file handles, WebSocket connections), the runtime can’t persist it. The task crashes and retries from the last valid checkpoint, potentially losing progress.

How It Compares to Temporal

Temporal is a workflow engine. Trigger.dev is a task runner with durable execution. The distinction matters:

Temporal strengths:

  • Multi-language support (Go, Java, Python, TypeScript)
  • Complex workflow patterns (sagas, child workflows, signals)
  • Battle-tested at scale (Uber, Netflix, Stripe)

Trigger.dev strengths:

  • Native TypeScript experience (no DSL, no code generation)
  • Simpler operational model (Postgres instead of Cassandra + Elasticsearch)
  • Faster onboarding (minutes vs. days)

Temporal is the right choice for mission-critical workflows with complex coordination. Trigger.dev fits teams that want durable execution without the operational overhead.

Technical Verdict

Use Trigger.dev when:

  • You’re building in TypeScript and want native async/await semantics
  • Your tasks are mostly linear (API calls, batch jobs, AI workflows)
  • You prefer operational simplicity over maximum throughput
  • You need durable execution but not full workflow orchestration

Avoid it when:

  • You need sub-second task latency (checkpointing adds overhead)
  • Your workflows require complex coordination (parallel branches, dynamic fan-out)
  • You’re already invested in Temporal or another workflow engine
  • You need guaranteed exactly-once semantics (Trigger.dev provides at-least-once)

The platform’s real contribution is showing that durable execution doesn’t require a PhD in distributed systems. You can get retry semantics, state persistence, and resumable execution with Postgres and careful checkpoint design. That’s a useful middle ground between “hope the process doesn’t crash” and “run a Temporal cluster.”