mech.app
Automation

Trigger.dev V2: What a Temporal Alternative for TypeScript Reveals About Durable Execution Plumbing

How Trigger.dev pivoted from event hooks to durable execution primitives, exposing the infrastructure gap for long-running agent tasks.

Source: trigger.dev
Trigger.dev V2: What a Temporal Alternative for TypeScript Reveals About Durable Execution Plumbing

Trigger.dev launched in February 2023 as a “developer-first Zapier alternative” and pulled 745 points on Hacker News. Eight months later, the team shipped V2 and repositioned as a “Temporal alternative for TypeScript.” That pivot tells you everything about the infrastructure gap between event-driven hooks and durable execution primitives.

The shift happened because early users didn’t want webhook glue. They wanted retries, state persistence, and resumable workflows for long-running tasks. The kind of plumbing you need when an AI agent calls six APIs, waits for human approval, then continues three hours later without losing context.

What Changed Between V1 and V2

V1 was event-driven. You defined triggers (webhooks, schedules, events) and wrote handlers. It looked like Zapier but with code. The problem: no built-in durability. If your handler crashed halfway through a multi-step process, you started over or built your own state machine.

V2 introduced task primitives with automatic retries, queue management, and step-level resumption. The API surface looks like this:

export const processDocument = task({
  id: "process-document",
  run: async (payload: { documentId: string }) => {
    // Each await is a resumption point
    const doc = await fetchDocument(payload.documentId);
    const extracted = await extractText(doc);
    const analyzed = await analyzeContent(extracted);
    
    // If this fails, only this step retries
    await saveResults(analyzed);
    
    return { status: "complete", wordCount: extracted.length };
  },
});

Each await creates a checkpoint. If analyzeContent throws, the task resumes from that step. You don’t write retry logic. You don’t manage queues. The runtime handles it.

Durable Execution vs. Event Sourcing

Temporal uses event sourcing. Every state transition is an event. Replay the event log, reconstruct state. It’s bulletproof but verbose. You write workflows in Go or Java, deploy workers, manage a separate cluster.

Trigger.dev uses checkpoint-based durability. The runtime serializes state at each await, stores it, and resumes on failure. It’s lighter weight but trades off some guarantees. You can’t replay arbitrary history. You get the last known good state.

AspectTemporalTrigger.dev
State modelEvent sourcing (full replay)Checkpoint-based (last state)
LanguageGo, Java, TypeScript (via SDK)TypeScript-native
DeploymentSeparate cluster + workersManaged service or self-hosted workers
Replay granularityFull event historyStep-level resumption
Learning curveSteep (workflow DSL, activities)Shallow (async/await)

Runtime Boundaries and Deployment Shape

Trigger.dev runs tasks in isolated worker processes. You define tasks in your codebase, but execution happens in a separate runtime. The managed service handles worker pools, scaling, and state storage. Self-hosted mode requires running your own worker fleet.

The deployment model:

  1. Task definition: You write tasks in your app using the SDK.
  2. Registration: Tasks register with the Trigger.dev API on deploy.
  3. Execution: When triggered, the runtime spins up a worker, loads your task code, executes it.
  4. State persistence: After each checkpoint, state serializes to Postgres or your configured store.
  5. Resumption: On failure, the runtime loads the last checkpoint and continues.

This is different from embedding workers in your Node.js app. Your app triggers tasks via API calls. The runtime manages execution. It’s closer to AWS Lambda than a background job library.

Observability and Debugging Primitives

The V2 dashboard exposes:

  • Step-level traces: See which step failed, how long each took, what state existed at each checkpoint.
  • Replay controls: Retry individual steps without re-running the entire task.
  • Queue metrics: Concurrency limits, backlog depth, throughput per task type.
  • Real-time logs: Streamed from worker processes, tagged by task run ID.

You can’t replay from arbitrary points in history like Temporal. But you can inspect intermediate state, retry failed steps, and trace execution flow without SSH-ing into servers.

Where It Fits in Agent Orchestration

Trigger.dev solves the “I need this to finish even if it takes six hours” problem. Use it when:

  • Multi-step agent workflows: Call LLM, wait for tool execution, call LLM again, repeat.
  • Human-in-the-loop approval: Pause execution, send notification, resume when user responds.
  • Batch processing with retries: Process 10,000 documents, retry failures individually.
  • Scheduled background tasks: Cron jobs that can’t timeout (data syncs, report generation).

It doesn’t replace real-time request/response flows. If your agent needs to respond in under 30 seconds, you’re better off with a stateless API and external state management.

Failure Modes and Operational Risks

State serialization limits: Your task state must serialize to JSON. No functions, no circular references, no class instances with methods. If you pass complex objects between steps, you’ll hit serialization errors.

Cold start latency: Workers spin up on demand. First execution of a task after idle time takes longer. Not a problem for background jobs. A problem if you’re chaining tasks in a user-facing flow.

Vendor lock-in (managed service): The managed service stores your state. Migrating to self-hosted or another platform means rebuilding state management. Open source helps, but you still need to run workers and handle persistence.

Concurrency control complexity: Queue limits and concurrency settings are global per task type. If you need per-tenant rate limiting or dynamic concurrency, you’re writing custom logic on top.

Technical Verdict

Use Trigger.dev when:

  • You’re building in TypeScript and want durable execution without learning Temporal’s workflow DSL.
  • Your tasks run longer than serverless function timeouts (15 minutes on Lambda, 60 seconds on Vercel).
  • You need step-level retries and don’t care about full event replay.
  • You want managed infrastructure and are okay with vendor-hosted state.

Avoid it when:

  • You need sub-second latency or real-time agent responses.
  • Your state model requires full event sourcing and audit trails.
  • You’re already running Temporal and don’t want to manage two orchestration layers.
  • You need complex distributed sagas with compensation logic (Temporal’s strength).

The V1-to-V2 pivot exposed a real gap: developers wanted durable execution but didn’t want to operate Temporal clusters or learn workflow DSLs. Trigger.dev filled that gap for the TypeScript ecosystem. It’s not a Temporal replacement for every use case, but for long-running agent tasks in Node.js apps, it’s the path of least resistance.