Trigger.dev launched in February 2023 as a “developer-first Zapier alternative” and earned 745 points on Hacker News. Eight months later, the team shipped V2 and repositioned as a “Temporal alternative for TypeScript.” That pivot tells you everything about what developers building agent infrastructure actually need: not webhook glue, but durable execution with retries, resumable state, and long-running task orchestration.
The shift exposes a fundamental architectural divide. Zapier-style tools handle event-driven workflows where each step completes in seconds. Temporal-style systems handle long-running tasks that span minutes or hours, survive crashes, and resume exactly where they left off. Trigger.dev’s V2 architecture reveals what it takes to build the second kind in TypeScript.
What Durable Execution Actually Means
Durable execution guarantees that a function runs to completion even if the process crashes, the network fails, or the server restarts. The runtime must:
- Serialize execution state at checkpoints
- Persist that state to durable storage
- Resume from the last checkpoint after a failure
- Replay deterministic operations without side effects
Temporal solves this with a Go-based history service that logs every workflow event. Workers replay the event log to reconstruct state. This works because Go’s concurrency model and explicit state management make replay deterministic.
TypeScript presents different challenges. JavaScript closures capture lexical scope, async/await creates implicit state machines, and the event loop makes timing non-deterministic. You cannot naively serialize a JavaScript function mid-execution and resume it later.
Trigger.dev’s approach:
- Tasks are TypeScript functions wrapped in a
task()decorator - The runtime instruments async boundaries (await points)
- State snapshots happen at each await
- Retries resume from the last successful await, not from the start
This means your task code looks like normal TypeScript, but the runtime tracks execution progress behind the scenes.
Architecture Comparison: Trigger.dev vs Temporal
| Component | Temporal | Trigger.dev V2 |
|---|---|---|
| Runtime | Go workers + history service | Node.js/Bun workers + Postgres |
| State persistence | Event sourcing (append-only log) | Checkpoint snapshots at await points |
| Replay mechanism | Full event replay from start; re-executes deterministic code | Resume from last checkpoint; skips completed steps using persisted results |
| Language support | Go, Java, Python, TypeScript (SDK) | TypeScript native |
| Deployment model | Self-hosted cluster or Temporal Cloud | Managed platform or self-hosted |
| Observability | Temporal UI + history queries | Real-time dashboard + trace logs |
| Failure recovery | Automatic replay from history | Automatic retry from checkpoint |
Temporal’s event sourcing gives you complete audit trails and time-travel debugging. Every decision, timer, and activity is logged. You can replay a workflow from any point in history.
Trigger.dev’s checkpoint model trades audit granularity for simpler state management. You get resumability without replaying every operation. Intermediate results are persisted and reused on retry. This matters for non-deterministic operations like API calls. Temporal requires you to wrap them in activities, while Trigger.dev checkpoints after each await.
Task Orchestration Plumbing
Here’s what a durable task looks like:
import { task } from "@trigger.dev/sdk/v3";
export const processDocument = task({
id: "process-document",
retry: {
maxAttempts: 3,
factor: 2,
minTimeout: 1000,
},
run: async (payload: { documentId: string }) => {
// Checkpoint 1: Persisted to Postgres after completion
const doc = await fetchDocument(payload.documentId);
// Checkpoint 2: OCR result stored; skipped on retry if already completed
const text = await ocrService.extract(doc.url);
// Checkpoint 3: AI response persisted; reused if this step succeeded before crash
const analysis = await openai.chat.completions.create({
model: "gpt-4",
messages: [{ role: "user", content: text }],
});
// Checkpoint 4: Final state written atomically
await saveAnalysis(payload.documentId, analysis);
return { status: "complete", analysisId: analysis.id };
},
});
If the worker crashes after the OCR step, the next retry starts from checkpoint 2. The document fetch and OCR results are already persisted. The runtime skips those steps and resumes at the OpenAI call.
Key plumbing details:
- Each
awaitcreates an implicit checkpoint - The runtime serializes the payload and intermediate results
- Retries use exponential backoff with jitter
- Timeouts apply per-checkpoint, not per-task
- Concurrency limits prevent queue overload
State Management Across Failures
The hardest part of durable execution is handling partial state. If a task makes three API calls and crashes after the second, you need to:
- Avoid re-executing the first two calls
- Preserve their results for the third call
- Handle cases where the second call succeeded but the state write failed
Trigger.dev’s checkpoint system addresses this by:
- Writing state to Postgres after each await completes
- Tagging each checkpoint with a sequence number
- Comparing sequence numbers on retry to skip completed steps
- Using database transactions to ensure state consistency
This creates a failure mode: if Postgres is unavailable, tasks cannot checkpoint and will fail. Temporal’s event log has the same dependency on its persistence layer, but the append-only model is simpler to scale and replicate.
Long-Running Task Failure Modes
Agent workflows often run for minutes or hours. Common failure scenarios:
Timeout cascades: A 30-minute task calls five APIs, each with a 10-minute timeout. If the first API hangs, the entire task times out before reaching the second checkpoint. Solution: set per-step timeouts, not just task-level timeouts.
Non-deterministic retries: If your task reads the current timestamp or generates random IDs, retries produce different results. Solution: generate IDs and timestamps once, before the first checkpoint, and pass them as state.
External state drift: A task fetches a user record, processes it, then updates the record. If the user record changes between retries, the update may overwrite newer data. Solution: use optimistic locking or version checks.
Queue backpressure: If tasks arrive faster than workers can process them, the queue grows unbounded. Solution: set concurrency limits and reject new tasks when the queue is full.
Trigger.dev provides concurrency controls and per-task retry policies, but you still need to design for idempotency and state consistency.
Observability and Debugging
Durable execution makes debugging harder because execution is non-linear. A task may pause for hours, resume on a different worker, and retry multiple times.
Trigger.dev’s observability stack:
- Real-time dashboard showing task status, duration, and retry count
- Trace logs for each checkpoint with payload and result
- Span-based tracing to correlate task execution across retries (a span represents a single operation like one API call; tracing links spans across retries to show the full execution path)
- Webhook notifications for task completion or failure
This is less granular than Temporal’s event history, which logs every decision and timer. But for most use cases, checkpoint-level visibility is enough.
Deployment Shape
Trigger.dev V2 runs as a managed platform or self-hosted. The self-hosted option requires:
- Postgres for state persistence
- Redis for queue management
- Node.js or Bun workers
- Optional: S3 for large payloads
The managed platform handles scaling, monitoring, and infrastructure. Workers auto-scale based on queue depth. You deploy task code as Docker images or via CLI.
Temporal requires more operational overhead: a cluster with frontend, history, matching, and worker services, plus Cassandra or Postgres for persistence. The tradeoff is more control over data residency and scaling behavior.
When TypeScript-Native Matters
Temporal’s TypeScript SDK is a thin wrapper over the Go runtime. You write TypeScript, but the execution model is Go’s. This creates friction:
- Workflow code must be deterministic (no Date.now(), Math.random(), or external I/O)
- Activities run in separate processes with serialization overhead
- Debugging requires understanding the Go runtime’s behavior
Trigger.dev’s TypeScript-native runtime removes these constraints. You write normal async TypeScript. The runtime handles durability without requiring you to separate workflows from activities.
This matters for teams that want to move fast without learning Temporal’s mental model. The tradeoff is less control over replay behavior and audit trails.
Technical Verdict
Use Trigger.dev V2 when:
- Your tasks have 4+ async boundaries and average runtime between 2 and 60 minutes (the checkpoint model works best when execution has clear async breakpoints, not tight loops)
- You need automatic retries for API orchestration workflows where each step is an external call (document processing, data enrichment, multi-step AI agent tasks)
- Your team writes TypeScript and wants to avoid the operational cost of running a Temporal cluster (managed platform handles scaling; self-hosting requires only Postgres and Redis)
- You can tolerate checkpoint-level observability instead of full event replay (you get execution status per await, not per conditional branch)
- Your workflows are primarily linear pipelines with occasional branches, not complex state machines with parallel execution or human-in-the-loop steps
Avoid it when:
- You need sub-second event replay for compliance audits or financial transaction workflows (Trigger.dev’s checkpoint model does not log every decision, only await boundaries)
- Your workflows require complex branching with parallel execution, sagas, or compensation logic (Temporal’s workflow DSL handles these patterns natively; Trigger.dev requires manual orchestration)
- You already run Temporal and need consistent orchestration across Go, Python, and TypeScript services (adding a second orchestration system creates operational overhead)
- You require on-premise deployment with strict data residency requirements and cannot use managed infrastructure (self-hosting is possible but less mature than Temporal’s deployment options)
- Your tasks execute tight loops or CPU-bound operations without async boundaries (the checkpoint model depends on await points; synchronous code cannot be resumed mid-execution)
The V1-to-V2 pivot reveals a market truth: developers building agent infrastructure need more than webhook glue. They need durable execution, automatic retries, and resumable state. Trigger.dev’s TypeScript-native approach trades Temporal’s event sourcing rigor for simpler developer experience. That tradeoff works for most agent and automation use cases.
Source Links
- Trigger.dev V2 Announcement (172 points, 39 comments)
- Trigger.dev V1 Show HN (745 points, 190 comments)
- Trigger.dev GitHub
- Trigger.dev Documentation