Trigger.dev launched in February 2023 as a Zapier alternative, collected 745 points on Hacker News, then pivoted eight months later to durable execution primitives. The V2 architecture (172 points, October 2023) positions itself as a Temporal competitor for TypeScript developers. It offers workflow guarantees without requiring Go or event sourcing knowledge.
The shift matters because most workflow engines force you into one of two camps: JavaScript task queues with weak durability (BullMQ, Agenda) or strongly consistent systems that require polyglot runtimes (Temporal, Cadence). Trigger.dev tries to split the difference by offering Temporal-style semantics in a TypeScript-native execution model.
What Changed Between V1 and V2
V1 focused on integration connectors and webhook triggers. You wrote event handlers that responded to external services. V2 rebuilt the core around long-running tasks with explicit retry boundaries, state checkpoints, and scheduling primitives.
Key architectural differences:
- V1: Event-driven handlers, no built-in state persistence, relied on external queues
- V2: Durable execution model, automatic retries, checkpoint-based state recovery
- Execution isolation: Moved from shared Node processes to containerized task runners
- Observability: Added structured tracing with step-level replay and debugging hooks
The pivot came from user feedback. Developers wanted to run multi-hour jobs, handle transient failures gracefully, and debug workflow state without parsing logs. V1’s event model could not guarantee those properties.
State Persistence Without Event Sourcing
Temporal persists every workflow decision as an immutable event log. Replay reconstructs state by re-executing the workflow code against that log. Trigger.dev skips event sourcing and uses explicit checkpoints instead.
How it works:
- Checkpoint API: You call
checkpoint(key, data)at explicit boundaries in your task code - Postgres storage: Checkpoints write to a relational table with task ID, step index, and serialized state
- Retry recovery: On failure, the runtime reloads the last checkpoint and resumes from that step
- No replay: The task does not re-execute prior steps, it jumps directly to the failure point
This trades Temporal’s deterministic replay for simpler mental models. You control when state gets persisted. The downside: if you forget to checkpoint before an expensive operation, a crash forces you to redo that work.
Checkpoint Example
export const processOrder = task({
id: "process-order",
run: async ({ orderId }: { orderId: string }) => {
// Fetch order data
const order = await db.order.findUnique({ where: { id: orderId } });
// Checkpoint after fetch
await checkpoint("order-fetched", { order });
// Call payment API (might fail)
const payment = await stripe.charges.create({
amount: order.total,
currency: "usd",
source: order.paymentToken,
});
// Checkpoint after payment
await checkpoint("payment-complete", { payment });
// Update inventory (idempotent)
await db.inventory.decrement({ productId: order.productId });
return { orderId, paymentId: payment.id };
},
});
If the Stripe call fails, the retry starts from the order-fetched checkpoint. The database fetch does not run again. If inventory decrement fails, you restart from payment-complete and skip the charge.
Concurrency and Scheduling Primitives
Trigger.dev exposes three execution modes:
| Mode | Trigger | Concurrency Control | Use Case |
|---|---|---|---|
| Scheduled | Cron expression | Global limit per task | Nightly ETL, report generation |
| Event-driven | Webhook, SDK call | Queue-based with max workers | User signup flows, API webhooks |
| Realtime | Frontend SDK | Per-connection stream | Live progress updates, chat agents |
Queue Backpressure
Event-driven tasks use a priority queue backed by Postgres. You configure maxConcurrency per task definition. When the queue depth exceeds the worker pool size, new tasks wait in PENDING state.
Backpressure handling:
- Rate limiting: Exponential backoff for tasks that fail repeatedly
- Dead letter queue: After N retries, tasks move to a manual review queue
- Priority override: You can bump specific task instances to the front
The system avoids distributed locking to eliminate coordination overhead. The coordinator polls the queue table with SELECT FOR UPDATE SKIP LOCKED to claim tasks. This design accepts the risk of occasional duplicate task claims in exchange for simpler infrastructure. Workers run in separate containers and report heartbeats every 30 seconds.
Scheduled Task Guarantees
Cron schedules persist in the database with next-run timestamps. A background process scans for due tasks every 10 seconds and enqueues them. If the scheduler crashes, the next instance picks up missed runs on startup.
Missed execution behavior:
- Default: Skip missed runs, schedule the next occurrence
- Catchup mode: Enqueue all missed runs sequentially
- Idempotency key: Prevent duplicate execution if scheduler restarts mid-enqueue
No distributed cron coordination. If you run multiple scheduler instances, you need external leader election (not included).
Execution Isolation Model
V2 runs each task in a dedicated Docker container. The runtime spins up a container from a pre-built image that includes your task code, executes the function, then tears down the container.
Isolation boundaries:
- Filesystem: Ephemeral, wiped after task completion
- Network: Outbound allowed, inbound blocked except for health checks
- Memory: Configurable limit (default 512MB), OOM kills trigger retries
- CPU: Shared, no hard limits unless you configure cgroups
The container runtime uses a sidecar proxy to intercept checkpoint calls and forward them to the coordinator API. Your task code never talks directly to Postgres.
Cold Start Mitigation
Container startup adds 2-5 seconds of latency. Trigger.dev keeps a warm pool of containers for frequently executed tasks. When a task completes, the container stays alive for 60 seconds. If another instance of the same task arrives, it reuses the warm container.
Warm pool sizing:
- Per-task limit: Max 5 warm containers per task definition
- Global limit: Max 50 warm containers across all tasks
- Eviction policy: LRU, with priority boost for tasks with high execution frequency
You can disable warm pools for tasks that need strict isolation or have large memory footprints.
Observability and Debugging
Every task execution generates a trace with step-level spans. The UI shows:
- Timeline view: Visual breakdown of checkpoint boundaries and retry attempts
- State inspector: JSON view of checkpoint payloads at each step
- Log aggregation: Structured logs with correlation IDs across retries
- Replay mode: Re-run a failed task from any checkpoint with modified input
Trace IDs propagate through HTTP headers if your task calls external APIs. You can link Trigger.dev traces to OpenTelemetry spans in your own services.
Failure Mode Visibility
The dashboard flags common failure patterns:
- Timeout loops: Tasks that hit max duration repeatedly
- Retry storms: High failure rate across multiple task instances
- Checkpoint gaps: Long execution spans without checkpoints (risk of wasted retries)
- Memory leaks: Containers that grow memory usage across warm starts
No automatic remediation. The system surfaces the pattern and lets you decide whether to adjust retry limits, add checkpoints, or refactor task logic.
Comparison: Trigger.dev vs. Temporal
| Dimension | Trigger.dev | Temporal |
|---|---|---|
| Language | TypeScript only | Polyglot (Go, Java, Python, TypeScript) |
| State model | Explicit checkpoints | Event sourcing with replay |
| Execution | Containerized tasks | Worker processes with sticky queues |
| Scheduling | Cron + event triggers | Timers, signals, child workflows |
| Observability | Built-in UI with replay | Requires external tracing setup |
| Self-hosting | Docker Compose, Kubernetes | Kubernetes with Cassandra/Postgres |
| Learning curve | Low (familiar async/await) | High (workflow determinism rules) |
Temporal guarantees exactly-once execution through deterministic replay. Trigger.dev guarantees at-least-once with idempotency keys. If your task has side effects (API calls, database writes), you must handle deduplication yourself.
Deployment Shape
Trigger.dev runs as three services:
- Coordinator: Handles task scheduling, queue management, checkpoint storage
- Worker pool: Spins up containers and executes task code
- API gateway: Exposes SDK endpoints for triggering tasks and querying state
Managed cloud deployment uses AWS ECS for workers and RDS Postgres for state. Self-hosted setup provides Docker Compose and Helm charts.
Resource requirements (self-hosted):
- Coordinator: 1 CPU, 2GB RAM, scales horizontally
- Worker pool: 2 CPU, 4GB RAM per worker node, autoscales based on queue depth
- Postgres: 2 CPU, 8GB RAM, replication recommended for production
No external dependencies beyond Postgres and Docker. No Kafka, no Redis, no Elasticsearch.
Technical Verdict
Use Trigger.dev if:
- Your entire stack is TypeScript and you want to avoid polyglot workflow engines
- Tasks run for minutes to hours, not milliseconds
- You need built-in observability and replay without configuring distributed tracing infrastructure
- At-least-once execution is acceptable and you can add idempotency keys to side effects
- You want simpler mental models than event sourcing and deterministic replay
- Cold start latency of 2-5 seconds is tolerable for your use case
Avoid Trigger.dev if:
- You need exactly-once guarantees for financial transactions, inventory updates, or other critical state changes
- Your workflows require complex branching, parallel execution, or saga compensation patterns
- You already run Temporal and have invested in event sourcing patterns across your organization
- You need sub-second task latency (container cold starts add unavoidable overhead)
- Your team works in multiple languages and needs polyglot workflow support
- You require advanced workflow primitives like signals, queries, or child workflow orchestration
Trigger.dev fits AI agent orchestration, ETL pipelines, and async API workflows where tasks have clear boundaries and idempotency is manageable. It struggles with high-throughput event processing or workflows that need strong consistency across distributed transactions without manual coordination.
The V2 pivot from Zapier-style integrations to workflow primitives shows clear product-market fit. The 172-point Show HN and open-source traction suggest developers want TypeScript-native orchestration that does not force them into Go or Java ecosystems. The checkpoint-based state model trades Temporal’s replay guarantees for operational simplicity and a lower learning curve.