Trigger.dev launched in February 2023 as a developer-first Zapier alternative (745 HN points). By October 2023, the team shipped V2 as a Temporal alternative for TypeScript (172 points). That pivot tells you everything about the infrastructure gap between simple event hooks and the durable execution primitives that multi-step agent workflows actually need.
The shift happened because early users kept asking for the same thing: retries, state persistence, and resumability without managing queue infrastructure or writing idempotent retry logic by hand. Trigger.dev V2 exposes how TypeScript developers want workflow orchestration that doesn’t require learning Temporal’s Go-first patterns or standing up a cluster.
The V1 to V2 Pivot Story
V1 was event-driven automation. You defined triggers (webhooks, schedules, events) and actions that ran in response. Think Zapier but with code instead of UI dropdowns. The problem surfaced when users tried to build agent workflows that needed to:
- Run for hours or days without hitting serverless timeouts
- Retry individual steps without re-executing the entire workflow
- Persist intermediate state so a crash doesn’t lose progress
- Resume from the last successful checkpoint
Event hooks don’t give you those primitives. You end up building your own state machine, writing checkpointing logic, and managing retry queues. V2 repositioned the entire platform around durable execution instead of event triggers.
Durable Execution Model
Trigger.dev V2 treats every task as a durable workflow. The runtime automatically handles:
State persistence: Task state snapshots to durable storage after each step. If the worker crashes, the task resumes from the last checkpoint instead of restarting.
Automatic retries: Failed steps retry with exponential backoff. You configure max attempts and backoff strategy. The runtime tracks which steps succeeded so retries skip completed work.
Long-running tasks: No serverless timeout constraints. Tasks can run for hours or days. The platform manages worker lifecycle and task handoff.
Concurrency control: You define how many instances of a task can run concurrently. The runtime enforces limits and queues excess tasks.
Here’s what a durable task looks like:
export const researchAgent = task({
id: "research-agent",
retry: {
maxAttempts: 3,
factor: 2,
minTimeout: 1000,
},
run: async ({ topic }: { topic: string }) => {
// Step 1: Search (persisted checkpoint)
const searchResults = await search(topic);
// Step 2: Browse (persisted checkpoint)
// If worker crashes here, task resumes at this step on restart
const webData = await browse(searchResults[0].url);
// Step 3: Analyze (persisted checkpoint)
const analysis = await analyze(webData);
return { summary: analysis };
},
});
If the worker crashes after browse() completes, the task resumes at analyze() instead of re-running search() and browse().
State Persistence Architecture
Trigger.dev uses a hybrid persistence model:
- Task metadata: Stored in Postgres (task ID, status, retry count, schedule)
- Execution state: Serialized to object storage (S3, GCS) after each step
- Worker coordination: Redis for task queues and worker heartbeats
When a task executes, the runtime:
- Pulls task metadata from Postgres
- Loads the latest state snapshot from object storage
- Executes the next step
- Writes the updated state snapshot back to object storage
- Updates task metadata in Postgres
This separation means state snapshots can grow large (MB-scale for agent memory) without bloating the database. The runtime only loads state when a worker picks up the task.
Deployment Boundaries
Trigger.dev offers three deployment models with distinct execution and security boundaries:
| Model | Code Location | State Storage | Worker Management |
|---|---|---|---|
| Managed Cloud | Trigger.dev infrastructure | Managed (Postgres + S3) | Fully managed, auto-scaling |
| Self-Hosted | Your Kubernetes cluster | Your Postgres + object storage | You manage workers, scaling |
| Hybrid | Your infrastructure | Managed state storage | Trigger.dev manages workers |
Managed cloud: Your task code runs in isolated containers on Trigger.dev infrastructure. Each task execution gets its own container with network isolation. Secrets are injected at runtime via environment variables and never persist in state snapshots. You deploy by pushing code to their registry. Workers auto-scale based on queue depth.
Self-hosted: You run workers in your own Kubernetes cluster and point them at your Postgres and object storage. Code executes entirely within your network perimeter. State snapshots never leave your configured storage backend. You control scaling, networking, and secrets management through your existing infrastructure.
Hybrid: Less common but useful for compliance scenarios. Your code runs in your infrastructure with your secrets and network policies, but Trigger.dev manages the control plane (scheduling, retries, observability). State snapshots remain in your storage backend.
The key security boundary: state snapshots contain serialized execution context but exclude secrets. Secrets are re-injected from your secret store on each worker restart.
Long-Running Task Handling
Traditional serverless platforms timeout after 15 minutes (AWS Lambda) or 60 minutes (Google Cloud Functions). Trigger.dev removes that constraint by treating tasks as resumable workflows instead of single-shot functions.
The runtime uses a heartbeat mechanism:
- Workers send heartbeats every 30 seconds while executing a task
- If a worker misses 3 heartbeats (90 seconds), the task is marked as stalled
- Another worker picks up the stalled task and resumes from the last checkpoint
This means a task can run indefinitely as long as each individual step completes within the heartbeat window. For agent workflows that call LLMs, scrape websites, or wait for human input, this is the difference between “works” and “doesn’t work.”
Observability and Monitoring
Trigger.dev includes built-in tracing for every task execution:
- Step-level timing and success/failure status
- Automatic error capture with stack traces
- Retry attempt history
- State snapshot size and checkpoint frequency
The dashboard shows:
- Active tasks (currently executing)
- Queued tasks (waiting for worker capacity)
- Failed tasks (exhausted retry attempts)
- Task duration percentiles (p50, p95, p99)
For agent workflows, this visibility matters because you need to see where tasks stall. If your research agent spends 80% of its time waiting for web scraping, you know where to optimize.
Comparison to Temporal
Temporal is the heavyweight in durable execution. It’s battle-tested at Uber, Stripe, and Netflix. But it has a steep learning curve:
Language support: Temporal’s core is Go. TypeScript support exists but feels like a second-class citizen. Trigger.dev is TypeScript-native.
Deployment complexity: Temporal requires running a cluster (server, worker, database, visibility store). Trigger.dev gives you a managed option or a simpler self-hosted setup.
Workflow syntax: Temporal uses a domain-specific workflow language with strict determinism rules. Trigger.dev lets you write normal async TypeScript with automatic checkpointing.
Ecosystem maturity: Temporal has more integrations, better documentation, and a larger community. Trigger.dev is younger and has fewer examples.
| Feature | Temporal | Trigger.dev V2 |
|---|---|---|
| TypeScript experience | Secondary language | Native, first-class |
| Deployment complexity | High (cluster required) | Low (managed or simple self-host) |
| Workflow syntax | DSL with determinism rules | Standard async TypeScript |
| State size limits | 2MB per workflow | Configurable (object storage) |
| Community size | Large, enterprise-focused | Smaller, developer-focused |
| Learning curve | Steep | Moderate |
Failure Modes and Edge Cases
State snapshot size: If your agent accumulates large context (conversation history, scraped data), state snapshots can grow to tens of MB. This slows down checkpoint writes and resume times. You need to prune state or offload large blobs to external storage.
Non-deterministic code: Trigger.dev doesn’t enforce determinism like Temporal. If your task generates random UUIDs or calls Date.now() directly, replays can produce different results. Consider injecting randomness and time as task inputs to ensure deterministic replays.
Worker version skew: If you deploy a new task version while old tasks are still running, workers might load incompatible code. Trigger.dev handles this by versioning task definitions, but you need to test migrations.
Database contention: High task throughput can saturate Postgres connections. The managed cloud handles this with connection pooling, but self-hosted deployments need to tune max_connections and worker concurrency.
Checkpoint frequency: Trigger.dev checkpoints after each await. If your task has a tight loop with many awaits, checkpoint overhead can dominate execution time. You need to batch work or reduce checkpoint frequency.
When to Use Trigger.dev
Trigger.dev fits when you need durable execution without Temporal’s complexity:
- Agent workflows: Multi-step LLM calls with retries and state persistence
- Data pipelines: ETL jobs that run for hours and need resumability
- Human-in-the-loop: Workflows that pause for approval or input
- Scheduled tasks: Cron jobs that exceed serverless timeouts
It’s less suitable for:
- High-throughput event processing: If you need to process 100k+ events/second, a simpler queue (SQS, Kafka) is faster
- Mission-critical workflows: Temporal’s maturity and ecosystem make it safer for financial transactions or healthcare workflows
- Polyglot teams: If you need Python, Go, and Java workers, Temporal’s multi-language support is stronger
Technical Verdict
Trigger.dev V2 solves the “I need Temporal but don’t want to learn Temporal” problem. It gives TypeScript developers durable execution primitives without mastering distributed systems theory. The managed cloud option removes deployment friction, and the open-source runtime gives you an escape hatch.
Use it when you’re building agent workflows or long-running tasks in TypeScript and don’t want to manage queue infrastructure. Avoid it if you need battle-tested reliability for mission-critical workflows or multi-language worker support. The V1 to V2 pivot was driven by explicit user feedback requesting durable execution primitives over event hooks, which shows the team responds to real developer pain points rather than chasing trends.