Trigger.dev V2: What a Temporal Alternative for TypeScript Reveals About Durable Execution Plumbing

Trigger.dev launched in February 2023 as a developer-first Zapier alternative (745 HN points). By October 2023, the team shipped V2 as a Temporal alternative for TypeScript (172 points). That pivot tells you everything about the infrastructure gap between simple event hooks and the durable execution primitives that multi-step agent workflows actually need.

The shift happened because early users kept asking for the same thing: retries, state persistence, and resumability without managing queue infrastructure or writing idempotent retry logic by hand. Trigger.dev V2 exposes how TypeScript developers want workflow orchestration that doesn’t require learning Temporal’s Go-first patterns or standing up a cluster.

The V1 to V2 Pivot Story

V1 was event-driven automation. You defined triggers (webhooks, schedules, events) and actions that ran in response. Think Zapier but with code instead of UI dropdowns. The problem surfaced when users tried to build agent workflows that needed to:

Run for hours or days without hitting serverless timeouts
Retry individual steps without re-executing the entire workflow
Persist intermediate state so a crash doesn’t lose progress
Resume from the last successful checkpoint

Event hooks don’t give you those primitives. You end up building your own state machine, writing checkpointing logic, and managing retry queues. V2 repositioned the entire platform around durable execution instead of event triggers.

Durable Execution Model

Trigger.dev V2 treats every task as a durable workflow. The runtime automatically handles:

State persistence: Task state snapshots to durable storage after each step. If the worker crashes, the task resumes from the last checkpoint instead of restarting.

Automatic retries: Failed steps retry with exponential backoff. You configure max attempts and backoff strategy. The runtime tracks which steps succeeded so retries skip completed work.

Long-running tasks: No serverless timeout constraints. Tasks can run for hours or days. The platform manages worker lifecycle and task handoff.

Concurrency control: You define how many instances of a task can run concurrently. The runtime enforces limits and queues excess tasks.

Here’s what a durable task looks like:

export const researchAgent = task({
  id: "research-agent",
  retry: {
    maxAttempts: 3,
    factor: 2,
    minTimeout: 1000,
  },
  run: async ({ topic }: { topic: string }) => {
    // Step 1: Search (persisted checkpoint)
    const searchResults = await search(topic);
    
    // Step 2: Browse (persisted checkpoint)
    // If worker crashes here, task resumes at this step on restart
    const webData = await browse(searchResults[0].url);
    
    // Step 3: Analyze (persisted checkpoint)
    const analysis = await analyze(webData);
    
    return { summary: analysis };
  },
});

If the worker crashes after browse() completes, the task resumes at analyze() instead of re-running search() and browse().

State Persistence Architecture

Trigger.dev uses a hybrid persistence model:

Task metadata: Stored in Postgres (task ID, status, retry count, schedule)
Execution state: Serialized to object storage (S3, GCS) after each step
Worker coordination: Redis for task queues and worker heartbeats

When a task executes, the runtime:

Pulls task metadata from Postgres
Loads the latest state snapshot from object storage
Executes the next step
Writes the updated state snapshot back to object storage
Updates task metadata in Postgres

This separation means state snapshots can grow large (MB-scale for agent memory) without bloating the database. The runtime only loads state when a worker picks up the task.

Deployment Boundaries

Trigger.dev offers three deployment models with distinct execution and security boundaries:

Model	Code Location	State Storage	Worker Management
Managed Cloud	Trigger.dev infrastructure	Managed (Postgres + S3)	Fully managed, auto-scaling
Self-Hosted	Your Kubernetes cluster	Your Postgres + object storage	You manage workers, scaling
Hybrid	Your infrastructure	Managed state storage	Trigger.dev manages workers

Managed cloud: Your task code runs in isolated containers on Trigger.dev infrastructure. Each task execution gets its own container with network isolation. Secrets are injected at runtime via environment variables and never persist in state snapshots. You deploy by pushing code to their registry. Workers auto-scale based on queue depth.

Self-hosted: You run workers in your own Kubernetes cluster and point them at your Postgres and object storage. Code executes entirely within your network perimeter. State snapshots never leave your configured storage backend. You control scaling, networking, and secrets management through your existing infrastructure.

Hybrid: Less common but useful for compliance scenarios. Your code runs in your infrastructure with your secrets and network policies, but Trigger.dev manages the control plane (scheduling, retries, observability). State snapshots remain in your storage backend.

The key security boundary: state snapshots contain serialized execution context but exclude secrets. Secrets are re-injected from your secret store on each worker restart.

Long-Running Task Handling

Traditional serverless platforms timeout after 15 minutes (AWS Lambda) or 60 minutes (Google Cloud Functions). Trigger.dev removes that constraint by treating tasks as resumable workflows instead of single-shot functions.

The runtime uses a heartbeat mechanism:

Workers send heartbeats every 30 seconds while executing a task
If a worker misses 3 heartbeats (90 seconds), the task is marked as stalled
Another worker picks up the stalled task and resumes from the last checkpoint

This means a task can run indefinitely as long as each individual step completes within the heartbeat window. For agent workflows that call LLMs, scrape websites, or wait for human input, this is the difference between “works” and “doesn’t work.”

Observability and Monitoring

Trigger.dev includes built-in tracing for every task execution:

Step-level timing and success/failure status
Automatic error capture with stack traces
Retry attempt history
State snapshot size and checkpoint frequency

The dashboard shows:

Active tasks (currently executing)
Queued tasks (waiting for worker capacity)
Failed tasks (exhausted retry attempts)
Task duration percentiles (p50, p95, p99)

For agent workflows, this visibility matters because you need to see where tasks stall. If your research agent spends 80% of its time waiting for web scraping, you know where to optimize.

Comparison to Temporal

Temporal is the heavyweight in durable execution. It’s battle-tested at Uber, Stripe, and Netflix. But it has a steep learning curve:

Language support: Temporal’s core is Go. TypeScript support exists but feels like a second-class citizen. Trigger.dev is TypeScript-native.

Deployment complexity: Temporal requires running a cluster (server, worker, database, visibility store). Trigger.dev gives you a managed option or a simpler self-hosted setup.

Workflow syntax: Temporal uses a domain-specific workflow language with strict determinism rules. Trigger.dev lets you write normal async TypeScript with automatic checkpointing.

Ecosystem maturity: Temporal has more integrations, better documentation, and a larger community. Trigger.dev is younger and has fewer examples.

Feature	Temporal	Trigger.dev V2
TypeScript experience	Secondary language	Native, first-class
Deployment complexity	High (cluster required)	Low (managed or simple self-host)
Workflow syntax	DSL with determinism rules	Standard async TypeScript
State size limits	2MB per workflow	Configurable (object storage)
Community size	Large, enterprise-focused	Smaller, developer-focused
Learning curve	Steep	Moderate

Failure Modes and Edge Cases

State snapshot size: If your agent accumulates large context (conversation history, scraped data), state snapshots can grow to tens of MB. This slows down checkpoint writes and resume times. You need to prune state or offload large blobs to external storage.

Non-deterministic code: Trigger.dev doesn’t enforce determinism like Temporal. If your task generates random UUIDs or calls Date.now() directly, replays can produce different results. Consider injecting randomness and time as task inputs to ensure deterministic replays.

Worker version skew: If you deploy a new task version while old tasks are still running, workers might load incompatible code. Trigger.dev handles this by versioning task definitions, but you need to test migrations.

Database contention: High task throughput can saturate Postgres connections. The managed cloud handles this with connection pooling, but self-hosted deployments need to tune max_connections and worker concurrency.

Checkpoint frequency: Trigger.dev checkpoints after each await. If your task has a tight loop with many awaits, checkpoint overhead can dominate execution time. You need to batch work or reduce checkpoint frequency.

When to Use Trigger.dev

Trigger.dev fits when you need durable execution without Temporal’s complexity:

Agent workflows: Multi-step LLM calls with retries and state persistence
Data pipelines: ETL jobs that run for hours and need resumability
Human-in-the-loop: Workflows that pause for approval or input
Scheduled tasks: Cron jobs that exceed serverless timeouts

It’s less suitable for:

High-throughput event processing: If you need to process 100k+ events/second, a simpler queue (SQS, Kafka) is faster
Mission-critical workflows: Temporal’s maturity and ecosystem make it safer for financial transactions or healthcare workflows
Polyglot teams: If you need Python, Go, and Java workers, Temporal’s multi-language support is stronger

Technical Verdict

Trigger.dev V2 solves the “I need Temporal but don’t want to learn Temporal” problem. It gives TypeScript developers durable execution primitives without mastering distributed systems theory. The managed cloud option removes deployment friction, and the open-source runtime gives you an escape hatch.

Use it when you’re building agent workflows or long-running tasks in TypeScript and don’t want to manage queue infrastructure. Avoid it if you need battle-tested reliability for mission-critical workflows or multi-language worker support. The V1 to V2 pivot was driven by explicit user feedback requesting durable execution primitives over event hooks, which shows the team responds to real developer pain points rather than chasing trends.