Trigger.dev launched in February 2023 as a “developer-first Zapier alternative” and earned 745 HN points. Eight months later, the team pivoted to V2 and repositioned as a “Temporal alternative for TypeScript.” That shift exposes a real infrastructure gap: event-driven webhook automation cannot handle long-running agent tasks that need retry semantics, state persistence, and durable execution.
The V2 announcement drew 172 points, but the architectural change matters more than the score. Developers building AI agents need workflows that survive process crashes, retry failed API calls with exponential backoff, and maintain state across hours or days. Temporal solves this with event sourcing and deterministic replay, but it requires Go or Java. TypeScript shops face a choice: adopt a new runtime or build retry logic by hand.
Trigger.dev’s pivot reveals what durable execution plumbing looks like when optimized for TypeScript-native workflows instead of polyglot microservices.
Why Zapier-Style Automation Breaks for Agents
Zapier and similar tools chain webhooks with simple retry policies. A failed step either retries immediately or dies. State lives in external databases or gets passed as JSON payloads. This works for short-lived tasks like “new Stripe payment → send Slack message.”
Agent workflows break this model:
- Long-running tasks: An agent researching a topic might call 10 APIs over 30 minutes. If the process crashes at step 7, you need to resume without re-executing steps 1-6.
- Non-deterministic retries: LLM calls return different results on retry. You cannot replay a workflow by re-running the same code.
- State explosion: Agent memory, tool call history, and intermediate results exceed what you can pass in webhook payloads.
Temporal handles this with event sourcing. Every workflow decision gets logged. On failure, the system replays the log to reconstruct state, then resumes. Deterministic execution means the same inputs produce the same outputs, so replay works.
TypeScript developers building agents need this durability but cannot always adopt Temporal’s Go/Java runtime.
Trigger.dev’s Durable Execution Model
Trigger.dev V2 provides durable execution without requiring a new runtime. The architecture separates orchestration (task scheduling, retry logic, state persistence) from execution (your TypeScript code).
Core Components
| Component | Responsibility | Failure Mode |
|---|---|---|
| Task definition | TypeScript function with retry config | Syntax errors caught at build time |
| Execution engine | Runs tasks in isolated containers | Container crash triggers retry from last checkpoint |
| State store | Postgres-backed task history | Database failure pauses new tasks, existing tasks resume after recovery |
| Retry coordinator | Exponential backoff, idempotency keys | Infinite retries require manual intervention |
| Observability layer | Real-time task logs and traces | Missing spans indicate dropped messages |
Tasks run in ephemeral containers. If a container dies, the engine reads the last checkpoint from Postgres and resumes. Unlike Temporal’s deterministic replay, Trigger.dev uses explicit checkpoints. You mark progress points in your code:
export const researchAgent = task({
id: "research-agent",
run: async ({ topic }: { topic: string }) => {
const messages: CoreMessage[] = [
{ role: "user", content: `Research: ${topic}` }
];
for (let i = 0; i < 10; i++) {
// Checkpoint before expensive LLM call
await checkpoint(`iteration-${i}`);
const { text, toolCalls, steps } = await generateText({
model: anthropic("claude-opus-4-20250514"),
system: "You are a research assistant with web access.",
messages,
tools: { search, browse, analyze },
maxSteps: 5,
});
if (!toolCalls.length) {
return { summary: text, stepsUsed: steps.length };
}
for (const call of toolCalls) {
const result = await executeTool(call);
messages.push({ role: "tool", content: result });
}
}
},
});
The checkpoint() call persists messages state to Postgres. If the container crashes during generateText(), the next execution resumes from the last checkpoint with the same message history.
State Persistence Strategy
Trigger.dev uses Postgres as the durable state store. Each task execution gets a row with:
- Task ID and execution attempt number
- Checkpoint data (serialized JSON)
- Retry count and next retry timestamp
- Idempotency key for deduplication
When a task fails, the retry coordinator reads the checkpoint, increments the retry count, calculates the next attempt time using exponential backoff, and schedules a new container. The new container deserializes the checkpoint and resumes.
This differs from Temporal’s event log. Temporal replays every decision to reconstruct state. Trigger.dev snapshots state at explicit points. The trade-off:
- Temporal: Full audit trail, deterministic replay, larger storage footprint.
- Trigger.dev: Smaller state snapshots, faster resume, requires manual checkpoint placement.
For agent workflows with non-deterministic LLM calls, explicit checkpoints avoid the replay problem. You cannot replay an LLM call and expect the same output, so snapshotting the result makes more sense.
Retry Semantics and Idempotency
Trigger.dev provides configurable retry policies per task:
export const apiTask = task({
id: "api-call",
retry: {
maxAttempts: 5,
factor: 2,
minTimeout: 1000,
maxTimeout: 60000,
},
run: async (payload) => {
// Task code
},
});
The engine generates an idempotency key from the task ID, execution ID, and attempt number. If a retry executes twice (network partition, duplicate message), the second attempt sees the existing result and returns immediately.
Idempotency keys live in Postgres with a TTL. After the TTL expires, the key gets purged. This prevents unbounded table growth but means very old retries might execute twice.
Handling Non-Idempotent Operations
Some operations cannot be made idempotent (sending an email, charging a credit card). Trigger.dev provides a once() wrapper:
await once("send-email", async () => {
await sendEmail(recipient, body);
});
The once() call checks Postgres for a completion marker. If found, it skips execution. If not, it runs the code and writes the marker. This ensures the operation executes exactly once, even across retries.
Observability Without Separate Infrastructure
Trigger.dev embeds observability into the execution engine. Every task execution produces:
- Real-time logs streamed to the dashboard
- Trace spans for each checkpoint and external call
- Retry history with failure reasons
- State snapshots at each checkpoint
The dashboard shows a timeline view of task execution. You see when checkpoints occurred, which API calls failed, and how long each step took. This replaces the need for separate tracing infrastructure like Jaeger or Honeycomb for basic workflows.
For complex multi-task workflows, you can export traces to OpenTelemetry-compatible backends. The engine injects trace context into each task, so distributed traces work across task boundaries.
Deployment Shape
Trigger.dev runs as a managed service or self-hosted. The managed version handles container orchestration, Postgres scaling, and retry coordination. Self-hosted deployments require:
- Postgres instance for state storage
- Container runtime (Docker, Kubernetes)
- Message queue for task scheduling (Redis, RabbitMQ)
- Object storage for large payloads (S3, GCS)
The execution engine polls the message queue for scheduled tasks, spawns containers, injects environment variables and secrets, and streams logs back to the dashboard. Containers run with resource limits (CPU, memory, timeout) to prevent runaway tasks.
For agent workflows, you typically set high timeouts (30+ minutes) and generous memory limits (2GB+). LLM calls and tool execution can consume significant resources.
Failure Modes and Mitigation
| Failure | Impact | Mitigation |
|---|---|---|
| Container crash mid-execution | Task resumes from last checkpoint | Place checkpoints before expensive operations |
| Postgres unavailable | New tasks queue, existing tasks pause | Use managed Postgres with automatic failover |
| Infinite retry loop | Task never completes, burns resources | Set maxAttempts and monitor retry count metrics |
| Checkpoint data too large | Postgres write fails, task aborts | Offload large data to object storage, store references in checkpoint |
| Idempotency key collision | Duplicate execution or skipped work | Use unique execution IDs, avoid manual key generation |
The most common failure mode is missing checkpoints. If you place checkpoints too far apart, a crash forces re-execution of expensive work. If you place them too frequently, you pay serialization overhead and Postgres write latency.
For agent workflows, checkpoint after each tool call. This balances durability with performance.
When TypeScript-Native Matters
Temporal’s Go and Java SDKs require running a separate worker process. TypeScript developers must:
- Write workflow logic in TypeScript
- Compile to JavaScript
- Run a Node.js worker that communicates with Temporal server via gRPC
- Handle serialization between TypeScript types and Temporal’s protobuf format
Trigger.dev eliminates steps 2-4. You write TypeScript, deploy to the platform, and the engine handles execution. Type safety extends from your code to the execution environment.
This matters for teams without polyglot infrastructure. If your stack is TypeScript end-to-end (Next.js frontend, Node.js backend, TypeScript agents), adding Go or Java for orchestration introduces operational complexity.
The trade-off: Temporal’s event sourcing provides stronger durability guarantees. Trigger.dev’s checkpoint model is simpler but requires careful checkpoint placement.
Technical Verdict
Use Trigger.dev when:
- Your stack is TypeScript-native and you want to avoid polyglot orchestration
- You need durable execution for agent workflows with retry and state persistence
- You prefer explicit checkpoints over deterministic replay
- You want embedded observability without separate tracing infrastructure
- You are building on a managed platform and do not need full control over the execution runtime
Avoid Trigger.dev when:
- You require deterministic replay for audit or compliance
- Your workflows span multiple languages (Go, Python, Java)
- You need sub-second task latency (checkpoint overhead adds 10-50ms per checkpoint)
- You already run Temporal and have invested in its operational model
- You need fine-grained control over worker pools and task routing
Trigger.dev’s pivot from Zapier-style automation to Temporal-style durable execution exposes the infrastructure gap for TypeScript developers building agents. The checkpoint-based state model trades replay guarantees for simplicity, which fits non-deterministic LLM workflows better than event sourcing.