Trigger.dev positions itself as a developer-first alternative to webhook-based automation platforms. The architectural difference matters for agentic systems: webhook chains fail when tasks exceed timeout windows, retry logic lives in external services, and state management happens outside your codebase. Trigger.dev moves orchestration into TypeScript functions with durable execution guarantees.
The platform earned 745 points and 190 comments on Hacker News. It solves the problem of long-running agent workflows that need to survive process restarts, handle retries with backoff, and maintain execution state without building a custom job queue.
The Webhook Problem
Webhook-based automation (Zapier, Make, n8n) relies on HTTP webhooks to chain services. Each step must complete within a timeout window (typically 30 seconds). If a step fails, the entire chain stops unless you build retry logic into each service.
Webhook constraints:
- Synchronous execution within timeout limits
- No built-in state persistence between steps
- Retry logic handled by the receiving service
- Difficult to debug multi-step failures
- No visibility into execution history
This works for simple triggers (new email → create task). It breaks for agent workflows that call LLMs, wait for human approval, or process large datasets.
Code-Native Orchestration
Trigger.dev runs tasks as durable functions. You write TypeScript that looks synchronous but executes with automatic retries, state checkpoints, and resumption after failures.
import { task } from "@trigger.dev/sdk/v3";
export const researchAgent = task({
id: "research-agent",
retry: {
maxAttempts: 3,
factor: 2,
minTimeout: 1000,
},
run: async ({ topic }: { topic: string }) => {
// Step 1: Search (automatically checkpointed)
const searchResults = await search(topic);
// Step 2: LLM analysis (retries on failure)
const analysis = await generateText({
model: anthropic("claude-opus-4-20250514"),
messages: [{ role: "user", content: `Analyze: ${topic}` }],
});
// Step 3: Store results (survives process restart)
await db.insert({ topic, analysis, searchResults });
return { summary: analysis, sources: searchResults.length };
},
});
The platform checkpoints state after each await. If the process crashes during LLM analysis, the task resumes from the search results without re-executing the search.
Execution Model
Trigger.dev uses a coordinator-worker architecture:
- Coordinator: Receives task triggers, manages queue, tracks execution state
- Workers: Pull tasks from queue, execute code, report checkpoints
- State store: Persists execution history and intermediate results
When you call await on an async operation, the SDK serializes the current state and sends it to the coordinator. If the worker dies, another worker picks up the task and resumes from the last checkpoint.
Key differences from webhooks:
| Aspect | Webhook Chains | Trigger.dev Tasks |
|---|---|---|
| Execution | Synchronous, timeout-bound | Asynchronous, resumable |
| Retry logic | Per-service implementation | Built-in with backoff |
| State | Passed in request body | Checkpointed automatically |
| Debugging | Scattered across services | Unified execution trace |
| Idempotency | Manual deduplication | Automatic via task ID |
Integration with Agent Workflows
Agentic systems need to coordinate multiple tools, wait for external events, and handle partial failures. Trigger.dev provides primitives that map to agent orchestration patterns.
Tool calling with retries:
export const agentWithTools = task({
id: "agent-tools",
run: async ({ query }) => {
const messages: CoreMessage[] = [
{ role: "user", content: query }
];
for (let i = 0; i < 10; i++) {
const { text, toolCalls, steps } = await generateText({
model: anthropic("claude-opus-4-20250514"),
messages,
tools: { search, browse, analyze },
maxSteps: 5,
});
if (!toolCalls.length) {
return { summary: text, stepsUsed: steps.length };
}
// Each tool call is checkpointed
for (const call of toolCalls) {
const result = await executeTool(call);
messages.push({ role: "tool", content: result });
}
}
},
});
If executeTool fails on the third tool call, the task retries from that point without re-executing the LLM or the first two tools.
Human-in-the-loop approval:
export const approvalWorkflow = task({
id: "approval-workflow",
run: async ({ request }) => {
const draft = await generateDraft(request);
// Wait for approval (task pauses, no timeout)
const approved = await waitForApproval(draft.id);
if (approved) {
await publish(draft);
}
return { status: approved ? "published" : "rejected" };
},
});
The task pauses at waitForApproval without consuming resources. When approval arrives (via webhook or API call), execution resumes.
State Persistence and Idempotency
Trigger.dev assigns each task invocation a unique run ID. If you trigger the same task twice with identical parameters, the platform can deduplicate based on an idempotency key:
await tasks.trigger("research-agent",
{ topic: "quantum computing" },
{ idempotencyKey: "research-quantum-2023-02-01" }
);
State persists in the coordinator’s database (PostgreSQL by default). Each checkpoint includes:
- Execution position (which await statement)
- Variable state at that point
- Retry count and backoff timer
- Parent task ID for nested workflows
The observability dashboard shows exactly where a task failed and what data it held at that moment.
Deployment Shape
Trigger.dev runs as a hosted service or self-hosted with Docker Compose. The self-hosted setup requires:
- Coordinator service (Node.js)
- Worker pool (scales horizontally)
- PostgreSQL for state
- Redis for queue management
- Optional: S3-compatible storage for large payloads
Workers pull tasks from Redis queues. Each worker can handle multiple tasks concurrently based on configured limits. The coordinator manages queue priorities and schedules retries.
Concurrency control:
export const rateLimitedTask = task({
id: "rate-limited",
queue: {
name: "api-calls",
concurrencyLimit: 5, // Max 5 concurrent executions
},
run: async ({ url }) => {
return await fetch(url);
},
});
This prevents overwhelming external APIs when processing batches.
Observability
The dashboard shows:
- Real-time task execution with step-by-step progress
- Execution traces with timing for each checkpoint
- Failed tasks with exact error location and state
- Retry history and backoff timers
- Queue depth and worker utilization
Each task run gets a trace ID that propagates through nested tasks and external API calls (if instrumented with OpenTelemetry).
Failure Modes
Worker crashes: Tasks resume from last checkpoint on another worker. No data loss if checkpoints succeed.
Coordinator crashes: Workers continue executing. New coordinator reads state from PostgreSQL and resumes queue management.
Database unavailable: New tasks cannot start. Running tasks fail at next checkpoint. Requires manual intervention.
Poison messages: Tasks that always fail consume retry budget. Configure dead-letter queues to move them after max attempts.
State explosion: Large intermediate results (multi-MB arrays) slow checkpointing. Use external storage and pass references.
Security Boundaries
Tasks run in the same process as your application code. There is no sandbox. A malicious task can access environment variables, file system, and network.
Mitigation strategies:
- Run workers in isolated containers
- Use separate worker pools for untrusted tasks
- Validate task inputs before execution
- Audit task definitions in code review
API keys for triggering tasks use HMAC signatures. The coordinator validates signatures before queuing tasks.
When to Use Trigger.dev
Good fit:
- Agent workflows with multiple LLM calls and tool invocations
- Background jobs that exceed serverless timeout limits
- Tasks requiring human approval or external event waits
- Workflows needing detailed execution traces
- Teams comfortable writing TypeScript orchestration logic
Poor fit:
- Simple webhook forwarding (use Zapier)
- Real-time synchronous APIs (use direct HTTP)
- Non-technical users building workflows (need visual builder)
- Stateless transformations (use serverless functions)
- Sub-second latency requirements (checkpointing adds overhead)
Technical Verdict
Trigger.dev solves the durable execution problem for agent workflows. If your agents need to survive process restarts, retry failed tool calls, or wait for external events, the code-native approach beats webhook chains.
The trade-off is operational complexity. You run a coordinator, manage worker pools, and monitor queue depth. For teams already running background job systems (Sidekiq, Celery), this is familiar territory. For teams using only serverless functions, it is a new operational burden.
The open-source model means you can self-host and audit the execution engine. The hosted service removes operational overhead but adds vendor dependency.
Use it when your agent workflows outgrow webhook timeouts and you need execution guarantees. Avoid it if your automation fits in 30-second webhook chains or you lack infrastructure to run stateful services.
Source Links
- Trigger.dev Platform
- GitHub Repository
- Hacker News Discussion (745 points, 190 comments)