Trigger.dev positions itself between Zapier’s no-code GUI and Temporal’s heavyweight event sourcing model. It gives developers TypeScript-first workflow definitions with managed state, retry logic, and task queuing. Two Show HN posts (745 and 172 points) tracked the platform’s evolution from a Zapier alternative to a Temporal alternative, signaling a shift from event-driven integrations to durable workflow orchestration.
The core question: how does Trigger.dev persist workflow state across long-running tasks, route events between services, and handle retries without forcing developers into a database schema or event sourcing architecture?
Architecture Overview
Trigger.dev runs three components:
- Orchestrator: Centralized control plane that schedules tasks, manages queues, and stores execution state.
- Worker runtime: Executes task code in isolated environments (Docker containers or serverless functions).
- Client SDK: TypeScript library embedded in your application that defines tasks and triggers.
Workflows are defined as task() functions in your codebase. The SDK registers these with the orchestrator at build time. When an event fires (webhook, cron, or manual trigger), the orchestrator queues the task, assigns it to a worker, and tracks execution state in Postgres.
State Persistence Model
Trigger.dev does not use event sourcing. Instead:
- Each task execution gets a unique run ID.
- The orchestrator writes execution state (status, output, error) to Postgres after each step.
- Long-running tasks checkpoint automatically. If a worker dies, the orchestrator replays from the last checkpoint.
- Developers access state via
ctx.runmetadata, not by querying a database.
This differs from Temporal, which rebuilds state by replaying an event log. Trigger.dev trades replay determinism for simpler mental models: state is a row in a table, not a projection of events.
Event Routing and Task Queuing
Tasks are triggered by:
- Webhooks: Orchestrator exposes HTTP endpoints per task. Incoming requests queue a run.
- Scheduled triggers: Cron expressions stored in the orchestrator. A scheduler service polls and enqueues tasks.
- Manual invocation: SDK method
tasks.trigger()sends a message to the orchestrator API.
The orchestrator maintains per-task queues with configurable concurrency limits. When a task is queued:
- Orchestrator checks concurrency settings (max parallel runs, rate limits).
- If capacity exists, it assigns the run to an available worker.
- Worker pulls task code from the registry, executes it, and streams logs back.
If no workers are available, the run waits in the queue. The orchestrator does not pre-allocate workers; it scales them on demand (Kubernetes pods or serverless functions).
Retry and Failure Recovery
Retry logic is declarative:
export const processOrder = task({
id: "process-order",
retry: {
maxAttempts: 3,
factor: 2,
minTimeout: 1000,
maxTimeout: 10000,
},
run: async (payload) => {
const result = await externalAPI.charge(payload.amount);
return result;
},
});
When a task fails:
- The orchestrator records the error and schedules a retry based on exponential backoff.
- If all retries exhaust, the run enters a
FAILEDstate. - Developers can manually retry from the dashboard or API.
Partial failures (e.g., step 2 of 5 fails) do not replay earlier steps. The orchestrator resumes from the failed step using the last checkpoint. This avoids idempotency issues but requires careful design: side effects in earlier steps must be safe to skip on retry.
Observability and Deployment Boundaries
Observability
The orchestrator exposes:
- Run logs: Streamed from workers to the orchestrator, stored in Postgres, viewable in the dashboard.
- Trace spans: Each task step emits OpenTelemetry spans. The orchestrator aggregates them into a trace tree.
- Metrics: Task duration, queue depth, retry counts, and worker utilization.
Logs and traces are queryable via the dashboard or API. No external APM is required, but you can export spans to Datadog or Honeycomb.
Deployment Options
Trigger.dev offers two deployment modes:
| Mode | Orchestrator | Workers | State Storage | Use Case |
|---|---|---|---|---|
| Cloud | Managed by Trigger.dev | Serverless (Fly.io or AWS Lambda) | Managed Postgres | Fast setup, no ops burden |
| Self-hosted | Docker Compose or Kubernetes | Your infrastructure (Docker, K8s, or serverless) | Your Postgres instance | Data residency, custom networking |
Self-hosting requires:
- Running the orchestrator as a stateless service (scales horizontally).
- Configuring worker environments (Docker images or serverless runtimes).
- Managing Postgres for state and logs.
The orchestrator and workers communicate over HTTP/WebSocket. Workers poll the orchestrator for tasks; the orchestrator does not push tasks to workers. This simplifies firewall rules but adds latency (polling interval is configurable, default 1 second).
Comparison: Trigger.dev vs. Temporal vs. Zapier
| Dimension | Trigger.dev | Temporal | Zapier |
|---|---|---|---|
| State model | Postgres rows | Event sourcing | Opaque |
| Retry logic | Declarative, per-task | Workflow-level, deterministic replay | GUI-configured |
| Developer control | Full code access | Full code access | No code access |
| Deployment | Cloud or self-hosted | Self-hosted (complex) | SaaS only |
| Observability | Built-in dashboard | Requires external APM | Limited logs |
Trigger.dev sits between Temporal’s deterministic guarantees and Zapier’s simplicity. You get code-first workflows without event sourcing complexity, but you lose Temporal’s replay-based recovery and Zapier’s zero-ops model.
Failure Modes and Mitigations
Orchestrator Downtime
If the orchestrator crashes:
- Queued tasks remain in Postgres; no data loss.
- Running tasks continue in workers but cannot report status.
- When the orchestrator restarts, it reconciles worker state and resumes scheduling.
Mitigation: Run multiple orchestrator replicas behind a load balancer. State is in Postgres, so replicas are stateless.
Worker Failures
If a worker crashes mid-task:
- The orchestrator detects the missing heartbeat (default 30 seconds).
- It marks the run as
FAILEDand schedules a retry. - The new worker resumes from the last checkpoint.
Mitigation: Set aggressive heartbeat intervals for time-sensitive tasks. Use idempotent operations in task steps.
Postgres Bottleneck
High task throughput can saturate Postgres:
- Writes: Every task step writes state and logs.
- Reads: Dashboard queries and worker polling hit the database.
Mitigation: Use read replicas for dashboard queries. Batch log writes. Archive old runs to cold storage.
Webhook Delivery Failures
If your application cannot reach the orchestrator:
- Webhooks are lost unless the sender retries.
- No built-in webhook queue or dead-letter handling.
Mitigation: Use a message broker (SQS, Pub/Sub) between your app and Trigger.dev. The broker handles retries and durability.
Code Example: Multi-Step Task with External API
import { task } from "@trigger.dev/sdk/v3";
export const syncCustomerData = task({
id: "sync-customer-data",
retry: { maxAttempts: 5 },
run: async (payload: { customerId: string }) => {
// Step 1: Fetch from external API
const customer = await fetch(
`https://api.example.com/customers/${payload.customerId}`
).then((r) => r.json());
// Step 2: Transform data (checkpoint after this)
const normalized = {
id: customer.id,
email: customer.email_address,
createdAt: new Date(customer.created_at),
};
// Step 3: Write to database
await db.customers.upsert(normalized);
// Step 4: Trigger downstream task
await tasks.trigger("send-welcome-email", {
email: normalized.email,
});
return { success: true, customerId: normalized.id };
},
});
If step 3 fails, the orchestrator retries from step 3. Steps 1 and 2 do not re-execute. This requires step 1 (API fetch) to be safe to skip: either the API is idempotent or you cache the result.
Technical Verdict
Use Trigger.dev when:
- You need code-first workflows with managed state and retries.
- You want observability without external APM setup.
- You can tolerate non-deterministic replay (state is checkpointed, not event-sourced).
- You prefer TypeScript and want type-safe task definitions.
Avoid Trigger.dev when:
- You need strict deterministic replay (use Temporal).
- Your workflows span months or years (Postgres state storage gets expensive).
- You require sub-second task latency (worker polling adds 1+ second delay).
- You need a GUI for non-developers (use Zapier or n8n).
Trigger.dev works best for developer-authored workflows that run minutes to hours, need retry logic, and benefit from centralized observability. It does not replace Temporal for mission-critical financial workflows or Zapier for marketing automation by non-engineers.