mech.app
Automation

Trigger.dev V2: What a Temporal Alternative for TypeScript Reveals About Durable Execution Plumbing

How Trigger.dev pivoted from event triggers to durable execution, exposing the infrastructure gap between webhooks and production-grade agent orchestrat...

Source: trigger.dev
Trigger.dev V2: What a Temporal Alternative for TypeScript Reveals About Durable Execution Plumbing

Trigger.dev launched in February 2023 as a “developer-first Zapier alternative” and pulled 745 points on Hacker News. By October, the team shipped V2 as a “Temporal alternative for TypeScript” after user feedback revealed developers wanted durable execution primitives for long-running workflows instead of event-driven webhooks. The V2 launch drew 172 points, and the pivot exposes a real infrastructure gap between simple automation and production-grade agent orchestration.

What Changed Between V1 and V2

V1 focused on event triggers and integrations. You connected services, listened for webhooks, and ran short-lived functions. V2 rebuilt the core around durable execution: state persistence across retries, long-running tasks that survive process crashes, and workflow primitives that handle timeouts and compensation logic.

Teams building multi-step agent workflows on serverless functions hit the 15-minute Lambda timeout, lose state between retries, and end up writing exponential backoff wrappers around every async function call. Trigger.dev V2 moves that plumbing into the platform.

Key V2 primitives:

  • Task definitions with automatic retries and exponential backoff
  • State persistence across failures without manual checkpointing
  • Queues and concurrency controls to prevent rate-limit explosions
  • Observability hooks that trace execution across tool calls and LLM invocations
  • Scheduled tasks (cron) that don’t timeout after 15 minutes

Durable Execution vs. Event Sourcing

Temporal uses event sourcing: every workflow decision gets logged as an event, and the system replays events to reconstruct state after a crash. Trigger.dev takes a simpler approach. It persists task state directly and resumes from the last checkpoint. Direct state persistence reduces operational complexity at the cost of replay determinism.

AspectTemporalTrigger.dev V2
State modelEvent sourcing with replayDirect state persistence
Language supportPolyglot (Go, TypeScript, Python, Java)TypeScript only
DeploymentSelf-hosted cluster or Temporal CloudManaged platform or self-hosted
Determinism guaranteesStrict (replay must be deterministic)Relaxed (idempotent retries assumed)
Workflow constraintsStrict (no side effects, deterministic code)Relaxed (async TypeScript patterns)
Multi-agent orchestrationRequires child workflows and signalsNative task composition with queues

The TypeScript-only constraint is deliberate. Trigger.dev optimizes for teams already in the Node.js ecosystem who want type-safe orchestration without learning Temporal’s workflow constraints. You lose polyglot flexibility but gain a simpler mental model.

Agent Orchestration Patterns

The V2 API shows how durable execution maps to agent workflows. Here’s a research agent that calls tools, retries on failure, and persists state across LLM invocations:

export const researchAgent = task({
  id: "research-agent",
  retry: {
    maxAttempts: 3,
    factor: 2,
    minTimeout: 1000,
    maxTimeout: 10000
  },
  run: async ({ topic }: { topic: string }) => {
    const messages: CoreMessage[] = [
      { role: "user", content: `Research: ${topic}` }
    ];

    for (let i = 0; i < 10; i++) {
      const { text, toolCalls, steps } = await generateText({
        model: anthropic("claude-3-5-sonnet-20241022"),
        system: "You are a research assistant with web access.",
        messages,
        tools: { search, browse, analyze },
        maxSteps: 5
      });

      if (!toolCalls.length) {
        return { summary: text, stepsUsed: steps.length };
      }

      // Tool execution happens inside the durable task
      for (const call of toolCalls) {
        const result = await executeTool(call);
        messages.push({ role: "tool", content: result });
      }
    }
  }
});

What the platform handles:

  • If generateText throws a rate limit error, the task retries with exponential backoff
  • If the process crashes mid-loop, the task resumes from the last persisted message state
  • If a tool call times out, the retry logic applies to the entire task, not just the tool
  • Observability traces show each LLM call, tool invocation, and retry attempt

This is the plumbing you’d otherwise build with Redis for state, Bull for queues, and exponential backoff wrappers around every async function call. Trigger.dev bundles it into a single task primitive.

State Persistence and Retry Boundaries

The critical design choice is where retry boundaries sit. In Temporal, every workflow decision is a retry boundary. In Trigger.dev, the entire task is the boundary. If any step fails, the whole task retries from the beginning unless you explicitly checkpoint.

Checkpointing example:

export const multiStepAgent = task({
  id: "multi-step-agent",
  run: async ({ input }) => {
    // Step 1: Fetch data (expensive, should not retry)
    const data = await checkpoint("fetch", async () => {
      return await fetchLargeDataset(input);
    });

    // Step 2: Process with LLM (may fail, should retry)
    const analysis = await generateText({
      model: anthropic("claude-3-5-sonnet-20241022"),
      messages: [{ role: "user", content: data }]
    });

    // Step 3: Store results (idempotent)
    await storeResults(analysis);
  }
});

The checkpoint wrapper persists the result of fetchLargeDataset so retries skip that step. Without it, every retry re-fetches the data. This is where Temporal’s event sourcing shines: every step is automatically a checkpoint. Trigger.dev makes you opt in.

Concurrency and Queue Management

Agent workflows hit rate limits fast. If you trigger 100 research tasks and each calls the Anthropic API 10 times, you’ll exhaust your quota in seconds. Trigger.dev’s queue system lets you control concurrency at the task level:

export const rateLimitedAgent = task({
  id: "rate-limited-agent",
  queue: {
    name: "anthropic-calls",
    concurrencyLimit: 5  // Max 5 concurrent tasks
  },
  run: async ({ query }) => {
    // This task will queue if 5 others are already running
    return await callAnthropicAPI(query);
  }
});

You can also set per-user or per-tenant queues to isolate workloads. This is harder in Temporal, where you’d configure worker pools and task queues at the infrastructure level. Trigger.dev moves the control into the task definition.

Observability and Debugging

The platform includes built-in tracing for every task execution. You see:

  • Start time, end time, and duration for each task
  • Retry attempts with failure reasons
  • LLM token usage and latency per call
  • Tool invocations with input/output payloads
  • State snapshots at each checkpoint

This is critical for debugging agent workflows. When a research agent fails after 8 tool calls, you need to see which call timed out and what the LLM context looked like. Trigger.dev’s dashboard surfaces this without custom instrumentation.

Tracing integrations:

  • OpenTelemetry export for Datadog, Honeycomb, or Grafana
  • Webhook notifications on task failure
  • Real-time logs streamed to your terminal during local dev

Deployment Models

Trigger.dev offers two deployment paths:

  1. Managed platform: You push tasks to Trigger.dev’s infrastructure. They handle scaling, retries, and persistence. You pay per task execution.
  2. Self-hosted: You run the Trigger.dev runtime in your own Kubernetes cluster or Docker Compose setup. You manage scaling and persistence.

The self-hosted option uses PostgreSQL for state persistence and Redis for queue management. You need to handle backups, failover, and capacity planning. The managed platform abstracts this but locks you into their pricing model.

Self-hosting requirements:

  • PostgreSQL 14+ for task state and execution history
  • Redis 6+ for queue coordination
  • Node.js 18+ runtime for task execution
  • S3-compatible storage for large payloads (optional)

When TypeScript-Only Hurts

The TypeScript constraint becomes a problem when your agent workflow needs to call Python ML models or Go services. Trigger.dev tasks can invoke external APIs, but you lose type safety and observability at the boundary.

Workarounds:

  • Wrap Python scripts in HTTP endpoints and call them from TypeScript tasks
  • Use message queues (RabbitMQ, SQS) to hand off work to Python workers
  • Run Python in a subprocess and parse stdout (loses observability and retry semantics)

Temporal’s polyglot support means you can write the orchestration logic in TypeScript and the ML inference step in Python, all within the same workflow. Trigger.dev forces you to split these into separate services.

Failure Modes and Edge Cases

What breaks:

  • Non-idempotent operations: If your task writes to a database without checking for duplicates, retries will create duplicate records. Trigger.dev doesn’t enforce idempotency.
  • Long-running LLM calls: If a single LLM call takes 10 minutes and the task timeout is 5 minutes, the task will retry forever. You need to set per-step timeouts.
  • State size limits: Task state is stored in PostgreSQL. If your agent accumulates 100 MB of conversation history, persistence will slow down. You need to prune state or move large payloads to S3.
  • Clock skew in scheduled tasks: Cron schedules use the platform’s clock, not your local time. If you schedule a task for “every day at 9 AM” and the platform runs in UTC, you’ll get unexpected execution times.

Technical Verdict

Use Trigger.dev V2 when:

  • You’re building multi-step LLM agents in TypeScript with 5+ external API calls per invocation and need sub-minute retry windows. Start with the managed platform to validate the model, then migrate to self-hosted only if you hit cost or compliance constraints.
  • You want durable execution without learning Temporal’s deterministic workflow constraints or event sourcing model.
  • You need built-in observability for agent tool chains and don’t want to instrument every LLM call and tool invocation manually.
  • You’re running on managed infrastructure or can operate PostgreSQL and Redis with backup and failover procedures.

Avoid it when:

  • You need strict determinism guarantees for financial transaction orchestration, compliance workflows, or audit trails that require event replay.
  • Your agent workflows span multiple languages (Python ML models, Go services, Rust data processing) and you need type-safe cross-language orchestration.
  • You’re already invested in Temporal infrastructure and don’t want to migrate state persistence and worker pools. If you’re already running Temporal in production, the migration cost outweighs the developer experience gain.
  • You need sub-second task latency (Trigger.dev adds 100-500ms overhead for state persistence and queue coordination).
  • You’re building workflows with 50+ concurrent steps that require fine-grained retry boundaries per step.

The V2 pivot shows that developers want orchestration primitives beyond event triggers. Trigger.dev fills the gap between serverless functions and Temporal’s complexity. It’s not a replacement for Temporal in high-stakes workflows, but it’s a faster path to durable agent execution for TypeScript teams who prioritize developer experience over polyglot flexibility.