mech.app
Automation

Trigger.dev V2: What a TypeScript-Native Temporal Alternative Reveals About Durable Execution Plumbing

How Trigger.dev pivoted from event triggers to durable execution, exposing infrastructure trade-offs in state persistence, retry semantics, and TypeScri...

Source: trigger.dev
Trigger.dev V2: What a TypeScript-Native Temporal Alternative Reveals About Durable Execution Plumbing

Trigger.dev launched in February 2023 as a “developer-first Zapier alternative” and earned 745 points on Hacker News. Eight months later, the team shipped V2 with a completely different pitch: a TypeScript-native alternative to Temporal for durable execution. That pivot exposes the real infrastructure gap between event-driven automation and long-running task orchestration.

The shift reveals what developers building AI agents and background jobs actually need: not webhook triggers, but durable execution with automatic retries, state persistence, and observability baked in. The V2 launch (172 points, 39 comments) signals sustained interest in a specific niche: TypeScript developers who find Temporal’s Go-centric model too heavy but need more than a simple job queue.

What Changed Between V1 and V2

V1 focused on event-driven triggers. You connected external services (GitHub, Stripe, Slack) and wrote handlers that fired when events arrived. The architecture assumed short-lived functions responding to webhooks.

V2 targets durable execution. You write long-running tasks that survive process crashes, network failures, and deployment cycles. The platform handles state persistence, retry logic, and observability without requiring explicit checkpointing code.

Key architectural differences:

  • State management: V1 relied on ephemeral function context. V2 persists task state to a backing store, allowing tasks to resume after infrastructure failures.
  • Retry semantics: V1 retries were basic HTTP-level retries. V2 implements workflow-style retries with exponential backoff, jitter, and configurable policies per task.
  • Execution model: V1 ran handlers in response to external events. V2 runs tasks as durable workflows that can wait, branch, and coordinate across multiple steps.

The pivot reflects feedback from teams building AI agents, media processing pipelines, and scheduled jobs. These workloads need durability guarantees, not just event routing.

How Durable Execution Works in Trigger.dev

Trigger.dev tasks run in isolated environments with automatic state persistence. When a task calls an external API or waits for a condition, the platform snapshots the execution state. If the process crashes, the task resumes from the last checkpoint.

Core plumbing components:

  1. Task definition: You export a task object with an id and a run function. The platform wraps this in orchestration logic.
  2. State persistence: Every async boundary (API call, wait, tool invocation) triggers a state snapshot. The platform stores execution history in a database.
  3. Retry engine: Failed steps retry according to task-level policies. The engine tracks attempt counts, backoff intervals, and failure reasons.
  4. Observability layer: Each task execution generates trace data. The dashboard shows step-by-step progress, timing, and error context.

Here’s a minimal durable task:

import { task } from "@trigger.dev/sdk/v3";

export const processVideo = task({
  id: "process-video",
  retry: {
    maxAttempts: 3,
    factor: 2,
    minTimeout: 1000,
    maxTimeout: 10000,
  },
  run: async (payload: { videoUrl: string }) => {
    // Step 1: Download video (durable checkpoint)
    const file = await downloadFile(payload.videoUrl);
    
    // Step 2: Transcode (durable checkpoint)
    const transcoded = await transcodeVideo(file);
    
    // Step 3: Upload to CDN (durable checkpoint)
    const cdnUrl = await uploadToCDN(transcoded);
    
    return { cdnUrl };
  },
});

Each await creates a checkpoint. If the transcode step fails, the task retries from that point without re-downloading the video.

TypeScript-Native vs. Polyglot Orchestration

Temporal supports multiple languages (Go, Java, Python, TypeScript) through a shared server and language-specific SDKs. This design prioritizes portability but introduces complexity: you run a Temporal server cluster, configure workers, and manage gRPC connections.

Trigger.dev runs entirely in TypeScript (now Bun for 5x throughput). The platform handles infrastructure, so you deploy tasks without managing servers or worker pools.

AspectTemporalTrigger.dev
Language supportPolyglot (Go, Java, Python, TS)TypeScript/Bun only
InfrastructureSelf-hosted cluster or Temporal CloudFully managed, no cluster setup
State backendCassandra, PostgreSQL, MySQLManaged PostgreSQL (abstracted)
Worker modelYou provision and scale workersElastic workers, auto-scaled
Learning curveSteep (workflows, activities, signals)Shallow (tasks with async/await)
Failure recoveryEvent sourcing with full replayCheckpoint-based resumption

Temporal’s event sourcing replays the entire workflow history on recovery. Trigger.dev snapshots state at each async boundary and resumes from the last checkpoint. This trades replay determinism for faster recovery and simpler mental models.

AI Agent Orchestration Example

AI agents need durable execution because tool calls, API requests, and model invocations can fail or timeout. Trigger.dev’s task model fits naturally:

import { task } from "@trigger.dev/sdk/v3";
import { generateText } from "ai";
import { anthropic } from "@ai-sdk/anthropic";

export const researchAgent = task({
  id: "research-agent",
  run: async ({ topic }: { topic: string }) => {
    const messages: CoreMessage[] = [
      { role: "user", content: `Research: ${topic}` },
    ];
    
    for (let i = 0; i < 10; i++) {
      const { text, toolCalls, steps } = await generateText({
        model: anthropic("claude-opus-4-20250514"),
        system: "You are a research assistant with web access.",
        messages,
        tools: { search, browse, analyze },
        maxSteps: 5,
      });
      
      if (!toolCalls.length) {
        return { summary: text, stepsUsed: steps.length };
      }
      
      for (const call of toolCalls) {
        const result = await executeTool(call);
        messages.push({ role: "tool", content: result });
      }
    }
  },
});

Each generateText call and executeTool invocation creates a checkpoint. If the agent crashes mid-research, it resumes with the conversation history intact.

Observability and Debugging

Trigger.dev’s dashboard shows real-time task execution with step-by-step traces. You see:

  • Execution timeline: When each step started, how long it took, and whether it succeeded or retried.
  • State snapshots: The payload and return value at each checkpoint.
  • Error context: Stack traces, retry counts, and failure reasons.

This visibility matters for debugging AI agents. When a tool call fails or a model returns unexpected output, you can inspect the exact state at that step without adding logging code.

Concurrency and Queue Management

Trigger.dev supports concurrency limits and queue-based execution. You configure how many instances of a task can run simultaneously:

export const batchProcess = task({
  id: "batch-process",
  queue: {
    concurrencyLimit: 5,
  },
  run: async (payload: { items: string[] }) => {
    // Only 5 instances run at once
  },
});

The platform manages queue depth, backpressure, and fair scheduling. This prevents resource exhaustion when processing large batches or handling traffic spikes.

Deployment and Infrastructure Shape

Trigger.dev runs as a managed service. You deploy tasks by pushing code to the platform:

  1. Define tasks in your codebase using the SDK.
  2. Deploy via CLI or CI/CD integration.
  3. Trigger tasks from your application using the client SDK or HTTP API.

The platform provisions workers, scales them based on load, and handles all state persistence. You don’t manage Kubernetes clusters, worker pools, or database schemas.

For self-hosting, Trigger.dev provides Docker images and Kubernetes manifests. You run the control plane and workers in your infrastructure, connecting to your own PostgreSQL instance.

Likely Failure Modes

State persistence lag: If the backing database experiences high latency, checkpoint writes slow down task execution. Monitor database performance and consider read replicas for observability queries.

Retry storms: Misconfigured retry policies can create cascading failures. If a task retries too aggressively, it can overwhelm downstream services. Use exponential backoff and circuit breakers.

Payload size limits: Large payloads (multi-megabyte files, huge JSON objects) can exceed storage limits. Stream large data through external storage (S3, R2) and pass references in task payloads.

Cold start latency: Elastic workers introduce cold start delays when scaling from zero. For latency-sensitive tasks, consider keeping a minimum number of warm workers.

TypeScript-only constraint: If your stack includes Python ML models or Go services, you’ll need to wrap them in HTTP APIs or run separate orchestration. Trigger.dev won’t execute non-TypeScript code directly.

When to Use Trigger.dev

Good fit:

  • TypeScript-first teams building AI agents, background jobs, or scheduled tasks.
  • Projects that need durable execution without managing infrastructure.
  • Workflows with unpredictable runtime (minutes to hours) that must survive crashes.
  • Teams that want observability and retries without writing boilerplate.

Poor fit:

  • Polyglot environments where workflows span multiple languages.
  • Teams that need on-premises deployment with strict data residency (self-hosting adds operational burden).
  • Extremely high-throughput workloads (millions of tasks per minute) where managed service costs become prohibitive.
  • Simple cron jobs that don’t need durability (use GitHub Actions or a basic scheduler).

Technical Verdict

Trigger.dev V2 solves a real problem: TypeScript developers need durable execution without the operational overhead of Temporal. The platform trades Temporal’s polyglot flexibility for a simpler developer experience and managed infrastructure.

Use it when you’re building AI agents, media processing pipelines, or long-running background jobs in TypeScript and want automatic retries, state persistence, and observability without writing orchestration code. Avoid it if you need multi-language workflows, have strict on-premises requirements, or run simple tasks that don’t need durability guarantees.

The pivot from event triggers to durable execution reflects what developers actually build: not reactive webhooks, but stateful workflows that must survive failures. That’s the plumbing AI agents depend on.