The agent tooling landscape has consolidated enough in 2026 that four distinct architectural patterns have emerged. The question is no longer whether agents work, but which deployment shape fits your state persistence requirements, tool boundary enforcement, and failure recovery needs.
This is not a ranking. Each pattern solves different problems. The goal is to give you a decision framework before you commit to an approach.
Path 1: Build It Yourself (Custom Framework)
You own the full stack: model calls, tool wiring, memory system, orchestration loop, deployment, monitoring. Frameworks like LangGraph and the OpenAI Agents SDK give you building blocks, but the architecture is yours.
State Persistence
You control where conversation context, tool call history, and intermediate results live. This usually means:
- A database (Postgres, Redis, or DynamoDB) for session state
- Blob storage (S3, GCS) for large artifacts like generated files or scraped data
- A checkpoint system that lets you resume from any point in the agent loop
When a container restarts, you decide what survives. If you store state in memory only, you lose everything. If you checkpoint after every tool call, you can replay from the last successful step.
# LangGraph checkpoint example
from langgraph.checkpoint.postgres import PostgresSaver
checkpointer = PostgresSaver(connection_string="postgresql://...")
graph = StateGraph(AgentState)
graph.add_node("agent", agent_node)
graph.add_node("tools", tool_node)
graph.add_edge("agent", "tools")
graph.add_edge("tools", "agent")
app = graph.compile(checkpointer=checkpointer)
# Resume from last checkpoint
result = app.invoke(
{"messages": [HumanMessage(content="continue")]},
config={"configurable": {"thread_id": "session-123"}}
)
Tool Boundary Enforcement
You implement your own guardrails. Common patterns:
- Allowlist of callable functions per agent role
- Rate limiting per tool (e.g., max 10 API calls per minute)
- Input validation before the tool executes
- Sandboxed execution environments for untrusted code
If an agent tries to call an unauthorized API, your code catches it before the request leaves your infrastructure.
Failure Recovery
You write the retry logic, error handling, and fallback behavior. Options:
- Retry with exponential backoff for transient failures
- Replay from the last checkpoint if a tool call fails
- Switch to a simpler model or tool if the primary one times out
- Abort and log the failure for manual review
The gap between a working demo and a production agent is large. A demo proves the model can do the task. A production agent handles failure gracefully, recovers from interrupted sessions, and behaves predictably across thousands of runs.
When to Choose This Path
- Your requirements are specific enough that no existing agent maps to them cleanly
- The agent needs deep integration with internal systems that cannot be exposed to external infrastructure
- The way you build the agent is itself the competitive advantage (proprietary orchestration logic, domain-specific memory structures, custom tool design)
- You need to understand every layer because you are responsible for debugging it in production
What It Demands
Time and architectural judgment. You will spend weeks on infrastructure that a managed platform would handle for you. The trade-off is control and flexibility.
Path 2: Buy a Managed Platform
You use a service like Relevance AI, Voiceflow, or Stack AI. The platform handles orchestration, state management, tool integration, and deployment. You configure the agent through a UI or API, but you do not own the runtime.
State Persistence
The platform manages it. Conversation context and tool call history live in the platform’s database. You do not control the schema or the storage layer.
If the platform goes down, your agent stops working. If the platform changes its state model, you adapt to their migration path.
Tool Boundary Enforcement
The platform provides built-in guardrails:
- Pre-approved tool integrations (Slack, Google Sheets, Salesforce)
- Rate limiting enforced at the platform level
- Input validation and output sanitization
You cannot call arbitrary APIs unless the platform supports custom webhooks or function calling. If you need a tool the platform does not support, you either request it or move to a different path.
Failure Recovery
The platform handles retries and error handling. You configure retry policies (e.g., retry 3 times with 5-second delays), but you do not write the retry logic.
If a tool call fails, the platform decides whether to retry, skip, or abort. You get logs and alerts, but you do not control the recovery flow.
When to Choose This Path
- You need an agent running quickly without building infrastructure
- Your tool requirements fit within the platform’s integrations
- You are comfortable with vendor lock-in and platform limitations
- You do not need deep customization of the orchestration loop
What It Demands
Acceptance of constraints. You trade control for speed. If the platform’s tool set or orchestration model does not fit your needs, you will hit a wall.
Path 3: Compose Existing Services
You wire together specialized services: a model provider (OpenAI, Anthropic), a tool orchestration layer (Zapier, Make), a memory service (Mem0, Zep), and a monitoring tool (Langfuse, Helicone). Each service does one thing well. You glue them together.
State Persistence
You choose a memory service that handles conversation context and tool call history. Mem0 and Zep are popular options. They store state in their own databases and expose APIs for reading and writing.
If a service restarts, the memory service retains the state. Your orchestration layer queries the memory service to resume the conversation.
Tool Boundary Enforcement
You configure boundaries in the orchestration layer. Zapier and Make let you define which tools an agent can call and under what conditions.
Rate limiting and input validation happen at the orchestration layer or the tool service itself. You do not write the enforcement logic, but you configure the rules.
Failure Recovery
Each service handles its own retries. The orchestration layer retries failed tool calls. The memory service retries failed writes. The model provider retries failed API calls.
You configure retry policies in each service, but you do not write the retry logic. If a failure crosses service boundaries (e.g., the orchestration layer fails to write to the memory service), you need to handle it in your glue code.
When to Choose This Path
- You want to avoid building infrastructure but need more flexibility than a managed platform
- Your tool requirements span multiple services
- You are comfortable managing API integrations and service dependencies
- You need observability and monitoring without building it yourself
What It Demands
Integration discipline. You will spend time wiring services together, handling authentication, and debugging cross-service failures. The trade-off is flexibility without the burden of building everything.
Path 4: Rent Ephemeral Compute
You use a serverless or ephemeral compute platform (Modal, Fly.io, AWS Lambda) to run agent code on demand. The agent spins up, executes, and shuts down. State lives outside the compute layer (in a database or object storage).
State Persistence
You store state in a database or object storage before the agent shuts down. The next invocation reads the state and continues.
If the compute instance dies mid-execution, you lose in-memory state. You need to checkpoint frequently to avoid losing work.
Tool Boundary Enforcement
You implement guardrails in your agent code, just like the custom framework path. The difference is that the code runs in a short-lived container or function.
Rate limiting and input validation happen in your code or in a proxy layer (e.g., an API gateway that enforces rate limits before the agent runs).
Failure Recovery
You write retry logic in your agent code or in the orchestration layer that invokes the ephemeral compute. If a tool call fails, you decide whether to retry in the same invocation or spawn a new one.
If the compute instance times out, you need to handle it at the orchestration layer. Some platforms (like Modal) let you configure retries and timeouts declaratively.
# Modal example with retries
import modal
app = modal.App("agent-app")
@app.function(
retries=3,
timeout=300,
secrets=[modal.Secret.from_name("openai-key")]
)
def run_agent(session_id: str, message: str):
try:
# Load state from database
state = load_state(session_id)
except DatabaseError as e:
logger.error(f"Failed to load state: {e}")
raise
# Run agent loop
result = agent.run(state, message)
try:
# Save state back to database
save_state(session_id, result.state)
except DatabaseError as e:
logger.error(f"Failed to save state: {e}")
raise
return result.output
When to Choose This Path
- You want to avoid managing long-running infrastructure
- Your agent workload is bursty or unpredictable
- You need to scale to zero when the agent is not in use
- You are comfortable with cold start latency and stateless execution
What It Demands
Discipline around state management. You need to checkpoint frequently and handle failures at the orchestration layer. The trade-off is cost efficiency and zero infrastructure management.
Comparison Table
| Dimension | Build | Buy | Compose | Rent |
|---|---|---|---|---|
| State Control | Full control over schema and storage | Platform manages state | Memory service handles state | You manage external state storage |
| Tool Boundaries | You implement all guardrails | Platform provides built-in limits | Orchestration layer enforces rules | You implement in agent code |
| Failure Recovery | You write all retry logic | Platform handles retries | Each service retries independently | You write retry logic or use platform features |
| Cold Start Latency | Depends on deployment shape | Usually low (always-on) | Depends on services | High (ephemeral compute) |
| Cost at Zero Usage | Infrastructure costs remain | Subscription or usage fees | Service fees remain | Zero (scales to zero) |
| Vendor Lock-In | None (you own the code) | High (platform-specific) | Medium (service APIs) | Low (portable code) |
| Time to Production | 8-12 weeks | 3-5 days | 2-3 weeks | 1-2 weeks |
Note: Time to Production estimates are editorial synthesis based on typical project scope. Actual timelines vary by team size, requirements complexity, and existing infrastructure.
Decision Tree
Start with these questions:
-
Is your primary constraint cost or time?
- Cost: Rent (scales to zero)
- Time: Buy (fastest to production)
- Neither: Continue
-
Do you need deep integration with internal systems that cannot be exposed externally?
- Yes: Build or Rent
- No: Continue
-
Do your tool requirements fit within a managed platform’s integrations?
- Yes: Buy
- No: Continue
-
Is your workload bursty or unpredictable, and do you want to scale to zero?
- Yes: Rent
- No: Continue
-
Do you need flexibility without building everything yourself?
- Yes: Compose
- No: Build
-
Do you have proprietary orchestration patterns or domain-specific memory structures that existing platforms cannot replicate?
- Yes: Build
- No: Compose or Buy
Hybrid Approaches
Most production systems combine these patterns. A typical SaaS might:
- Build custom orchestration logic for proprietary decision-making
- Compose Zapier for tool integration with third-party services
- Rent Modal for execution to handle burst traffic
Another common pattern:
- Buy a managed platform for customer-facing agents (speed to market)
- Build custom agents for internal workflows (deep integration needs)
- Compose observability services (Langfuse, Helicone) across both
The key is to choose the right pattern for each layer based on your requirements, not your default instinct.
Technical Verdict
Use Build if you have at least 2 full-time backend engineers with distributed systems experience, 6+ months runway, and requirements that no existing platform can satisfy. The orchestration logic itself is your competitive advantage. Avoid if you need production in under 8 weeks, have fewer than 2 dedicated engineers, or lack experience debugging distributed state machines in production.
Use Buy if you need an agent in production within 2 weeks and your tool requirements fit within the platform’s integrations. Your team is under 5 engineers and you prioritize speed over flexibility. Avoid if you need custom orchestration logic, deep internal system integration that cannot be exposed via webhooks, or if your business model cannot tolerate 15-30% platform fees on usage.
Use Compose if you need flexibility without building everything yourself. You have 2-4 weeks to production and are comfortable managing API integrations across 3-5 services. Your tool requirements span multiple domains (CRM, analytics, communication). Avoid if you need sub-100ms p99 latency, cannot tolerate cross-service authentication complexity, or lack experience debugging failures that span service boundaries.
Use Rent if your workload is bursty (10x+ variance between peak and trough), cost is the primary constraint, and you can tolerate 2-5 second cold starts. You need to scale to zero when idle. Avoid if you need consistent sub-second response times, your agent maintains sessions longer than 15 minutes with frequent state updates (checkpoint overhead becomes prohibitive), or you cannot tolerate 1-3% invocation failure rates during platform scaling events.
Most production deployments will combine these patterns. Start with the path that matches your most critical constraint (cost, time, control, or flexibility), then layer in other patterns as needed.