AWS just published a case study with Works Human Intelligence showing a 97% cost reduction in production agent deployments. The project migrated two business support agents from a self-hosted LangGraph stack to Amazon Bedrock AgentCore. The numbers are rare in an industry that typically hides operational economics.
This is not about feature velocity. This is about the economic plumbing of running agents at scale: token consumption patterns, orchestration overhead, and the hidden costs of state management in multi-step workflows.
The Baseline Architecture
Works Human Intelligence built two agents for their COMPANY HR system:
- Commuting Allowance Agent: Automates approval workflows for employee relocation events
- Browser Operation Agent: Executes customer tasks inside the COMPANY web interface
The initial PoC ran on LangGraph orchestrated via Amazon ECS and AWS Fargate. This is a common pattern: self-hosted orchestration with containerized execution. The problem surfaced in token consumption and API call patterns.
Where the Costs Hid
Self-hosted orchestration frameworks like LangGraph introduce overhead in three places:
- State serialization: Every step in a multi-turn workflow writes state to persistent storage. Each serialization round-trips through the LLM to generate structured outputs.
- Tool call overhead: Each function invocation requires a separate LLM call to parse results and decide the next action.
- Memory management: Conversation history grows linearly. Without aggressive pruning, token counts balloon on every turn.
The ECS/Fargate deployment added compute costs. Each agent invocation spun up container resources, even for simple approval checks that completed in seconds.
AgentCore’s Resource Model
Amazon Bedrock AgentCore shifts orchestration into a managed service. The key architectural differences:
- Serverless execution: No container overhead. Agents run on-demand with sub-second cold starts.
- Native state management: AgentCore maintains conversation state without round-tripping through the LLM for serialization.
- Batched tool calls: Multiple tool invocations can execute in a single LLM turn, reducing API call count.
The state management change is the biggest lever. Instead of asking the LLM to serialize its own state after every step, AgentCore tracks execution context natively. This eliminates a class of LLM calls that exist purely for bookkeeping.
Cost Attribution and Observability
AgentCore exposes cost metrics at three levels:
| Metric Level | Granularity | Use Case |
|---|---|---|
| Agent | Per-agent totals | Budget allocation across agent types |
| Session | Per-conversation | Customer-level cost attribution |
| Step | Per-tool-invocation | Identifying expensive operations |
The session-level tracking matters for multi-tenant deployments. If you’re running agents on behalf of multiple customers, you need to attribute costs to the right account. AgentCore logs token consumption and tool execution time per session ID.
The step-level metrics expose which tool calls burn tokens. In the Works Human Intelligence deployment, the Browser Operation Agent initially made redundant web scraping calls. Step-level observability surfaced the pattern, and they added caching logic to the tool layer.
The 97% Reduction Breakdown
AWS and Works Human Intelligence published specific cost drivers:
- Eliminated container overhead: Fargate costs disappeared entirely. AgentCore runs serverless.
- Reduced LLM calls by 60%: Native state management removed serialization round-trips.
- Token consumption down 40%: Batched tool calls and smarter memory pruning.
The 97% figure combines compute savings (Fargate elimination) and LLM cost reduction (fewer API calls, fewer tokens per call). The exact split depends on workload characteristics, but the container overhead was the largest single line item.
Implementation Pattern
Here’s the orchestration flow for the Commuting Allowance Agent:
import boto3
bedrock_agent = boto3.client('bedrock-agent-runtime')
def process_commuting_allowance(application_data):
response = bedrock_agent.invoke_agent(
agentId='agent-xyz',
agentAliasId='prod',
sessionId=f"session-{application_data['employee_id']}",
inputText=f"Process commuting allowance: {application_data}",
enableTrace=True # Exposes step-level cost metrics
)
# AgentCore handles state across multiple tool calls
# No manual serialization or memory management
for event in response['completion']:
if 'chunk' in event:
yield event['chunk']['bytes'].decode()
The enableTrace flag is critical for cost observability. It logs every tool invocation, token count, and latency metric to CloudWatch. You can build cost dashboards directly from these traces.
Failure Modes and Guardrails
AgentCore introduces new failure surfaces:
- Session state limits: AgentCore maintains state up to a maximum conversation length. Long-running workflows need explicit checkpointing.
- Tool timeout boundaries: Each tool call has a hard timeout. Browser automation tasks that scrape slow pages can hit limits.
- Quota exhaustion: Serverless execution shares regional quotas. High-concurrency deployments need quota increases.
The Works Human Intelligence team hit the tool timeout boundary with the Browser Operation Agent. Web scraping tasks occasionally exceeded the 30-second limit. They solved it by breaking long scrapes into paginated tool calls with explicit continuation tokens.
When to Use AgentCore
AgentCore makes sense when:
- You’re running high-volume, short-duration agent workflows (support tickets, approvals, data entry)
- Token costs dominate your bill and you need observability into consumption patterns
- You want to avoid managing orchestration infrastructure (ECS clusters, state stores, memory layers)
Skip AgentCore if:
- Your agents need custom orchestration logic that doesn’t map to Bedrock’s execution model
- You require sub-second latency and can’t tolerate cold starts (though AgentCore cold starts are typically under 500ms)
- You’re already running agents on infrastructure with sunk costs (existing Kubernetes clusters, pre-purchased compute)
The 97% cost reduction is real, but it’s specific to a workload that was over-provisioned on Fargate and burning tokens on state serialization. If you’re running lean orchestration already, the gains will be smaller.
Technical Verdict
Amazon Bedrock AgentCore is cost engineering infrastructure disguised as a managed service. The 97% reduction comes from eliminating two classes of waste: container overhead for short-lived tasks and LLM calls that exist only to maintain state.
Use it when token consumption and compute costs are your top operational expense. The session-level cost attribution is production-grade, and the step-level observability exposes optimization opportunities that are invisible in self-hosted stacks.
Avoid it if you need orchestration flexibility that doesn’t fit Bedrock’s execution model, or if you’re running agents on infrastructure you’ve already paid for. The cost savings assume you’re moving from a higher-cost baseline.
The Works Human Intelligence case study is valuable because it publishes real numbers. Most agent deployments hide their economics. This one shows the plumbing.