mech.app
AI Agents

Amazon Bedrock AgentCore: Managed State, Memory, and RAG Plumbing for Production Agents

How AWS AgentCore abstracts conversation persistence, knowledge base integration, and runtime orchestration for production agent deployments.

Source: aws.amazon.com
Amazon Bedrock AgentCore: Managed State, Memory, and RAG Plumbing for Production Agents

Amazon Bedrock AgentCore is AWS’s managed runtime layer for production agents. It handles conversation state persistence, knowledge base integration, and orchestration flow so you don’t build agent infrastructure from scratch. The recent equipment repair assistant walkthrough shows how AgentCore Memory, Knowledge Base, and Runtime work together in a multi-turn diagnostic workflow.

This is infrastructure abstraction, not a framework. You get managed state, automatic RAG retrieval timing, and session continuity without running your own vector store or conversation database.

What AgentCore Actually Manages

AgentCore provides three primitives that typically require custom infrastructure:

AgentCore Runtime: Hosts the agent execution environment and handles orchestration. It manages the request/response cycle, tool invocation sequencing, and model interaction. You deploy agent code to the runtime, and it handles scaling and availability.

AgentCore Memory: Persists conversation state across sessions without requiring DynamoDB tables or Redis clusters. The service stores conversation history, user context, and intermediate reasoning steps. When a technician returns to a diagnostic session hours later, the agent resumes with full context.

Bedrock Knowledge Base Integration: Connects RAG retrieval directly into the agent flow. Instead of calling vector search APIs manually and injecting results into prompts, AgentCore handles retrieval timing and context injection. The agent declares knowledge base dependencies, and the runtime manages query execution and result ranking.

Architecture Boundaries

The equipment repair assistant shows the separation between domain logic and infrastructure:

Strands Agents SDK Layer: Provides the domain reasoning interface. You define agent behavior, tool schemas, and decision logic here. The SDK compiles this into an AgentCore-compatible deployment package.

AgentCore Runtime Layer: Executes the compiled agent, manages state transitions, and coordinates tool calls. This is where orchestration happens: the runtime decides when to retrieve from the knowledge base, when to invoke external tools, and when to return control to the user.

Tool Execution Boundary: External tools (parts lookup APIs, diagnostic systems) run outside AgentCore. The runtime calls these via HTTP or Lambda, waits for responses, and feeds results back into the agent’s reasoning loop.

The frontend (React app on Amplify) authenticates via Cognito and calls the AgentCore Runtime endpoint directly. No custom API Gateway or Lambda proxy layer sits between the user and the agent.

Memory Persistence Without Database Management

AgentCore Memory eliminates the need to build conversation storage:

# Traditional approach: manual state management
conversation_id = str(uuid.uuid4())
dynamodb.put_item(
    TableName='conversations',
    Item={
        'id': conversation_id,
        'user_id': user_id,
        'messages': json.dumps(messages),
        'context': json.dumps(context),
        'timestamp': int(time.time())
    }
)

# AgentCore approach: declare memory requirement
agent = StrandsAgent(
    name="equipment_repair_assistant",
    model="amazon.nova-2-lite",
    memory_enabled=True  # Runtime handles persistence
)

The runtime stores conversation turns, extracted entities, and reasoning traces. When a session resumes, the agent receives the full history without explicit load operations. This works across different invocation patterns (synchronous API calls, asynchronous workflows, scheduled tasks).

Memory scope is per-user and per-agent. If you deploy multiple agents, each maintains separate conversation state. The runtime handles expiration policies and storage limits automatically.

Knowledge Base Integration Timing

The difference between calling Bedrock RAG APIs directly and using AgentCore’s Knowledge Base integration is retrieval orchestration:

Direct RAG API calls: You decide when to query the vector store, construct the query string, rank results, and inject context into the prompt. This gives control but requires logic to handle retrieval failures, empty results, and context window limits.

AgentCore Knowledge Base: The runtime decides retrieval timing based on the agent’s reasoning state. When the agent needs information about a specific equipment model, the runtime queries the knowledge base, ranks results by relevance, and injects them into the model’s context automatically.

The equipment repair assistant indexes manufacturer manuals, parts catalogs, and repair procedures into a Bedrock Knowledge Base. When a technician asks “What causes hydraulic pressure loss in a Model X harvester?”, the agent:

  1. Receives the query at the Runtime layer
  2. Determines it needs technical documentation (not just conversation history)
  3. Triggers a knowledge base query for “hydraulic pressure loss Model X”
  4. Receives ranked document chunks
  5. Feeds chunks to Amazon Nova 2 Lite with the original question
  6. Returns a synthesized answer with source citations

You don’t write the retrieval loop. The runtime handles query reformulation, result filtering, and context injection.

Orchestration Flow in Multi-Turn Diagnostics

The equipment repair workflow shows how AgentCore coordinates state across multiple turns:

Turn 1: Technician describes a symptom (“Engine won’t start, battery is charged”)
Turn 2: Agent asks clarifying questions (“Do you hear the starter motor?”)
Turn 3: Agent retrieves diagnostic procedures from the knowledge base
Turn 4: Agent identifies required parts and provides part numbers
Turn 5: Agent surfaces manufacturer repair instructions

Each turn updates conversation state in AgentCore Memory. The runtime maintains:

  • Symptom history
  • Diagnostic branch taken (electrical vs. fuel system)
  • Parts already identified
  • Procedures already reviewed

When the technician returns the next day, the agent resumes at Turn 5 without re-diagnosing.

Deployment Shape and Scaling

AgentCore Runtime is a managed service, so you don’t configure EC2 instances or Kubernetes pods. Deployment involves:

  1. Package agent code with Strands SDK
  2. Upload to S3
  3. Create AgentCore Runtime deployment via API or CloudFormation
  4. Configure model (Amazon Nova 2 Lite, Claude, etc.)
  5. Attach Knowledge Base and enable Memory

The runtime scales automatically based on request volume. You pay per invocation and per token processed, not for idle capacity.

For the equipment repair assistant, the CloudFormation stack provisions:

  • Cognito User Pool and Identity Pool (authentication)
  • Amplify app (frontend hosting)
  • AgentCore Runtime deployment (agent execution)
  • Bedrock Knowledge Base (indexed documentation)
  • IAM roles (runtime permissions)

The frontend calls the AgentCore endpoint directly after Cognito authentication. No custom Lambda functions sit in the request path.

Trade-offs and Failure Modes

ComponentBenefitRisk
AgentCore MemoryNo database management, automatic persistenceLimited query capabilities, no custom indexing
Knowledge Base IntegrationAutomatic retrieval timing, context injectionLess control over ranking, query reformulation
Managed RuntimeNo infrastructure scaling, automatic updatesVendor lock-in, limited observability hooks
Strands SDKHigh-level agent definition, less boilerplateAbstraction limits low-level orchestration control

Likely failure modes:

  • Context window overflow: If conversation history plus knowledge base results exceed the model’s context limit, the runtime truncates. You don’t control truncation strategy.
  • Knowledge base retrieval latency: RAG queries add 200-500ms per turn. For real-time applications, this may be too slow.
  • Memory consistency: If the runtime fails mid-turn, conversation state may be incomplete. The service provides eventual consistency, not strong consistency.
  • Tool call timeouts: If an external tool (parts API, diagnostic system) times out, the agent may retry or fail. You configure timeout policies but don’t control retry logic.

Security Boundaries

AgentCore enforces IAM-based access control. The runtime assumes a role with permissions to:

  • Invoke Bedrock models
  • Query Knowledge Bases
  • Read/write Memory storage
  • Call external tools (Lambda, API Gateway)

The frontend authenticates users via Cognito, and Cognito issues temporary credentials scoped to the user’s identity. These credentials authorize AgentCore Runtime invocations.

For the equipment repair assistant, this means:

  • Technicians authenticate once via Cognito
  • Frontend receives temporary AWS credentials
  • Credentials allow calling the AgentCore endpoint
  • Runtime enforces per-user Memory isolation

You don’t build custom authentication middleware or manage session tokens.

Observability Gaps

AgentCore logs invocations to CloudWatch, but observability is limited:

  • Request/response logs: Available, shows input/output per turn
  • Token usage: Tracked per invocation
  • Knowledge base queries: Logged with query text and result count
  • Tool calls: Logged with tool name and execution time

Not available:

  • Detailed reasoning traces (why the agent chose a specific retrieval query)
  • Intermediate state snapshots (conversation state at each reasoning step)
  • Custom metrics (domain-specific KPIs like diagnostic accuracy)

For production deployments, you’ll need external observability tools (Datadog, New Relic) to track agent behavior beyond AWS’s built-in logs.

When to Use AgentCore

Good fit:

  • Multi-turn conversational agents with state persistence requirements
  • RAG-heavy workflows where retrieval timing is complex
  • Teams that want managed infrastructure over custom orchestration
  • Applications where AWS lock-in is acceptable

Poor fit:

  • Agents requiring custom orchestration logic (complex branching, parallel tool calls)
  • Real-time applications sensitive to RAG retrieval latency
  • Use cases needing strong consistency in conversation state
  • Projects requiring detailed reasoning observability

Technical Verdict

AgentCore removes the undifferentiated heavy lifting of agent infrastructure: conversation storage, RAG orchestration, and runtime scaling. If you’re building a production agent on AWS and don’t need custom orchestration control, it’s faster than building these primitives yourself.

The trade-off is abstraction. You lose fine-grained control over retrieval timing, state management, and failure handling. For diagnostic workflows, customer support agents, or internal tools where AWS lock-in is acceptable, AgentCore is a reasonable default.

If you need custom orchestration (parallel tool execution, complex state machines, non-standard retrieval patterns), you’ll hit the abstraction ceiling quickly. In that case, build on Bedrock APIs directly or use a framework like LangGraph that gives you lower-level control.