mech.app
AI Agents

MCP at Scale: How AWS and Cisco Built Automated Security Scanning for Multi-Agent Deployments

Inside the infrastructure plumbing for scanning, governing, and auditing Model Context Protocol and Agent-to-Agent traffic in enterprise deployments.

Source: aws.amazon.com
MCP at Scale: How AWS and Cisco Built Automated Security Scanning for Multi-Agent Deployments

Enterprises now run dozens to hundreds of MCP servers, each exposing tools that connect AI agents to databases, APIs, and internal systems. The Agent-to-Agent (A2A) Protocol added autonomous inter-agent communication in April 2025. Agent Skills followed shortly after. Security teams face a scaling problem: manual review processes add weeks to each deployment, audit trails don’t exist for autonomous agent actions, and compliance frameworks like SOX and GDPR require visibility that current tooling doesn’t provide.

AWS and Cisco published a production-scale solution in May 2026 that addresses three specific gaps: visibility into deployed tools and agents, automated security scanning that matches deployment velocity, and audit logging that satisfies regulatory requirements. The architecture reveals where security scans actually run, how policy engines work across both MCP and A2A protocols, and what the audit log schema looks like when you need to track high-cardinality agent interactions without exploding storage costs.

The Visibility Problem

MCP servers are lightweight. A Python script with a few tool definitions can expose database queries, file system access, or API calls to any agent that connects. Developers ship them fast. Security teams discover them later.

Agent-to-Agent communication compounds the problem. When agents talk directly to each other, traditional API gateways don’t see the traffic. You lose the choke point where you’d normally enforce policy.

Agent Skills add another layer. These are reusable capabilities that agents can invoke, often packaged as containers or serverless functions. They proliferate across teams, and tracking which skills are deployed where becomes a manual spreadsheet exercise.

The AWS and Cisco solution centers on AI Registry, an open-source project that maintains a live inventory of:

  • MCP servers with their exposed tools and data source connections
  • A2A agents with their communication patterns and trust boundaries
  • Agent Skills with their execution environments and permission scopes

The registry isn’t a static catalog. It integrates with deployment pipelines to capture metadata at creation time and monitors runtime behavior to detect drift.

Scanning Architecture

Cisco AI Defense performs automated security scans. The placement decision matters. You can scan at three points:

  1. Gateway layer: Intercept MCP and A2A traffic at the network edge
  2. Sidecar proxy: Inject a scanning container alongside each agent
  3. Orchestrator integration: Scan inside the agent framework before tool execution

Each approach has trade-offs:

Scan LocationLatency ImpactCoverageFailure Mode
Gateway+5-15ms per callCatches all external trafficMisses internal agent-to-agent calls
Sidecar+2-8ms per callFull coverage per agentDeployment complexity, resource overhead
Orchestrator<1ms (in-process)Deepest context visibilityFramework-specific integration required

AWS and Cisco use a hybrid model. Gateway scans catch MCP servers exposed to external agents. Sidecar proxies handle A2A traffic between internal agents. Orchestrator hooks provide deep inspection for high-risk tool calls that access sensitive data.

The scanner checks for:

  • Credential leakage in tool parameters
  • SQL injection patterns in database queries
  • Excessive permission scopes (agent requesting admin when read-only would suffice)
  • Data exfiltration attempts (large payloads to external endpoints)
  • Policy violations (agent accessing data outside its approved domain)

Scans run asynchronously when possible. For high-risk operations, the scanner blocks until policy evaluation completes. The system uses a risk score to decide: low-risk tool calls get async scanning, high-risk calls block.

Policy Engine Design

The policy engine needs to work across MCP tool calls and A2A message passing without duplicating rule logic. The architecture uses a unified policy language that abstracts over protocol differences.

A policy rule looks like this:

policy:
  id: "prevent-pii-access"
  scope: ["mcp", "a2a"]
  conditions:
    - tool.category == "database"
    - data.classification == "PII"
    - agent.clearance < "confidential"
  action: "deny"
  audit: "log-full-context"

The engine evaluates policies at decision points:

  • MCP tool invocation: Before the agent calls a tool, check if the tool + parameters + agent identity satisfy all applicable policies
  • A2A message send: Before an agent sends a message to another agent, check if the sender + recipient + message content satisfy policies
  • Skill execution: Before a skill runs, check if the invoking agent + skill permissions + execution context satisfy policies

Policy evaluation happens in under 5ms for 95% of requests. The engine caches compiled policies and uses a decision tree to short-circuit evaluation when early conditions fail.

Audit Log Schema

Compliance teams need to answer questions like “which agents accessed customer financial data last quarter?” and “show me all tool calls that modified production databases.” The audit log schema needs to capture enough detail without creating a storage cost problem.

High-cardinality fields are the challenge. Tool names, parameter values, and context objects can have thousands of unique values. Storing every parameter in full-text fields makes queries expensive.

The solution uses a tiered schema:

Tier 1: Indexed fields (always queryable)

  • Timestamp
  • Agent ID
  • Tool or skill name
  • Action (invoke, send, execute)
  • Result (success, denied, error)
  • Risk score

Tier 2: Structured metadata (queryable with filters)

  • Data classification tags
  • Resource identifiers (database names, API endpoints)
  • Policy IDs that were evaluated

Tier 3: Full context (stored in object storage, retrieved by ID)

  • Complete parameter payloads
  • Message contents
  • Stack traces for errors

Tier 1 and 2 go into a time-series database optimized for high-write throughput. Tier 3 goes into S3 with lifecycle policies that archive to Glacier after 90 days. Compliance queries hit Tier 1 and 2. Forensic investigations retrieve Tier 3 by ID when needed.

The log volume is significant. A deployment with 100 agents making 10 tool calls per minute generates 144,000 log entries per day. At 2KB per entry (Tier 1 + 2), that’s 288MB daily or 8.6GB monthly. Tier 3 adds another 50-100GB monthly depending on parameter sizes. Storage costs run $2-5 per agent per month.

Deployment Shape

The reference architecture runs on AWS with these components:

  • AI Registry: DynamoDB table with Lambda triggers for real-time updates, exposed via API Gateway
  • Cisco AI Defense scanner: ECS Fargate tasks that scale based on scan queue depth
  • Policy engine: Lambda functions with policy rules stored in S3 and cached in ElastiCache
  • Audit logs: Kinesis Data Firehose writing to OpenSearch (Tier 1/2) and S3 (Tier 3)

The scanner uses a queue-based architecture. When an agent invokes a tool, the orchestrator publishes a scan request to SQS. Scanner tasks pull from the queue, evaluate policies, and write results to a response topic. The orchestrator subscribes to the response topic and either proceeds with the tool call or returns a denial to the agent.

For blocking scans (high-risk operations), the orchestrator uses a synchronous Lambda invocation instead of the queue. This adds latency but guarantees policy evaluation before execution.

Failure Modes

The system has several failure modes to consider:

Scanner unavailable: If scanner tasks crash or scale-down too aggressively, the scan queue backs up. The orchestrator has a timeout (default 500ms). If no scan result arrives within the timeout, the system can either fail-open (allow the tool call) or fail-closed (deny it). Most deployments fail-closed for high-risk operations and fail-open for low-risk ones.

Policy engine errors: If policy evaluation throws an exception (malformed rule, missing context), the default action is deny. This prevents a policy bug from opening a security hole, but it can block legitimate agent operations. The system logs policy errors to a separate alert stream so teams can fix broken rules quickly.

Audit log loss: If Kinesis Firehose can’t write to OpenSearch (cluster full, network partition), it buffers records and retries. After the buffer fills (default 5MB or 5 minutes), Firehose writes failed records to an S3 error bucket. A Lambda function monitors the error bucket and alerts when audit logs are being dropped.

Registry drift: If the AI Registry gets out of sync with actual deployments (someone deploys an MCP server without registering it), the scanner won’t know to scan it. The system runs a periodic reconciliation job that scans infrastructure (ECS tasks, Lambda functions, EC2 instances) for MCP servers and A2A agents, then compares against the registry. Unregistered components trigger alerts.

Observability

The system exposes metrics for:

  • Scan latency (p50, p95, p99)
  • Policy evaluation time
  • Audit log write throughput
  • Scanner queue depth
  • Policy denial rate by rule ID

Teams use these metrics to tune performance. If p95 scan latency exceeds 100ms, they scale up scanner tasks. If a policy rule has a high denial rate, they investigate whether the rule is too strict or agents are misbehaving.

Distributed tracing links agent requests through the scanning pipeline. When an agent call fails, operators can see the full path: orchestrator → scan request → policy evaluation → denial reason. This cuts debugging time from hours to minutes.

Technical Verdict

Use this architecture when you have:

  • More than 20 MCP servers or A2A agents in production
  • Compliance requirements that mandate audit trails for AI agent actions
  • Security teams that can’t keep up with manual review of new tools and agents
  • Multi-team deployments where you need centralized governance without blocking velocity

Avoid it when:

  • You have fewer than 10 agents and can manually review each deployment
  • Your agents don’t access sensitive data or regulated systems
  • You’re still in the prototype phase and deployment patterns haven’t stabilized
  • The 2-5ms latency overhead per tool call breaks your performance budget

The system adds operational complexity. You need to run and monitor the scanner infrastructure, maintain policy rules, and handle audit log storage. For small deployments, the overhead isn’t worth it. For enterprises scaling to hundreds of agents, it’s the difference between controlled growth and unmanageable sprawl.