mech.app
AI Agents

AWS Agent Plugins: How Amazon Packages IAM, CloudWatch, and Best-Practice Playbooks for Coding Agents

AWS Labs ships a plugin model that bundles skills, MCP servers, and hooks with IAM condition keys to distinguish agent actions from human ones.

Source: github.com
AWS Agent Plugins: How Amazon Packages IAM, CloudWatch, and Best-Practice Playbooks for Coding Agents

AWS Labs is shipping a plugin packaging model that bundles agent skills (step-by-step workflows), MCP servers (live data connections), and hooks (lifecycle events) into a single installable unit. The financial angle: enterprises can now encode AWS deployment expertise as versioned, auditable capabilities with IAM condition keys to distinguish agent actions from human ones. This reduces context bloat and makes agent behavior standardizable across teams.

The repo hit GitHub trending rank 3 for Python with 717 stars and 101 forks, signaling strong community adoption. AWS announced the Agent Toolkit for AWS in May 2026 as the production successor, signaling enterprise readiness. The repo explicitly states: “Instead of repeatedly pasting long AWS guidance into prompts, developers can now encode that guidance as reusable, versioned capabilities.” This is a pattern that’s becoming best practice as coding agents move from demos to production workflows.

The Plugin Packaging Model

A plugin is a container that holds three types of artifacts. Each plugin declares its capabilities, dependencies, and IAM requirements in a manifest file. The agent runtime reads the manifest, loads the appropriate skills and servers, and registers hooks at the right points in the execution flow.

The three artifact types:

  • Agent skills: Structured workflows that encode domain expertise as step-by-step processes. Think deployment checklists, architecture review playbooks, or cost-optimization routines.
  • MCP servers: Connections to external services, data sources, and APIs. These give agents access to live documentation, pricing data, and other runtime resources.
  • Hooks: Lifecycle events that trigger before or after specific agent actions. Useful for logging, approval gates, or context injection.

(Manifest structure is illustrative; refer to the repo for the actual schema.)

plugin:
  name: aws-deployment-assistant
  version: 1.2.0
  skills:
    - name: deploy-lambda
      path: ./skills/deploy-lambda.yaml
      required_permissions:
        - lambda:CreateFunction
        - lambda:UpdateFunctionCode
        - iam:PassRole
  mcp_servers:
    - name: aws-pricing
      endpoint: <mcp-server-endpoint>
      auth: iam
  hooks:
    - event: pre_deploy
      handler: ./hooks/cost_estimate.py

When an agent invokes the deploy-lambda skill, the runtime loads the skill definition, checks IAM permissions, and executes the workflow. If a pre_deploy hook is registered, it runs first. The MCP server provides live pricing data if the skill needs to estimate costs.

IAM Condition Keys for Agent Actions

The production toolkit introduces IAM condition keys that distinguish agent actions from human ones. This allows you to write policies that permit agents to deploy resources but require human approval for deletions, enabling compliance with change control policies.

Example condition key structure (exact condition key names are documented in the Agent Toolkit for AWS; the Labs repo may use different identifiers):

{
  "Effect": "Allow",
  "Action": "lambda:DeleteFunction",
  "Resource": "*",
  "Condition": {
    "StringEquals": {
      "aws:PrincipalTag/ActorType": "human"
    }
  }
}

The agent runtime can tag API calls with an actor type identifier. This lets you track which agent invoked which API, when, and with what outcome (visible in CloudTrail and CloudWatch Metrics) without modifying agent code. CloudWatch metrics track agent invocation counts, error rates, and latency. This gives you observability into agent behavior without custom instrumentation.

Agent Skills: Encoding Step-by-Step Processes

Agent skills are structured files that define a sequence of steps. Each step can invoke an AWS API, call an MCP server, or prompt the agent for input. The execution model is linear: the agent works through the steps in order, stopping if a step fails or requires human approval.

Here’s a pseudocode example of a deployment skill (syntax simplified for illustration):

skill:
  name: deploy-lambda
  description: Deploy a Lambda function with best-practice configuration
  steps:
    - name: validate_code
      action: run_command
      command: pytest tests/
      on_failure: abort
    - name: package_function
      action: run_command
      command: zip -r function.zip .
    - name: create_or_update
      action: aws_api
      service: lambda
      method: update_function_code
      params:
        FunctionName: <function_name>
        ZipFile: <read_file:function.zip>
      fallback:
        method: create_function
        params:
          FunctionName: <function_name>
          Runtime: python3.11
          Role: <iam_role_arn>
          Handler: index.handler
          Code:
            ZipFile: <read_file:function.zip>
    - name: tag_deployment
      action: aws_api
      service: lambda
      method: tag_resource
      params:
        Resource: <function_arn>
        Tags:
          DeployedBy: agent
          DeployedAt: <timestamp>

The agent reads this skill, executes each step, and logs the results. If validate_code fails, the deployment aborts. If create_or_update fails because the function doesn’t exist, the fallback creates it. The final step tags the resource for tracking.

MCP Servers and Live Data Connections

MCP servers provide runtime access to external data. The AWS Labs repo includes servers for pricing, documentation, and service quotas. When an agent needs to estimate costs, it queries the pricing server. When it needs to check quota limits, it queries the service quotas server.

The server interface is simple: the agent sends a request, the server returns JSON. The agent runtime handles authentication, retries, and error handling. You can write custom servers for internal APIs, databases, or third-party services.

Observability and Failure Modes

CloudWatch and CloudTrail integration gives you visibility into agent behavior. Every skill invocation, API call, and hook execution is logged. You can set up alarms for high error rates, long execution times, or unexpected API calls. Cost tracking via MCP servers and CloudWatch metrics helps enterprises enforce budget policies and audit agent spending.

Common failure modes:

Failure ModeCauseMitigation
Skill execution timeoutLong-running commands or API callsDefine timeout thresholds in skill YAML; use CloudWatch alarms to detect long-running executions
IAM permission deniedAgent role lacks required permissionsUse least-privilege policies, test skills in sandbox
MCP server unreachableNetwork issues or server downtimeImplement retries with exponential backoff, cache responses
Hook execution errorBug in custom hook codeValidate hooks in CI, use try-catch blocks

The agent runtime should fail fast and log errors clearly. If a skill fails, the agent should stop execution and report the failure to the operator. If a hook fails, the agent should decide whether to continue or abort based on the hook’s criticality.

Deployment Shape

The typical deployment shape is a local agent runtime (Claude Code, Cursor, or Codex) that loads plugins from a local directory or remote registry. The runtime is currently supported by Claude Code, Cursor, and Codex. The runtime authenticates to AWS using the operator’s IAM credentials or an assumed role. The agent executes skills and calls MCP servers on behalf of the operator.

Security boundaries:

  • The agent runtime runs with limited IAM permissions. It can only invoke skills that are explicitly allowed.
  • Skills declare their required IAM actions. The runtime checks permissions before execution.
  • MCP servers authenticate using IAM or API keys. The runtime does not expose credentials to the agent.
  • Hooks run in isolated environments. They cannot access the agent’s memory or state.

For production workflows, you could potentially deploy the agent runtime as a Lambda function or ECS task. The runtime would load plugins from S3, execute skills in response to events (CodePipeline triggers, CloudWatch alarms), and write results to DynamoDB or CloudWatch Logs. This is a logical extension of the local runtime model, though the Labs repo focuses on local development.

Labs Repo vs. Agent Toolkit

The Labs repo is useful for experimentation and custom plugins. The Agent Toolkit for AWS is the production successor and includes skills that have been evaluated for accuracy and effectiveness. Labs repo skills are community-contributed and unvetted. If you’re building production software, use the Agent Toolkit. If you’re prototyping or building custom capabilities, the Labs repo is a good starting point.

Technical Verdict

Use AWS Agent Plugins if:

  • You have multiple AWS deployment workflows that need standardization across teams.
  • You need audit trails for compliance and must distinguish agent actions from human actions in CloudTrail logs.
  • You run agents in production and need IAM boundaries, approval gates, and cost tracking.
  • You want to encode AWS deployment expertise as versioned, auditable capabilities instead of pasting guidance into prompts.

Avoid AWS Agent Plugins if:

  • You need single-shot prompting or simple task automation without IAM audit requirements.
  • Your agents run outside the AWS ecosystem. The IAM integration and CloudWatch plumbing are AWS-specific.
  • You need cross-cloud support. You’ll need to abstract the observability layer for GCP, Azure, or on-premises environments.
  • You need to support multiple LLM providers without AWS-specific dependencies.

The plugin model adds unnecessary complexity for simple prompting tasks but becomes valuable when you need to enforce approval gates, track costs, and standardize agent behavior across teams. The production toolkit is the better choice for enterprise deployments.