Anthropic's Agent Skills: How Claude Loads Instructions, Scripts, and Resources on Demand

Anthropic just open-sourced their Skills repository, exposing the production plumbing behind Claude’s document creation capabilities. The system treats skills as self-contained folders with a SKILL.md manifest, bundled scripts, and resources. This is not a function-calling wrapper. It’s a runtime loader that injects specialized instructions and tooling into Claude’s context on demand.

The repository (135K+ stars, trending #2 on GitHub) includes both Apache 2.0 open-source skills and source-available reference implementations for docx, pdf, pptx, and xlsx generation. The architecture reveals how Anthropic bridges static documentation and dynamic execution in production.

The SKILL.md Format

Each skill lives in its own folder. The SKILL.md file is the contract:

---
name: "Document Creator"
description: "Create and edit DOCX files"
version: "1.0.0"
capabilities:
  - document_creation
  - formatting
  - template_application
---

# Instructions

When the user requests a document, follow these steps:
1. Confirm document type and structure
2. Apply brand guidelines if provided
3. Generate using the bundled script
4. Validate output format

The manifest includes metadata (name, version, capabilities) and natural-language instructions. Claude parses the frontmatter for routing decisions and reads the instructions as context augmentation. This is closer to dynamic prompt injection than traditional tool registration.

Loading and Execution Flow

Skills are loaded at runtime based on task detection or explicit user request. The flow:

Task analysis: Claude identifies required capabilities from the user prompt.
Skill selection: The runtime matches capabilities to available skills in the repository.
Context injection: The SKILL.md instructions are prepended to the conversation context.
Resource binding: Bundled scripts, templates, and data files become available to the execution layer.
Tool invocation: If the skill includes executable components, Claude calls them through the standard tool interface.

The key difference from function calling: the instructions are natural language, not JSON schemas. Claude interprets the skill’s guidance rather than conforming to a rigid API contract.

State Management and Boundaries

Skills are stateless by design. Each invocation starts fresh. If a skill needs to maintain state across turns (like tracking edits to a document), it must:

Return state as part of the tool output
Accept state as an input parameter on subsequent calls
Rely on the conversation history for continuity

This keeps skills composable but pushes state management up to the orchestration layer. The runtime does not provide built-in session storage or cross-skill state sharing.

Security Boundaries

The repository includes both instructions (safe to inject into context) and executable scripts (require sandboxing). The boundary:

Component	Trust Level	Execution Context
SKILL.md instructions	User-controlled, context-injected	LLM inference
Bundled Python scripts	Sandboxed, tool-invoked	Isolated runtime
Resource files (templates, data)	Read-only, loaded on demand	Skill-specific namespace
User-provided inputs	Untrusted, validated per skill	Passed through tool parameters

Scripts run in a separate execution layer with restricted filesystem access and no network by default. The skill manifest does not declare permissions. Anthropic’s runtime enforces a fixed sandbox policy.

Versioning and Conflicts

Skills declare a version in the frontmatter, but the repository does not enforce semantic versioning or dependency resolution. If two skills provide overlapping capabilities:

The runtime may load both and let Claude choose based on context.
Explicit skill names in the user prompt override automatic selection.
No built-in conflict detection exists. Overlapping instructions can confuse the model.

For production use, you need external tooling to:

Pin skill versions in your deployment manifest
Test skill combinations for instruction conflicts
Monitor which skills Claude actually invokes per session

Production Skills: Document Creation

The source-available document skills (docx, pdf, pptx, xlsx) show the production pattern:

Each skill bundles a Python library (python-docx, reportlab, python-pptx, openpyxl).
The SKILL.md includes detailed formatting instructions, not just API calls.
Skills return file paths or base64-encoded blobs, not in-memory objects.
Error handling is explicit in the instructions: “If the template is missing, ask the user for a replacement.”

These skills power Claude’s file creation UI. The instructions are verbose (hundreds of lines) because they encode formatting preferences, edge case handling, and user interaction patterns.

Code Example: Minimal Custom Skill

# skills/csv_analyzer/analyze.py
import pandas as pd
import sys

def analyze_csv(file_path):
    df = pd.read_csv(file_path)
    return {
        "rows": len(df),
        "columns": list(df.columns),
        "summary": df.describe().to_dict()
    }

if __name__ == "__main__":
    result = analyze_csv(sys.argv[1])
    print(result)

# skills/csv_analyzer/SKILL.md
---
name: "CSV Analyzer"
description: "Analyze CSV files and return summary statistics"
version: "1.0.0"
capabilities:
  - data_analysis
---

# Instructions

When the user uploads a CSV file and asks for analysis:
1. Confirm the file path is accessible
2. Run the analyze.py script with the file path as the argument
3. Parse the JSON output
4. Present the summary in a readable table format
5. If the file is malformed, explain the error and suggest fixes

The skill is a folder with two files. Claude reads the instructions, invokes the script via tool call, and formats the output per the guidance.

Observability Gaps

The repository does not include:

Logging hooks for skill invocations
Metrics on which skills are loaded per session
Tracing for multi-skill workflows
Failure telemetry when a script errors out

You need to instrument the runtime yourself. Anthropic’s production system likely logs skill selection and execution, but that layer is not open-sourced.

Deployment Shape

Skills are deployed as a folder tree. The runtime scans the directory at startup and indexes skills by capability. For cloud deployments:

Package skills into a Docker image or mount them as a volume.
Use a CDN or object storage for large resource files (templates, datasets).
Version the entire skills directory as a single artifact.
Test skill combinations in staging before promoting to production.

Hot-reloading is not supported. Skill changes require a runtime restart.

Likely Failure Modes

Instruction ambiguity: If two skills have similar capabilities but different instruction styles, Claude may blend them incorrectly. Test overlapping skills together.

Script dependency drift: Skills bundle their own libraries. If a skill depends on a system package (like wkhtmltopdf for PDF rendering), the runtime must pre-install it. The manifest does not declare system dependencies.

Context overflow: Loading multiple verbose skills can exhaust the context window. Monitor token usage and prune unused skills from the active set.

Sandboxing bypass: If a skill script writes to a shared temp directory, another skill could read the output. Ensure each skill gets an isolated working directory.

Technical Verdict

Use Anthropic Skills if:

You need Claude to perform specialized tasks with repeatable instructions and want to version those capabilities as self-contained folders.
Your workflows combine natural-language guidance with executable scripts, and you prefer dynamic context injection over rigid function schemas.
You can tolerate stateless execution where each skill invocation starts fresh and state must be explicitly passed between turns.
You control the deployment environment and can pre-install system dependencies that skills require but don’t declare.
You want to ship production-grade document generation (docx, pdf, pptx, xlsx) using Anthropic’s reference implementations as a starting point.

Avoid if:

You need fine-grained permission control per skill (the sandbox policy is fixed and not configurable in the manifest).
Your agent workflows require cross-skill state sharing, session persistence, or transactional guarantees across multiple skill invocations.
You expect automatic dependency resolution, conflict detection, or semantic versioning enforcement (you’ll need external tooling).
You need built-in observability for skill selection, invocation tracing, and failure telemetry (instrumentation is your responsibility).
You’re building multi-agent systems where skills need to coordinate through shared state or message passing (the architecture assumes single-agent, stateless execution).

The Skills system is a pragmatic middle ground between prompt engineering and full tool orchestration. It works well for production tasks where the instructions are stable and the execution environment is controlled. For dynamic, multi-agent workflows with complex state, you will need to layer additional orchestration on top.