Managing 40+ Agent Skills: Why .agents Folders and Custom Prompts Need Version Control

Skill accumulation is the quiet tax on every AI coding agent session. You start with three custom prompts. A year later you have forty-something. Every Claude Code session loads all of them. Every Codex run loads all of them. Every .agents-based tool you try loads all of them.

The folders are global. The work is not.

This is the unsexy plumbing problem that hits every team after the honeymoon phase with AI coding agents ends. Here is what breaks, what the filesystem patterns actually look like, and what a minimal solution needs to handle.

How Skill Bleed Happens

Skills arrive in trickles:

Work skills for internal tooling (deploy helpers, runbook scaffolders, API auth wrappers)
Personal project skills (SQLite migration helpers, blog post drafters, GitHub issue templates)
Research skills (PDF-to-notes flows, arXiv summarizers, citation cleaners)
Frontend skills (Tailwind class organizers, React component scaffolders, accessibility linters)
Backend skills (Postgres index analyzers, OpenAPI spec generators)
Experimental skills you built once, used twice, never deleted

By the time you notice the problem, the count is north of forty. Some you use every day. Some you used once. All of them are visible to your agents at startup.

The Actual Cost

The cost is not just startup latency (though that is real). It is the wrong skill suggested at the wrong moment:

Claude drafts a personal email and suggests your company’s email templating skill
Codex pulls up internal database schema documentation while you work on a personal SQLite project
Half-finished experimental skills surface in autocomplete and pollute the suggestion space
Context windows fill with skill descriptions that have nothing to do with the current task

Filesystem Patterns and Scope Differences

Different tools use different conventions for loading custom instructions:

Tool	Location	Scope	Precedence
Claude Code	`.clinerules`	Project-level	Overrides global
Codex	`.agents/` folder	Global or project	Merges with global
Cursor	`.cursorrules`	Project-level	Overrides global
Aider	`.aider.conf.yml`	Project-level	Overrides global

The problem: most tools load everything in the global folder, then merge or override with project-level files. There is no built-in namespacing, no dependency resolution, no way to say “only load these five skills for this project.”

What the Folder Structure Looks Like

A typical .agents folder after a year:

~/.agents/
├── work/
│   ├── deploy-helper.md
│   ├── runbook-scaffold.md
│   └── api-auth-wrapper.md
├── personal/
│   ├── blog-drafter.md
│   ├── sqlite-migration.md
│   └── github-issue-template.md
├── research/
│   ├── pdf-to-notes.md
│   ├── arxiv-summarizer.md
│   └── citation-cleaner.md
├── frontend/
│   ├── tailwind-organizer.md
│   ├── react-scaffold.md
│   └── a11y-linter.md
├── backend/
│   ├── postgres-index-analyzer.md
│   └── openapi-generator.md
└── experiments/
    ├── half-finished-thing-1.md
    ├── half-finished-thing-2.md
    └── ...

Every tool loads every file. Subdirectories do not create namespaces. You cannot tell Claude Code to only load work/ skills for work projects.

Sync Problems at Scale

When the same skill needs to be available in VS Code, CLI agents, and web UIs, you hit three sync problems:

Format drift: Claude Code expects Markdown with specific frontmatter. Codex expects plain text with YAML headers. Cursor expects a different YAML schema.
Version skew: You update a skill in one location, forget to update it in another. Now you have two versions of the same logic.
Deletion lag: You remove a skill from one tool but forget to remove it from another. It keeps surfacing in the wrong context.

Most people solve this with symlinks or shell scripts. Both break in predictable ways:

Symlinks break when you move directories or switch machines
Shell scripts require manual invocation and do not handle conflicts
Neither approach handles format translation or version tracking

What a Minimal Solution Needs

After trying symlinks, scripts, and manual copying, the author built a small Go CLI. Here is what it handles:

Core Requirements

Namespace isolation: Only load skills tagged for the current project or context
Format translation: Convert between tool-specific formats (Markdown to YAML, YAML to plain text)
Version tracking: Detect when a skill has been updated in one location but not another
Conflict resolution: Handle cases where the same skill exists in multiple locations with different content

Implementation Shape

The CLI uses a manifest file to track skill metadata:

skills:
  - name: deploy-helper
    tags: [work, backend]
    locations:
      - ~/.agents/work/deploy-helper.md
      - ~/work-project/.clinerules/deploy-helper.md
    checksum: abc123
    last_updated: 2026-05-20T10:30:00Z
  
  - name: blog-drafter
    tags: [personal, writing]
    locations:
      - ~/.agents/personal/blog-drafter.md
      - ~/blog/.cursorrules/blog-drafter.md
    checksum: def456
    last_updated: 2026-05-19T14:22:00Z

When you run the CLI in a project directory, it:

Reads the manifest
Filters skills by tags (project-level .skill-tags file or CLI flag)
Checks checksums to detect version skew
Copies or translates files to the appropriate tool-specific locations
Updates the manifest with new checksums and timestamps

Command Surface

# Sync skills for the current project
skill-sync sync --tags work,backend

# List all skills and their locations
skill-sync list

# Add a new skill
skill-sync add --name postgres-index-analyzer --tags work,backend --file ./skill.md

# Detect version skew
skill-sync check

# Remove a skill from all locations
skill-sync remove --name half-finished-thing-1

Failure Modes

Even with tooling, you hit edge cases:

Tag explosion: If every skill gets unique tags, you end up with tag management overhead that rivals the original problem
Manifest drift: If the manifest gets out of sync with the filesystem, the CLI becomes a source of confusion instead of clarity
Format translation bugs: Converting between Markdown, YAML, and plain text is error-prone. You need tests for every format pair.
Concurrent updates: If you update a skill in two locations simultaneously, the CLI needs a merge strategy (last-write-wins, manual resolution, or error-and-halt)

When Skills Need Dependency Resolution

Some skills depend on other skills. A “deploy-helper” skill might reference a “runbook-scaffold” skill. A “blog-drafter” skill might reference a “citation-cleaner” skill.

Most tools do not handle this. You end up with:

Duplicate logic across skills
Broken references when you rename or remove a dependency
No way to enforce that dependent skills are loaded together

A minimal dependency system adds a depends_on field to the manifest:

skills:
  - name: deploy-helper
    tags: [work, backend]
    depends_on: [runbook-scaffold]
    locations: [...]

The CLI then ensures that when you sync deploy-helper, it also syncs runbook-scaffold.

Security Boundaries

Skills are code. They execute in the same context as your agent. If you load a malicious skill, it can:

Exfiltrate environment variables
Modify files
Make network requests
Execute arbitrary shell commands

Most tools do not sandbox skill execution. The only protection is:

Manual review: Read every skill before you load it
Checksum verification: Detect when a skill has been modified
Source tracking: Know where each skill came from

The manifest helps with (2) and (3). It does not help with (1). You still need to read the code.

Observability Gaps

When a skill misbehaves, you need to know:

Which skill was active during the session
What the skill’s content was at that moment
Whether the skill was loaded from the global folder or a project-level override
Whether the skill had dependencies that were also loaded

Most tools do not log this. You end up debugging by:

Manually inspecting the .agents folder
Grepping through session logs for skill names
Reproducing the session with different skill combinations to isolate the problem

A better approach: the CLI generates a session manifest at sync time that lists every skill that will be loaded, its checksum, and its dependencies. You can then include this manifest in bug reports or session logs.

Technical Verdict

Use this approach when:

You have more than a dozen skills and they span multiple contexts (work, personal, research)
You use multiple AI coding tools (Claude Code, Codex, Cursor, Aider) and need the same skills available in all of them
You work on multiple machines and need skills to stay in sync
You collaborate with a team and need to share skill libraries

Avoid this approach when:

You have fewer than ten skills and they are all relevant to every project
You only use one AI coding tool and it has good built-in skill management
Your skills are highly project-specific and do not need to be shared across contexts
You prefer manual control and do not mind copying files by hand

The real win is not the CLI. It is the manifest. Once you have a machine-readable record of what skills exist, where they live, and what they depend on, you can build tooling around it. The CLI is just the first step.

Source Links

Original article on dev.to