Terminal agents face a fundamental problem: they need to be general enough to handle arbitrary commands but specific enough to understand your project’s conventions, deployment pipelines, and team workflows. Hardcoded tool definitions solve this for narrow use cases but break down when every user has different needs.
Gemini CLI’s skill system takes a different approach. Instead of shipping a fixed set of function definitions, it lets agents discover and load specialized instructions at runtime. Skills are self-contained directories that package context, instructions, and examples into capabilities the agent can invoke on demand.
This is not function calling. It is progressive context disclosure.
How Skill Discovery Works
Gemini CLI scans for skill directories at startup. Each skill is a folder containing a SKILL.md file that describes what the skill does, when to use it, and how to execute it. The agent does not load these files into context immediately. Instead, it builds a lightweight index of available skills.
When you issue a command, the agent evaluates whether any skill matches the task. If it finds a match, it loads that skill’s instructions into the current context window. If not, it proceeds with general-purpose reasoning.
This lazy loading pattern keeps the context window lean. A project with 20 skills does not burn 20 skill definitions worth of tokens on every request. You only pay for what you use.
Key components:
- Skill directory structure: Each skill lives in
.gemini/skills/<skill-name>/ - SKILL.md manifest: Describes purpose, trigger conditions, and execution steps
- Optional artifacts: Scripts, templates, or config files the skill references
- Discovery index: Lightweight metadata the agent scans before deciding to load a skill
The agent uses the skill name and a short description to decide relevance. If you name a skill deploy-staging and describe it as “deploys the app to staging environment,” the agent will load it when you say “push this to staging.”
Prompt Engineering for Skill Awareness
The agent needs to know skills exist without loading them all. Gemini CLI achieves this by injecting a skill index into the system prompt. The index lists skill names and one-line descriptions.
When the agent sees a user request, it pattern-matches against this index. If it finds a likely match, it loads the full SKILL.md file and follows its instructions.
Example skill index injection:
Available skills:
- deploy-staging: Deploy application to staging environment
- run-tests: Execute full test suite with coverage reporting
- generate-migration: Create database migration from schema changes
The agent sees this list and decides whether to invoke a skill. If you say “run the tests,” it loads run-tests. If you say “fix this bug,” it does not.
This is cheaper than function calling APIs because the agent does not need to serialize parameters, validate schemas, or handle return values. It just reads instructions and executes shell commands.
Skill Execution Boundaries
Skills do not run in isolated sandboxes. They execute in the same shell context as the agent itself. This means a skill can modify environment variables, change directories, and leave side effects.
Execution flow:
- Agent decides a skill is relevant
- Agent loads
SKILL.mdinto context - Agent reads instructions and generates shell commands
- Commands execute in the current shell session
- Output returns to the agent for interpretation
There is no subprocess isolation. If a skill runs cd /tmp, the agent’s working directory changes. If a skill sets export API_KEY=..., that variable persists.
This is a deliberate trade-off. Terminal agents prioritize speed and simplicity over security boundaries. If you need isolation, you wrap the entire agent in a container, not individual skills.
Failure Modes and Error Handling
Skills fail the same way shell commands fail. If a script exits with a non-zero status, the agent sees the error output and decides what to do next.
Common failure scenarios:
| Failure Type | Agent Behavior | User Impact |
|---|---|---|
| Skill not found | Falls back to general reasoning | May produce suboptimal solution |
| Command exits non-zero | Reads stderr, attempts recovery | May retry with modified approach |
| Partial execution | Sees partial output, infers state | May leave system in inconsistent state |
| Ambiguous skill match | Loads first match or asks for clarification | May invoke wrong skill |
| Context window overflow | Truncates skill instructions | May skip critical steps |
The agent does not automatically retry failed skills. If deploy-staging fails, the agent surfaces the error and waits for you to fix the underlying issue or provide more context.
This is different from orchestration frameworks that implement retry logic, circuit breakers, and rollback mechanisms. Terminal agents assume you are watching and can intervene.
Skill Versioning and Conflicts
Gemini CLI does not enforce skill versioning. If two skills have similar names or overlapping descriptions, the agent picks one based on string similarity and context clues.
Conflict resolution is implicit:
- Skill names should be distinct and descriptive
- Descriptions should clearly state when to use each skill
- If ambiguity exists, the agent may ask for clarification
You can namespace skills by prefixing names (frontend-deploy, backend-deploy) or by using more specific descriptions. There is no formal conflict detection.
If you update a skill, the agent sees the new version immediately. There is no cache invalidation or version pinning. This makes iteration fast but means breaking changes propagate instantly.
Practical Implementation Example
Here is a skill that generates API client code from an OpenAPI spec:
Directory structure:
.gemini/skills/generate-api-client/
├── SKILL.md
└── templates/
└── client.template.ts
SKILL.md:
# Generate API Client
Use this skill when the user asks to generate an API client from an OpenAPI specification.
## When to use
- User mentions "generate client" or "create API wrapper"
- An openapi.yaml or swagger.json file exists in the project
## Steps
1. Locate the OpenAPI spec file
2. Run `npx openapi-typescript <spec-file> -o src/api/types.ts`
3. Copy templates/client.template.ts to src/api/client.ts
4. Update client.ts with the correct base URL from the spec
5. Confirm generation completed successfully
## Expected output
- src/api/types.ts with TypeScript definitions
- src/api/client.ts with typed fetch wrapper
When you say “generate the API client,” the agent loads this skill, follows the steps, and produces the files. If the OpenAPI spec is missing, it tells you. If the command fails, it shows you the error.
The skill does not need to handle every edge case. It provides a happy path. You handle exceptions.
Comparison to Function-Calling APIs
Traditional function-calling APIs (OpenAI, Anthropic) require you to define tools upfront with JSON schemas. The model decides when to call a function, you execute it, and you return the result.
Function calling:
- Requires schema definitions for every tool
- Model invokes functions by name with validated parameters
- You control execution in your application code
- Return values feed back into the model
Gemini CLI skills:
- No schema definitions, just natural language instructions
- Agent generates shell commands based on instructions
- Execution happens in the terminal, not your code
- Output is raw text the agent interprets
Skills are lighter weight but less structured. You trade type safety and programmatic control for speed and flexibility.
Technical Verdict
Use Gemini CLI skills when:
- You need lightweight, user-extensible agent capabilities
- Your workflows are shell-based and do not require complex state management
- You want to iterate on agent behavior without recompiling or redeploying
- Context window efficiency matters more than execution isolation
Avoid this pattern when:
- You need strong security boundaries between tools
- Failure recovery requires transactional rollback or compensating actions
- You are orchestrating multi-step workflows across distributed services
- You need to audit or replay agent actions with high fidelity
Skills work best for local development workflows, deployment automation, and code generation tasks. They do not replace orchestration frameworks for production systems.
If you are building a terminal agent that needs to learn your team’s conventions, skills are the simplest path. If you are building a production agent that coordinates microservices, you need something heavier.