Gemini CLI Skills: Teaching Your Terminal Agent How to Think

Terminal agents face a fundamental problem: they need to be general enough to handle arbitrary commands but specific enough to understand your project’s conventions, deployment pipelines, and team workflows. Hardcoded tool definitions solve this for narrow use cases but break down when every user has different needs.

Gemini CLI’s skill system takes a different approach. Instead of shipping a fixed set of function definitions, it lets agents discover and load specialized instructions at runtime. Skills are self-contained directories that package context, instructions, and examples into capabilities the agent can invoke on demand.

This is not function calling. It is progressive context disclosure.

How Skill Discovery Works

Gemini CLI scans for skill directories at startup. Each skill is a folder containing a SKILL.md file that describes what the skill does, when to use it, and how to execute it. The agent does not load these files into context immediately. Instead, it builds a lightweight index of available skills.

When you issue a command, the agent evaluates whether any skill matches the task. If it finds a match, it loads that skill’s instructions into the current context window. If not, it proceeds with general-purpose reasoning.

This lazy loading pattern keeps the context window lean. A project with 20 skills does not burn 20 skill definitions worth of tokens on every request. You only pay for what you use.

Key components:

Skill directory structure: Each skill lives in .gemini/skills/<skill-name>/
SKILL.md manifest: Describes purpose, trigger conditions, and execution steps
Optional artifacts: Scripts, templates, or config files the skill references
Discovery index: Lightweight metadata the agent scans before deciding to load a skill

The agent uses the skill name and a short description to decide relevance. If you name a skill deploy-staging and describe it as “deploys the app to staging environment,” the agent will load it when you say “push this to staging.”

Prompt Engineering for Skill Awareness

The agent needs to know skills exist without loading them all. Gemini CLI achieves this by injecting a skill index into the system prompt. The index lists skill names and one-line descriptions.

When the agent sees a user request, it pattern-matches against this index. If it finds a likely match, it loads the full SKILL.md file and follows its instructions.

Example skill index injection:

Available skills:
- deploy-staging: Deploy application to staging environment
- run-tests: Execute full test suite with coverage reporting
- generate-migration: Create database migration from schema changes

The agent sees this list and decides whether to invoke a skill. If you say “run the tests,” it loads run-tests. If you say “fix this bug,” it does not.

This is cheaper than function calling APIs because the agent does not need to serialize parameters, validate schemas, or handle return values. It just reads instructions and executes shell commands.

Skill Execution Boundaries

Skills do not run in isolated sandboxes. They execute in the same shell context as the agent itself. This means a skill can modify environment variables, change directories, and leave side effects.

Execution flow:

Agent decides a skill is relevant
Agent loads SKILL.md into context
Agent reads instructions and generates shell commands
Commands execute in the current shell session
Output returns to the agent for interpretation

There is no subprocess isolation. If a skill runs cd /tmp, the agent’s working directory changes. If a skill sets export API_KEY=..., that variable persists.

This is a deliberate trade-off. Terminal agents prioritize speed and simplicity over security boundaries. If you need isolation, you wrap the entire agent in a container, not individual skills.

Failure Modes and Error Handling

Skills fail the same way shell commands fail. If a script exits with a non-zero status, the agent sees the error output and decides what to do next.

Common failure scenarios:

Failure Type	Agent Behavior	User Impact
Skill not found	Falls back to general reasoning	May produce suboptimal solution
Command exits non-zero	Reads stderr, attempts recovery	May retry with modified approach
Partial execution	Sees partial output, infers state	May leave system in inconsistent state
Ambiguous skill match	Loads first match or asks for clarification	May invoke wrong skill
Context window overflow	Truncates skill instructions	May skip critical steps

The agent does not automatically retry failed skills. If deploy-staging fails, the agent surfaces the error and waits for you to fix the underlying issue or provide more context.

This is different from orchestration frameworks that implement retry logic, circuit breakers, and rollback mechanisms. Terminal agents assume you are watching and can intervene.

Skill Versioning and Conflicts

Gemini CLI does not enforce skill versioning. If two skills have similar names or overlapping descriptions, the agent picks one based on string similarity and context clues.

Conflict resolution is implicit:

Skill names should be distinct and descriptive
Descriptions should clearly state when to use each skill
If ambiguity exists, the agent may ask for clarification

You can namespace skills by prefixing names (frontend-deploy, backend-deploy) or by using more specific descriptions. There is no formal conflict detection.

If you update a skill, the agent sees the new version immediately. There is no cache invalidation or version pinning. This makes iteration fast but means breaking changes propagate instantly.

Practical Implementation Example

Here is a skill that generates API client code from an OpenAPI spec:

Directory structure:

.gemini/skills/generate-api-client/
├── SKILL.md
└── templates/
    └── client.template.ts

SKILL.md:

# Generate API Client

Use this skill when the user asks to generate an API client from an OpenAPI specification.

## When to use
- User mentions "generate client" or "create API wrapper"
- An openapi.yaml or swagger.json file exists in the project

## Steps
1. Locate the OpenAPI spec file
2. Run `npx openapi-typescript <spec-file> -o src/api/types.ts`
3. Copy templates/client.template.ts to src/api/client.ts
4. Update client.ts with the correct base URL from the spec
5. Confirm generation completed successfully

## Expected output
- src/api/types.ts with TypeScript definitions
- src/api/client.ts with typed fetch wrapper

When you say “generate the API client,” the agent loads this skill, follows the steps, and produces the files. If the OpenAPI spec is missing, it tells you. If the command fails, it shows you the error.

The skill does not need to handle every edge case. It provides a happy path. You handle exceptions.

Comparison to Function-Calling APIs

Traditional function-calling APIs (OpenAI, Anthropic) require you to define tools upfront with JSON schemas. The model decides when to call a function, you execute it, and you return the result.

Function calling:

Requires schema definitions for every tool
Model invokes functions by name with validated parameters
You control execution in your application code
Return values feed back into the model

Gemini CLI skills:

No schema definitions, just natural language instructions
Agent generates shell commands based on instructions
Execution happens in the terminal, not your code
Output is raw text the agent interprets

Skills are lighter weight but less structured. You trade type safety and programmatic control for speed and flexibility.

Technical Verdict

Use Gemini CLI skills when:

You need lightweight, user-extensible agent capabilities
Your workflows are shell-based and do not require complex state management
You want to iterate on agent behavior without recompiling or redeploying
Context window efficiency matters more than execution isolation

Avoid this pattern when:

You need strong security boundaries between tools
Failure recovery requires transactional rollback or compensating actions
You are orchestrating multi-step workflows across distributed services
You need to audit or replay agent actions with high fidelity

Skills work best for local development workflows, deployment automation, and code generation tasks. They do not replace orchestration frameworks for production systems.

If you are building a terminal agent that needs to learn your team’s conventions, skills are the simplest path. If you are building a production agent that coordinates microservices, you need something heavier.

Source Links