mech.app
Security

Phantom Dependencies: How AI Coding Agents Install Packages That Don't Exist Yet

AI agents hallucinate package names and write them to manifests without registry validation, creating supply-chain gaps and namespace races.

Source: thenewstack.io
Phantom Dependencies: How AI Coding Agents Install Packages That Don't Exist Yet

AI coding agents are writing dependency declarations for packages that do not exist. Aikido security research published in May 2026 shows agents hallucinating package names, then inserting them into package.json, requirements.txt, or go.mod without checking whether those names are registered. The gap between LLM code generation and package registry validation creates a supply-chain attack surface: typosquatting, namespace races, and phantom dependencies that install malicious code the moment someone publishes under the hallucinated name.

This is not a theoretical edge case. Simon Willison’s April 2026 analysis documented $2,180 per month in token costs from agent API usage, with agents running unsupervised at scale. Anthropic and OpenAI both moved to per-token enterprise pricing in April 2026, incentivizing faster agent loops and fewer human checkpoints. The economic pressure is toward more autonomy, not less. When agents run faster and cheaper, teams reduce manual review steps. This creates a wider window for phantom dependencies to slip through.

Cursor and GitHub Copilot drove $1.2B of Anthropic’s API revenue before Claude Code competed directly, showing the scale of agent adoption. As agent-assisted development becomes the default workflow, the phantom dependency attack surface grows proportionally.

The Tool Boundary Problem

Most agent frameworks expose file-write tools but not registry-check tools. The agent’s action space ends at “write this line to requirements.txt” without a corresponding “verify this package exists on PyPI” step. The agent has no way to know it hallucinated a name unless the human reviews the diff or the CI pipeline fails.

Here is what happens in practice:

  1. Agent generates code that imports stripe-webhooks-validator.
  2. Agent writes stripe-webhooks-validator==1.0.0 to requirements.txt.
  3. Human approves the PR without checking PyPI.
  4. CI runs pip install -r requirements.txt and fails because the package does not exist.
  5. Attacker registers stripe-webhooks-validator on PyPI with malicious code.
  6. Next CI run succeeds and installs the attacker’s package.

The window between step 4 and step 5 is where the attacker strikes. If the attacker monitors failed CI logs or watches for 404s on package registries, they can claim the namespace before the team notices.

Namespace Races and Reservation APIs

Package managers do not have a “reservation” or “intent-to-publish” API. npm, PyPI, and RubyGems allow anyone to claim an unclaimed name. There is no pre-registration step where a developer can say “I plan to publish this name in 24 hours, lock it for me.”

This creates a race condition:

  • Agent writes a dependency.
  • CI fails.
  • Team opens a ticket to fix the name.
  • Attacker claims the name.
  • Team merges the fix.
  • CI installs the attacker’s package.

The fix is not to add a reservation API. That would introduce denial-of-service risks (attackers reserving thousands of plausible names). The fix is to validate package existence before writing the dependency declaration.

Agent Validation Architectures

Three approaches to prevent phantom dependencies:

| Approach | Validation Point | Latency Cost | Risk if Validation Fails |\n|----------|------------------|--------------|--------------------|\n| Pre-write registry check | Agent calls PyPI/npm API before writing | +200ms per package | Low (race condition between check and write is rare but possible) |\n| Post-write CI gate | CI fails on 404, blocks merge | Zero (validation in existing pipeline) | High (attacker can claim name before retry) |\n| Local package cache | Agent checks local mirror or cache | +10ms per package | Medium (cache staleness) |

The pre-write check is the safest. The agent’s tool definition should include a check_package_exists(registry, name, version) function that returns a boolean. If false, the agent should either skip the dependency or flag it for human review.

Here is a simplified tool schema for a Python agent. Note that Claude API does not natively support pre-conditions in tool definitions. The following is pseudocode showing the validation logic that must be implemented in the agent system prompt or through tool_use chaining:

# Pseudocode: Claude API does not support pre_conditions natively.
# Implement validation in system prompt or chain tool calls explicitly.

tools = [
    {
        "name": "write_requirements",
        "description": "Write a requirements.txt file",
        "parameters": {
            "packages": {
                "type": "array",
                "items": {
                    "name": "string",
                    "version": "string"
                }
            }
        }
    },
    {
        "name": "check_package_exists",
        "description": "Verify package exists on registry",
        "parameters": {
            "registry": "string",
            "name": "string",
            "version": "string"
        },
        "returns": "boolean"
    }
]

# In practice, the agent system prompt must include:
# "Before calling write_requirements, call check_package_exists for each package.
# If check_package_exists returns false, do not write that package and log the failure."

A valid tool_use pattern with explicit validation steps:

  1. Agent calls check_package_exists(registry="pypi", name="stripe-webhooks-validator", version="1.0.0").
  2. Tool returns {"exists": false}.
  3. Agent logs the failure and does not call write_requirements for that package.
  4. Agent surfaces the hallucination to the human operator.

State Management and Observability

Agents that modify dependency files need state tracking for rollback and audit. If an agent writes 10 dependencies and 3 are phantom, the system should:

  • Log which packages failed validation.
  • Roll back the file write.
  • Surface the failure to the human operator.
  • Optionally, retry with a different model or prompt.

Observability requirements:

  • Trace ID per agent run: Link file writes to the LLM call that generated them.
  • Registry API logs: Record every package existence check (name, version, timestamp, result).
  • Diff snapshots: Store before/after state of package.json, requirements.txt, etc.
  • Failure mode counters: Track how often agents hallucinate dependencies by registry and model.

Without these, you cannot answer “which agent run installed the phantom package?” or “how many times did Claude hallucinate npm packages this week?”

Deployment Shape and Security Boundaries

Agents that write code should not have direct write access to production manifests or the ability to trigger npm install or pip install directly. The deployment shape should look like this:

  1. Agent writes to a sandbox branch (not main).
  2. CI runs validation (including registry checks).
  3. Human reviews the diff (especially new dependencies).
  4. Merge gate blocks phantom dependencies (CI fails if any package 404s).
  5. Production deployment (only after all checks pass).

The security boundary is between the agent’s file-write capability and the package manager’s install step. That step must go through CI with registry validation.

If you are running agents in a CI/CD pipeline (e.g., GitHub Actions with Copilot or Claude Code), add a validation step:

# Pseudocode: This example is simplified.
# Production scripts must handle scoped packages (@org/package),
# private registries, and registry API rate limits.

- name: Validate dependencies
  run: |
    for pkg in $(jq -r '.dependencies | keys[]' package.json); do
      if ! npm view "$pkg" > /dev/null 2>&1; then
        echo "Phantom dependency detected: $pkg"
        exit 1
      fi
    done

This script checks every package in package.json against npm before allowing the workflow to proceed.

Likely Failure Modes

Where this breaks:

  • Private registries: Agent checks public PyPI but the package is on a private index. Configure the agent with registry URLs and credentials.
  • Version mismatches: Agent checks if stripe-webhooks-validator exists but writes stripe-webhooks-validator==99.0.0 (a version that does not exist). Validate both name and version.
  • Registry downtime: PyPI or npm is unreachable during validation. Fail open (allow the write but log a warning) or fail closed (block the write until the registry is reachable).
  • Transitive dependencies: Agent writes a valid package, but that package depends on a phantom package. Run pip install --dry-run -r requirements.txt 2>&1 | grep -i "not found" or npm install --dry-run in CI to catch transitive phantoms by parsing error output for missing packages.

The worst failure mode is silent success: the agent writes a phantom dependency, CI does not validate, and the package gets installed weeks later when an attacker claims the name.

Technical Verdict

Use pre-write registry validation if you are deploying coding agents that modify dependency files. The latency cost (200ms per package check) is negligible compared to the supply-chain risk. Do not rely on CI to catch phantom dependencies after the fact. Attackers monitor failed builds and can claim namespaces faster than your team can fix the typo.

Skip pre-write validation only if your agent reads code or generates documentation without write access to manifests. The validation overhead is not worth it unless the agent can modify dependency declarations.

If you cannot add registry checks to the agent’s tool boundary, add a merge gate that blocks PRs with unregistered packages. This is a weaker defense (it does not prevent the agent from writing the phantom dependency), but it stops the package from reaching production.

The long-term fix is for package managers to expose a “validate manifest” API that checks all dependencies in one call. Until then, you need to build this validation layer yourself.