mech.app
AI Agents

Pre-Commit Hooks for AI Agents: How Merrilin Enforces Code Quality Before LLMs Touch the Repo

Treating coding agents like junior developers with automated guardrails. A practical guide to pre-commit hooks that catch agent hallucinations and anti-...

Source: blog.merrilin.ai
Pre-Commit Hooks for AI Agents: How Merrilin Enforces Code Quality Before LLMs Touch the Repo

Coding agents ship code fast. They also ship hallucinated imports, missing error boundaries, and SQL queries that silently swallow exceptions. The bottleneck is no longer writing code. It’s reviewing the mess before it hits main.

Merrilin’s engineering team treats their agents like junior developers who need automated guardrails. Instead of relying on expensive AI code review bots to check AI-generated code, they built a custom pre-commit hook using tree-sitter to catch agent-specific anti-patterns at commit time. The hook runs locally, fails fast, and forces agents to retry before bad code enters version control.

This is not about linting style. It’s about enforcing recovery boundaries, preventing reader-critical database writes without isolation, and stopping swallowed promise rejections before they become production incidents.

Why Pre-Commit Instead of CI or Review Bots

Pre-commit hooks run before code enters the repository. CI runs after. Review bots run even later, often after multiple commits have piled up.

The feedback loop matters:

  • Pre-commit: Agent gets immediate feedback, retries the commit with corrections
  • CI: Agent has already moved on, requires context switching or manual intervention
  • Review bot: Requires a second LLM call, adds latency and cost, often flags issues the agent could have fixed itself

Merrilin’s approach catches errors in seconds, not minutes. The agent’s context window still holds the relevant code. The retry loop is tight.

Architecture: Tree-Sitter Over Regex

Most pre-commit hooks use regex or grep. Merrilin uses tree-sitter, a parsing library that builds concrete syntax trees for multiple languages.

Why tree-sitter wins for agent-generated code:

  • Understands code structure, not just text patterns
  • Catches context-dependent anti-patterns (e.g., database calls inside try blocks without recovery logic)
  • Handles Python, TypeScript, and SQL in a single tool
  • Avoids false positives from comments or string literals

The hook lives in .pre-commit-config.yaml and calls a custom Python script that:

  1. Parses staged files with tree-sitter
  2. Walks the AST looking for specific node patterns
  3. Applies domain-specific rules (SQL recovery, reader isolation, error handling)
  4. Returns a non-zero exit code if violations are found

Agents see the error message, adjust the code, and re-commit. No human intervention required for most cases.

The Eight Rules Merrilin Enforces

SQL and Recovery Rules

1. Catch-and-Continue After Database Operations

Agents love to wrap database calls in try-except blocks and then continue execution as if nothing happened. This hides failures.

# Blocked by pre-commit
try:
    db.session.execute(query)
except SQLAlchemyError:
    pass  # Agent assumes this is fine
# Code continues, state is inconsistent

The hook detects try blocks containing database operations where the except clause doesn’t raise, return, or call a recovery function.

2. SQLAlchemy Exceptions Without Recovery Boundaries

Catching SQLAlchemyError is fine. Catching it without a rollback, re-raise, or explicit recovery path is not.

3. Integrity Errors Without Constraint Mapping

Agents often catch IntegrityError but don’t map it to a user-facing error. The hook enforces that integrity violations must either log the constraint name or return a structured error.

4. Background Database Errors Without Rollback

Background tasks (Celery, asyncio) that hit database errors must explicitly roll back. Agents forget this because the failure is silent.

Reader-Critical Recovery Rules

5. Reader-Critical Database Operations Without Recovery

Any database write that affects reader state (user preferences, session data, feature flags) must have a recovery path. If the write fails, the reader must not see stale or inconsistent data.

6. Reader State Writes Without Isolation

Reader state changes must use SERIALIZABLE or REPEATABLE READ isolation. Agents default to READ COMMITTED, which allows race conditions.

Frontend Rules

7. Direct Reads of Transport Error Details

Agents love to display raw error messages from API responses. This leaks internal state and stack traces to users.

// Blocked by pre-commit
catch (error) {
  setErrorMessage(error.response.data.detail);  // Exposes internals
}

The hook enforces that error details must pass through a sanitization function before rendering.

8. Swallowed Promise Rejections and Catch Blocks

Empty catch blocks in async code are a common agent mistake. The hook detects promise rejections that don’t log, re-throw, or update UI state.

The Suppression Mechanism

Sometimes you need to violate a rule. Merrilin uses inline comments to suppress specific checks:

# merrilin-guard: disable=catch-and-continue
try:
    db.session.execute(query)
except SQLAlchemyError:
    pass  # Intentional: this is a cache refresh, failure is acceptable

The hook parses these comments and skips the relevant checks. Suppressions are logged and reviewed during PR audits.

Performance and Failure Modes

MetricValue
Average hook runtime1.2 seconds (Python + TypeScript)
False positive rate~3% (mostly suppressed)
Agent retry rate18% of commits fail first attempt
Time to fix after failure8 seconds median (agent re-generates)

Common failure modes:

  • Tree-sitter parse errors: Malformed code from agent hallucinations. Hook fails with syntax error, agent retries.
  • Overly strict rules: Early versions blocked legitimate patterns. Suppression mechanism fixed this.
  • Performance on large diffs: Hooks run on staged files only. Large refactors (500+ lines) can take 5-8 seconds.

Integration with Agent Workflows

Merrilin runs multiple agents simultaneously (code generation, test writing, documentation). Each agent commits independently. The pre-commit hook is the shared enforcement layer.

Agent retry flow:

  1. Agent generates code, stages files
  2. Pre-commit hook runs, detects missing rollback in background task
  3. Hook returns error message with line number and rule name
  4. Agent parses error, regenerates the function with rollback logic
  5. Agent re-stages and commits
  6. Hook passes, commit succeeds

The agent’s orchestration layer (LangGraph in Merrilin’s case) treats hook failures as recoverable errors. The agent doesn’t escalate to a human unless it fails three times on the same rule.

Trade-Offs and Limitations

What this approach gets right:

  • Catches errors before they enter version control
  • Works with any agent framework (no vendor lock-in)
  • Fast feedback loop (seconds, not minutes)
  • No additional API costs (runs locally)
  • Agents learn patterns over time (fewer retries after initial corrections)

What it doesn’t solve:

  • Logic errors (the code is syntactically correct but does the wrong thing)
  • Performance issues (slow queries, N+1 problems)
  • Security vulnerabilities that aren’t pattern-based (e.g., business logic flaws)
  • Cross-service consistency (hook only sees one repo at a time)

Comparison: Pre-Commit vs. Alternatives

ApproachFeedback SpeedCostAgent AutonomyCoverage
Pre-commit hooksSecondsFreeHigh (auto-retry)Pattern-based
CI checksMinutesFreeMedium (requires re-run)Broader
AI review botsMinutes$50-200/moLow (needs human merge)Contextual
Manual reviewHours-daysEngineer timeNoneComprehensive

Pre-commit hooks are the first line of defense. They don’t replace CI or human review. They reduce the volume of trivial mistakes that reach those stages.

Building Your Own Agent Pre-Commit Guard

If you want to implement this pattern:

1. Start with tree-sitter

Install tree-sitter and language grammars for your stack. Write a script that parses staged files and walks the AST.

2. Define agent-specific anti-patterns

Look at your agent’s recent commits. What mistakes repeat? Missing error handling? Hardcoded secrets? Unvalidated inputs?

3. Implement rules as AST queries

For each anti-pattern, write a tree-sitter query that detects the problematic node structure. Test against known bad examples.

4. Add suppression comments

You will have false positives. Build a suppression mechanism from day one.

5. Integrate with agent orchestration

Configure your agent framework to parse hook errors and retry. LangGraph, LangChain, and AutoGPT all support custom error handlers.

6. Monitor retry rates

Track how often agents fail hooks and how long it takes them to fix issues. If retry rates exceed 30%, your rules are too strict.

Technical Verdict

Use pre-commit hooks for agent-generated code when:

  • You run multiple agents that commit independently
  • You have clear, repeatable anti-patterns (missing error boundaries, SQL without rollback)
  • Your agents can parse error messages and retry autonomously
  • You want fast, local feedback without additional API costs

Avoid this approach when:

  • Your agents generate code infrequently (manual review is cheaper)
  • Your anti-patterns are logic-based, not structural (pre-commit can’t catch these)
  • You don’t have engineering time to maintain custom rules
  • Your team prefers centralized CI enforcement over local hooks

Pre-commit hooks are not a replacement for testing, CI, or human review. They are a speed gate that catches the obvious mistakes before they waste time downstream. For teams running agents at scale, that time savings compounds quickly.