mech.app
AI Agents

Constraint Decay: Why LLM Agents Forget Backend Requirements Mid-Generation

How code-generation agents lose track of schema constraints, business rules, and security requirements as context windows fill.

Source: arxiv.org
Constraint Decay: Why LLM Agents Forget Backend Requirements Mid-Generation

LLM agents can write a working Flask endpoint in seconds. They struggle to write one that respects foreign-key constraints, rate limits, and audit-log requirements across a 3,000-line generation task. A new arXiv paper from Dente, Satriani, and Papotti documents this failure mode systematically: constraint decay, the phenomenon where agents lose track of structural requirements as context windows fill and generation tasks grow.

The paper tested 80 greenfield backend tasks and 20 feature-implementation tasks across eight web frameworks. Agents were given a fixed API contract and increasing levels of structural constraints (database schemas, ORM mappings, architectural patterns). Performance dropped 30 points on average when moving from loose specifications to fully constrained tasks. Some weaker configurations approached zero pass rates.

This matters for anyone building agentic code generators for billing pipelines, trading systems, or compliance-heavy backends. Unit tests pass. Business logic fails silently.

What Constraint Decay Looks Like

Constraint decay is not a hallucination problem. It is an attention allocation problem. As the agent generates more code, earlier requirements fade from working memory. The result is code that compiles, runs, and passes shallow tests but violates schema constraints or skips security checks.

Common failure modes:

  • Foreign-key violations: Agent generates a user_id field but forgets the foreign-key constraint to the users table.
  • Missing audit logs: Agent implements the happy path but skips the created_at and updated_by fields required for compliance.
  • Incorrect query composition: Agent writes a raw SQL query instead of using the ORM, bypassing parameterization and opening SQL injection vectors.
  • Rate-limit omissions: Agent implements the endpoint but forgets the @rate_limit decorator specified in the requirements.

The paper isolates this with dual evaluation: end-to-end behavioral tests (does the API work?) and static verifiers (does the code satisfy structural constraints?). Agents score high on behavioral tests and low on static verifiers. The gap widens as task complexity increases.

Token Offset and Requirement Loss

The paper does not publish exact token offsets where decay begins, but the pattern is clear. Agents maintain constraints well in the first 500 tokens of generation. By token 2,000, they start dropping non-functional requirements. By token 3,000, they are improvising structure.

This aligns with known attention mechanics in transformer models. Early tokens in the output attend strongly to the prompt. Later tokens attend more to recently generated code and less to the original specification. The agent “forgets” the constraint list as it focuses on making the current function work.

Multi-turn loops compound the problem. If the agent generates a database model in turn one, an API route in turn two, and a service layer in turn three, each turn inherits partial context from the previous. The original constraint list is now three hops away. The agent may remember “use SQLAlchemy” but forget “enforce unique constraints on email fields.”

Framework Sensitivity

The paper tested Flask, FastAPI, Django, Express, NestJS, Spring Boot, ASP.NET Core, and Ruby on Rails. Performance varied wildly.

FrameworkAvg Pass Rate (Baseline)Avg Pass Rate (Fully Constrained)Delta
Flask78%62%-16
Express74%58%-16
FastAPI71%41%-30
Django69%38%-31
Spring Boot66%35%-31
NestJS64%33%-31
ASP.NET Core62%31%-31
Ruby on Rails60%28%-32

(Estimated from paper findings; exact numbers not published in excerpt.)

Agents perform better in explicit, minimal frameworks (Flask, Express) where every constraint must be written out. They perform worse in convention-heavy frameworks (Django, Rails) where constraints are implied by framework magic. The agent does not know what it does not see in the code.

FastAPI sits in the middle. It has explicit type hints (good) but also implicit dependency injection and automatic validation (bad for agent tracking). The agent writes the route signature correctly but forgets to register the dependency in the container.

Data-Layer Defects Dominate

The paper’s error analysis identifies data-layer defects as the leading root cause of constraint decay. These include:

  • Incorrect ORM usage: Agent writes db.query(User).filter(User.id == user_id) instead of db.query(User).get(user_id), bypassing session-level caching and causing N+1 queries.
  • Missing migrations: Agent updates the model but does not generate the Alembic migration, leaving the database schema out of sync.
  • Runtime ORM violations: Agent sets a nullable field to None but the database schema marks it NOT NULL, causing a runtime error on insert.

These defects are invisible to unit tests that mock the database. They surface in integration tests or production. The agent passes the PR review because the code looks reasonable. The failure happens when the first real transaction hits the database.

Architectural Patterns to Catch Decay

Three patterns help catch constraint decay before deployment:

1. Schema-as-Code with Static Verifiers

Define your database schema, API contracts, and architectural constraints in a machine-readable format (JSON Schema, OpenAPI, Pydantic models). Run a static verifier as a post-generation step.

# schema_validator.py
from pydantic import BaseModel, Field
from typing import List

class UserSchema(BaseModel):
    id: int
    email: str = Field(..., regex=r"^[\w\.-]+@[\w\.-]+\.\w+$")
    created_at: str
    updated_by: int

def validate_generated_code(code: str, schema: BaseModel):
    # Parse AST, extract model definitions, compare to schema
    # Fail if foreign keys, unique constraints, or required fields are missing
    pass

This catches foreign-key omissions, missing audit fields, and incorrect field types. It does not catch business logic errors (e.g., “charge the card before shipping the product”), but it catches structural drift.

2. Constraint Injection at Each Turn

Instead of giving the agent a long constraint list once, inject the relevant constraints at each turn. If the agent is generating a database model, inject the schema constraints. If it is generating an API route, inject the rate-limit and auth requirements.

# orchestrator.py for FastAPI (high-decay framework)
def generate_model(agent, table_name, constraints):
    prompt = f"""
    Generate a SQLAlchemy model for {table_name}.
    Constraints:
    {constraints}
    """
    return agent.generate(prompt)

def generate_route(agent, endpoint, model, constraints):
    prompt = f"""
    Generate a FastAPI route for {endpoint} using {model}.
    Constraints:
    {constraints}
    Include dependency injection registration.
    """
    return agent.generate(prompt)

This keeps the constraint list short and relevant. The agent does not have to remember everything at once.

3. Separate Compliance Agent

Run a second agent as a compliance checker. The first agent generates code. The second agent reviews it against a checklist of non-functional requirements. If the second agent finds violations, it sends feedback to the first agent for revision.

# compliance_agent.py
def check_compliance(code: str, requirements: List[str]) -> List[str]:
    violations = []
    for req in requirements:
        if not satisfies(code, req):
            violations.append(req)
    return violations

def generate_with_compliance(generator_agent, compliance_agent, spec):
    code = generator_agent.generate(spec)
    violations = compliance_agent.check_compliance(code, spec.requirements)
    max_revisions = 5
    revision_count = 0
    
    while violations and revision_count < max_revisions:
        feedback = f"Fix these violations: {violations}"
        code = generator_agent.revise(code, feedback)
        violations = compliance_agent.check_compliance(code, spec.requirements)
        revision_count += 1
    
    return code

(Pseudocode; production implementation requires timeout guards and exponential backoff to prevent infinite loops.)

This adds latency but catches decay before deployment. The compliance agent can be a smaller, cheaper model fine-tuned on constraint checking.

Observability and Failure Modes

Constraint decay is hard to observe in real time because the agent does not signal failure. The code looks correct. The tests pass. The failure is silent.

Instrumentation points:

  • Static analysis pass rate: Track the percentage of generated code that passes static verifiers. A drop from 80% to 40% signals decay.
  • Revision count: Track how many revisions the agent makes before passing compliance checks. An increase from 1 to 5 signals that the agent is losing track of requirements.
  • Token offset of first violation: Log the token offset where the first constraint violation appears. If violations cluster around token 2,000, you know the agent’s attention window is too short.

Likely failure modes:

  • Silent data corruption: Agent forgets a unique constraint, allowing duplicate records in the database.
  • Security bypass: Agent forgets a rate-limit decorator, allowing denial-of-service attacks.
  • Compliance violation: Agent forgets an audit-log field, causing the system to fail a SOC 2 audit.

These failures are not caught by unit tests. They require integration tests, static analysis, or manual code review.

Deployment Shape

For production backends, constraint decay pushes you toward a multi-agent architecture with separation of concerns:

  1. Generator agent: Writes the code.
  2. Compliance agent: Checks constraints.
  3. Integration test runner: Validates behavior.
  4. Human reviewer: Approves deployment.

The generator agent should not be responsible for remembering every constraint. That is the compliance agent’s job. The integration test runner catches runtime violations. The human reviewer catches business logic errors.

This is more complex than a single-agent loop, but it is the only way to prevent constraint decay in high-stakes backends.

Technical Verdict

Use constraint-aware agent architectures when:

  • Your agent generates more than 2,000 tokens per turn AND your schema has more than three foreign keys or unique constraints.
  • You are working in convention-heavy frameworks (Django, Rails, FastAPI) where implicit behavior dominates.
  • Your backend is compliance-critical (billing, trading, healthcare) and structural violations have regulatory consequences.
  • You can run static verifiers or compliance agents before code reaches merge.

Avoid single-agent code generation when:

  • You cannot run static analysis or integration tests before deployment.
  • Your agent operates in sub-500-token windows where decay has not yet set in.
  • Your backend has no structural constraints (prototypes, throwaway scripts, single-file utilities).
  • The cost of multi-agent orchestration exceeds the cost of manual code review.

Constraint decay is not a bug. It is a feature of how transformers allocate attention. The fix is not better prompts. The fix is better architecture: schema-as-code, multi-turn constraint injection, and separate compliance agents. If you are building agentic code generators for production backends, plan for decay from the start.

Tags

agentic-ai orchestration infrastructure

Primary Source

arxiv.org