Armin Ronacher (Flask, Rye, Pi) just called out a new operational failure mode in open-source maintenance: bug reports rewritten by LLMs that arrive full of confident speculation, fake minimal repros, and root-cause guesses that waste maintainer time. He calls them “slop issues” (a clanker is his term for an LLM rewriting tool). Simon Willison tagged the post with ai-ethics and slop, signaling that the community recognizes this as a distinct problem class.
This is not about AI-generated pull requests or automated vulnerability hunters. This is about the human-to-maintainer communication layer breaking down when users paste their observed problem into an LLM, accept the rewritten output, and submit it as a GitHub issue. The result looks helpful but obscures the actual facts.
The Operational Damage
When an issue arrives rewritten by an LLM, maintainers face:
- Confident hallucinations: The LLM invents root causes, suggests implementation strategies, and references adjacent code that may not be relevant.
- Fake minimal repros: The model generates a reproduction case that looks plausible but doesn’t actually trigger the bug.
- Verbose speculation: Long lists of error classes, analogies to similar problems, and guesswork that buries the actual observed behavior.
- Loss of ground truth: The human’s original command, expected outcome, and exact error message are rewritten into prose that sounds authoritative but lacks precision.
Ronacher’s preferred format is stark:
I ran this command. I expected this to happen. This happened instead. Here is the exact error or log.
That structure gives maintainers the raw facts. LLM rewrites replace facts with inference.
Why Triage Systems Can’t Distinguish Slop
Issue trackers store markdown text. They have no built-in mechanism to distinguish:
- A human typing their observations directly
- A human using an LLM as a grammar assistant
- A human pasting their problem into ChatGPT and submitting the output verbatim
All three arrive as markdown. The only signal is tone and structure. Slop issues tend to:
- Use passive voice and hedge words (“it appears that,” “this might indicate”)
- Include implementation suggestions before the bug is confirmed
- Reference code paths the reporter never examined
- Lack exact command invocations or raw log output
But these are heuristics, not metadata. A well-intentioned user with an LLM-assisted grammar check looks identical to a user who let the model rewrite their entire report.
Enforcement Options and Trade-Offs
| Approach | What It Does | Maintainer Benefit | User Friction | When to Use |
|---|---|---|---|---|
| Structured issue templates | Enforce separate fields for command, expected, actual, logs | Forces factual input, resists essay-style expansion | Low (already common) | 20+ issues/month, small team, clear repro requirements |
| Manual triage labels | Maintainer marks issues as “needs clarification” | Signals quality problems | None for users | Under 10 issues/month, strong community norms |
| Pre-submit linting | Bot checks for speculation keywords, missing logs | Catches common patterns | Medium (may block valid reports) | 50+ issues/month, existing CI/CD, tolerance for 10-15% false positives |
| Community norms | Document “no LLM rewrites” in CONTRIBUTING.md | Cultural signal | Low | Small projects (under 1,000 stars), low issue volume |
None of these are airtight. Structured templates help but don’t prevent users from pasting LLM output into the “actual behavior” field. Detection tools require ecosystem buy-in. Manual triage doesn’t scale.
What a Slop-Resistant Issue Template Looks Like
Here’s a template structure that resists LLM expansion:
## Command or Code You Ran
<!-- Paste the exact command or code snippet. No explanation. -->
## Expected Behavior
<!-- One sentence. What should have happened? -->
## Actual Behavior
<!-- One sentence. What happened instead? -->
## Logs or Error Messages
<!-- Paste raw output. Do not summarize or rewrite. -->
## Environment
- OS:
- Version:
- Install method:
---
**Do not include:**
- Root cause analysis
- Suggested fixes
- Comparisons to other issues
- Implementation strategies
We need your observations, not speculation.
This template:
- Separates facts into discrete fields
- Explicitly forbids speculation
- Requests raw output, not prose summaries
- States the maintainer’s needs upfront
It won’t stop all slop, but it makes the desired format clear.
The Assistive Tool Dilemma
Some users rely on LLMs for accessibility: grammar assistance, translation, or structuring their thoughts. A blanket ban on AI-generated text harms those users.
The distinction is:
- Assistive use: LLM helps the user express what they observed.
- Generative use: LLM invents content the user didn’t observe.
Here’s a concrete example. A user observes this:
$ rye sync
Error: connection timeout
They paste it into ChatGPT asking for help filing a bug report. The LLM rewrites it as:
When running
rye sync, the initialization sequence appears to block on DNS resolution, which could indicate a network stack issue or firewall configuration problem. This might be related to the async runtime’s connection pooling strategy. The timeout suggests the underlying socket is not receiving a response within the expected window, possibly due to IPv6 fallback behavior.
The maintainer now has to ask: did you actually see a DNS error? Is there a firewall? Did you check IPv6? Or is this speculation? A structured template forces the user to paste the exact error message first. If they write “Error: connection timeout” in the “Actual Behavior” field and leave it at that, the maintainer has ground truth. If they paste the LLM’s speculation, the template’s explicit instructions (“Do not include root cause analysis”) signal they’ve violated the format.
Issue trackers have no way to enforce this boundary automatically. The best signal is whether the report contains:
- Exact commands
- Raw logs
- Specific version numbers
- Reproducible steps the reporter actually executed
If those are present, the LLM likely assisted rather than generated. If they’re absent and replaced with speculation, the LLM probably rewrote the entire report.
Observability Gap
GitHub Issues has no telemetry for:
- Whether the issue text was pasted from an LLM
- How many edits happened before submission
- Whether the reporter ran the repro steps themselves
Maintainers see the final markdown. They can’t trace provenance. This is the same problem email spam filters faced in the early 2000s: content-based heuristics without sender authentication.
One possible mitigation: issue bots that ask clarifying questions when they detect speculation patterns. Example:
This issue contains phrases like "might indicate" and "could be caused by."
Can you confirm you ran the exact command listed and saw the error shown?
Please paste the raw log output if available.
This adds friction but surfaces the ground-truth question early.
Technical Verdict
Use structured issue templates with explicit anti-speculation language if your project receives more than 20 issues per month and you’ve seen multiple reports in the last 30 days that included root-cause speculation, implementation suggestions, or analogies to adjacent code without providing the exact command run, expected behavior, and raw error output. Ronacher’s critique shows that maintainers need “I ran this command. I expected this to happen. This happened instead. Here is the exact error or log.” Templates with separate fields for command, expected, actual, and logs force users to provide factual input before expanding into speculation. The setup cost (five minutes to add a GitHub issue template) is justified when triage overhead from slop issues exceeds the time saved by accepting unstructured reports.
Avoid structured templates if your project receives fewer than 10 issues per month, you have strong community norms, or your issue tracker is primarily used for feature requests and design discussions rather than bug reports. Templates optimized for factual bug reports frustrate users trying to propose ideas. At low volume, direct communication and cultural expectations work better than automation.
Why this works: Ronacher’s problem statement is that LLM-rewritten issues contain “complete guesswork on root causes, fake-minimal repros, suggested implementation strategies” that bury the actual observed behavior. Structured templates resist this by making it easier to paste raw facts into discrete fields than to paste LLM-generated prose into a single text box. The template doesn’t detect slop, it makes slop harder to produce.