mech.app
Security

Email Security Triage Agents: How to Filter Bogus Reports Without Human Review

Build an email agent that parses, classifies, and routes security reports using email parsing, classification heuristics, and ticketing integration.

Source: opencomputer.dev
Email Security Triage Agents: How to Filter Bogus Reports Without Human Review

Security teams running bug bounty programs or public vulnerability disclosure channels face a volume problem. Most incoming reports are AI-generated noise, duplicates, or out-of-scope submissions. Human triage is expensive and slow. An email agent sitting between the inbox and your ticketing system can filter 80% of the junk without touching a real vulnerability.

This is not about building a chatbot. It’s about parsing email, extracting structured data, running classification heuristics, and routing decisions to the right queue. The plumbing matters because misclassifying a real vulnerability is worse than letting spam through.

The Email-to-Agent Pipeline

The architecture is simpler than you think. No UI, no database, no persistent state beyond what your email provider and ticketing system already maintain.

Core flow:

  1. Email arrives at a dedicated address (security@yourcompany.com)
  2. Email provider triggers a webhook or forwards to an IMAP-watching service
  3. Agent parses body, attachments, and headers
  4. Classification logic runs against codebase context
  5. Agent sends result email or creates a ticket

Trigger mechanism:

You need a way to know when to launch the agent. Three options:

  • Gmail labels + IMAP polling: Set up a filter that labels incoming mail, poll IMAP every 60 seconds, launch agent for new labeled messages
  • Webhook from email provider: SendGrid, Mailgun, or Cloudflare Email Routing can POST to your endpoint
  • Serverless function on S3 bucket: Forward email to S3 via SES, trigger Lambda on object creation

The OpenComputer demo uses Gmail labels. A cron job polls IMAP, finds messages with the “security-report” label, and spins up an agent sandbox for each one. This avoids running a persistent server and keeps the agent stateless. The agent itself is a Python script that reads the email from stdin and writes classification results to stdout.

Parsing Email Bodies and Attachments

Email is a terrible format for structured data. You get HTML, plain text, quoted replies, inline images, and attachments in a MIME multipart wrapper. The agent needs to extract:

  • Claimed vulnerability type (XSS, SQLI, CSRF, etc.)
  • Affected URL or endpoint
  • Steps to reproduce
  • Proof-of-concept code or screenshots
  • Reporter contact info

Parsing strategy:

import email
from email import policy

def parse_security_report(raw_email):
    msg = email.message_from_bytes(raw_email, policy=policy.default)
    
    # Extract plain text body, ignoring HTML
    body = ""
    if msg.is_multipart():
        for part in msg.walk():
            if part.get_content_type() == "text/plain":
                body += part.get_content()
    else:
        body = msg.get_content()
    
    # Extract attachments
    attachments = []
    for part in msg.iter_attachments():
        attachments.append({
            "filename": part.get_filename(),
            "content_type": part.get_content_type(),
            "data": part.get_content()
        })
    
    return {
        "from": msg["From"],
        "subject": msg["Subject"],
        "body": body,
        "attachments": attachments
    }

The agent then feeds the body text to an LLM with a prompt that extracts structured fields. Attachments get OCR’d if they’re images, or parsed if they’re code files.

Common parsing failures:

  • Reporter pastes HTML instead of plain text (LLM sees tags, not content)
  • Steps to reproduce are in a screenshot (need OCR or manual review)
  • Email thread includes 10 previous replies (need to isolate the new content)

Classification Heuristics

The agent needs to decide: is this a real vulnerability, a duplicate, out of scope, or spam? You can’t rely on an LLM alone because it will hallucinate confidence.

Rule-based filters (run first):

  • Duplicate detection: Hash the body text, check against last 30 days of reports stored in a cache
  • Out-of-scope domains: If the reported URL is not in your domain list, auto-reject
  • Known false positives: “Missing security headers on static assets” gets auto-closed with a canned response
  • Spam signals: No steps to reproduce, generic template language, requests for payment upfront

LLM classification (run second):

The agent sends the parsed report plus a snippet of your codebase to an LLM with this prompt:

You are a security engineer reviewing a vulnerability report. 
Given the report below and the relevant code, classify it as:

- VALID: Exploitable vulnerability with clear impact
- INVALID: Not a vulnerability (misunderstanding, intended behavior)
- OUT_OF_SCOPE: Real issue but not in our threat model
- SPAM: Generic template, no specifics

Report:
{parsed_report}

Relevant code:
{code_context}

Return JSON: {"classification": "VALID", "confidence": 0.85, "reasoning": "..."}

The code context comes from a vector search over your repo. The agent embeds the reported endpoint or file path, retrieves the top 5 relevant code chunks, and includes them in the prompt.

Confidence thresholds:

ClassificationConfidenceAction
VALID> 0.8Create high-priority ticket, notify on-call
VALID0.5-0.8Create ticket, assign to triage queue
INVALID> 0.9Auto-close, send canned response
INVALID< 0.9Send to triage queue (false negatives are expensive)
OUT_OF_SCOPE> 0.8Auto-close with explanation
SPAM> 0.95Silent discard

These thresholds map directly to business risk. A false negative (marking a real vulnerability as INVALID) is catastrophic, so the agent errs toward human review when confidence is below 0.9. A false positive (creating a ticket for spam) wastes human time but doesn’t create security exposure.

Ticketing System Integration

The agent needs to write back to your workflow. Three common patterns:

1. Email reply only

The agent sends a response email to the reporter. No ticket created unless a human forwards it. Simple but loses audit trail.

2. Create ticket, email reporter

Agent calls Jira/Linear/GitHub Issues API to create a ticket, then sends email with ticket ID. Requires API credentials and error handling for rate limits.

3. Forward to triage queue

Agent sends email to a dedicated triage address (security-triage@yourcompany.com) with classification metadata in headers. Humans process the queue. Lowest risk but still requires human time.

Example GitHub Issues integration:

import os
import requests

# Fetch token from environment variable (injected by secrets manager)
GITHUB_TOKEN = os.environ.get("GITHUB_TOKEN")
OWNER = "your-org"
REPO = "your-repo"

def create_ticket(report, classification):
    headers = {
        "Authorization": f"token {GITHUB_TOKEN}",
        "Accept": "application/vnd.github.v3+json"
    }
    
    labels = ["security"]
    if classification["confidence"] < 0.9:
        labels.append("human-review")
    
    issue = {
        "title": f"Security Report: {report['subject']}",
        "body": f"""
**Reporter:** {report['from']}
**Classification:** {classification['classification']} (confidence: {classification['confidence']})
**Reasoning:** {classification['reasoning']}

---

{report['body']}
        """,
        "labels": labels
    }
    
    response = requests.post(
        f"https://api.github.com/repos/{OWNER}/{REPO}/issues",
        headers=headers,
        json=issue
    )
    
    return response.json()["html_url"]

Failure Modes and Human Override

The worst failure is misclassifying a real vulnerability as spam or invalid. The second worst is creating 100 tickets for the same duplicate report.

Mitigation strategies:

  • Conservative thresholds: When in doubt, route to human review
  • Audit log: Store every classification decision with the full prompt and LLM response
  • Manual override: Include a link in every auto-close email that lets the reporter escalate to human review
  • Weekly review: Sample 10 random auto-closed reports each week to check for false negatives

Override mechanism:

The agent includes a unique token in every response email. If the reporter replies with “ESCALATE {token}”, the agent creates a ticket regardless of classification. The token prevents abuse (you can’t escalate someone else’s report).

Duplicate handling:

Hash the report body and store in a cache with 30-day TTL. If a new report hashes to the same value, auto-close with “Duplicate of #{original_ticket_id}”. This breaks if the reporter rewords the report, so include a similarity check (cosine distance on embeddings) as a fallback.

Deployment Shape and Infrastructure

The agent runs in a stateless sandbox. Each email triggers a fresh instance. No persistent process, no database beyond what you use for duplicate detection.

Infrastructure options:

  • AWS Lambda + SES: Email arrives in S3, triggers Lambda, agent runs in container
  • Google Cloud Run + Gmail API: Cron job polls Gmail, spawns Cloud Run instance per message
  • Kubernetes CronJob: Polls IMAP every minute, launches pod per message

Required infrastructure components:

  1. Email ingestion: IMAP polling or webhook receiver
  2. Agent runtime: Sandboxed execution environment (Lambda, Cloud Run, Kubernetes pod)
  3. Duplicate cache: Redis or DynamoDB with 30-day TTL for report hashes
  4. Vector database: Pinecone, Weaviate, or pgvector for codebase embeddings
  5. Secrets manager: AWS Secrets Manager, GCP Secret Manager, or HashiCorp Vault

The OpenComputer demo-agent-triage repository shows a minimal implementation. The agent code is a single Python script that expects the email as input and outputs a JSON classification. The orchestration layer (IMAP polling, sandbox creation, result handling) is separate. This separation lets you swap out the agent logic without touching the infrastructure.

Secrets management:

The agent needs:

  • Email credentials (IMAP or API token)
  • Ticketing system API token
  • LLM API key
  • Code repo access (GitHub token or SSH key)

These go into environment variables injected at runtime. Never bake them into the container image. Use your cloud provider’s secret manager and fetch at startup.

Observability and Debugging

You need to know when the agent is wrong. Metrics to track:

  • Classification distribution: How many VALID vs INVALID vs SPAM per day
  • Confidence scores: Histogram of LLM confidence values
  • False positive rate: Manually review a sample and compare to agent decisions
  • Processing time: P50, P95, P99 latency from email arrival to ticket creation
  • Error rate: Failed API calls, LLM timeouts, parsing exceptions

Logging strategy:

Log every decision with full context:

{
  "timestamp": "2026-06-09T08:15:00Z",
  "email_id": "abc123",
  "from": "reporter@example.com",
  "classification": "INVALID",
  "confidence": 0.92,
  "reasoning": "Reported XSS is in a sandboxed iframe with CSP",
  "llm_model": "claude-3-opus",
  "llm_tokens": 1523,
  "processing_time_ms": 4200,
  "action": "auto_close"
}

Store these in a searchable log aggregator (Datadog, Splunk, CloudWatch Logs). When a reporter complains about an auto-close, you can pull the full decision trace.

Technical Verdict

Use this pattern when:

  • You receive more than 10 security reports per week
  • More than 50% are duplicates, out of scope, or spam
  • You have a codebase the agent can search (not a black-box SaaS product)
  • You can tolerate a 5% false negative rate (real vulns marked invalid)

Avoid when:

  • You get fewer than 5 reports per month (not worth the automation overhead)
  • Your threat model requires human review of every report (regulated industries, critical infrastructure)
  • You don’t have structured vulnerability disclosure guidelines (the agent needs clear scope rules)
  • Your codebase changes faster than you can update the agent’s context (early-stage startups)

The value is in time saved, not perfection. If the agent filters 80% of junk and routes 15% to triage, you’ve cut human review time by 80%. The remaining 5% (misclassified real vulnerabilities) is your risk budget. Tune thresholds based on your risk tolerance.