An engineer ran an autonomous agent against real GitHub bounties for 96 hours. It submitted 240+ PRs, got 72 merged, and earned $500-800. The interesting part is not the money. It’s the failure telemetry from a multi-day stateful workflow that had to manage OAuth tokens, rate limits, repository forks, and competitive task acquisition without human intervention.
This is the plumbing breakdown.
Architecture: The Bounty Hunting Loop
The agent (called ZKA) runs on a 30-minute cron schedule. Each cycle:
- Scans GitHub for open bounties via API
- Evaluates legitimacy, difficulty, and competition
- Clones repositories and analyzes codebases
- Writes fixes with tests
- Submits PRs with descriptions
- Monitors review feedback and responds to bots
The stack is straightforward: GitHub CLI for API interactions, Python for orchestration, Hermes Agent (a self-hosted AI framework), and cron for scheduling.
# Simplified bounty hunting loop
while True:
bounties = search_bounties()
for bounty in bounties:
if is_legitimate(bounty) and is_low_competition(bounty):
repo = clone_repository(bounty.repo_url)
fix = generate_fix(repo, bounty.issue)
if run_tests(fix):
submit_pr(fix, bounty)
sleep(1800) # 30-minute interval
The critical detail: this is not a one-shot script. It’s a persistent agent that must maintain state across multiple days, handle API failures, and avoid triggering anti-abuse mechanisms.
Authentication Boundaries and Token Management
GitHub OAuth tokens have scopes. A token with repo:read cannot push to a repository. A token with repo:write can, but it also has higher rate limits and stricter abuse detection.
The agent needs to:
- Fork repositories (requires
repo:write) - Clone forks (requires
repo:read) - Push commits (requires
repo:write) - Open PRs (requires
repo:write) - Comment on issues (requires
public_repoorrepo:write)
The naive approach is to use a single token with full repo scope. This works until GitHub’s abuse detection flags the account for suspicious activity. The agent submitted 240+ PRs in 96 hours. That’s 2.5 PRs per hour, every hour, for four days straight. No human does that.
The better approach is to use multiple tokens with different scopes and rotate them based on operation type. Read operations use a low-privilege token. Write operations use a higher-privilege token. This reduces the blast radius if one token gets rate-limited or flagged.
Token refresh is another issue. OAuth tokens expire. The agent needs to detect expiration (usually a 401 response) and refresh the token before retrying the operation. If the refresh fails, the agent should pause and alert a human.
Rate Limiting and Backoff Strategies
GitHub’s API has two rate limits:
- Primary rate limit: 5,000 requests per hour for authenticated requests
- Secondary rate limit: Triggered by rapid bursts of requests, even if under the primary limit
The agent hit the secondary rate limit multiple times. The symptom: 403 responses with a Retry-After header. The cause: submitting PRs too quickly in succession.
The fix: exponential backoff with jitter. After each API call, the agent waits a random interval between 1 and 5 seconds. If it receives a 403, it waits for the duration specified in Retry-After, then doubles the wait time for the next retry.
def api_call_with_backoff(func, max_retries=5):
for attempt in range(max_retries):
try:
return func()
except RateLimitError as e:
wait_time = e.retry_after or (2 ** attempt + random.uniform(0, 1))
time.sleep(wait_time)
raise Exception("Max retries exceeded")
The agent also tracks its own rate limit budget. Before each API call, it checks the X-RateLimit-Remaining header. If the remaining budget is below 100, it pauses until the reset time.
State Persistence Across Multi-Day Workflows
A bounty workflow is not atomic. It spans multiple steps:
- Find bounty
- Fork repository
- Clone fork
- Write fix
- Run tests
- Submit PR
- Monitor review feedback
- Respond to comments
- Wait for merge
- Claim bounty
If the agent crashes or restarts between steps, it needs to resume from where it left off. This requires persistent state.
The agent uses a SQLite database to track:
- Bounties in progress
- Repository forks created
- PRs submitted
- Review comments received
- Merge status
Each bounty has a state machine:
discovered→forked→cloned→fixed→tested→submitted→under_review→merged→claimed
The agent queries the database at the start of each cycle to determine which bounties need attention. If a PR is in under_review, it checks for new comments. If a PR is in merged, it attempts to claim the bounty.
| State | Next Action | Failure Mode |
|---|---|---|
discovered | Fork repository | Repository is archived or private |
forked | Clone fork | Fork creation failed silently |
cloned | Generate fix | Codebase analysis timeout |
fixed | Run tests | Tests fail due to environment mismatch |
tested | Submit PR | PR already exists for the same issue |
submitted | Monitor reviews | Review bot requests changes |
under_review | Respond to comments | Comment parsing fails |
merged | Claim bounty | Bounty platform API is down |
The most common failure mode: tests pass locally but fail in CI. The agent has no visibility into the CI environment. It can only see the final status (pass/fail) and any logs the CI system exposes. If the logs are not machine-readable, the agent cannot diagnose the failure.
Isolation Boundaries: Preventing Accidental Commits
The agent operates on forks, not upstream repositories. This is critical. If the agent accidentally pushes to the upstream repository, it could corrupt the main branch or trigger security alerts.
The isolation strategy:
- Always fork before cloning
- Set the fork as the
originremote - Set the upstream repository as the
upstreamremote (read-only) - Never push to
upstream
The agent validates the remote configuration before every push:
def safe_push(repo_path, branch):
remotes = subprocess.check_output(
["git", "remote", "-v"], cwd=repo_path
).decode()
if "origin" not in remotes:
raise Exception("No origin remote found")
origin_url = get_remote_url(repo_path, "origin")
if not is_fork(origin_url):
raise Exception("Origin is not a fork")
subprocess.run(["git", "push", "origin", branch], cwd=repo_path, check=True)
Another isolation boundary: the agent runs in a containerized environment with limited file system access. It can only write to a designated workspace directory. This prevents it from accidentally modifying system files or other repositories.
Observability: What You Need to See
The agent logs every API call, every state transition, and every error. The logs are structured JSON, not plain text. This makes them queryable.
Key metrics tracked:
- Bounties discovered per cycle
- PRs submitted per hour
- Merge rate (merged PRs / submitted PRs)
- Average time from submission to merge
- Rate limit budget remaining
- Error rate by type (API errors, test failures, merge conflicts)
The agent also exposes a Prometheus endpoint for real-time monitoring. Grafana dashboards show:
- Active bounties by state
- API call latency
- Rate limit consumption over time
- Error spikes
The most useful alert: “No PRs submitted in the last 2 hours.” This usually means the agent is stuck in a retry loop or the bounty platform API is down.
What Actually Broke
The agent ran for 96 hours. Here’s what failed:
Test environment mismatches: 40% of PRs that passed local tests failed in CI. The agent could not reproduce the CI environment locally. The fix: run tests in a Docker container that matches the CI environment.
Review bot parsing: CodeRabbit and Cubic post structured comments with suggested changes. The agent could not parse these comments reliably. It treated them as human feedback and generated nonsensical responses. The fix: add explicit parsers for known review bots.
Merge conflicts: When multiple agents (or humans) work on the same repository, merge conflicts are inevitable. The agent could not resolve conflicts automatically. It marked the PR as failed and moved on. The fix: implement a conflict resolution strategy (rebase on upstream, regenerate fix, resubmit).
Bounty platform API instability: The bounty platform API went down twice during the 96-hour run. The agent could not claim bounties during these outages. The fix: implement a retry queue with exponential backoff for bounty claims.
Token expiration: One of the OAuth tokens expired mid-run. The agent detected the 401 response but failed to refresh the token because the refresh token was also expired. The fix: monitor token expiration proactively and refresh before expiration.
Deployment Shape
The agent runs on a single EC2 instance (t3.medium). It does not need horizontal scaling because the bottleneck is API rate limits, not compute. Adding more instances would just hit rate limits faster.
The deployment includes:
- Cron daemon for scheduling
- SQLite database for state persistence
- Docker for test isolation
- Prometheus for metrics
- Grafana for dashboards
- CloudWatch for logs
The agent is stateful, so it cannot be deployed as a serverless function. It needs persistent storage and long-running processes.
Likely Failure Modes in Production
If you run a similar agent, expect these failures:
- Rate limit exhaustion: GitHub’s secondary rate limit is unpredictable. You will hit it.
- Test flakiness: Tests that pass locally will fail in CI. You need environment parity.
- Review bot incompatibility: Every repository uses different review bots. You need custom parsers.
- Merge conflicts: Multiple agents competing for the same bounties will create conflicts.
- Token expiration: OAuth tokens expire. You need proactive refresh.
- Bounty platform downtime: Third-party APIs go down. You need retry queues.
- Repository access changes: Repositories can become private or archived mid-workflow. You need to handle 404s gracefully.
Technical Verdict
Use this approach when:
- You have a high-volume, repetitive workflow (e.g., bounty hunting, issue triage, dependency updates)
- You can tolerate a 40-60% success rate
- You have robust observability and alerting
- You can handle OAuth token management and rate limiting
- You are willing to invest in test environment parity
Avoid this approach when:
- You need 100% reliability
- You cannot afford to hit API rate limits
- You do not have structured logs and metrics
- You are working with repositories that have complex CI/CD pipelines
- You cannot handle merge conflicts programmatically
The agent earned $500-800 in 96 hours, but it also generated noise (160+ failed PRs). If you run this in production, you need to balance throughput with quality. The best strategy: start with a whitelist of known-good repositories, monitor merge rates closely, and expand gradually.