When an autonomous agent deletes a folder, modifies a trading position, or calls an API you didn’t expect, you need the same forensic tools you use for code: blame, bisect, and rewind. Re_gent applies Git’s commit model to agent execution history, tracking tool calls, LLM responses, and state mutations as versioned snapshots.
The project addresses a gap that becomes obvious once you move from single-shot LLM calls to multi-step workflows. You can log everything, but logs don’t give you diffs. You can checkpoint state, but checkpoints don’t tell you why a decision was made. Re_gent treats each agent action as a commit, complete with parent pointers, metadata, and a snapshot of the world before and after.
The Missing Primitive
Standard agent observability gives you traces and spans. You see the sequence of tool calls, the latency, the token count. What you don’t get is a structured history that answers:
- When did the agent delete this file, and what was its reasoning?
- Which tool call introduced the bug I’m seeing now?
- Can I rewind to before the agent compacted its context window and lost critical state?
Re_gent’s commit model captures:
- Tool invocations: Function name, arguments, return value.
- LLM responses: The raw completion, the parsed action, the reasoning trace if available.
- State deltas: File system changes, database writes, API side effects.
- Parent commits: The execution graph, including branches when the agent explores multiple paths.
Each commit is content-addressable. You can diff two commits to see what changed. You can bisect to find the commit that introduced a failure. You can blame a specific state mutation and trace it back to the LLM prompt that triggered it.
Architecture
Re_gent sits between your orchestration layer and the agent runtime. It intercepts tool calls and wraps them in a commit transaction.
from regent import ReGent, commit
agent = ReGent(repo_path="./agent_history")
@agent.tool
def delete_folder(path: str) -> dict:
"""Remove a directory and its contents."""
result = shutil.rmtree(path)
return {"deleted": path, "status": "success"}
# Each tool call becomes a commit
with agent.commit(message="Clean up temp files", author="agent-v2"):
delete_folder("/tmp/cache")
The commit includes:
- Timestamp and author (agent version, model ID).
- The tool call signature and return value.
- A snapshot of relevant state (file hashes, database checksums).
- The LLM’s reasoning if you’re using chain-of-thought or ReAct.
Re_gent stores commits in a local SQLite database by default, with optional backends for Postgres or S3. The storage model is append-only. Snapshots are deduplicated using content hashing, so repeated states don’t explode disk usage.
Bisect for Non-Deterministic Actions
Git bisect works because code is deterministic. Run the same commit, get the same result. Agents are not deterministic. Replay a commit with the same prompt and you may get a different tool call due to temperature, sampling, or model updates.
Re_gent handles this by storing the full execution trace, not just the prompt. When you bisect, you’re not re-running the agent. You’re replaying the recorded tool calls and checking whether the outcome matches your test condition.
regent bisect start
regent bisect bad HEAD
regent bisect good a3f8c2
# Re_gent replays commits without re-invoking the LLM
# You provide a test script that checks the resulting state
regent bisect run ./test_portfolio_balance.sh
The test script examines the state snapshot at each commit. If the portfolio balance is negative, the test fails. Re_gent walks the commit graph using binary search until it finds the first bad commit.
This works because Re_gent captures the side effects, not just the intent. You’re bisecting the actual state mutations, not the model’s reasoning process.
Blame and Audit Trails
Blame maps a piece of state back to the commit that created it. In a trading agent, you might ask: which decision led to this open position?
regent blame positions.json --line 42
Output:
commit 7a3f9e1c
Author: trading-agent-v1.2 (gpt-4-0125-preview)
Date: 2026-05-15 14:32:18 UTC
Message: Open long position on AAPL based on momentum signal
Tool: execute_trade(symbol="AAPL", side="buy", quantity=100)
Reasoning: "RSI below 30, MACD crossover detected, entry at $178.50"
This is critical for regulated environments. You need to show an auditor not just what the agent did, but why it did it and when. Re_gent’s commit log becomes your audit trail.
Rewind After Context Compaction
Long-running agents compact their context window to stay under token limits. They summarize old messages, drop tool call details, and lose the ability to explain past decisions.
Re_gent decouples execution history from the agent’s working memory. Even after the agent compacts, you can rewind the commit history and inspect the full trace.
regent checkout 3d8a7f2
regent show --full-trace
You get the original prompt, the tool calls, the LLM’s reasoning, and the state before and after. The agent’s current context may have forgotten this, but the VCS hasn’t.
Storage Trade-offs
Storing every tool call and state snapshot is expensive. Re_gent uses several strategies to keep costs manageable:
| Strategy | Benefit | Cost |
|---|---|---|
| Content-addressable snapshots | Deduplicates unchanged state | Hash computation overhead |
| Incremental deltas | Stores only what changed | Requires diffing logic for each state type |
| Lazy snapshot loading | Fetches snapshots on demand | Slower bisect and blame operations |
| Commit pruning | Drops old commits after a retention window | Loses long-term audit trail |
For high-frequency trading agents, you might commit only on significant actions (trades, risk limit breaches) rather than every market data fetch. For code generation agents, you might snapshot file hashes instead of full file contents.
The default configuration stores full snapshots for the last 1,000 commits and switches to delta-only storage beyond that. You can tune this based on your compliance requirements and disk budget.
Failure Modes
Re_gent introduces new failure surfaces:
- Commit overhead: Wrapping every tool call in a transaction adds latency. For latency-sensitive agents, you may need to batch commits or commit asynchronously.
- Storage exhaustion: Long-running agents with high action frequency can fill disk. You need monitoring and pruning policies.
- Replay divergence: If the environment changes (API returns different data, file system state differs), replaying a commit may not reproduce the same outcome. Re_gent can detect this but can’t prevent it.
- Concurrency conflicts: If multiple agents share the same repo, you need locking or conflict resolution. Re_gent currently assumes single-writer.
The biggest risk is treating the commit history as ground truth when the agent’s reasoning was flawed. Re_gent shows you what the agent did and why it thought it was correct. It doesn’t tell you whether the reasoning was sound.
Technical Verdict
Use Re_gent when:
- You need audit trails for regulated environments (finance, healthcare, legal).
- Your agents run long enough that you lose track of why they made specific decisions.
- You’re debugging multi-step workflows and need to isolate which action introduced a bug.
- You want to compare agent behavior across model versions or prompt changes.
Avoid Re_gent when:
- Your agents are stateless or single-shot (just use structured logging).
- Latency is critical and you can’t afford commit overhead.
- You’re prototyping and don’t need historical forensics yet.
- Your storage budget is tight and you can’t deduplicate snapshots effectively.
Re_gent is infrastructure for production agents that modify state and need to explain themselves later. It’s not a replacement for observability or logging. It’s a layer on top that gives you the same forensic tools you expect from version control.