A recent Hacker News thread about career transitions surfaced a pattern: engineers planning moves into automated crypto trading as AI disrupts traditional dev roles. The conversation exposed a gap between “I’ll build a trading bot” aspirations and the infrastructure needed to prevent catastrophic losses.
Building a trading agent is not hard. Building one that survives exchange outages, rate limits, and your own bugs without liquidating your account is a different problem. Here’s what the plumbing actually looks like.
Exchange Connectivity: Rate Limits and WebSocket Reconnection
Every exchange enforces API rate limits. Binance allows 1200 requests per minute. Coinbase caps at 10 requests per second. Your agent needs to track its own request budget or the exchange will ban your IP for hours.
REST vs. WebSocket trade-offs:
- REST: Stateless, easier to retry, but burns rate limit budget on polling
- WebSocket: Real-time updates, but requires reconnection logic when the socket drops
WebSocket reconnection is where most bots fail. You need:
- Exponential backoff with jitter to avoid thundering herd
- Snapshot reconciliation (did you miss any trades during the disconnect?)
- Duplicate message detection (exchanges often replay the last N messages on reconnect)
class ExchangeWebSocket:
def __init__(self, url, snapshot_endpoint):
self.url = url
self.snapshot_endpoint = snapshot_endpoint
self.last_sequence = None
self.reconnect_delay = 1.0
async def reconnect(self):
await asyncio.sleep(self.reconnect_delay)
self.reconnect_delay = min(self.reconnect_delay * 2, 60)
# Fetch snapshot to reconcile missed messages
snapshot = await self.fetch_snapshot()
self.last_sequence = snapshot['sequence']
# Reconnect WebSocket
self.ws = await websockets.connect(self.url)
self.reconnect_delay = 1.0 # Reset on success
async def fetch_snapshot(self):
# Use REST API to get current order book state
async with aiohttp.ClientSession() as session:
async with session.get(self.snapshot_endpoint) as resp:
return await resp.json()
Position Sizing Guardrails: Preventing Runaway Orders
An autonomous agent without position limits will eventually place an order that exceeds your account balance. This is not theoretical. Public incident reports document $440M in retail bot losses during 2021 flash crashes.
Guardrail layers:
- Pre-execution checks: Validate order size against account balance and risk parameters before submitting
- Exchange-side limits: Set max order size and daily loss limits in exchange API settings (not all exchanges support this)
- Circuit breakers: Halt trading if drawdown exceeds threshold or if order rejection rate spikes
| Guardrail Type | Where It Lives | Failure Mode |
|---|---|---|
| Pre-execution validation | Agent code | Bug in validation logic |
| Exchange API limits | Exchange settings | Not all exchanges support; requires manual configuration |
| Circuit breaker | Monitoring layer | Reacts after damage is done |
| Kill switch | External service | Requires separate infrastructure; adds latency |
The problem: guardrails in your agent code can be bypassed by bugs in your agent code. You need an external kill switch that monitors account state and can halt trading independent of the agent’s decision loop.
State Management Across Exchange Outages
Your agent needs to track:
- Open positions (long/short, size, entry price)
- Pending orders (limit orders waiting to fill)
- Recent fills (to reconcile against expected state)
- Market data (order book, recent trades)
When the exchange goes down, your agent loses visibility into all of this. When it comes back up, you need to reconcile:
- Did pending orders fill during the outage?
- Did stop-loss orders trigger?
- Is your position size what you think it is?
State reconciliation pattern:
async def reconcile_state_after_outage(self):
# Fetch current positions from exchange
live_positions = await self.exchange.get_positions()
# Fetch open orders
live_orders = await self.exchange.get_open_orders()
# Compare against local state
for symbol, local_pos in self.local_positions.items():
live_pos = live_positions.get(symbol, 0)
if live_pos != local_pos:
# Log discrepancy and update local state
logger.warning(f"Position mismatch for {symbol}: "
f"local={local_pos}, live={live_pos}")
self.local_positions[symbol] = live_pos
# Check if stop-loss triggered
if abs(live_pos) < abs(local_pos):
self.handle_stop_loss_fill(symbol, local_pos - live_pos)
You need a persistent state store (Redis, PostgreSQL) separate from the agent’s in-memory state. On restart, the agent loads from the store and reconciles against the exchange.
Backtesting vs. Paper Trading: Different Infrastructure
Backtesting replays historical data. Paper trading uses live market data with simulated execution. These require different infrastructure.
Backtesting plumbing:
- Load historical OHLCV or tick data from CSV/Parquet
- Simulate order book state (or use actual historical order book snapshots if available)
- Assume fills at limit price (optimistic) or worse (realistic slippage model)
- No network latency, no rate limits, no exchange outages
Paper trading plumbing:
- Connect to live WebSocket feeds
- Simulate order execution based on current order book
- Track simulated positions separately from real positions
- Handle all the same failure modes as live trading (reconnection, rate limits, outages)
The gap: backtesting infrastructure gives you false confidence. Your strategy might work on historical data but fail in paper trading because:
- Slippage is worse than your model assumed
- Your reconnection logic has a bug
- You hit rate limits during high-volatility periods
- Your position sizing logic doesn’t account for partial fills
Multi-Exchange Agents: Reconciling Inconsistent APIs
If your agent trades across multiple exchanges, you need a unified abstraction layer. Each exchange has:
- Different timestamp formats (Unix milliseconds, ISO 8601, exchange-specific)
- Different order book depth (some provide 20 levels, some provide 100)
- Different fee structures (maker/taker, tiered by volume, flat)
- Different order types (some support stop-limit, some don’t)
Abstraction layer pattern:
class UnifiedExchange:
def __init__(self, exchange_adapter):
self.adapter = exchange_adapter
async def get_order_book(self, symbol):
raw_book = await self.adapter.fetch_order_book(symbol)
return self.normalize_order_book(raw_book)
def normalize_order_book(self, raw_book):
return {
'bids': [(float(price), float(size))
for price, size in raw_book['bids'][:20]],
'asks': [(float(price), float(size))
for price, size in raw_book['asks'][:20]],
'timestamp': self.normalize_timestamp(raw_book['timestamp'])
}
def normalize_timestamp(self, ts):
# Convert to Unix milliseconds
if isinstance(ts, str):
return int(datetime.fromisoformat(ts).timestamp() * 1000)
return int(ts)
The CCXT library provides this abstraction for 100+ exchanges, but you still need to handle exchange-specific quirks (rate limits, order size precision, minimum order values).
Observability: What to Log When Things Go Wrong
When your agent loses money, you need to reconstruct what happened. Logs must capture:
- Every order placement (symbol, side, size, price, timestamp)
- Every fill (actual execution price, fees paid)
- Every decision (why did the agent place this order?)
- Every error (API failures, validation failures, reconnection events)
Structured logging pattern:
logger.info("order_placed", extra={
"symbol": "BTC-USD",
"side": "buy",
"size": 0.1,
"price": 45000,
"order_id": "abc123",
"reason": "mean_reversion_signal",
"account_balance": 10000,
"position_before": 0.0
})
Store logs in a queryable format (Elasticsearch, ClickHouse). You need to answer questions like:
- What was the agent’s position size when it placed this order?
- How many times did the WebSocket reconnect in the last hour?
- What percentage of orders were rejected due to insufficient balance?
Technical Verdict
Use autonomous trading agents when:
- You have a tested strategy with positive expectancy in paper trading (not just backtesting)
- You have external guardrails (kill switch, position limits) independent of agent code
- You can afford to lose the entire account balance (seriously)
- You have observability infrastructure to debug losses post-mortem
Avoid when:
- You’re moving from backtesting directly to live trading without paper trading
- Your agent has no external kill switch or position limits
- You don’t have a plan for handling exchange outages and WebSocket reconnections
- You’re using this as a primary income source (the failure modes are too unpredictable)
The infrastructure gap between “I built a trading bot” and “I built a trading bot that won’t liquidate my account” is wider than most engineers expect. The plumbing is not glamorous, but it’s the difference between a learning experience and a financial disaster.