Multi-Agent RL in Three-Sided Marketplaces: How Delayed Feedback Trains Dispatch Systems Without Immediate Rewards
How DoorDash uses multi-agent RL to optimize dispatch when feedback arrives minutes after decisions, balancing delivery speed and courier utilization.