Swarm Robotics at Home: ROS 2, Gazebo, and Rust for Multi-Agent Physical Systems

Running multiple autonomous robots in your living room is no longer a research-lab fantasy. The ROS 2 Humble LTS distribution, Gazebo simulation, and micro-ROS for microcontrollers have matured to the point where you can prototype decentralized swarm behaviors on a single workstation or a handful of Raspberry Pis. The plumbing is open source, the documentation is complete, and the failure modes are well understood.

This article walks through the infrastructure stack for multi-robot systems: how ROS 2 isolates message buses per agent, how Gazebo spawns N robots in a single world, how micro-ROS bridges embedded boards to the ROS 2 graph, and where Rust fits when you need deterministic latency on the edge.

What Swarming Actually Means

Chris Benson’s working definition: “Swarming occurs when numerous independent fully-autonomous multi-agentic platforms exhibit highly-coordinated locomotive and emergent behaviors with agency and self-governance in any domain (air, ground, sea, undersea, space), functioning as a single independent logical distributed decentralized decisioning entity.”

Translation: no central orchestrator. Each robot runs its own decision loop, publishes observations, subscribes to neighbors’ state, and converges on collective behavior through local rules. The coordination emerges from message passing, not from a master node issuing commands.

This is fundamentally different from a fleet management system where a server dispatches tasks. In a true swarm, the control plane is distributed across every agent.

ROS 2 Namespace Isolation

ROS 2 uses DDS (Data Distribution Service) as its middleware. Every node publishes and subscribes to topics on a shared network. When you spawn multiple robots, you need to prevent message collisions: robot A’s /cmd_vel topic must not interfere with robot B’s.

Solution: namespaces. Each robot gets a unique prefix (e.g., /robot_0, /robot_1). Launch files remap topics so that /robot_0/cmd_vel and /robot_1/cmd_vel coexist on the same DDS domain.

# Example ROS 2 launch snippet for spawning two robots
from launch import LaunchDescription
from launch_ros.actions import Node

def generate_launch_description():
    return LaunchDescription([
        Node(
            package='my_robot_pkg',
            executable='robot_node',
            namespace='robot_0',
            parameters=[{'robot_id': 0}]
        ),
        Node(
            package='my_robot_pkg',
            executable='robot_node',
            namespace='robot_1',
            parameters=[{'robot_id': 1}]
        ),
    ])

Each node subscribes to /robot_N/scan for lidar data and publishes to /robot_N/cmd_vel for motion commands. The namespace acts as a message bus partition.

Gotcha: DDS discovery can be slow when you scale past 10-20 robots on a single subnet. You may need to tune RMW_FASTRTPS_USE_QOS_FROM_XML or switch to a different RMW implementation (e.g., CycloneDDS) to reduce discovery overhead.

Gazebo Multi-Robot Simulation

Gazebo (formerly Ignition, now Gazebo Sim) is the standard physics simulator for ROS 2. You define a world file (SDF format), spawn robot models, and Gazebo publishes sensor data (lidar, cameras, IMU) to ROS 2 topics.

Spawning N robots:

Load a base world (empty plane, obstacles, etc.).
Use ros2 run gazebo_ros spawn_entity.py with unique --entity names and --namespace flags.
Each robot’s URDF/SDF includes sensor plugins that publish to namespaced topics.

The Gazebo server runs a single physics loop. All robots share the same timestep, so you get deterministic replay and debugging. You can pause, step, or rewind the simulation.

Performance: A modern workstation (16 GB RAM, 8-core CPU) can simulate 5-10 wheeled robots with lidar at real-time speed. Add cameras or complex collision meshes, and you drop to 0.5x real-time. For larger swarms, you either reduce sensor fidelity or run headless (no GUI) and offload rendering.

Micro-ROS for Embedded Boards

Full ROS 2 nodes require a Linux kernel and several hundred MB of RAM. For microcontrollers (STM32, ESP32, Teensy), you use micro-ROS: a stripped-down ROS 2 client library that runs on FreeRTOS or bare metal.

Architecture:

The microcontroller runs a micro-ROS agent client.
A companion SBC (Raspberry Pi, Jetson Nano) runs the micro-ROS agent server.
The agent bridges UART/SPI/UDP messages from the MCU to the ROS 2 DDS graph.

Latency and memory constraints:

Component	RAM Usage	Typical Latency	Use Case
Full ROS 2 node (SBC)	200-500 MB	5-20 ms	High-level planning, vision, SLAM
micro-ROS (MCU)	50-200 KB	1-5 ms	Motor control, sensor polling, low-level reflexes

You offload time-critical control loops (PID for motor speed, obstacle avoidance reflexes) to the MCU. High-level path planning and swarm coordination logic runs on the SBC.

Failure mode: If the agent connection drops, the MCU can either halt or fall back to a safe behavior (e.g., stop motors). You define this in the MCU firmware. There is no automatic failover in micro-ROS itself.

Decentralized Coordination Patterns

Without a central orchestrator, how do robots agree on collective behavior?

Common patterns:

Consensus protocols: Raft or Paxos for electing a temporary leader when needed (e.g., deciding which robot opens a door). Overkill for simple swarms, useful for heterogeneous fleets.
Behavior trees per agent: Each robot runs a local behavior tree that reacts to neighbor state. Example: “If I see a neighbor within 2 meters, slow down. If I see a target and no neighbor is closer, approach.”
Potential fields: Each robot publishes its position. Others compute repulsive forces (avoid collision) and attractive forces (move toward goal). The robot integrates forces into a velocity command.
Auction-based task allocation: Robots bid on tasks (e.g., “I can reach waypoint X in 10 seconds”). Highest bidder claims the task. No central auctioneer; bids are broadcast on a shared topic.

Message passing: ROS 2 topics are pub/sub, so every robot sees every message (unless you use DDS partitions). For large swarms, you filter by distance: only subscribe to neighbors within communication range. This requires a spatial index (e.g., a simple grid or k-d tree) updated every N milliseconds.

Rust for Edge Performance

ROS 2 nodes are typically written in C++ or Python. Python is slow for control loops. C++ is fast but verbose and unsafe.

Rust in the stack:

rclrs: Rust bindings for ROS 2. You write nodes in Rust, compile to native binaries, and deploy to SBCs.
Embassy: Async embedded framework for STM32/ESP32. You can write micro-ROS client logic in Rust instead of C.
Zenoh: A Rust-native pub/sub protocol that can bridge to ROS 2 via a plugin. Lower latency than DDS for edge-to-cloud scenarios.

When to use Rust:

You need deterministic latency (e.g., 1 kHz control loop).
You want memory safety without garbage collection pauses.
You are deploying to resource-constrained SBCs (512 MB RAM).

When to stick with C++/Python:

You need mature ROS 2 libraries (Nav2, MoveIt) that do not have Rust equivalents.
Your team already has C++ expertise.
Prototyping speed matters more than runtime performance.

Observability and Debugging

Tools:

rqt_graph: Visualize the ROS 2 node graph. See which topics each robot subscribes to.
ros2 topic echo: Inspect message payloads in real time.
Gazebo GUI: Replay simulation, visualize sensor rays, check collision geometry.
Foxglove Studio: Web-based dashboard for live and recorded ROS 2 data. Supports 3D visualization, plots, and custom panels.

Common failure modes:

Clock skew: Robots drift out of sync if you rely on system time. Use ROS 2’s /clock topic and use_sim_time parameter in simulation.
Network partitions: A robot loses WiFi and stops receiving neighbor state. Implement a timeout: if no message in N seconds, assume isolation and switch to safe mode.
Resource exhaustion: Too many robots publishing high-frequency sensor data saturate the network. Throttle topics or use QoS policies (e.g., BEST_EFFORT instead of RELIABLE).

Deployment Shape

Local development:

Single workstation running Gazebo + ROS 2 Humble.
Spawn 3-5 robots in simulation.
Write and test coordination logic in Python or Rust.

Hardware prototype:

3-5 Raspberry Pi 4 (4 GB RAM each) or Jetson Nano.
Each Pi runs a full ROS 2 node (planning, localization).
Each Pi connects via UART to an STM32 or ESP32 running micro-ROS (motor control).
Robots communicate over WiFi (ad-hoc mesh or access point).

Scaling to 10+ robots:

Switch to a more efficient DDS implementation (CycloneDDS).
Use DDS partitions to limit discovery scope.
Offload compute-heavy tasks (SLAM, vision) to a central edge server; robots send raw sensor data and receive waypoints.

Security Boundaries

ROS 2 supports SROS2 (Secure ROS 2): DDS security plugins for authentication, encryption, and access control. You generate certificates per node and define policies (e.g., “robot_0 can publish to /robot_0/cmd_vel but not /robot_1/cmd_vel”).

Practical reality: Most hobbyist and research setups skip SROS2 because certificate management is tedious. If you deploy on an untrusted network (public WiFi, outdoor field test), enable it. Otherwise, rely on network isolation (VLAN, VPN).

Embedded attack surface: Micro-ROS agents accept commands over UART/SPI. If an attacker gains physical access to the MCU serial port, they can inject motor commands. Mitigation: use signed messages or run the agent on a secure element.

Trade-Offs

Approach	Pros	Cons
Full ROS 2 on every robot	Rich ecosystem, easy debugging	High RAM/CPU, slower boot
Micro-ROS + SBC hybrid	Low latency, deterministic control	More complex architecture, two failure domains
Centralized planning + edge execution	Simpler per-robot logic	Single point of failure, network dependency
Pure decentralized swarm	No SPOF, scales horizontally	Harder to debug, emergent bugs

Technical Verdict

Use this stack when:

You are prototyping multi-robot coordination algorithms and need fast iteration in simulation.
You want to deploy 3-10 physical robots on consumer hardware (Raspberry Pi, ESP32).
You need open-source tools with active communities and long-term support.

Avoid when:

You need 50+ robots in a single swarm (DDS discovery overhead becomes prohibitive; consider custom UDP multicast or Zenoh).
Your robots have strict real-time requirements (sub-millisecond jitter). ROS 2 is soft real-time; use a dedicated RTOS for the control layer.
You are building a production fleet with SLAs. ROS 2 is a research/prototyping platform. You will need custom monitoring, OTA updates, and fleet management tooling on top.