Lucid Dreamer: Multimodal World Model for Zero-Shot Policy Transfer in Multi-Agent Autonomous Racing (ICRA 2025 Submission)
Published:
Brief: While reinforcement learning offers potential for continual learning and adaptability in complex scenarios, its application to real-world robotics faces significant challenges. Unlike in simulations, physical platforms struggle to collect a diverse corpus of training data due to critical safety risks and the inherent constraints of operating within a dynamic and partially observable environment. Our work draws inspiration from the human capability to fuse and exploit multiple sensing modalities, construct comprehensive models of how the world operates, and then leverage those models to adeptly navigate in challenging and often unpredictable environments. Here is an overview of how unmanned vehicles (ground, air, and surface) can exploit a world model constructed through multimodal perception, to learn near-optimal policies for guidance and control. A key aspect of the approach is learning from imagination in which the world model is used to simulate future imagined trajectories, enabling it to anticipate potential risks before encountering them in the real world. Our ongoing work and long-term vision is to evolve the traditional sense-plan-act framework into a more intuitive and cognitively inspired sense-imagine-act model. Dr. Elena Shrestha's presentation of our research.
Role: Graduate Research Assistant, Robotics & Optimization for Analysis of Human Motion (ROAHM) Lab
Contribution:
- Optimized the performance of the Jetson TX2 board to improve the efficiency of semantic segmentation tasks, reducing callback duration by fivefold and enhancing real-time processing for autonomous operations in dynamic environments.
- Developed a waypoint-follower algorithm for multi-agent experiments, utilizing the cartographer\_ros package for map building and localization. This algorithm was crucial to reinforcement learning experiments by enabling precise navigation and path planning for multiple autonomous agents.
- Configured a teleoperation controller, conducted LIDAR scans, calibrated IMUs, and implemented SLAM using cartographer\_ros, resolving critical frame transformation issues that enhanced system reliability and performance.
- Prepared robot hardware for real-world testing, including setting up the battery, electrical, and mechanical components. I customized and 3D-printed parts to accommodate sensors, then connected and configured the sensors (LIDAR, camera, IMU, and odometry) through ROS for seamless data integration.
- Assisted in real-world testing, gathering critical sensor data—such as LIDAR, camera, odometry, IMU, and velocity measurements—focused on the robot’s mechanical response and navigation control. This data was used to train and optimize a reinforcement learning model, improving the robot’s autonomous capabilities.
[GitHub][Publication][Slide]
Skills: ROS, Rviz, Gazebo, SLAM (cartographer_ros), Sensor Fusion (LIDAR, Camera, IMU, GPS, Odometry), object avoidance algorithm, autonomous navigation, PyTorch, TensorFlow, U-Net, OpenCV, PID, Python, C++, Linux, Bash/Shell Scripting, Git, Docker, Microcontroller, SolidWorks, 3D printing
Contributors' Acknowledgement: Prof. Ram Vasudevan, Dr. Elena Shrestha, Hanxi Wan, Madhav Rawal, Surya Singh