Lucid Dreamer: Multimodal World Model for Zero-Shot Policy Transfer in Multi-Agent Autonomous Racing (ICRA 2025 Submission)

Published:

Supervisors: Prof. Ram Vasudevan and Dr. Elena Shrestha



Brief: While reinforcement learning offers potential for continual learning and adaptability in complex scenarios, its application to real-world robotics faces significant challenges. Unlike in simulations, physical platforms struggle to collect a diverse corpus of training data due to critical safety risks and the inherent constraints of operating within a dynamic and partially observable environment. Our work draws inspiration from the human capability to fuse and exploit multiple sensing modalities, construct comprehensive models of how the world operates, and then leverage those models to adeptly navigate in challenging and often unpredictable environments. Here is an overview of how unmanned vehicles (ground, air, and surface) can exploit a world model constructed through multimodal perception, to learn near-optimal policies for guidance and control. A key aspect of the approach is learning from imagination in which the world model is used to simulate future imagined trajectories, enabling it to anticipate potential risks before encountering them in the real world. Our ongoing work and long-term vision is to evolve the traditional sense-plan-act framework into a more intuitive and cognitively inspired sense-imagine-act model. Dr. Elena Shrestha's presentation of our research.

Role: Graduate Research Assistant, Robotics & Optimization for Analysis of Human Motion (ROAHM) Lab

Contribution:

- Optimized the performance of the Jetson TX2 board to improve the efficiency of semantic segmentation tasks, reducing callback duration by fivefold and enhancing real-time processing for autonomous operations in dynamic environments.
- Developed a waypoint-follower algorithm for multi-agent experiments, utilizing the cartographer_ros package for map building and localization. This algorithm was crucial to reinforcement learning experiments by enabling precise navigation and path planning for multiple autonomous agents.
- Configured a teleoperation controller, conducted LIDAR scans, calibrated IMUs, and implemented SLAM using cartographer_ros, resolving critical frame transformation issues that enhanced system reliability and performance.
- Prepared the robot hardware for real-world testing, including setting up the battery, electrical, and mechanical components. I customized and 3D-printed parts to accommodate sensors, then connected and configured the sensors (LIDAR, camera, IMU, and odometry) through ROS for seamless data integration.
- Assisted in real-world testing, gathering critical sensor data—such as LIDAR, camera, odometry, IMU, and velocity measurements—focused on the mechanical response and navigation control of the robot. This data was used to train and optimize a reinforcement learning model, improving the robot's autonomous capabilities.


Remote control test

Remote control test on terrain

MBRL Autonomous Experiment

Real-time SLAM on the UGV robot

[GitHub][Publication]

Skills: ROS, Simulation (Rviz, Gazebo), Sensor Integration (LIDAR, Camera, IMU), SLAM, Machine Learning (PyTorch, TensorFlow), Control Systems (PID), Computer Vision (OpenCV), Python, C++, Linux, Bash/Shell Scripting, Git, Debugger, Microcontroller, PWM, SolidWorks
Contributors' Acknowledgement: