Neel P. Bhatt*
·
Yunhao Yang*
·
Rohan Siva
Daniel Milan
.
Ufuk Topcu
.
Atlas Wang
*Equal contribution
A new framework for disentangling, quantifying, and handling uncertainty in multimodal foundation models, reducing variability in model responses by 40% and enhancing success rate/reliability in robot perception and planning!
Full video available at: here.
Current models struggle with unpredictable environments, as they can’t accurately separate perception and decision uncertainties. This limits their effectiveness in real-world robotics and autonomous driving.
We present a novel framework for enhancing multimodal foundation models in robotic planning by disentangling, quantifying, and addressing perception and decision uncertainties. By isolating perception uncertainty in visual interpretation and decision uncertainty in plan generation, our approach enables targeted uncertainty management.
$ pip install openai==1.40.2
$ pip install openai-clip==1.0.1
$ pip install seaborn
$ pip install pandas
$ pip install torch==2.2.2
$ pip install torchvision==0.17.2- Disentangling and quantifying perception and decision uncertainty here.
- Inference on our fine-tuned multimodal foundation model and comparison with benchmark here.
All model checkpoints are available on huggingface here.
- Carla Images
- Table-Top Manipulation (Robot Arm's View and Top View)
- Table-Top Manipulation (Side View)
If you find this work interesting and use it in your research, please consider citing our paper.
@inproceedings{bhatt2025knowyoureuncertainplanning,
title={Know Where You're Uncertain When Planning with Multimodal Foundation Models: A Formal Framework},
author={Neel P. Bhatt and Yunhao Yang and Rohan Siva and Daniel Milan and Ufuk Topcu and Zhangyang Wang},
year={2025},
booktitle={Proceedings of the Seventh Annual Conference on Machine Learning and Systems},
address={Santa Clara, CA, USA},
publisher={mlsys.org},
}

