Know Where You’re Uncertain When Planning with Multimodal Foundation Models: A Formal Framework

Neel P. Bhatt* · Yunhao Yang* · Rohan Siva
Daniel Milan . Ufuk Topcu . Atlas Wang
*Equal contribution

Project Page | Arxiv | Paper | Interactive Demo

🎯 TL;DR

A new framework for disentangling, quantifying, and handling uncertainty in multimodal foundation models, reducing variability in model responses by 40% and enhancing success rate/reliability in robot perception and planning!

Demo Videos (CARLA Sim | Ground Robot | Table-top Manipulation)

Full video available at: here.

Framework

Current models struggle with unpredictable environments, as they can’t accurately separate perception and decision uncertainties. This limits their effectiveness in real-world robotics and autonomous driving.

We present a novel framework for enhancing multimodal foundation models in robotic planning by disentangling, quantifying, and addressing perception and decision uncertainties. By isolating perception uncertainty in visual interpretation and decision uncertainty in plan generation, our approach enables targeted uncertainty management.

Setup

$ pip install openai==1.40.2
$ pip install openai-clip==1.0.1
$ pip install seaborn
$ pip install pandas
$ pip install torch==2.2.2
$ pip install torchvision==0.17.2

Example Notebooks

Disentangling and quantifying perception and decision uncertainty here.
Inference on our fine-tuned multimodal foundation model and comparison with benchmark here.

Model Checkpoints

All model checkpoints are available on huggingface here.

Datasets

Calibration

Citation

If you find this work interesting and use it in your research, please consider citing our paper.

@inproceedings{bhatt2025knowyoureuncertainplanning,
            title={Know Where You're Uncertain When Planning with Multimodal Foundation Models: A Formal Framework},
            author={Neel P. Bhatt and Yunhao Yang and Rohan Siva and Daniel Milan and Ufuk Topcu and Zhangyang Wang},
            year={2025},
            booktitle={Proceedings of the Seventh Annual Conference on Machine Learning and Systems},
            address={Santa Clara, CA, USA},
            publisher={mlsys.org},
}

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
distributions		distributions
docs		docs
examples		examples
.gitignore		.gitignore
README.md		README.md
active_sensing.png		active_sensing.png
automated_refinement.png		automated_refinement.png
demos.gif		demos.gif
fine-tuned-model-inference.ipynb		fine-tuned-model-inference.ipynb
uncertainty-quantification.ipynb		uncertainty-quantification.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Know Where You’re Uncertain When Planning with Multimodal Foundation Models: A Formal Framework

Project Page | Arxiv | Paper | Interactive Demo

🎯 TL;DR

Demo Videos (CARLA Sim | Ground Robot | Table-top Manipulation)

Framework

Setup

Example Notebooks

Model Checkpoints

Datasets

Calibration

Training

Testing

Sample Images for Inference

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Know Where You’re Uncertain When Planning with Multimodal Foundation Models: A Formal Framework

Project Page | Arxiv | Paper | Interactive Demo

🎯 TL;DR

Demo Videos (CARLA Sim | Ground Robot | Table-top Manipulation)

Framework

Setup

Example Notebooks

Model Checkpoints

Datasets

Calibration

Training

Testing

Sample Images for Inference

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages