Skip to content

uncertainty-in-planning/uncertainty-in-planning.github.io

Repository files navigation

Know Where You’re Uncertain When Planning with Multimodal Foundation Models: A Formal Framework

Neel P. Bhatt* · Yunhao Yang* · Rohan Siva
Daniel Milan . Ufuk Topcu . Atlas Wang
*Equal contribution

🎯 TL;DR

A new framework for disentangling, quantifying, and handling uncertainty in multimodal foundation models, reducing variability in model responses by 40% and enhancing success rate/reliability in robot perception and planning!

Demo Videos (CARLA Sim | Ground Robot | Table-top Manipulation)

Video Demos

Full video available at: here.

Framework

Current models struggle with unpredictable environments, as they can’t accurately separate perception and decision uncertainties. This limits their effectiveness in real-world robotics and autonomous driving.

We present a novel framework for enhancing multimodal foundation models in robotic planning by disentangling, quantifying, and addressing perception and decision uncertainties. By isolating perception uncertainty in visual interpretation and decision uncertainty in plan generation, our approach enables targeted uncertainty management.

Framework: Active Sensing

Framework: Automated Refinement

Setup

$ pip install openai==1.40.2
$ pip install openai-clip==1.0.1
$ pip install seaborn
$ pip install pandas
$ pip install torch==2.2.2
$ pip install torchvision==0.17.2

Example Notebooks

  1. Disentangling and quantifying perception and decision uncertainty here.
  2. Inference on our fine-tuned multimodal foundation model and comparison with benchmark here.

Model Checkpoints

All model checkpoints are available on huggingface here.

Datasets

Calibration

  1. Carla Images
  2. Table-Top Manipulation (Robot Arm's View and Top View)
  3. Table-Top Manipulation (Side View)

Training

Carla Images

Testing

Real-World Driving

Sample Images for Inference

Inference Samples

Citation

If you find this work interesting and use it in your research, please consider citing our paper.

@inproceedings{bhatt2025knowyoureuncertainplanning,
            title={Know Where You're Uncertain When Planning with Multimodal Foundation Models: A Formal Framework},
            author={Neel P. Bhatt and Yunhao Yang and Rohan Siva and Daniel Milan and Ufuk Topcu and Zhangyang Wang},
            year={2025},
            booktitle={Proceedings of the Seventh Annual Conference on Machine Learning and Systems},
            address={Santa Clara, CA, USA},
            publisher={mlsys.org},
}

About

[MLSys 2025] Know Where You're Uncertain When Planning with Multimodal Foundation Models: A Formal Framework

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors