Feature/vlm bbox finetuning by gokul-tqagi · Pull Request #949 · Physical-Intelligence/openpi

gokul-tqagi · 2026-05-15T23:48:10Z

No description provided.

Self-contained experiment setup under experiments/ for fine-tuning pi0.5 on the lipbalm pick task using only the right arm from bimanual ALOHA WidowX recordings. - Data processing script extracts right arm (indices 7:14) and keeps cam_high + cam_right_wrist from 3 raw bimanual datasets - Custom single-arm transforms following the DROID 2-camera pattern - YAML config for training hyperparams and remote server targeting - train.sh / eval.sh sync code+data to SSH servers and launch jobs - No existing openpi source files are modified

- Switch to JAX backend for LoRA fine-tuning (PyTorch trainer lacks freeze_filter support) - Add convert_to_lerobot_v21.py for v3.0 -> v2.1 format conversion (required by OpenPI's pinned LeRobot 0.1.0) - Add validate_dataset.py to verify frame counts, action/state shapes, image integrity, and value match against source data - Patch LeRobot get_safe_version to skip Hub check for local datasets - Fix HF_LEROBOT_HOME env var (was LEROBOT_HOME, deprecated) - batch_size=16 fits L40S 46GB with LoRA + JAX

- Evaluate checkpoint against dataset episodes with action prediction MAE per joint - Patch LeRobot Hub check for local datasets (same as run_train) - Support configurable episode/frame sampling

…ipeline - Move run_train.py → train/, run_eval.py + export_results.py → eval/ - Fix all import paths (sys.path, __file__-relative) for new locations - Update train.sh/eval.sh script paths and rsync excludes - Add marker_pick.yaml config (74 episodes, 4 datasets) - Generalize config.py: remove lipbalm defaults, derive prompt/name from YAML - Fix convert_to_lerobot_v21: add convert_and_merge() for multi-dataset conversion - Make process_single_arm.py --datasets required (no hardcoded defaults) - Add .gitignore for results/, processed data, logs - Rewrite README as generic OpenPI LoRA fine-tuning workflow guide

Pi0.5 LoRA fine-tuning pipeline for single-arm ALOHA tasks

`<loc_NNNN>` bbox tokens (lost during π0.5 Stage-2) while preserving its action-generation quality, via a joint α·CE(loc tokens) + β·MSE(flow-matching) objective with the action expert kept frozen as an MSE anchor.

gokul-tqagi and others added 8 commits May 5, 2026 15:19

Fix image stats path resolution in process_single_arm.py

f4b2693

Add dataset evaluation with per-joint error reporting

c771fef

- Evaluate checkpoint against dataset episodes with action prediction MAE per joint - Patch LeRobot Hub check for local datasets (same as run_train) - Support configurable episode/frame sampling

Merge pull request #1 from TorqueAGI-AIBrain/experiments/lipbalm-pi05

12234b4

Pi0.5 LoRA fine-tuning pipeline for single-arm ALOHA tasks

Add bbox auto-annotation pipeline for the VLM training data

2dac309

gokul-tqagi requested review from Michael-Equi, jimmyt857 and kvablack as code owners May 15, 2026 23:48

jimmyt857 removed their request for review May 22, 2026 03:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/vlm bbox finetuning#949

Feature/vlm bbox finetuning#949
gokul-tqagi wants to merge 8 commits into
Physical-Intelligence:mainfrom
TorqueAGI-AIBrain:feature/vlm-bbox-finetuning

gokul-tqagi commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gokul-tqagi commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants