feat: add Open-X VQA task by njb-nvidia · Pull Request #1346 · EvolvingLMMs-Lab/lmms-eval

njb-nvidia · 2026-05-20T22:45:20Z

Summary

Adds Open-X VQA, a multiple-choice VQA benchmark for embodied AI / robotic manipulation scenes derived from the Open-X-Embodiment data. Each item is a single-image MCQ where the model selects one of A-D.

Dataset: nv-njb/OpenXVQA on HuggingFace (6,676 test items, single `test` split).
Metric: `openxvqa_accuracy` — exact-match on the extracted MCQ letter.

Files

`lmms_eval/tasks/openxvqa/openxvqa.yaml` — task config.
`lmms_eval/tasks/openxvqa/utils.py` — image bytes -> PIL, MCQ letter extraction.

Parity vs. local fork

Qwen3-VL-2B-Instruct, full `test` split (6,676 items), 8x H100, greedy decoding.

Source	Accuracy	Identical predictions
Fork	0.5685	-
Upstream	0.5785	5,711 / 6,676 (85.6%)

Delta of +1.0pp is within the noise we have seen on other ports caused by minor drift in the upstream `qwen3_vl` model class.

Test plan

`uv run lmms-eval --tasks openxvqa --limit 8` smoke (single GPU)
Full `test` run on 8x H100 with Qwen3-VL-2B-Instruct, scores match the fork within noise
Per-doc analysis: 85.6% identical filtered_resps

Open-X VQA is a multiple-choice VQA benchmark for embodied AI / robotic manipulation scenes, derived from the Open-X-Embodiment data. Each item is a single-image MCQ where the model picks one of A-D. Dataset: nv-njb/OpenXVQA on HuggingFace (6,676 test items, single split). Metric: openxvqa_accuracy — exact-match on the extracted MCQ letter.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Open-X VQA task#1346

feat: add Open-X VQA task#1346
njb-nvidia wants to merge 1 commit into
EvolvingLMMs-Lab:mainfrom
njb-nvidia:add-openxvqa-task

njb-nvidia commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

njb-nvidia commented May 20, 2026

Summary

Files

Parity vs. local fork

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant