Commit 5753e51
Add UniG2U benchmark task with model support (#1297)
* Add UniG2U benchmark task with model support
Add UniG2U (Unified Generation-to-Understanding) benchmark covering
11 sub-tasks across chart understanding, geometry, physics, spatial
planning, and visual puzzles.
Two evaluation modes:
- unig2u: standard understanding (single-stage)
- unig2u_GtA: Visual CoT two-stage (generate auxiliary image, then answer)
GtA triggered explicitly via generation_kwargs.visual_cot: true.
New model implementations: Ovis-U1, ILLUME+, MMaDa, Qwen-Image-Edit.
via [HAPI](https://hapi.run)
Co-Authored-By: HAPI <noreply@hapi.run>
* Fix qwen_image_edit device_map for diffusion pipeline
DiffusionPipeline does not support device_map="auto", use "balanced" instead.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Add Visual CoT (GtA) support check for models
Models without GtA implementation now raise a clear error when run on
visual_cot tasks, instead of silently degrading with garbled prompts.
- Add `supports_visual_cot` class attribute to lmms base (default False)
- Add `_check_visual_cot_support()` guard in base class
- Call the check in evaluator before generate_until
- Set `supports_visual_cot = True` on ovis_u1, bagel_unig2u, illume_plus, qwen_image_edit
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Revert "Add Visual CoT (GtA) support check for models"
This reverts commit 2bc4f01.
* Add generate_visual_cot as dedicated output type for GtA tasks
Visual CoT (GtA) tasks now use output_type: generate_visual_cot instead
of generate_until. Models must implement generate_visual_cot() to run
these tasks — models without it get a clear NotImplementedError.
- Add generate_visual_cot() default in lmms base class (raises NotImplementedError)
- Register generate_visual_cot in ALL_OUTPUT_TYPES and construct_requests
- All 31 visual_cot yaml configs: output_type → generate_visual_cot
- 4 GtA models (ovis_u1, bagel_unig2u, illume_plus, qwen_image_edit):
implement generate_visual_cot() delegating to generate_until()
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Update README with generate_visual_cot documentation
Reflect the new GtA mechanism: models must implement
generate_visual_cot() to run visual_cot tasks.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Remove duplicate entries in .gitignore
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Organize unig2u task folder by sub-task
Move yaml configs into per-task subdirectories for clarity.
Symlink utils.py into each subdirectory for !function resolution.
tasks/unig2u/
├── chartqa100/ ├── geometry3k/
├── auxsolidmath/ ├── babyvision/
├── illusionbench/ ├── mmsi/
├── phyx/ ├── realunify/
├── uni_mmmu/ ├── vsp/
├── visualpuzzles/
├── unig2u.yaml # top-level standard group
├── unig2u_GtA.yaml # top-level GtA group
├── utils.py # shared utilities
└── README.md
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Address PR review feedback
- Fix function name collisions: rename generic doc_to_visual/doc_to_text/
process_results/aggregate_results to babyvision_* and realunify_* prefixed
versions, update 10 yaml files accordingly
- Fix Python 3.9 compat: _JudgeClient | None → Optional[_JudgeClient]
- Remove eval_logger reassignment from loguru to stdlib logging
- Remove parse_response random fallback (return None instead)
- Remove gradient_checkpointing_enable() during eval in illume_plus
- Add OOM warning for dual model loading in qwen_image_edit
- Fix bagel_umm registration to use short class name
- Fix != None → is not None
- Fix GPT-5.1 → GPT-4o in docstrings
- Fix bare except → except (json.JSONDecodeError, ValueError, KeyError)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Split merged utils.py into per-task independent utils
Replace the single 2900-line merged utils.py with the original
per-task utils.py files from the reference implementation. Each
sub-task folder now has its own independent utils.py, eliminating
function name collisions entirely.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* style: run black + isort formatting
* Fix code quality issues in per-task utils
- Remove eval_logger reassignment in mmsi, visualpuzzles utils
- Remove random fallback in visualpuzzles parse_response (return None)
- Fix != None → is not None in visualpuzzles
- Fix GPT-5.1 → GPT-4o in auxsolidmath, geometry3k docstrings
- Fix bare except → specific exceptions in uni_mmmu
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Fix lint: run black + isort on mmsi and visualpuzzles utils
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Ubuntu <xinjiezhang@RobustDNN-A100-47.anzplpv4vzne3ajfbcfpzr1dkd.ix.internal.cloudapp.net>
Co-authored-by: HAPI <noreply@hapi.run>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: mwxely <yang0756@e.ntu.edu.sg>1 parent be9c135 commit 5753e51
101 files changed
Lines changed: 11471 additions & 1 deletion
File tree
- lmms_eval
- api
- models
- simple
- tasks/unig2u
- auxsolidmath
- babyvision
- chartqa100
- geometry3k
- illusionbench
- mmsi
- phyx
- realunify
- uni_mmmu
- visualpuzzles
- vsp
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
81 | 81 | | |
82 | 82 | | |
83 | 83 | | |
| 84 | + | |
| 85 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
118 | 118 | | |
119 | 119 | | |
120 | 120 | | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
121 | 132 | | |
122 | 133 | | |
123 | 134 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
59 | 59 | | |
60 | 60 | | |
61 | 61 | | |
| 62 | + | |
62 | 63 | | |
63 | 64 | | |
64 | 65 | | |
| |||
1563 | 1564 | | |
1564 | 1565 | | |
1565 | 1566 | | |
| 1567 | + | |
| 1568 | + | |
1566 | 1569 | | |
1567 | 1570 | | |
1568 | 1571 | | |
| |||
1572 | 1575 | | |
1573 | 1576 | | |
1574 | 1577 | | |
1575 | | - | |
| 1578 | + | |
1576 | 1579 | | |
1577 | 1580 | | |
1578 | 1581 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| 26 | + | |
26 | 27 | | |
27 | 28 | | |
28 | 29 | | |
| |||
40 | 41 | | |
41 | 42 | | |
42 | 43 | | |
| 44 | + | |
43 | 45 | | |
44 | 46 | | |
45 | 47 | | |
| |||
63 | 65 | | |
64 | 66 | | |
65 | 67 | | |
| 68 | + | |
66 | 69 | | |
67 | 70 | | |
68 | 71 | | |
69 | 72 | | |
70 | 73 | | |
71 | 74 | | |
| 75 | + | |
72 | 76 | | |
73 | 77 | | |
74 | 78 | | |
| |||
79 | 83 | | |
80 | 84 | | |
81 | 85 | | |
| 86 | + | |
82 | 87 | | |
83 | 88 | | |
84 | 89 | | |
| |||
0 commit comments