Commit 0ebcd70
authored
Support VLM calibration with image-text data (#755)
## What does this PR do?
**Type of change:** New feature
**Overview:**
The primary goal of this PR is to allow the model optimizer to use
image-text pair data during the calibration phase of quantization, which
is likely help improve accuracy of quantized VLMs like Nemotron VL on
visual understanding tasks particularly, compared to text-only
calibration data.
- New Feature: Adds support for VLM calibration specifically using
image-text data.
- Dataset Integration: Introduces support for sampling from the
`Nemotron-VLM-Dataset-v2`.
- Refactoring: Created a separate utility for VLM datasets to keep the
main Hugging Face PTQ script (`hf_ptq.py`) clean.
- Simplified logic for handling multimodal inputs.
- Addressed specific issues encountered when calibrating the
`Nemotron-Nano-VL-12B-V2` model with image data.
- Documentation: Updated the README to include instructions and examples
for VLM calibration.
This PR complements #347
and we will consolidate llm_ptq and vlm_ptq examples in follow-up PRs.
## Usage
<!-- You can potentially add a usage example below. -->
```python
python3 hf_ptq.py --pyt_ckpt_path /home/scratch.omniml_data_2/models/Nemotron-Nano-VL-12B-V2 --qformat nvfp4 --export_path /home/omniml_data_3/zhiyuc/checkpoints/Nemotron-Nano-VL-12B-V2-NVFP4-doccalib --trust_remote_code --kv_cache_qformat none --calib_with_images --vlm_dataset nemotron_vlm_dataset_v2 --vlm_subsets sparsetables,plotqa_cot --calib_size 512
```
## Testing
<!-- Mention how have you tested your change if applicable. -->
## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->
- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: Yes <!--- If No, explain why.
-->
- **Did you write any new necessary tests?**: Yes/No
- **Did you add or update any necessary documentation?**: Yes
- **Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**:
Not yet <!--- Only for new features, API changes, critical bug fixes or
bw breaking changes. -->
## Additional Information
<!-- E.g. related issue. -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added Vision-Language Model (VLM) calibration support with image-text
pair data, specifically for Nemotron VL models.
* Added new `--calib_with_images` CLI flag to enable image-based
calibration workflows.
* Integrated Nemotron VLM dataset v2 for streaming multimodal
calibration data.
* **Documentation**
* Added VLM calibration guidance in the PTQ README with usage examples
and dataset information.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>1 parent 8c36f5a commit 0ebcd70
File tree
6 files changed
+904
-44
lines changed- examples/llm_ptq
- modelopt/torch/utils
6 files changed
+904
-44
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
162 | 162 | | |
163 | 163 | | |
164 | 164 | | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
165 | 182 | | |
166 | 183 | | |
167 | 184 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| 18 | + | |
18 | 19 | | |
19 | 20 | | |
20 | 21 | | |
| |||
131 | 132 | | |
132 | 133 | | |
133 | 134 | | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
134 | 182 | | |
135 | 183 | | |
136 | 184 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
| 28 | + | |
28 | 29 | | |
29 | 30 | | |
30 | 31 | | |
| |||
98 | 99 | | |
99 | 100 | | |
100 | 101 | | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
101 | 135 | | |
102 | 136 | | |
103 | 137 | | |
| |||
108 | 142 | | |
109 | 143 | | |
110 | 144 | | |
111 | | - | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
112 | 169 | | |
113 | 170 | | |
114 | 171 | | |
| |||
165 | 222 | | |
166 | 223 | | |
167 | 224 | | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
168 | 231 | | |
169 | 232 | | |
170 | 233 | | |
| |||
292 | 355 | | |
293 | 356 | | |
294 | 357 | | |
| 358 | + | |
295 | 359 | | |
| 360 | + | |
296 | 361 | | |
297 | 362 | | |
298 | 363 | | |
| |||
308 | 373 | | |
309 | 374 | | |
310 | 375 | | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
311 | 401 | | |
312 | 402 | | |
313 | 403 | | |
| |||
321 | 411 | | |
322 | 412 | | |
323 | 413 | | |
| 414 | + | |
324 | 415 | | |
325 | 416 | | |
326 | 417 | | |
327 | 418 | | |
328 | | - | |
329 | | - | |
330 | | - | |
331 | | - | |
332 | | - | |
333 | | - | |
334 | | - | |
335 | | - | |
336 | | - | |
337 | | - | |
338 | | - | |
339 | | - | |
340 | | - | |
341 | | - | |
342 | | - | |
343 | | - | |
344 | | - | |
345 | | - | |
346 | | - | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
347 | 423 | | |
348 | 424 | | |
349 | 425 | | |
| |||
356 | 432 | | |
357 | 433 | | |
358 | 434 | | |
| 435 | + | |
359 | 436 | | |
360 | 437 | | |
361 | 438 | | |
| |||
433 | 510 | | |
434 | 511 | | |
435 | 512 | | |
436 | | - | |
437 | | - | |
438 | | - | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
439 | 521 | | |
440 | 522 | | |
441 | 523 | | |
| |||
462 | 544 | | |
463 | 545 | | |
464 | 546 | | |
| 547 | + | |
465 | 548 | | |
466 | 549 | | |
467 | 550 | | |
| |||
551 | 634 | | |
552 | 635 | | |
553 | 636 | | |
| 637 | + | |
| 638 | + | |
554 | 639 | | |
555 | 640 | | |
556 | 641 | | |
| |||
700 | 785 | | |
701 | 786 | | |
702 | 787 | | |
| 788 | + | |
703 | 789 | | |
704 | 790 | | |
705 | 791 | | |
| |||
815 | 901 | | |
816 | 902 | | |
817 | 903 | | |
818 | | - | |
| 904 | + | |
| 905 | + | |
| 906 | + | |
| 907 | + | |
| 908 | + | |
| 909 | + | |
| 910 | + | |
| 911 | + | |
| 912 | + | |
819 | 913 | | |
820 | 914 | | |
821 | 915 | | |
| |||
866 | 960 | | |
867 | 961 | | |
868 | 962 | | |
| 963 | + | |
| 964 | + | |
| 965 | + | |
| 966 | + | |
| 967 | + | |
| 968 | + | |
| 969 | + | |
| 970 | + | |
869 | 971 | | |
870 | 972 | | |
871 | 973 | | |
| |||
1003 | 1105 | | |
1004 | 1106 | | |
1005 | 1107 | | |
| 1108 | + | |
1006 | 1109 | | |
1007 | 1110 | | |
1008 | 1111 | | |
| |||
1020 | 1123 | | |
1021 | 1124 | | |
1022 | 1125 | | |
| 1126 | + | |
1023 | 1127 | | |
1024 | 1128 | | |
1025 | 1129 | | |
| |||
1030 | 1134 | | |
1031 | 1135 | | |
1032 | 1136 | | |
1033 | | - | |
| 1137 | + | |
1034 | 1138 | | |
1035 | 1139 | | |
0 commit comments