Commit 26cad67
authored
[OMNIML-3252][ONNX] Add real Q/DQ scales in Autotune (#951)
## What does this PR do?
**Type of change:** New feature
**Overview:** ONNX Autotune (also called Auto Q/DQ) is currently and
standalone feature of ModelOpt that automatically adds Q/DQ where
relevant according to information obtained from TensorRT inference. One
issue is that the scales in those Q/DQ nodes are random.
This PR does 2 major things:
1. Integrates Auto Q/DQ into the ONNX quantization workflow; and
2. Enables calibration data to be used to obtain the correct scales for
the Q/DQ nodes.
## Usage
```python
$ python -m modelopt.onnx.quantization --onnx_path=model.onnx --autotune={quick,default,extensive}
```
> Please see `__main__.py` for other args.
## Testing
1. Added unittest for Q/DQ node placement validation:
`tests/gpu/onnx/quantization/test_autotune_quantization_integration.py`
2. Verified that accuracy was recovered by integrating MOQ with
Autotune. Results on RTX 3090 with TRT 10.12.0.36 (`--stronglyTyped`)
with ViT, as per `examples/onnx_ptq`:
| Model | Top-1 acc | Top-5 acc |
|--------------------------|---------------|----------------|
| FP32 | 85.1% | 97.5% |
| FP16 (FP32 with --fp16) | 85.1% | 97.5% |
| Quant (MOQ) | 82.4% | 96.4% |
| Quant (Autotune) | 0.1% | 0.5%|
| Quant (MOQ + Autotune) | 79.6% | 95.0% |
Notice that accuracy was mostly recovered from standalone Autotune to
MOQ + Autotune (real Q/DQ scales). The drop in accuracy between MOQ and
MOQ + Autotune is likely due to some sensitive nodes being quantized,
such as `BiasAdd` (see bug 5916898).
## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->
- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: Yes <!--- If No, explain why.
-->
- **Did you write any new necessary tests?**: Yes
- **Did you add or update any necessary documentation?**: No (will be
done in a different PR)
- **Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**:
No <!--- Only for new features, API changes, critical bug fixes or bw
breaking changes. -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Autotuning added to ONNX quantization: CLI flags, presets, per-region
tuning, and FP8/INT8 support; accepts in-memory models and optional
output dirs; node-filter loading and explicit-flag CLI behavior.
* Activation-operation accessor exposed and autotune helpers added to
the package API.
* **Bug Fixes**
* Safer graph rewiring to avoid corrupting quantized graphs when targets
are absent.
* **Tests**
* New integration test and model helper validating autotune quantization
consistency.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Additional information
To reproduce accuracy with ViT, call `download_example_onnx.py` and
`image_prep.py` without `--fp16`.
If `--fp16` is used here, quantizing this model with `--autotune`
results in the following error:
```
[modelopt][onnx] - ERROR - Benchmark failed: Converting dtype('float16') to a ctypes type
```
This is fixed in #978.
---------
Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>1 parent fe83270 commit 26cad67
16 files changed
Lines changed: 788 additions & 121 deletions
File tree
- modelopt/onnx
- quantization
- autotune
- tests
- _test_utils/onnx/quantization/autotune
- gpu/onnx/quantization
- unit/onnx/quantization/autotune
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
386 | 386 | | |
387 | 387 | | |
388 | 388 | | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
23 | 28 | | |
24 | 29 | | |
25 | 30 | | |
| |||
295 | 300 | | |
296 | 301 | | |
297 | 302 | | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
298 | 405 | | |
299 | 406 | | |
300 | 407 | | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
301 | 425 | | |
302 | 426 | | |
303 | 427 | | |
| |||
331 | 455 | | |
332 | 456 | | |
333 | 457 | | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
334 | 466 | | |
335 | 467 | | |
336 | 468 | | |
| |||
362 | 494 | | |
363 | 495 | | |
364 | 496 | | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
365 | 510 | | |
366 | 511 | | |
367 | 512 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
23 | 26 | | |
24 | 27 | | |
25 | 28 | | |
| |||
42 | 45 | | |
43 | 46 | | |
44 | 47 | | |
| 48 | + | |
45 | 49 | | |
46 | 50 | | |
| 51 | + | |
47 | 52 | | |
48 | 53 | | |
49 | 54 | | |
| |||
60 | 65 | | |
61 | 66 | | |
62 | 67 | | |
| 68 | + | |
63 | 69 | | |
64 | 70 | | |
| 71 | + | |
65 | 72 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
24 | 29 | | |
25 | 30 | | |
26 | 31 | | |
| |||
44 | 49 | | |
45 | 50 | | |
46 | 51 | | |
47 | | - | |
48 | | - | |
49 | | - | |
50 | | - | |
51 | | - | |
52 | | - | |
53 | | - | |
54 | | - | |
55 | | - | |
56 | | - | |
57 | | - | |
58 | | - | |
59 | 52 | | |
60 | 53 | | |
61 | 54 | | |
| |||
73 | 66 | | |
74 | 67 | | |
75 | 68 | | |
76 | | - | |
77 | | - | |
78 | | - | |
79 | | - | |
80 | | - | |
81 | | - | |
82 | | - | |
83 | | - | |
84 | | - | |
85 | | - | |
86 | | - | |
87 | | - | |
88 | | - | |
89 | | - | |
90 | | - | |
91 | | - | |
92 | | - | |
93 | | - | |
94 | | - | |
95 | | - | |
96 | | - | |
97 | | - | |
98 | | - | |
99 | | - | |
100 | 69 | | |
101 | 70 | | |
102 | 71 | | |
| |||
155 | 124 | | |
156 | 125 | | |
157 | 126 | | |
158 | | - | |
159 | | - | |
160 | | - | |
161 | | - | |
162 | | - | |
163 | | - | |
164 | | - | |
165 | | - | |
166 | | - | |
167 | | - | |
168 | | - | |
169 | | - | |
| 127 | + | |
170 | 128 | | |
171 | | - | |
| 129 | + | |
172 | 130 | | |
173 | 131 | | |
174 | 132 | | |
| |||
265 | 223 | | |
266 | 224 | | |
267 | 225 | | |
268 | | - | |
| 226 | + | |
269 | 227 | | |
270 | 228 | | |
271 | 229 | | |
| |||
331 | 289 | | |
332 | 290 | | |
333 | 291 | | |
334 | | - | |
| 292 | + | |
335 | 293 | | |
336 | 294 | | |
337 | 295 | | |
338 | 296 | | |
339 | 297 | | |
340 | 298 | | |
341 | 299 | | |
342 | | - | |
| 300 | + | |
343 | 301 | | |
344 | 302 | | |
345 | 303 | | |
| |||
0 commit comments