Commit e4df91b
authored
OMNIML-2663] Replace modelopt FP8 QDQ nodes with native ONNX QDQ nodes (#852)
## What does this PR do?
**Type of change:**
New feature
**Overview:**
- Updated FP8 quant exporter to replace modelopt custom QDQ nodes with
native ONNX QDQ nodes
- Updated get_onnx_bytes_and_metadata to make convert_float_to_float16()
default instead of autocast
- Created util functions to fix graph structure after conversion
## Testing
```
python torch_quant_to_onnx.py --quantize_mode=fp8 \
--onnx_save_path=<model_path> \
--calibration_data_size 64 \
--batch_size 128
python evaluate.py --onnx_path=<model_path> \
--model_name=vit_base_patch16_224 \
--results_path=./results.txt \
--batch_size 128
```
Results:
Before replacement:
```
The top1 accuracy of the model is 85.06%
The top5 accuracy of the model is 97.558%
Inference latency of the model is 5.27963 ms
```
After replacement:
```
The top1 accuracy of the model is 85.054%
The top5 accuracy of the model is 97.542%
Inference latency of the model is 5.74771 ms
```
## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->
- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: No
- Replaced modelopt QDQ nodes with native ONNX qdq nodes
- **Did you write any new necessary tests?**: No
- **Did you add or update any necessary documentation?**: No
- **Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**:
No
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* ONNX utilities to remove redundant Casts, fold Constant→Cast patterns,
and convert targeted Casts to FP16.
* **Improvements**
* FP8 QDQ nodes now converted to native ONNX QDQ/Dequantize nodes for
improved compatibility.
* Export pipeline streamlined: consistent FP16 handling, unified weight
quantization, cast cleanup ordering, and added logging for better
traceability.
* **Tests**
* Unit tests updated to use the new ONNX utilities.
* **Changelog**
* Entry added noting FP8 QDQ → native ONNX QDQ conversion.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>1 parent beac6e9 commit e4df91b
File tree
9 files changed
+450
-286
lines changed- modelopt
- onnx
- autocast
- export
- torch/_deploy/utils
- tests/unit/onnx/autocast
9 files changed
+450
-286
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
35 | | - | |
| 35 | + | |
| 36 | + | |
36 | 37 | | |
37 | 38 | | |
38 | 39 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
130 | 130 | | |
131 | 131 | | |
132 | 132 | | |
133 | | - | |
| 133 | + | |
134 | 134 | | |
135 | 135 | | |
136 | 136 | | |
| |||
279 | 279 | | |
280 | 280 | | |
281 | 281 | | |
282 | | - | |
| 282 | + | |
283 | 283 | | |
284 | 284 | | |
285 | 285 | | |
| |||
303 | 303 | | |
304 | 304 | | |
305 | 305 | | |
306 | | - | |
307 | | - | |
| 306 | + | |
| 307 | + | |
308 | 308 | | |
309 | 309 | | |
310 | 310 | | |
| |||
342 | 342 | | |
343 | 343 | | |
344 | 344 | | |
345 | | - | |
| 345 | + | |
346 | 346 | | |
347 | 347 | | |
348 | 348 | | |
349 | 349 | | |
350 | 350 | | |
351 | 351 | | |
352 | | - | |
| 352 | + | |
353 | 353 | | |
354 | 354 | | |
355 | 355 | | |
| |||
457 | 457 | | |
458 | 458 | | |
459 | 459 | | |
460 | | - | |
| 460 | + | |
461 | 461 | | |
462 | 462 | | |
463 | 463 | | |
| |||
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
| 30 | + | |
30 | 31 | | |
31 | 32 | | |
32 | 33 | | |
| |||
60 | 61 | | |
61 | 62 | | |
62 | 63 | | |
63 | | - | |
64 | | - | |
65 | | - | |
66 | | - | |
67 | | - | |
68 | | - | |
69 | | - | |
70 | | - | |
71 | | - | |
72 | | - | |
73 | | - | |
74 | | - | |
75 | | - | |
76 | | - | |
77 | | - | |
78 | | - | |
79 | | - | |
80 | | - | |
81 | | - | |
82 | | - | |
83 | | - | |
84 | | - | |
85 | | - | |
86 | | - | |
87 | | - | |
88 | | - | |
89 | 64 | | |
90 | 65 | | |
91 | 66 | | |
| |||
99 | 74 | | |
100 | 75 | | |
101 | 76 | | |
102 | | - | |
| 77 | + | |
103 | 78 | | |
104 | 79 | | |
105 | 80 | | |
106 | 81 | | |
107 | 82 | | |
108 | | - | |
109 | | - | |
110 | | - | |
111 | | - | |
112 | | - | |
113 | | - | |
114 | | - | |
115 | | - | |
116 | | - | |
117 | | - | |
118 | | - | |
119 | | - | |
120 | | - | |
121 | | - | |
122 | | - | |
123 | | - | |
124 | | - | |
125 | | - | |
126 | 83 | | |
127 | 84 | | |
128 | 85 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| 25 | + | |
| 26 | + | |
25 | 27 | | |
26 | 28 | | |
27 | 29 | | |
| |||
45 | 47 | | |
46 | 48 | | |
47 | 49 | | |
48 | | - | |
| 50 | + | |
49 | 51 | | |
50 | 52 | | |
51 | | - | |
| 53 | + | |
52 | 54 | | |
53 | 55 | | |
54 | | - | |
| 56 | + | |
55 | 57 | | |
56 | 58 | | |
57 | 59 | | |
| |||
62 | 64 | | |
63 | 65 | | |
64 | 66 | | |
65 | | - | |
| 67 | + | |
66 | 68 | | |
67 | 69 | | |
68 | 70 | | |
| |||
88 | 90 | | |
89 | 91 | | |
90 | 92 | | |
91 | | - | |
| 93 | + | |
92 | 94 | | |
93 | 95 | | |
94 | 96 | | |
| |||
101 | 103 | | |
102 | 104 | | |
103 | 105 | | |
104 | | - | |
105 | | - | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
215 | 215 | | |
216 | 216 | | |
217 | 217 | | |
218 | | - | |
| 218 | + | |
219 | 219 | | |
220 | 220 | | |
221 | 221 | | |
| |||
259 | 259 | | |
260 | 260 | | |
261 | 261 | | |
262 | | - | |
| 262 | + | |
263 | 263 | | |
264 | 264 | | |
265 | 265 | | |
| |||
365 | 365 | | |
366 | 366 | | |
367 | 367 | | |
368 | | - | |
| 368 | + | |
369 | 369 | | |
370 | 370 | | |
371 | 371 | | |
| |||
0 commit comments