You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add SwinTransformer support for torch_onnx quantization workflow
Enable end-to-end quantize-export-TRT pipeline for SwinTransformer models
(v1 and v2) across FP8, INT8, MXFP8, NVFP4, and auto precision modes.
Core fixes:
- Add LayerNormalization, Clip, Mul, Exp to change_casts_to_fp16 for FP8
stronglyTyped compatibility (fixes type mismatches in Swin/SwinV2 TRT builds)
Example/test changes:
- Add Conv2d quantization overrides for TRT compatibility (MXFP8/NVFP4->FP8,
INT4_AWQ->INT8) since TRT only supports FP8/INT8 for convolutions
- Add cpb_mlp and downsample to quantization filter exclusion list
- Add --no_pretrained and --model_kwargs CLI args for testing with tiny models
- Add --timm_model_name to download_example_onnx.py (default: ViT)
- Add SwinTransformer to vision_models.py with dynamic input size resolution
- Rewrite tests: parametrize over (ViT, Swin, SwinV2) x (fp8, int8, mxfp8,
nvfp4, auto) with TRT engine build verification using --stronglyTyped
- Update README with vision model support matrix and Conv2d override docs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Copy file name to clipboardExpand all lines: examples/torch_onnx/README.md
+19-8Lines changed: 19 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -53,6 +53,7 @@ The `torch_quant_to_onnx.py` script quantizes [timm](https://github.com/huggingf
53
53
54
54
- Loads a pretrained timm torch model (default: ViT-Base).
55
55
- Quantizes the torch model to FP8, MXFP8, INT8, NVFP4, or INT4_AWQ using ModelOpt.
56
+
- For models with Conv2d layers (e.g., SwinTransformer), automatically overrides Conv2d quantization to FP8 (for MXFP8/NVFP4 modes) or INT8 (for INT4_AWQ mode) for TensorRT compatibility.
56
57
- Exports the quantized model to ONNX.
57
58
- Postprocesses the ONNX model to be compatible with TensorRT.
58
59
- Saves the final ONNX model.
@@ -63,11 +64,21 @@ The `torch_quant_to_onnx.py` script quantizes [timm](https://github.com/huggingf
63
64
64
65
```bash
65
66
python torch_quant_to_onnx.py \
66
-
--timm_model_name=vit_base_patch16_224 \
67
+
--timm_model_name=<timm model name> \
67
68
--quantize_mode=<fp8|mxfp8|int8|nvfp4|int4_awq> \
68
69
--onnx_save_path=<path to save the exported ONNX model>
69
70
```
70
71
72
+
### Conv2d Quantization Override
73
+
74
+
TensorRT only supports FP8 and INT8 for convolution operations. When quantizing models with Conv2d layers (like SwinTransformer), the script automatically applies the following overrides:
75
+
76
+
| Quantize Mode | Conv2d Override | Reason |
77
+
| :---: | :---: | :--- |
78
+
| FP8, INT8 | None (already compatible) | Native TRT support |
If the input model is of type image classification, use the following script to evaluate it. The script automatically downloads and uses the [ILSVRC/imagenet-1k](https://huggingface.co/datasets/ILSVRC/imagenet-1k) dataset from Hugging Face. This gated repository requires authentication via Hugging Face access token. See <https://huggingface.co/docs/hub/en/security-tokens> for details.
0 commit comments