You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/torch_onnx/README.md
+19-8Lines changed: 19 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -53,6 +53,7 @@ The `torch_quant_to_onnx.py` script quantizes [timm](https://github.com/huggingf
53
53
54
54
- Loads a pretrained timm torch model (default: ViT-Base).
55
55
- Quantizes the torch model to FP8, MXFP8, INT8, NVFP4, or INT4_AWQ using ModelOpt.
56
+
- For models with Conv2d layers (e.g., SwinTransformer), automatically overrides Conv2d quantization to FP8 (for MXFP8/NVFP4 modes) or INT8 (for INT4_AWQ mode) for TensorRT compatibility.
56
57
- Exports the quantized model to ONNX.
57
58
- Postprocesses the ONNX model to be compatible with TensorRT.
58
59
- Saves the final ONNX model.
@@ -63,11 +64,21 @@ The `torch_quant_to_onnx.py` script quantizes [timm](https://github.com/huggingf
63
64
64
65
```bash
65
66
python torch_quant_to_onnx.py \
66
-
--timm_model_name=vit_base_patch16_224 \
67
+
--timm_model_name=<timm model name> \
67
68
--quantize_mode=<fp8|mxfp8|int8|nvfp4|int4_awq> \
68
69
--onnx_save_path=<path to save the exported ONNX model>
69
70
```
70
71
72
+
### Conv2d Quantization Override
73
+
74
+
TensorRT only supports FP8 and INT8 for convolution operations. When quantizing models with Conv2d layers (like SwinTransformer), the script automatically applies the following overrides:
75
+
76
+
| Quantize Mode | Conv2d Override | Reason |
77
+
| :---: | :---: | :--- |
78
+
| FP8, INT8 | None (already compatible) | Native TRT support |
If the input model is of type image classification, use the following script to evaluate it. The script automatically downloads and uses the [ILSVRC/imagenet-1k](https://huggingface.co/datasets/ILSVRC/imagenet-1k) dataset from Hugging Face. This gated repository requires authentication via Hugging Face access token. See <https://huggingface.co/docs/hub/en/security-tokens> for details.
0 commit comments