You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add EfficientViT support for torch_onnx quantization workflow
Add end-to-end support for efficientvit_l2 (Conv2d-heavy timm model) in
the torch_onnx quantization-to-ONNX-to-TRT pipeline. This required
several fixes to handle Conv2d layers with FP8 quantization:
- Disable FP8 autocast during ONNX export to avoid dynamic shape issues
- Disable Conv2d FP8 weight quantizer during ONNX export (TRT_FP8 custom
ops produce dynamic shapes incompatible with ONNX Conv kernel shape
requirement)
- Add fix_fp16_fp32_mismatches() to insert Cast nodes resolving FP32/FP16
type mismatches after blocked-op FP16 conversion
- Extend configure_linear_module_onnx_quantizers() to handle non-Linear
modules with block-quantized input quantizers (e.g., pooling layers)
- Add _disable_conv2d_dynamic_quantizers() to disable NVFP4/MXFP8 dynamic
quantizers on Conv2d (TRT dynamic quantize requires 2D/3D, Conv2d is 4D)
- Set calibration algorithm for MXFP8 Conv2d FP8 overrides
- Add global_pool to filter_func exclusions
- Relax is_fp8_quantized() to detect models with only input_quantizer FP8
Supported modes: FP8, INT8, MXFP8, NVFP4. Auto mode excluded due to
Conv2d FP8 input/weight type mismatch in TRT stronglyTyped.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
0 commit comments