Commit 6e77a83
[BugFix][5997203] Update Sqrt casts to FP16 (#1084)
### What does this PR do?
Type of change: Bug fix
Change cast nodes before Sqrt to FP16
### Testing
```
python torch_quant_to_onnx.py --quantize_mode=mxfp8 --timm_model_name=vit_base_patch16_224 --onnx_save_path=vit_base_patch16_224.mxfp8.onnx --calibration_data_size=512
python evaluate.py --onnx_path=vit_base_patch16_224.mxfp8.onnx --model_name=vit_base_patch16_224 --eval_data_size=100
```
### Before your PR is "*Ready for review*"
Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)
and your commits are signed (`git commit -s -S`).
Make sure you read and follow the [Security Best
Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors)
(e.g. avoiding hardcoded `trust_remote_code=True`, `torch.load(...,
weights_only=False)`, `pickle`, etc.).
- Is this change backward compatible?: ❌
- Casts before Sqrt are now FP16 instead of FP32
- If you copied code from any other sources or added a new PIP
dependency, did you follow guidance in `CONTRIBUTING.md`: ✅
- Did you write any new necessary tests?:N/A
- Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?:
N/A
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Bug Fixes**
* Improved ONNX model export for quantized models with reduced precision
(fp16/bf16) by enhancing type casting handling during the export
process.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>1 parent bd188a9 commit 6e77a83
1 file changed
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
608 | 608 | | |
609 | 609 | | |
610 | 610 | | |
611 | | - | |
| 611 | + | |
612 | 612 | | |
613 | 613 | | |
614 | 614 | | |
| |||
0 commit comments