Commit cc97510
committed
fix(ai): update onnxruntime to 1.21.0+ for INT4 quantization support
The AI model build was failing to use INT4 quantization because
matmul_4bits_quantizer module requires onnxruntime 1.21.0 or higher.
Error:
ImportError: cannot import name 'matmul_4bits_quantizer'
from 'onnxruntime.quantization'
Issue:
- requirements.txt specified onnxruntime>=1.20.0
- matmul_4bits_quantizer module was added in a later release
- Build continued with FP32 (full precision) models instead of INT4
- Models were ~4x larger than necessary
Fix:
- Update requirements.txt to onnxruntime>=1.21.0
- Update build-sea.yml to match version requirement
- INT4 quantization will now work, reducing model sizes by ~75%
This provides better model compression without accuracy loss.1 parent 56c614a commit cc97510
2 files changed
Lines changed: 2 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
313 | 313 | | |
314 | 314 | | |
315 | 315 | | |
316 | | - | |
| 316 | + | |
317 | 317 | | |
318 | 318 | | |
319 | 319 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
| 4 | + | |
0 commit comments