Skip to content

Commit cc97510

Browse files
committed
fix(ai): update onnxruntime to 1.21.0+ for INT4 quantization support
The AI model build was failing to use INT4 quantization because matmul_4bits_quantizer module requires onnxruntime 1.21.0 or higher. Error: ImportError: cannot import name 'matmul_4bits_quantizer' from 'onnxruntime.quantization' Issue: - requirements.txt specified onnxruntime>=1.20.0 - matmul_4bits_quantizer module was added in a later release - Build continued with FP32 (full precision) models instead of INT4 - Models were ~4x larger than necessary Fix: - Update requirements.txt to onnxruntime>=1.21.0 - Update build-sea.yml to match version requirement - INT4 quantization will now work, reducing model sizes by ~75% This provides better model compression without accuracy loss.
1 parent 56c614a commit cc97510

2 files changed

Lines changed: 2 additions & 2 deletions

File tree

.github/workflows/build-sea.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -313,7 +313,7 @@ jobs:
313313
if [ "${{ steps.ai-cache-valid.outputs.valid }}" != "true" ]; then
314314
echo "::group::Installing Python dependencies"
315315
python3 -m pip install --upgrade pip
316-
python3 -m pip install transformers torch optimum[onnx] "onnxruntime>=1.20.0"
316+
python3 -m pip install transformers torch optimum[onnx] "onnxruntime>=1.21.0"
317317
echo "::endgroup::"
318318
fi
319319
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
transformers
22
torch
33
optimum[onnx]
4-
onnxruntime>=1.20.0
4+
onnxruntime>=1.21.0

0 commit comments

Comments
 (0)