Skip to content

Commit 96df945

Browse files
committed
fix(wasm): update INT4 quantization to use matmul_nbits_quantizer API
The onnxruntime API changed from matmul_4bits_quantizer to matmul_nbits_quantizer with more generic n-bit quantization support. API Changes: - matmul_4bits_quantizer → matmul_nbits_quantizer module - DefaultWeightOnlyQuantConfig → RTNWeightOnlyQuantConfig - MatMul4BitsQuantizer → MatMulNBitsQuantizer This fixes the ImportError: cannot import name 'matmul_4bits_quantizer' that was preventing AI model INT4 quantization from working with onnxruntime >=1.20.
1 parent 2348268 commit 96df945

File tree

2 files changed

+5
-4
lines changed

2 files changed

+5
-4
lines changed

.github/workflows/build-wasm.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -169,7 +169,7 @@ jobs:
169169
pip list | grep -E "(onnx|optimum|torch)"
170170
echo ""
171171
python3 -c "import onnxruntime; print(f'ONNX Runtime version: {onnxruntime.__version__}')"
172-
python3 -c "from onnxruntime.quantization import matmul_4bits_quantizer; print('✓ INT4 quantization available')"
172+
python3 -c "from onnxruntime.quantization.matmul_nbits_quantizer import MatMulNBitsQuantizer, RTNWeightOnlyQuantConfig; print('✓ INT4 quantization available')"
173173
echo "::endgroup::"
174174
175175
- name: Install dependencies

packages/models/scripts/build.mjs

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -229,15 +229,16 @@ async function quantizeModel(modelKey) {
229229
try {
230230
await execAsync(
231231
`python3 -c "` +
232-
`from onnxruntime.quantization import matmul_4bits_quantizer, quant_utils; ` +
232+
`from onnxruntime.quantization.matmul_nbits_quantizer import MatMulNBitsQuantizer, RTNWeightOnlyQuantConfig; ` +
233+
`from onnxruntime.quantization import quant_utils; ` +
233234
`from pathlib import Path; ` +
234-
`quant_config = matmul_4bits_quantizer.DefaultWeightOnlyQuantConfig(` +
235+
`quant_config = RTNWeightOnlyQuantConfig(` +
235236
` block_size=128, ` +
236237
` is_symmetric=True, ` +
237238
` accuracy_level=4` +
238239
`); ` +
239240
`model = quant_utils.load_model_with_shape_infer(Path('${onnxPath}')); ` +
240-
`quant = matmul_4bits_quantizer.MatMul4BitsQuantizer(model, algo_config=quant_config); ` +
241+
`quant = MatMulNBitsQuantizer(model, algo_config=quant_config); ` +
241242
`quant.process(); ` +
242243
`quant.model.save_model_to_file('${quantPath}', True)` +
243244
`"`,

0 commit comments

Comments
 (0)