Commit 5b398bd
committed
Fix palettize_weights with enable_per_channel_scale=True crashing on ANE (macOS 26)
When OpPalettizerConfig is configured with enable_per_channel_scale=True,
palettize_weights wraps the constexpr_lut_to_dense output in a
constexpr_blockwise_shift_scale op (data=<dense fp16 weight>, scale=<per-channel
fp16>). On macOS 26, the MPSGraph backend lowering for that constexpr op fails
verification when targeting the Apple Neural Engine:
'mps.dequantize' op operand #2 must be tensor of quantized values,
but got 'tensor<1xf16>'
... failed assertion `original module failed verification'
The MPSGraph lowering of constexpr_blockwise_shift_scale assumes the data
operand is a quantized integer tensor (it lowers to mps.dequantize); with
enable_per_channel_scale=True, the data is the dense fp16 weight, which fails
that assumption. CPU and GPU compute units accept the wrapper and predict
correctly; only the ANE-targeted MIL -> MPSGraph dispatch is broken.
Fix: bake per_channel_scale into the LUT entries at compile time and re-emit
constexpr_lut_to_dense, instead of leaving the scale as a runtime constexpr.
Both data and scale are fp16 and the wrapper's only effect is data * scale, so
the fold is mathematically identical. The failing MPSGraph dispatch is
eliminated entirely, and CPU / GPU numerics stay bit-identical with the prior
behavior. Resulting graph also has one fewer runtime constexpr per palettized
const.
Test updated: TestPalettizeWeights::test_palettization_pcs previously asserted
that the constexpr_blockwise_shift_scale wrapper was emitted; it now asserts
the wrapper is absent (the LUT is pre-scaled). Numerical equivalence vs the
unpalettized model is verified by the existing verify_model_outputs call on
macOS 15+.
Tested:
- test_palettization_pcs: PASS
- All 155 TestPalettizeWeights / TestJointCompressWeights: PASS
- Manual: Qwen3-VL 2B stateful chunk on macOS 26 + M4 ANE:
MPSGraph verification crash gone (was reproducible at every load).1 parent e95804f commit 5b398bd
2 files changed
Lines changed: 41 additions & 9 deletions
File tree
- coremltools
- optimize/coreml
- test/optimize/coreml
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1139 | 1139 | | |
1140 | 1140 | | |
1141 | 1141 | | |
1142 | | - | |
1143 | | - | |
1144 | | - | |
1145 | | - | |
1146 | | - | |
| 1142 | + | |
| 1143 | + | |
| 1144 | + | |
| 1145 | + | |
| 1146 | + | |
| 1147 | + | |
| 1148 | + | |
| 1149 | + | |
| 1150 | + | |
| 1151 | + | |
| 1152 | + | |
| 1153 | + | |
| 1154 | + | |
| 1155 | + | |
| 1156 | + | |
| 1157 | + | |
| 1158 | + | |
| 1159 | + | |
| 1160 | + | |
| 1161 | + | |
| 1162 | + | |
| 1163 | + | |
| 1164 | + | |
| 1165 | + | |
| 1166 | + | |
| 1167 | + | |
| 1168 | + | |
| 1169 | + | |
| 1170 | + | |
| 1171 | + | |
| 1172 | + | |
1147 | 1173 | | |
| 1174 | + | |
1148 | 1175 | | |
1149 | 1176 | | |
1150 | 1177 | | |
| |||
Lines changed: 9 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1683 | 1683 | | |
1684 | 1684 | | |
1685 | 1685 | | |
1686 | | - | |
| 1686 | + | |
| 1687 | + | |
| 1688 | + | |
| 1689 | + | |
| 1690 | + | |
| 1691 | + | |
| 1692 | + | |
| 1693 | + | |
1687 | 1694 | | |
1688 | 1695 | | |
1689 | 1696 | | |
1690 | | - | |
1691 | | - | |
1692 | | - | |
| 1697 | + | |
1693 | 1698 | | |
1694 | 1699 | | |
1695 | 1700 | | |
| |||
0 commit comments