Commit 0b94ef9
Wire quant_predicate for mixed-precision quantization
Add a quant_predicate on the privacy-filter Model that keeps the MoE
router at 8 bits while the rest of the weights quantize to the user's
chosen bit width. The router is a small but routing-sensitive linear;
a uniform 4-bit quantization of the router was measurably degrading
accuracy in gpt-oss-style models, and the same applies here.
Follow mlx-vlm's pattern in convert.py: delegate quantization to
mlx_lm.utils.quantize_model, passing a wrapper that composes
mlx-embeddings' skip-vision / group-size checks with the model's
quant_predicate. mlx_lm handles recording per-layer overrides into
config["quantization"][path], and the existing load path in utils.py
already respects those.
Verified: bf16 and q4 (uniform) both still extract the same PII spans;
mixed-precision q4-experts + q8-router saves to disk with 4.52 bits/
weight, loads correctly, and extracts the same spans.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 44616af commit 0b94ef9
2 files changed
Lines changed: 30 additions & 12 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
91 | 91 | | |
92 | 92 | | |
93 | 93 | | |
94 | | - | |
95 | 94 | | |
96 | 95 | | |
97 | | - | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
98 | 114 | | |
| 115 | + | |
99 | 116 | | |
100 | 117 | | |
101 | 118 | | |
102 | | - | |
103 | | - | |
104 | | - | |
| 119 | + | |
105 | 120 | | |
106 | | - | |
107 | | - | |
108 | | - | |
109 | | - | |
110 | | - | |
| 121 | + | |
111 | 122 | | |
112 | 123 | | |
113 | 124 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
155 | 155 | | |
156 | 156 | | |
157 | 157 | | |
158 | | - | |
159 | | - | |
160 | 158 | | |
161 | 159 | | |
162 | 160 | | |
| |||
315 | 313 | | |
316 | 314 | | |
317 | 315 | | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
0 commit comments