Commit 804cd74
fix(llama): dequantize Q4_1 (and all non-packed quant types) in DecoderGgufMemSegConverter
DecoderGgufMemSegConverter only handled Q4_0/Q8_0 (packed) and Q4_K/Q5_K/Q6_K (dequant); every other quant type fell into an else branch that logged a warning and passed the raw quant bytes through unchanged. The forward pass then crashed deep inside matmul with a dtype/layout mismatch (e.g. Q4_1 Qwen3 models: 'unsupported quant type Q4_1 for blk.0.ffn_down.weight').
Route the else branch through DequantOps.dequantFromBytes to FP32 — the same memory-for-correctness trade-off already used for K-quants. This covers Q4_1, Q5_0, Q5_1, Q8_1, IQ4_NL/XS, TQ1/2_0, etc. (all already implemented in skainet-io-gguf). DequantOps throws for genuinely unknown types, so an unsupported model now fails explicitly at load time instead of silently passing through and crashing later inside matmul.
Adds a regression test that a Q4_1 weight is dequantized to its logical 2D FP32 shape rather than passed through as 1D bytes.
Closes #654
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>1 parent 0a2185c commit 804cd74
2 files changed
Lines changed: 76 additions & 16 deletions
File tree
- llm-inference/llama/src
- jvmMain/kotlin/sk/ainet/models/llama
- jvmTest/kotlin/sk/ainet/models/llama
Lines changed: 16 additions & 16 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
30 | | - | |
31 | | - | |
32 | | - | |
33 | | - | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
34 | 38 | | |
35 | 39 | | |
36 | 40 | | |
37 | 41 | | |
38 | 42 | | |
39 | 43 | | |
40 | | - | |
41 | | - | |
42 | 44 | | |
43 | 45 | | |
44 | 46 | | |
| |||
168 | 170 | | |
169 | 171 | | |
170 | 172 | | |
171 | | - | |
172 | | - | |
173 | | - | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
174 | 181 | | |
175 | 182 | | |
176 | 183 | | |
177 | | - | |
178 | | - | |
179 | | - | |
180 | | - | |
181 | | - | |
182 | | - | |
183 | | - | |
184 | 184 | | |
185 | 185 | | |
186 | 186 | | |
| |||
Lines changed: 60 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
104 | 104 | | |
105 | 105 | | |
106 | 106 | | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
107 | 137 | | |
108 | 138 | | |
109 | 139 | | |
| |||
157 | 187 | | |
158 | 188 | | |
159 | 189 | | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
160 | 220 | | |
161 | 221 | | |
162 | 222 | | |
| |||
0 commit comments