@@ -27,18 +27,20 @@ import java.lang.foreign.Arena
2727 * [Q8MemorySegmentTensorData] with the **logical** matrix shape derived
2828 * from metadata. Upstream `DefaultCpuOpsJvm.matmul` and `transpose`
2929 * detect the markers and dispatch quant-aware kernels at forward time.
30- * - **Q4_K / Q5_K / Q6_K** → dequantized to FP32. The packed K-quant kernels
31- * are MemSeg-only on a hot path the DSL doesn't yet route through, so this
32- * trades memory for correctness. Same trade-off the legacy converter
33- * makes for K-quants.
30+ * - **Every other quant type** (Q4_1, Q5_0, Q5_1, Q8_1, the K-quants
31+ * Q4_K / Q5_K / Q6_K, IQ4_NL/XS, TQ1/2_0, ...) → dequantized to FP32. None
32+ * of these has a packed MemSeg kernel on the hot path the DSL routes
33+ * through, so this trades memory for correctness — the same trade-off the
34+ * legacy converter makes for K-quants. [DequantOps.dequantFromBytes] throws
35+ * for genuinely unknown types, so an unsupported model fails explicitly at
36+ * load time instead of silently passing bytes through and crashing later
37+ * inside matmul (see issue #654).
3438 * - **token_embd.weight** → always dequantized to FP32 regardless of quant
3539 * type. The Embedding layer consumes this via `gather`, not matmul, so it
3640 * needs real floats with the logical 2D shape — packed quant bytes would
3741 * be misread as FP32 values, and the loader's intermediate Int8 wrapper
3842 * stores a 1D byte-count shape that `gather` rejects.
3943 * - **FP32 (no entry in `quantTypes`)** → passed through unchanged.
40- * - **Other quant types** → warning logged, passed through (will fail later
41- * if the model actually hits them via matmul).
4244 *
4345 * Why logical shape matters here: the loader stores raw quant bytes via
4446 * `ctx.fromByteArray(Shape(bytes.size), Int8, bytes)` — a 1D byte-count
@@ -168,19 +170,17 @@ public object DecoderGgufMemSegConverter {
168170 @Suppress(" UNCHECKED_CAST" )
169171 ctx.fromData(newData as TensorData <FP32 , Float >, FP32 ::class )
170172 }
171- GGMLQuantizationType .Q4_K ,
172- GGMLQuantizationType .Q5_K ,
173- GGMLQuantizationType .Q6_K -> {
173+ // Every other GGUF quant type (Q4_1, Q5_0, Q5_1, Q8_1, the
174+ // K-quants, IQ4_NL/XS, TQ1/2_0, ...) has no packed MemSeg kernel
175+ // on the DSL forward path, so dequantize to FP32 here — the same
176+ // memory-for-correctness trade-off the K-quants already made.
177+ // DequantOps throws for genuinely unknown types, which turns what
178+ // used to be a silent pass-through (and a confusing crash deep
179+ // inside matmul) into an explicit failure at load time. See #654.
180+ else -> {
174181 val floats = DequantOps .dequantFromBytes(bytes, quantType, logicalShape.volume)
175182 ctx.fromFloatArray(logicalShape, FP32 ::class , floats)
176183 }
177- else -> {
178- println (
179- " WARNING: DecoderGgufMemSegConverter: unsupported quant type $quantType for '$name '; " +
180- " passing through unchanged. Forward pass may fail at matmul." ,
181- )
182- tensor
183- }
184184 }
185185 }
186186
0 commit comments