Commit 3d3c6ff
fix(apertus): force-dequant token_embd under NATIVE_OPTIMIZED
ApertusWeightLoader.streamingTensorToTensor / readerTensorToTensor wrap
quantized weights with byte-level rank-1 shape under
QuantPolicy.NATIVE_OPTIMIZED so the native FFM kernels can address the
block layout directly. That works for matmul (the kernel knows the
logical shape from metadata) but breaks Embedding.gather, which requires
the logical rank-2 [vocab, dim] shape — a rank-1 weight tensor errors
with "gather: unsupported input rank 1".
Surfaced by ApertusNetworkLoader.fromGguf().load() on real
unsloth/Apertus-8B-Instruct-2509 Q4_K_S: token_embd is stored as Q4_K
in the GGUF and gets the byte-level shape, so the very first forward
pass through the embedding layer dies before any logit math.
Add loadStreamingTensor / loadReaderTensor wrappers around the existing
*ToTensor helpers. They route token_embd.weight through the dequant
path (DequantOps.dequantFromBytes → createTensor with the logical
[vocab, dim] shape) when quantPolicy is NATIVE_OPTIMIZED and the
tensor is a quantized type. Other tensors keep their NATIVE_OPTIMIZED
byte-level layout for kernel dispatch.
The integration test class kdoc documents the next blocker that
prevents end-to-end inference (linearProject in MultiHeadAttention
calls ops.transpose on byte-shape weights for Q/K/V/O and FFN
projections, which Gemma solves via Q4_KBlockTensorData but Apertus
doesn't yet implement). Tracked as #100.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 583ebbc commit 3d3c6ff
2 files changed
Lines changed: 86 additions & 4 deletions
File tree
- llm-inference/apertus/src
- commonMain/kotlin/sk/ainet/models/apertus
- jvmTest/kotlin/sk/ainet/models/apertus
Lines changed: 58 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
120 | 120 | | |
121 | 121 | | |
122 | 122 | | |
123 | | - | |
| 123 | + | |
124 | 124 | | |
125 | 125 | | |
126 | 126 | | |
127 | 127 | | |
128 | | - | |
| 128 | + | |
| 129 | + | |
129 | 130 | | |
130 | 131 | | |
131 | 132 | | |
| |||
162 | 163 | | |
163 | 164 | | |
164 | 165 | | |
165 | | - | |
| 166 | + | |
166 | 167 | | |
167 | 168 | | |
168 | 169 | | |
169 | 170 | | |
170 | | - | |
| 171 | + | |
| 172 | + | |
171 | 173 | | |
172 | 174 | | |
173 | 175 | | |
| |||
560 | 562 | | |
561 | 563 | | |
562 | 564 | | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
| 594 | + | |
| 595 | + | |
| 596 | + | |
| 597 | + | |
| 598 | + | |
| 599 | + | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
563 | 617 | | |
564 | 618 | | |
565 | 619 | | |
| |||
Lines changed: 28 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
180 | 180 | | |
181 | 181 | | |
182 | 182 | | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
183 | 211 | | |
184 | 212 | | |
185 | 213 | | |
| |||
0 commit comments