Skip to content

Commit 009acc3

Browse files
TheTomclaude
andcommitted
fix: gate turbo V unpad on V type, not K type (#42)
When using asymmetric KV (-ctk q8_0 -ctv turbo4), the V unpad code was gated on k->type being turbo. Since K is q8_0, the unpad was skipped even when V was turbo and padded to 128. This caused a shape mismatch at the wo matmul (ggml_can_mul_mat assertion) for models with non-128-aligned head_dim (e.g., GPT-OSS-120B with head_dim=64, openai_moe_iswa architecture). Fix: check v->type instead of k->type for V unpad blocks in both build_attn overloads. Q rotation remains correctly gated on k->type. Reported-by: NigelTufnel12345 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-Authored-By: tturney@psyguard.ai
1 parent 1073622 commit 009acc3

1 file changed

Lines changed: 4 additions & 2 deletions

File tree

src/llama-graph.cpp

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2189,7 +2189,8 @@ ggml_tensor * llm_graph_context::build_attn(
21892189

21902190
// TurboQuant: if V was padded, the output has padded dimensions.
21912191
// Extract original V head_dim after inverse WHT (applied inside build_attn_mha).
2192-
if (k->type == GGML_TYPE_TURBO3_0 || k->type == GGML_TYPE_TURBO4_0 || k->type == GGML_TYPE_TURBO2_0) {
2192+
// NOTE: gate on v->type (not k->type) for asymmetric configs where K=q8_0 but V=turbo
2193+
if (v->type == GGML_TYPE_TURBO3_0 || v->type == GGML_TYPE_TURBO4_0 || v->type == GGML_TYPE_TURBO2_0) {
21932194
const int64_t orig_v_head = hparams.n_embd_head_v(il);
21942195
// cur is 2D: (n_embd_head * n_head, n_tokens) after build_attn_mha
21952196
const int64_t padded_v_head = v->ne[0];
@@ -2415,7 +2416,8 @@ ggml_tensor * llm_graph_context::build_attn(
24152416
cb(cur, "kqv_out", il);
24162417

24172418
// TurboQuant: if V was padded, extract original V head_dim after inverse WHT
2418-
if (k->type == GGML_TYPE_TURBO3_0 || k->type == GGML_TYPE_TURBO4_0 || k->type == GGML_TYPE_TURBO2_0) {
2419+
// NOTE: gate on v->type (not k->type) for asymmetric configs where K=q8_0 but V=turbo
2420+
if (v->type == GGML_TYPE_TURBO3_0 || v->type == GGML_TYPE_TURBO4_0 || v->type == GGML_TYPE_TURBO2_0) {
24192421
const int64_t orig_v_head = hparams.n_embd_head_v(il);
24202422
const int64_t padded_v_head = v->ne[0];
24212423
if (padded_v_head != orig_v_head) {

0 commit comments

Comments
 (0)