Skip to content

fix: gate turbo V unpad on V type, not K type#91

Merged
TheTom merged 2 commits intofeature/turboquant-kv-cachefrom
fix/turbo-v-unpad-gate-merge
Apr 20, 2026
Merged

fix: gate turbo V unpad on V type, not K type#91
TheTom merged 2 commits intofeature/turboquant-kv-cachefrom
fix/turbo-v-unpad-gate-merge

Conversation

@TheTom
Copy link
Copy Markdown
Owner

@TheTom TheTom commented Apr 18, 2026

Summary

Cherry-pick of e99452f from fix/turbo-v-unpad-gate. Fixes asymmetric KV crash when V is turbo but K is q8_0.

Problem

The V unpad code in llama-graph.cpp is gated on k->type being turbo. With asymmetric config (-ctk q8_0 -ctv turbo3), K is q8_0 so the check fails and V unpad is skipped, even though V is turbo and padded to 128. This causes a shape mismatch that crashes the Metal backend.

Reported by:

  • Kurt Knapp: MiniMax-M2.7 on Mac Studio M3 Ultra, q8_0/turbo3 crashes with "command buffer failure"
  • Sjoerd: Qwen3.5-122B on dual L40S, q8_0/turbo3 produces corrupt output (literal ? characters)
  • NigelTufnel12345: GPT-OSS-120B, ggml_can_mul_mat assertion failure

Fix

Check v->type instead of k->type for V unpad blocks. 4 lines changed, 2 insertions, 2 deletions.

Impact

  • Fixes all asymmetric KV configs where K != turbo and V == turbo
  • No effect on symmetric configs (both types match, both checks pass)
  • No effect on non-turbo configs

Refs: #42

TheTom and others added 2 commits April 20, 2026 08:45
The TurboFlash two-pass fused attention kernel produces garbage output
on M5 Max (Apple10/Metal4) for all turbo3 V configs. Disabling by
default routes turbo3 through the standard FA path which works correctly.

Users can opt-in with TURBO_FLASH=1 for testing/debugging.

No perf regression — standard FA path matches TurboFlash speed within
noise (~55-57 t/s tg128 for q8_0/turbo3 on M5 Max).

Co-Authored-By: tturney@psyguard.ai
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@TheTom TheTom force-pushed the fix/turbo-v-unpad-gate-merge branch from 009acc3 to a1bcb34 Compare April 20, 2026 14:10
@TheTom TheTom merged commit d3271ac into feature/turboquant-kv-cache Apr 20, 2026
26 of 53 checks passed
jimbothigpen pushed a commit to jimbothigpen/frankenturbo2 that referenced this pull request May 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant