fix: gate turbo V unpad on V type, not K type#91
Merged
TheTom merged 2 commits intofeature/turboquant-kv-cachefrom Apr 20, 2026
Merged
fix: gate turbo V unpad on V type, not K type#91TheTom merged 2 commits intofeature/turboquant-kv-cachefrom
TheTom merged 2 commits intofeature/turboquant-kv-cachefrom
Conversation
The TurboFlash two-pass fused attention kernel produces garbage output on M5 Max (Apple10/Metal4) for all turbo3 V configs. Disabling by default routes turbo3 through the standard FA path which works correctly. Users can opt-in with TURBO_FLASH=1 for testing/debugging. No perf regression — standard FA path matches TurboFlash speed within noise (~55-57 t/s tg128 for q8_0/turbo3 on M5 Max). Co-Authored-By: tturney@psyguard.ai Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
009acc3 to
a1bcb34
Compare
jimbothigpen
pushed a commit
to jimbothigpen/frankenturbo2
that referenced
this pull request
May 2, 2026
…eTom#91) fix: gate turbo V unpad on V type, not K type
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Cherry-pick of e99452f from
fix/turbo-v-unpad-gate. Fixes asymmetric KV crash when V is turbo but K is q8_0.Problem
The V unpad code in
llama-graph.cppis gated onk->typebeing turbo. With asymmetric config (-ctk q8_0 -ctv turbo3), K is q8_0 so the check fails and V unpad is skipped, even though V is turbo and padded to 128. This causes a shape mismatch that crashes the Metal backend.Reported by:
?characters)Fix
Check
v->typeinstead ofk->typefor V unpad blocks. 4 lines changed, 2 insertions, 2 deletions.Impact
Refs: #42