Commit 59798f1
fix(cuda): allow f16/bf16 + q8_0 mixed KV without GGML_CUDA_FA_ALL_QUANTS (#82)
The FA dispatcher rejected any K != V type combo unless all types were
in the turbo+q8_0 set. This meant common configs like `-ctk f16 -ctv q8_0`
fell back to CPU unless built with -DGGML_CUDA_FA_ALL_QUANTS=ON.
The vec template instances for f16/bf16 + q8_0 are already compiled
(fattn-vec-instance-{f16,bf16}-q8_0.cu and their reverse), so the
dispatcher was gating kernels that do exist.
Extend the predicate to include f16 and bf16 alongside turbo + q8_0.
Reported by @dentity007 on sm_89 (RTX 4090) and sm_121 (GB10), where
`-ctk f16 -ctv q8_0` showed 340x slowdown indicative of CPU fallback.
Co-Authored-By: tturney@psyguard.ai
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 1073622 commit 59798f1
1 file changed
Lines changed: 7 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
419 | 419 | | |
420 | 420 | | |
421 | 421 | | |
422 | | - | |
423 | | - | |
424 | | - | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
425 | 428 | | |
426 | | - | |
| 429 | + | |
427 | 430 | | |
428 | 431 | | |
429 | 432 | | |
| |||
0 commit comments