cuda: disable sparse V skip (warp divergence regression) by TheTom · Pull Request #105 · TheTom/llama-cpp-turboquant

TheTom · 2026-04-24T13:41:25Z

Summary

Disables sparse V dequant skip on CUDA. Per-lane branching causes warp divergence that costs more than the skipped dequants save.

Benchmarks (@sztlink, Qwen3-30B-A3B Q4_K_M)

RTX 4090 (SM89):

Context	Sparse V ON	OFF (baseline)	Delta
512	59.05	59.51	-0.8%
4K	13.39	13.77	-2.8%
8K	7.73	7.77	-0.5%
16K	3.95	3.97	-0.5%

RTX 3090 (SM86):

Context	Sparse V ON	OFF (baseline)	Delta
512	32.49	32.19	-0.9%
4K	6.40	6.38	-0.3%
8K	2.56	2.54	-0.8%
16K	1.26	1.25	-0.5%

Metal path unaffected (remains enabled, +4% to +23%).

TODO

Revisit with warp-level ballot skip (__ballot_sync + early exit when entire warp is below threshold).

@sztlink

Per-lane branching in the VEC FA kernel causes warp divergence that costs more than the skipped dequants save. Benchmarked at -0.3% to -2.8% on RTX 3090/4090 across all context lengths. Metal path unaffected (remains enabled, +4% to +23%). TODO: revisit with warp-level ballot skip (__ballot_sync + early exit when entire warp is below threshold). Data: @sztlink (Qwen3-30B-A3B Q4_K_M, CUDA SM86/SM89)

cuda: disable sparse V skip (warp divergence regression)

github-actions Bot added Nvidia GPU ggml labels Apr 24, 2026

TheTom merged commit 11a241d into feature/turboquant-kv-cache Apr 24, 2026
20 of 48 checks passed

jimbothigpen pushed a commit to jimbothigpen/frankenturbo2 that referenced this pull request May 2, 2026

Merge pull request TheTom#105 from TheTom/fix/disable-sparse-v-cuda

ba80f04

cuda: disable sparse V skip (warp divergence regression)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cuda: disable sparse V skip (warp divergence regression)#105

cuda: disable sparse V skip (warp divergence regression)#105
TheTom merged 1 commit into
feature/turboquant-kv-cachefrom
fix/disable-sparse-v-cuda

TheTom commented Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

TheTom commented Apr 24, 2026

Summary

Benchmarks (@sztlink, Qwen3-30B-A3B Q4_K_M)

TODO

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant