Commit a07ab09
sparse V VEC: warp-uniform skip via warp_reduce_max (opt-in)
Adds the vllm-project/vllm#41422 design pattern to llama.cpp's CUDA
fattn-vec kernel: replace the per-lane sparse V skip (which had warp
divergence on turbo paths and was compile-time gated off for turbo via
PR #115's `if constexpr (!V_is_turbo)`) with a warp-uniform skip via
`warp_reduce_max`. All lanes branch on the same value so there's no
warp divergence regardless of V type.
Off by default. Opt in at build time:
cmake -DCMAKE_CXX_FLAGS=-DGGML_CUDA_TURBO_SPARSE_V_VEC
Threshold defaults to 0.001f (matches vLLM PR #41422). Override with
-DGGML_CUDA_TURBO_SPARSE_V_VEC_THRESHOLD=<val>.
Default-off path is byte-identical (verified on M5 Max Metal: Qwen2.5-7B
Q8_0 sym turbo3 PPL 6.6594, exact match with PR #115 baseline).
Pairs with the prior commit's tile-kernel sparse V skip (off-by-default
opt-in via GGML_CUDA_TURBO_SPARSE_V_TILE) — that one targets fattn-tile
prefill, this one targets fattn-vec decode (the actual hot path on
single-token generation where the vLLM win was measured).
NO MERGE — testing branch only. AMD MI300X HIP validation pending.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 289b9dc commit a07ab09
1 file changed
Lines changed: 50 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
4 | 13 | | |
5 | 14 | | |
6 | 15 | | |
| |||
412 | 421 | | |
413 | 422 | | |
414 | 423 | | |
415 | | - | |
416 | | - | |
417 | | - | |
418 | | - | |
419 | | - | |
420 | | - | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
421 | 451 | | |
422 | 452 | | |
423 | 453 | | |
| |||
426 | 456 | | |
427 | 457 | | |
428 | 458 | | |
| 459 | + | |
429 | 460 | | |
430 | 461 | | |
431 | 462 | | |
| |||
461 | 492 | | |
462 | 493 | | |
463 | 494 | | |
464 | | - | |
465 | | - | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
466 | 507 | | |
467 | 508 | | |
468 | 509 | | |
| |||
471 | 512 | | |
472 | 513 | | |
473 | 514 | | |
| 515 | + | |
474 | 516 | | |
475 | 517 | | |
476 | 518 | | |
| |||
0 commit comments