forked from ggml-org/llama.cpp
-
-
Notifications
You must be signed in to change notification settings - Fork 335
Pull requests: TheTom/llama-cpp-turboquant
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
turbo KV: correct Lloyd-Max centroids (4.125 bpw), PDL + fused-MMA decode for turbo4/3/2, perplexity 32K fix
Apple Metal
CUDA
examples
ggml
#197
opened Jun 26, 2026 by
TheTom
Owner
Loading…
Add turboquant KV cache support with tensor parallelism
ggml
#196
opened Jun 24, 2026 by
sunzx
Loading…
fix: NixOS: Remove duplicate spirv-headers entry
devops
nix
#195
opened Jun 24, 2026 by
adamjames
Loading…
Remove duplicated spirv-headers from package.nix
devops
nix
#192
opened Jun 23, 2026 by
jakkunight
Loading…
vulkan: submit more frequently on integrated GPUs to fix #185 device-lost (gfx1151/RADV)
ggml
Vulkan
#186
opened Jun 19, 2026 by
TheTom
Owner
Loading…
Feature/vulkan fa large buffer
ggml
Nvidia GPU
testing
Vulkan
#181
opened Jun 13, 2026 by
Yvi71
Loading…
ggml: add ROCmFP4 CPU quantization (experimental Q4_0_ROCMFP4 / _FAST)
examples
ggml
#170
opened Jun 6, 2026 by
TheTom
Owner
Loading…
hip: VEC flash-attn for D=512 (Gemma 4) on ROCm with quantized KV
ggml
Nvidia GPU
#156
opened May 24, 2026 by
cclecle
Loading…
vulkan: add TurboQuant KV cache support and optimized turbo mat-vec paths
ggml
Vulkan
#140
opened May 10, 2026 by
Fenix46
Loading…
fix(qwen35): support Qwen3.5:9B loading from Ollama GGUF
model
#135
opened May 8, 2026 by
Jordan-HS
Loading…
vendor: bump cpp-httplib to 0.43.2 (openssl 4.0.0 fix)
python
script
#121
opened May 4, 2026 by
TheTom
Owner
Loading…
1 of 3 tasks
HIP mixed TurboQuant vec FA on gfx900/gfx906
build
ggml
Nvidia GPU
#99
opened Apr 21, 2026 by
2bigO
Loading…
perf: turbo VEC flash attention — +9% decode on CUDA via autoresearch
ggml
Nvidia GPU
script
#53
opened Apr 4, 2026 by
signalnine
Loading…
7 tasks done
fix: HIP/ROCm compatibility — check cudaMemcpyToSymbol errors, guard …
ggml
Nvidia GPU
#41
opened Apr 1, 2026 by
terrysimons
•
Draft
ProTip!
Follow long discussions with comments:>50.