You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
opencl: enable tiled-wide q6_K/q4_K decode GEMV by default
Flip GGML_OPENCL_Q6K_GEMV_TILED and GGML_OPENCL_Q4K_GEMV_TILED from opt-in
to default-on (opt out with =0), matching the GGML_OPENCL_Q6K_GEMV_O4
convention. The tiled lm_head/embed GEMV is bit-identical to the legacy
path and gates only on long-vocab shapes (ne01>=32768, ne01%64==0).
Verified on Adreno X2 (asus-gly), tg64 fa=0 warmed -r3:
- Qwen3-1.7B-Q4_K_M: +10.3%/+5.9%/+3.5% @ d=0/4k/16k
- Llama-3.2-1B-Q4_K_M: +3.8%/+2.7%/+0.5% @ d=0/4k/16k
Greedy output byte-identical default vs =0.
0 commit comments