Commit 47337a9
committed
ggml-ve : Q4_K direct kernel — make chunked+packed the default variant
Collapses the three opt-in flags (GGML_VE_Q4K_STD_CHUNK +
GGML_VE_Q4K_STD_PACKED) into the default path. With just
GGML_VE_Q4K_DIRECT=1 the kernel now runs the fastest measured
variant (chunked VL=256 + packed pvfmad) instead of single-row
VL=32.
Env vars are now OVERRIDES for A/B testing, all forcing slower
paths:
GGML_VE_Q4K_STD_PLAIN=1 single-row VL=32 (old default)
GGML_VE_Q4K_STD_TILE=1 8-row tile
GGML_VE_Q4K_STD_NOPACK=1 chunked unpacked (scratch pack)
GGML_VE_Q4K_STD_GATHER=1 chunked unpacked vgtlzx gather
27B Q4_K_M with GGML_VE_Q4K_DIRECT=1 + N_GT_1=1 (no other flags):
0.70 pp / 0.60 tg t/s -- same as the fully-flagged packed path,
now the default. (Single-row default was 0.32/0.30.)
Standalone test_q4k_std_matvec ALL OK on the new default path
(packed numerics, max_abs 5.7e-6), 12 shapes incl. K=17408.
This is a strict improvement to the direct path with no downside;
direct vs canon routing is unchanged (still opt-in).1 parent dc98e54 commit 47337a9
1 file changed
Lines changed: 26 additions & 10 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
108 | 108 | | |
109 | 109 | | |
110 | 110 | | |
111 | | - | |
112 | | - | |
113 | | - | |
114 | | - | |
115 | | - | |
116 | | - | |
117 | | - | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
118 | 136 | | |
119 | 137 | | |
120 | 138 | | |
| |||
132 | 150 | | |
133 | 151 | | |
134 | 152 | | |
135 | | - | |
136 | | - | |
137 | 153 | | |
138 | 154 | | |
139 | 155 | | |
| |||
160 | 176 | | |
161 | 177 | | |
162 | 178 | | |
163 | | - | |
| 179 | + | |
164 | 180 | | |
165 | 181 | | |
166 | 182 | | |
| |||
0 commit comments