Commit 18e4404
committed
ggml-ve : Q4_K N>1 — HBM-aware cache upload (no more SEGV)
ROOT CAUSE of the N>1 SEGV:
- With GGML_VE_Q4K_N_GT_1=1 the scheduler decides Q4_K weights are
best placed on VE_HBM (because our backend now consumes them in
batched form too), so w->data becomes an HBM pointer.
- The host-side canonical-pack in get_or_upload_q4k_canon reads
src_blocks via memcpy. That SEGVs on an HBM pointer.
- Earlier sessions only ever saw CPU_Mapped Q4_K weights (N=1 path),
so the cache had never been exposed to HBM-resident inputs.
Fix: before calling the cache, check ggml_backend_buffer_is_host on
the weight buffer. If non-host (= HBM), download the weight to a host
bounce buffer with vedaMemcpyDtoH first, then pass that to the cache.
One-time cost per unique weight at first lookup; subsequent calls hit
the cache.
Confirmed working with GGML_VE_Q4K_N_GT_1=1 on MiniCPM5-1B-Q4_K_M:
- prompt eval (N=5): prints "Q4K-DEVICE-BOUNCE" once per weight,
then no crashes; correct output
- decode (N=1): hits the now-populated canon cache, no
bounce, runs the existing 8-row tile kernel
Perf right now (1B, full N>1 path enabled):
| path | pp t/s | tg t/s |
| ----------------------------- | -------- | ------ |
| default (N>1 → CPU) | 21.9 | 14.4 |
| GGML_VE_Q4K_N_GT_1=1 (this) | 7.8 | 8.5 |
VE path slower than CPU because each N>1 op is LOOPED as N sequential
matvecs (1.3 ms × N), where CPU's AVX2 Q4_K does a true batched
SGEMM. So this commit gets us to "N>1 works on VE" but not "N>1 is
fast on VE".
Real win comes from the next commit: tile-batched matmul. Dequant a
row tile of weights to F32 once, then run N matvecs against that
cached tile (or one cblas_sgemm). Amortizes dequant across N x-columns.
For now N>1 stays GATED behind GGML_VE_Q4K_N_GT_1=1 because the
sequential path is slower than CPU. Will flip default when batched
matmul lands.1 parent bc64991 commit 18e4404
1 file changed
Lines changed: 33 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
| 17 | + | |
| 18 | + | |
17 | 19 | | |
18 | 20 | | |
19 | 21 | | |
| |||
219 | 221 | | |
220 | 222 | | |
221 | 223 | | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
222 | 254 | | |
223 | | - | |
| 255 | + | |
224 | 256 | | |
225 | 257 | | |
226 | 258 | | |
| |||
0 commit comments