Tracking SYCL backend coverage for the turbo2/turbo3/turbo4 V-cache codepath. Vulkan was fixed in #118; SYCL is the same shape of bug but a separate backend.
Background
@cclecle (Intel A380, oneAPI) hit the same SET_ROWS abort on SYCL in #50:
ggml_backend.cpp:898: pre-allocated tensor (cache_k_l3 (view)) in a buffer (SYCL0)
that cannot run the operation (SET_ROWS)
He has a working starting point with a Claude-implemented SYCL kernel for turbo3:
https://github.com/cclecle/llama-cpp-turboquant/tree/feature/turboquant-kv-cache
What's needed
Mirror the Vulkan PR #118 pattern in ggml/src/ggml-sycl/:
- Register
TURBO2_0 / TURBO3_0 / TURBO4_0 as supported targets in the SYCL SET_ROWS op
- Wire the per-bitwidth quant kernels (cclecle has turbo3, turbo2/turbo4 still TODO)
- Add the dispatch entry so the scheduler doesn't fall back to abort
- Verify end-to-end with
llama-bench -ctk q4_0 -ctv turboN -fa 1 on real Intel hardware (Arc A380 / A770)
- Output coherence check:
llama-cli -p \"The capital of France is\" -n 16 with -ctv turbo4
Why we can't do this in-house
We have zero Intel GPUs. SYCL is community-maintained for this fork. Best path is for someone with Intel hardware to PR.
Hardware needed
- Intel Arc A380 / A770 (or any DG2-class+) for real validation
- Intel Tiber Cloud has free A770 access if you don't have local hardware
Reference for porters
cc @cclecle — if you're up for opening a PR off your fork (just turbo3 to start, or full turbo2+3+4), I'll review and merge.
Tracking SYCL backend coverage for the turbo2/turbo3/turbo4 V-cache codepath. Vulkan was fixed in #118; SYCL is the same shape of bug but a separate backend.
Background
@cclecle (Intel A380, oneAPI) hit the same
SET_ROWSabort on SYCL in #50:He has a working starting point with a Claude-implemented SYCL kernel for turbo3:
What's needed
Mirror the Vulkan PR #118 pattern in
ggml/src/ggml-sycl/:TURBO2_0/TURBO3_0/TURBO4_0as supported targets in the SYCLSET_ROWSopllama-bench -ctk q4_0 -ctv turboN -fa 1on real Intel hardware (Arc A380 / A770)llama-cli -p \"The capital of France is\" -n 16with-ctv turbo4Why we can't do this in-house
We have zero Intel GPUs. SYCL is community-maintained for this fork. Best path is for someone with Intel hardware to PR.
Hardware needed
Reference for porters
cc @cclecle — if you're up for opening a PR off your fork (just turbo3 to start, or full turbo2+3+4), I'll review and merge.