Skip to content

sycl: add SET_ROWS support for turbo2/turbo3/turbo4 V cache #120

@TheTom

Description

@TheTom

Tracking SYCL backend coverage for the turbo2/turbo3/turbo4 V-cache codepath. Vulkan was fixed in #118; SYCL is the same shape of bug but a separate backend.

Background

@cclecle (Intel A380, oneAPI) hit the same SET_ROWS abort on SYCL in #50:

ggml_backend.cpp:898: pre-allocated tensor (cache_k_l3 (view)) in a buffer (SYCL0)
that cannot run the operation (SET_ROWS)

He has a working starting point with a Claude-implemented SYCL kernel for turbo3:

https://github.com/cclecle/llama-cpp-turboquant/tree/feature/turboquant-kv-cache

What's needed

Mirror the Vulkan PR #118 pattern in ggml/src/ggml-sycl/:

  1. Register TURBO2_0 / TURBO3_0 / TURBO4_0 as supported targets in the SYCL SET_ROWS op
  2. Wire the per-bitwidth quant kernels (cclecle has turbo3, turbo2/turbo4 still TODO)
  3. Add the dispatch entry so the scheduler doesn't fall back to abort
  4. Verify end-to-end with llama-bench -ctk q4_0 -ctv turboN -fa 1 on real Intel hardware (Arc A380 / A770)
  5. Output coherence check: llama-cli -p \"The capital of France is\" -n 16 with -ctv turbo4

Why we can't do this in-house

We have zero Intel GPUs. SYCL is community-maintained for this fork. Best path is for someone with Intel hardware to PR.

Hardware needed

  • Intel Arc A380 / A770 (or any DG2-class+) for real validation
  • Intel Tiber Cloud has free A770 access if you don't have local hardware

Reference for porters

cc @cclecle — if you're up for opening a PR off your fork (just turbo3 to start, or full turbo2+3+4), I'll review and merge.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions