feat: kvbm-v2 xpu-sycl enablement#87
Conversation
|
@statiraju , could you pls help review? The above xpu-sycl enablement is based on kvbm-v2 on main branch and for reference only. We'd like to follow your multi-device/backend design and your team is the proper maintainer for device abstraction. It would be great to work closely with you then we may sync on this feature for each important milestone without the extra overheads to refactor it again. XPU/system perf tuning may be more critical which is the foundation of kvbm v2. Our next step is to measure the xpu perf as illustrated in above description , so it's important to know what crates & operations will still serve KVBM v2. XPU bench work can be executed in parallel. thx. Pls add other kvbm-v2 reviewers/designer if needed. |
b00d6fd to
9efd6a2
Compare
632db11 to
302a32b
Compare
Describe the design and implementation details about how Intel XPU (SYCL/oneAPI) was integrated into KVBM v2 alongside the existing NVIDIA CUDA backend: the trait surfaces that were extracted, the SYCL implementations that were added, and the crate-level wiring that keeps KVBM v2 engine-agnostic and framework-agnostic. The documents covers the state of the branch, the evolution from the CUDA-only baseline, and the relationships between the crates under lib/ that make up KVBM v2. Signed-off-by: Zhan Xue <zhan.xue@intel.com>
Please align with DEP ai-dynamo/dynamo#9313.
Describe the design and implementation details about how Intel XPU (SYCL/oneAPI) was integrated into KVBM v2 alongside the existing NVIDIA CUDA backend: the trait surfaces that were extracted, the SYCL implementations that were added, and the crate-level wiring that keeps KVBM v2 engine-agnostic and framework-agnostic. The documents covers the state of the branch, the evolution from the CUDA-only baseline, and the relationships between the crates under lib/ that make up KVBM v2.
Implementation PR:
ai-dynamo/dynamo#7946
Question:
Need to know what will be changed and what will be not changed for KVBM v2. e.g., kvbm-common/kvbm-config/kvbm-engine/kvbm-kernels/kvbm-logical/kvbm-physical/memory/bindings,etc.
Concern:
Measure xpu perf ( kvbm_v2_xpu_sycl_enablement.md) from different API layers based on raw API(kvbench in kvbm-kernel with Intel SYCL rust binding), bench transfer (bench_transfer with transfer manager API in kvbm-physical) and bench_engine ( in kvbm-engine) with unified abstracted processes for batch copy, vectorized copy(GPU kernel), OneCCL(broadcast for MLA), NUMA pinned memory allocation, etc.