[XPU][NIXL] Add GPUDirect RDMA support for XPU#35270
Conversation
|
cc. @xuechendi, @rogerxfeng8, @yma11 |
There was a problem hiding this comment.
Code Review
This pull request adds GPUDirect RDMA support for XPU in the NIXL connector. The changes involve updating the UCX build script to a specific commit that includes the necessary fixes, enabling Level Zero support, and modifying the NIXL connector and XPU platform code to support XPU device memory for KV transfer. The changes are generally correct and well-targeted. I have one suggestion to improve the precision of a workaround to avoid potential side effects.
|
@zhenwei-intel, |
The PR of FP8 KV Cache hasn't been upstreamed yet. On the upstream branch, BF16 KV Cache can be used. The current performance testing is based on FP8 KV Cache (cherry-picked https://github.com/intel-innersource/applications.ai.gpu.vllm-xpu/pull/57), with a scale of 1.0. |
No need to transfer the scale as both the models for P and D instances should have such info. |
xuechendi
left a comment
There was a problem hiding this comment.
LGTM, @NickLucche , may you help to review
NickLucche
left a comment
There was a problem hiding this comment.
this looks ok on my side, nit on buffer naming
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
2f0f906 to
ff5fcd2
Compare
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
|
@jikunshang @1643661061leo I updated the dockerfile.xpu, please take another look. |
There was a problem hiding this comment.
Pull request overview
Adds GPUDirect RDMA support for XPU in the NIXL connector by enabling XPU VRAM KV-buffer paths, applying a UCX workaround for memtype misdetection, and updating the XPU Docker image to build UCX+NIXL from source with RDMA dependencies.
Changes:
- Enable
"xpu"as a KV buffer device for the NIXL connector and map it to VRAM memory type. - Apply a UCX environment workaround on XPU to avoid memtype-cache misdetection.
- Update
Dockerfile.xputo build UCX (pinned commit) and NIXL from source and install RDMA tooling/libs.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
| vllm/platforms/xpu.py | Sets UCX env var to avoid UCX memtype-cache misdetection on XPU. |
| vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py | Allows XPU KV buffers and maps XPU device to VRAM in NIXL. |
| docker/Dockerfile.xpu | Builds UCX/NIXL from source and installs RDMA dependencies for XPU images. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
|
merge as ci passed. thanks for your contribution! |
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Purpose
Add GPUDirect RDMA support for XPU in NIXL connector.
Requirements:
Limitations:
Test Plan
Performance data of Llama3.3-70B int4 model with fp8 kvcache on 8xB60, ISL=1500, OSL=150
2P1D vs Non-PD under SLO TTFT<5s, ITL<100ms
PD commands
prefill
prefill2
decode
proxy
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.