[XPU][NIXL] Add GPUDirect RDMA support for XPU by zhenwei-intel · Pull Request #35270 · vllm-project/vllm

zhenwei-intel · 2026-02-25T07:52:27Z

Purpose

Add GPUDirect RDMA support for XPU in NIXL connector.

Requirements：

UCX must include the fix from UCT/IB/ZE: Enable GPUDirect RDMA for Intel Xe devices openucx/ucx#11187.

Limitations:

Must be set UCX_NET_DEVICES manually for better performance until UCT/ZE: Add device topology registration openucx/ucx#11180 is merged.
Currently ze-ipc is not supported. UCT/ZE/ZE_IPC: enable level zero ipc support for Intel GPUs openucx/ucx#11218

Test Plan

Performance data of Llama3.3-70B int4 model with fp8 kvcache on 8xB60, ISL=1500, OSL=150
2P1D vs Non-PD under SLO TTFT<5s, ITL<100ms

Serve more requests: under SLO, 2P1D achieved a request throughput of 1.06, compared to 0.64 for the Non-PD — 1.65x improvement.

PD commands

prefill

export UCX_TLS=ib,rc,ze_copy

export ZE_AFFINITY_MASK=2,3
export model_name=ibnzterrell/Meta-Llama-3.3-70B-Instruct-AWQ-INT4
export tp_size=2


VLLM_USE_V1=1 VLLM_NIXL_SIDE_CHANNEL_HOST=localhost VLLM_NIXL_SIDE_CHANNEL_PORT=5577 VLLM_WORKER_MULTIPROC_METHOD=spawn VLLM_ENABLE_V1_MULTIPROCESSING=1 vllm serve $model_name -tp $tp_size --host localhost --port 7101 --seed 42 --enforce-eager --dtype float16 --gpu-memory-utilization 0.9 --kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_both","kv_buffer_device":"xpu"}' --max-model-len 8192 --block-size 64 --no-enable-prefix-caching --kv-cache-dtype fp8

prefill2

export UCX_TLS=ib,rc,ze_copy

export ZE_AFFINITY_MASK=4,5
export model_name=ibnzterrell/Meta-Llama-3.3-70B-Instruct-AWQ-INT4
export tp_size=2


VLLM_USE_V1=1 VLLM_NIXL_SIDE_CHANNEL_HOST=localhost VLLM_NIXL_SIDE_CHANNEL_PORT=5377 VLLM_WORKER_MULTIPROC_METHOD=spawn VLLM_ENABLE_V1_MULTIPROCESSING=1 vllm serve $model_name -tp $tp_size --host localhost --port 7102 --seed 42 --enforce-eager --dtype float16 --gpu-memory-utilization 0.9 --kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_both","kv_buffer_device":"xpu"}' --max-model-len 8192 --block-size 64 --no-enable-prefix-caching--kv-cache-dtype fp8

decode

export UCX_TLS=ib,rc,ze_copy

export ZE_AFFINITY_MASK=0,1
export model_name=ibnzterrell/Meta-Llama-3.3-70B-Instruct-AWQ-INT4
export tp_size=2


VLLM_USE_V1=1 VLLM_NIXL_SIDE_CHANNEL_HOST=localhost VLLM_NIXL_SIDE_CHANNEL_PORT=5177 VLLM_WORKER_MULTIPROC_METHOD=spawn VLLM_ENABLE_V1_MULTIPROCESSING=1 vllm serve $model_name -tp $tp_size --host localhost --port 7201 --seed 42 --enforce-eager --dtype float16 --gpu-memory-utilization 0.9 --kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_both","kv_buffer_device":"xpu"}' --max-model-len 8192 --block-size 64 --no-enable-prefix-caching --kv-cache-dtype fp8

proxy

python3 tests/v1/kv_connector/nixl_integration/toy_proxy_server.py --prefiller-hosts localhost  localhost --prefiller-ports 7101 7102 --decoder-host localhost --decoder-port 7201 --host localhost --port 7300

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

zhenwei-intel · 2026-02-25T07:53:35Z

cc. @xuechendi, @rogerxfeng8, @yma11

gemini-code-assist

Code Review

This pull request adds GPUDirect RDMA support for XPU in the NIXL connector. The changes involve updating the UCX build script to a specific commit that includes the necessary fixes, enabling Level Zero support, and modifying the NIXL connector and XPU platform code to support XPU device memory for KV transfer. The changes are generally correct and well-targeted. I have one suggestion to improve the precision of a workaround to avoid potential side effects.

xuechendi · 2026-02-25T15:33:15Z

@zhenwei-intel, --kv-cache-dtype fp8 , we won't be able to transfer scale at this moment, so the accuracy is impact. Might not significant in simple text.

zhenwei-intel · 2026-02-26T01:01:15Z

@zhenwei-intel, --kv-cache-dtype fp8 , we won't be able to transfer scale at this moment, so the accuracy is impact. Might not significant in simple text.

The PR of FP8 KV Cache hasn't been upstreamed yet. On the upstream branch, BF16 KV Cache can be used.

The current performance testing is based on FP8 KV Cache (cherry-picked https://github.com/intel-innersource/applications.ai.gpu.vllm-xpu/pull/57), with a scale of 1.0.

hshen14 · 2026-02-27T00:45:59Z

@zhenwei-intel, --kv-cache-dtype fp8 , we won't be able to transfer scale at this moment, so the accuracy is impact. Might not significant in simple text.

The PR of FP8 KV Cache hasn't been upstreamed yet. On the upstream branch, BF16 KV Cache can be used.

The current performance testing is based on FP8 KV Cache (cherry-picked https://github.com/intel-innersource/applications.ai.gpu.vllm-xpu/pull/57), with a scale of 1.0.

No need to transfer the scale as both the models for P and D instances should have such info.

xuechendi

LGTM, @NickLucche , may you help to review

NickLucche

this looks ok on my side, nit on buffer naming

NickLucche

synced with @xuechendi

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>

zhenwei-intel · 2026-02-28T08:05:34Z

@jikunshang @1643661061leo I updated the dockerfile.xpu, please take another look.

Copilot

Pull request overview

Adds GPUDirect RDMA support for XPU in the NIXL connector by enabling XPU VRAM KV-buffer paths, applying a UCX workaround for memtype misdetection, and updating the XPU Docker image to build UCX+NIXL from source with RDMA dependencies.

Changes:

Enable "xpu" as a KV buffer device for the NIXL connector and map it to VRAM memory type.
Apply a UCX environment workaround on XPU to avoid memtype-cache misdetection.
Update Dockerfile.xpu to build UCX (pinned commit) and NIXL from source and install RDMA tooling/libs.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.

File	Description
vllm/platforms/xpu.py	Sets UCX env var to avoid UCX memtype-cache misdetection on XPU.
vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py	Allows XPU KV buffers and maps XPU device to VRAM in NIXL.
docker/Dockerfile.xpu	Builds UCX/NIXL from source and installs RDMA dependencies for XPU images.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>

jikunshang · 2026-03-03T00:42:40Z

merge as ci passed. thanks for your contribution!

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

zhenwei-intel requested review from ApostaC, NickLucche, jikunshang and orozery as code owners February 25, 2026 07:52

mergify Bot added the kv-connector label Feb 25, 2026

gemini-code-assist Bot reviewed Feb 25, 2026

View reviewed changes

Comment thread vllm/platforms/xpu.py

jikunshang reviewed Feb 25, 2026

View reviewed changes

Comment thread tools/install_nixl_from_source_ubuntu.py Outdated

Comment thread vllm/platforms/xpu.py

Comment thread tools/install_nixl_from_source_ubuntu.py Outdated

mergify Bot added the ci/build label Feb 25, 2026

jikunshang reviewed Feb 25, 2026

View reviewed changes

Comment thread tools/install_nixl_from_source_ubuntu.py Outdated

xuechendi reviewed Feb 26, 2026

View reviewed changes

Comment thread vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

xuechendi reviewed Feb 26, 2026

View reviewed changes

Comment thread vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py Outdated

xuechendi approved these changes Feb 27, 2026

View reviewed changes

NickLucche reviewed Feb 27, 2026

View reviewed changes

Comment thread vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py Outdated

NickLucche added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 27, 2026

NickLucche approved these changes Feb 27, 2026

View reviewed changes

jikunshang approved these changes Feb 28, 2026

View reviewed changes

zhenwei-intel added 6 commits February 27, 2026 20:35

[XPU][NIXL] support GPUDirect RDMA

538b288

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>

make UCX_VERSION as parameter

cd1fbba

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>

make with-ze as parameter

f3aaa47

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>

remove unused comments

2185c10

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>

update of install nixl and ucx

ff5fcd2

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>

update nixl memory type

23743b9

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>

zhenwei-intel force-pushed the xpu_pd_2026 branch from 2f0f906 to ff5fcd2 Compare February 28, 2026 04:53

update dockerfile

06733ec

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>

jikunshang reviewed Feb 28, 2026

View reviewed changes

Comment thread docker/Dockerfile.xpu

Comment thread docker/Dockerfile.xpu Outdated

jikunshang requested a review from Copilot February 28, 2026 08:26

Copilot AI reviewed Feb 28, 2026

View reviewed changes

Copilot started reviewing on behalf of jikunshang February 28, 2026 08:37 View session

zhenwei-intel and others added 2 commits February 28, 2026 18:52

update dockerfile

795da8e

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>

Merge branch 'main' into xpu_pd_2026

6079f20

jikunshang merged commit 9dd656f into vllm-project:main Mar 3, 2026
115 checks passed

Yanli2190 mentioned this pull request Mar 10, 2026

[XPU][NixlConnector] Add ze_ipc transport support for single-node PD disaggregation #36625

Closed

Copilot AI pushed a commit to machov/vllm that referenced this pull request Mar 10, 2026

[XPU][NIXL] Add GPUDirect RDMA support for XPU (vllm-project#35270)

7f6ffac

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

avinashsingh77 pushed a commit to avinashsingh77/vllm that referenced this pull request Mar 12, 2026

[XPU][NIXL] Add GPUDirect RDMA support for XPU (vllm-project#35270)

1c9f268

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

Spycsh mentioned this pull request Mar 17, 2026

feat: add GPUDirect support for intel xpu on Dynamo ai-dynamo/dynamo#5852

Merged

wendyliu235 pushed a commit to wendyliu235/vllm-public that referenced this pull request Mar 18, 2026

[XPU][NIXL] Add GPUDirect RDMA support for XPU (vllm-project#35270)

1152af4

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

mystous pushed a commit to mystous/vllm_hybrid that referenced this pull request May 10, 2026

[XPU][NIXL] Add GPUDirect RDMA support for XPU (vllm-project#35270)

bcb882e

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

0826joyce pushed a commit to 0826joyce/vllm-serving-optimization that referenced this pull request May 19, 2026

[XPU][NIXL] Add GPUDirect RDMA support for XPU (vllm-project#35270)

4202d55

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

Uh oh!

Conversation

zhenwei-intel commented Feb 25, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

PD commands

Test Result

Uh oh!

zhenwei-intel commented Feb 25, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xuechendi commented Feb 25, 2026

Uh oh!

zhenwei-intel commented Feb 26, 2026

Uh oh!

Uh oh!

Uh oh!

hshen14 commented Feb 27, 2026

Uh oh!

xuechendi left a comment

Choose a reason for hiding this comment

Uh oh!

NickLucche left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NickLucche left a comment

Choose a reason for hiding this comment

Uh oh!

zhenwei-intel commented Feb 28, 2026

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jikunshang commented Mar 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

zhenwei-intel commented Feb 25, 2026 •

edited by github-actions Bot

Loading