Skip to content

[Fix] Failed to start vLLM service using multi-node launch scripts under CUDA data parallelism#670

Merged
mag1c-h merged 1 commit intoModelEngine-Group:developfrom
sumingZero:ray_script
Jan 23, 2026
Merged

[Fix] Failed to start vLLM service using multi-node launch scripts under CUDA data parallelism#670
mag1c-h merged 1 commit intoModelEngine-Group:developfrom
sumingZero:ray_script

Conversation

@sumingZero
Copy link
Copy Markdown
Contributor

…nder CUDA data parallelism

Purpose

What this PR does / why we need it?

When launching a multi-node data-parallel service using run_vllm.sh, the startup fails due to missing --data-parallel-backend and --data-parallel-size-local arguments. This commit fixes the issue by adding the missing parameters.

Test

How was this patch tested?
Verified that the script can successfully launch the service in a dual-node DP=4, TP=4 configuration.
image

@flesher0813 flesher0813 changed the title [Fix] Failed to start vLLLM service using multi-node launch scripts u… [Fix] Failed to start vLLM service using multi-node launch scripts under CUDA data parallelism Jan 23, 2026
@mag1c-h mag1c-h merged commit 92de9f9 into ModelEngine-Group:develop Jan 23, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants