fix(configs): remove vestigial client dp_rank_count / X-data-parallel-rank#2708
Open
S1ro1 wants to merge 1 commit into
Open
fix(configs): remove vestigial client dp_rank_count / X-data-parallel-rank#2708S1ro1 wants to merge 1 commit into
S1ro1 wants to merge 1 commit into
Conversation
…-rank dp-rank routing (#1940) let the client pin a rollout to a DP shard within a single hybrid-LB server via the X-data-parallel-rank header. #2696 moved to external-LB everywhere (one endpoint per DP rank, router-balanced) and dropped hybrid-LB, leaving the header vestigial. Worse, the regular vllm-router forwards client headers to the worker and a single-DP backend rejects an out-of-range rank ("data_parallel_rank N is out of range"), so dense multi-node external-LB could fail on every request with rank > 0. Remove client.dp_rank_count and the per-rank client expansion: setup_clients now creates one client per base_url with no DP-rank header, and the router load-balances across the per-rank endpoints. The server-side reader already tolerates an absent header. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
samsja
approved these changes
Jun 4, 2026
samsja
requested changes
Jun 4, 2026
Member
samsja
left a comment
There was a problem hiding this comment.
ah wait actually do we have router on single node ? if not we cant remove
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Removes the vestigial client-side
dp_rank_count/X-data-parallel-rankmechanism.Background
dp_rank_count+ theX-data-parallel-rankheader come from #1940 ("add dp-rank-routing"): with a single hybrid-LB server fronting multiple DP engines, the client pinned each rollout to a DP shard for KV reuse. #2696 ("use external-LB everywhere, drop hybrid-LB") replaced that with one endpoint per DP rank, router-balanced — so the header no longer has a job.It isn't just dead, it's harmful. The regular
vllm-routerforwards client headers to the worker, and vLLM validates the rank:Single-DP backends (
--data-parallel-size 1, how dense external-LB launches) havenum_ranks = 1, so any request carrying rank > 0 is rejected. Affinity / KV reuse is now the router's job (prefix-cache scoring), not the client's.Change
client.dp_rank_countfromClientConfig.setup_clients: one client perbase_url, noX-data-parallel-rankheader.dp_rank_countauto-set in the RL resolver and the field pass-through in the elastic pool.client_identitykeys onapi_base_urlalone (each DP rank is its own endpoint).No checked-in config sets
dp_rank_count, and the server-side header reader already handles its absence. Breaking only for external configs that set it (extra="forbid"rejects it); nothing to migrate.Related: #2707 (NCCL world-size) — together they fully fix dense multi-node external-LB.
🤖 Generated with Claude Code
Note
Medium Risk
Breaking config removal (
extra="forbid") and changed inference routing identity; fixes harmful rank headers on single-DP vLLM but requires one base URL per DP rank in user configs.Overview
Removes client-side
dp_rank_countand theX-data-parallel-rankheader path that expanded eachbase_urlinto multiple pinned DP clients. With external-LB, each DP rank is already a separate endpoint inclient.base_url, so the orchestrator now builds one inference client per URL and load-balances onapi_base_urlonly.ClientConfigdrops the field (breaking for configs that still set it).setup_clientsno longer loops ranks or sets rank headers. The RLauto_setup_inference_clienthook stops copying inference DP size into the client. Elastic client rebuilds no longer forwarddp_rank_count. CHANGELOG documents the migration (delete the key; no replacement).Reviewed by Cursor Bugbot for commit 55d3fa8. Bugbot is set up for automated code reviews on this repo. Configure here.