feat: llm-d (EPP+Envoy) router backend by S1ro1 · Pull Request #2697 · PrimeIntellect-ai/prime-rl

S1ro1 · 2026-06-03T02:58:23Z

What

Adds llm-d (the upstream llm-d Endpoint Picker + Envoy) as a second router backend alongside vllm-router, selected via [inference.deployment.router] type = "llm-d". Built on the external-LB substrate from #2696 (merged) — the per-rank vLLM engines launch identically; only the router control plane differs.

2 commits:

refactor: drop client-side DP-rank pinning for external-LB — external-LB gives each DP rank its own endpoint (the URL is the rank selector), so the client no longer needs the hybrid-LB X-data-parallel-rank header. Removes dp_rank_count + the per-rank client expansion (one client per base URL; the router balances). Also fixes llm-d: the EPP forwards the header to the dp=1 backend, which rejected it.
feat: llm-d (EPP+Envoy) router backend — LlmdRouterConfig (discriminated-union member: scorers + per-profile prefill_scorer_overrides/decode_scorer_overrides, non_cached_tokens, decode_sidecar_port, known-scorer validator); an llm-d branch in the shared launch_router helper (renders per-replica EPP + Envoy + file-discovery endpoints, launches epp+envoy); the pd-sidecar on decode nodes for P/D; entrypoints pass the router config object to the templates; SLURM cleanup clears stale epp/envoy/pd-sidecar procs; presets under templates/llmd/ + scripts/install_llmd.sh; docs + install skill. Rejects enable_return_routed_experts / trainer.enable_router_replay with llm-d (breaks P/D, unverified for multi-node).

E2E verification (SLURM)

Path	Result
llm-d RL multi-node (Qwen3-0.6B)	EPP+Envoy balanced across 8 per-rank endpoints, completed all steps, 0 errors
llm-d RL P/D (Qwen3-30B-A3B)	routed to all 8 prefill + all 8 decode ranks, pd-sidecar + NIXL KV transfer, `errored=0`, `RL trainer finished` (incl. weight broadcast)
vllm-router (MN + P/D)	regression — still balanced, 0 errors
routed-experts rejection	rejected on both `InferenceConfig` (direct `enable_return_routed_experts`) and `RLConfig` (`trainer.enable_router_replay`); vllm-router + router replay passes

decode ≫ prefill request counts are intended: short / prefix-cached prompts skip remote prefill (non_cached_tokens).

Install

bash scripts/install_llmd.sh builds epp/envoy/pd-sidecar into third_party/llmd/bin.

Supersedes #2691 (the pre-external-LB version).

Note

Medium Risk
Touches multi-node/SLURM inference routing, P/D sidecars, and a breaking removal of dp_rank_count; misconfiguration is mostly caught by validators, but operational risk is in new native binaries and routing behavior changes.

Overview
Adds llm-d (EPP + Envoy) as a second inference router backend alongside vllm-router, selected via a discriminated [inference.deployment.router] block (type = "llm-d"). New LlmdRouterConfig exposes scorer weights, P/D prefill/decode overrides, non_cached_tokens, and decode_sidecar_port, with validation for unknown scorers and a hard error when router replay / enable_return_routed_experts is combined with llm-d (on both InferenceConfig and RLConfig).

SLURM launch is extended: _launch_router.sh.j2 renders per-replica EPP/Envoy/file-discovery configs from templates/llmd/, starts epp and envoy, and on P/D decode nodes starts pd-sidecar; cleanup kills stale llm-d processes. scripts/install_llmd.sh and install docs pin-build epp, pd-sidecar, and Envoy into third_party/llmd/bin.

Breaking client change: removes orchestrator.student.client.dp_rank_count and per-rank client expansion via X-data-parallel-rank—with external-LB, each DP rank is its own URL and the router load-balances across endpoints (documented in CHANGELOG.md).

Entrypoints pass the full router object into templates (replacing separate port/policy vars); inference/RL sbatch templates use is_disaggregated and infer_nodes_per_replica for llm-d endpoint wiring.

^{Reviewed by Cursor Bugbot for commit 5d0f4ea. Bugbot is set up for automated code reviews on this repo. Configure here.}

With external-LB data parallelism each DP rank is its own API server on its own port (the URL is the rank selector), so the client no longer needs the hybrid-LB `X-data-parallel-rank` header to pin a rollout to an internal DP shard. Remove the `dp_rank_count` client field + its auto-setup and the per-rank client expansion: one client per base URL, no rank header. The router (vllm-router or llm-d EPP) balances across the per-rank endpoints. This also fixes llm-d routing: the EPP forwards the header to the dp=1 backend, which rejected it ("data_parallel_rank N is out of range"). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add llm-d as a second router backend alongside vllm-router, selected via `[inference.deployment.router] type = "llm-d"`. Built on the external-LB launch substrate — the per-rank vLLM engines launch identically; only the router control plane differs. - `LlmdRouterConfig`: discriminated-union member with `scorers` (base) + per-profile `prefill_scorer_overrides`/`decode_scorer_overrides`, `non_cached_tokens`, `decode_sidecar_port`, and a known-scorer validator. - `launch_router` helper gains an llm-d branch: renders per-replica EPP + Envoy + file-discovery endpoints and launches `epp` + `envoy` instead of vllm-router. Call sites stay router-agnostic. - pd-sidecar on each decode node (P/D) for remote-prefill orchestration + NIXL. - Entrypoints pass the `router` config object to the templates. - Reject `enable_return_routed_experts` with llm-d (breaks P/D, unverified for multi-node). - SLURM cleanup also clears stale `epp`/`envoy`/`pd-sidecar` processes. - Presets under `templates/llmd/` + `scripts/install_llmd.sh`; docs + install skill. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 5d0f4ea. Configure here.}

cursor · 2026-06-03T09:20:17Z


    # Start the router on node 0 (balances across per-rank endpoints — no intra-node DP header)
    if [ "$INFER_NODE_RANK" -eq 0 ]; then
-        launch_router regular "$ROUTER_ARGS" "$ROUTER_PORT" "{{ router_policy }}" "$OUTPUT_DIR/logs/inference/router.log"


Decode sidecar missing extra nodes

Medium Severity

With llm-d P/D, file-discovery lists a decode endpoint per DP rank on every decode node (sidecar base port + rank). pd-sidecar is only started when ROLE_RANK is 0, so when num_decode_nodes exceeds num_decode_replicas (e.g. two decode nodes, one replica), non-head decode nodes never run a sidecar while the EPP still routes traffic there.

Additional Locations (2)

src/prime_rl/templates/multi_node_rl.sbatch.j2#L302-L316

src/prime_rl/templates/llmd/endpoints.yaml.j2#L18-L28

^{Reviewed by Cursor Bugbot for commit 5d0f4ea. Configure here.}

S1ro1 marked this pull request as ready for review June 3, 2026 03:01

cursor Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread src/prime_rl/templates/_launch_router.sh.j2

Comment thread packages/prime-rl-configs/src/prime_rl/configs/shared.py

S1ro1 force-pushed the feat/llm-d-router-v2 branch from 0701783 to 1f39804 Compare June 3, 2026 03:03

cursor Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread src/prime_rl/templates/multi_node_rl.sbatch.j2

S1ro1 force-pushed the feat/llm-d-router-v2 branch from 1f39804 to 74c911a Compare June 3, 2026 03:13

cursor Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread packages/prime-rl-configs/src/prime_rl/configs/rl.py

S1ro1 force-pushed the feat/llm-d-router-v2 branch from 74c911a to 44e47f9 Compare June 3, 2026 08:33

S1ro1 and others added 2 commits June 3, 2026 14:46

S1ro1 force-pushed the feat/llm-d-router-v2 branch from 44e47f9 to 5d0f4ea Compare June 3, 2026 09:17

cursor Bot reviewed Jun 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: llm-d (EPP+Envoy) router backend#2697

feat: llm-d (EPP+Envoy) router backend#2697
S1ro1 wants to merge 2 commits into
mainfrom
feat/llm-d-router-v2

S1ro1 commented Jun 3, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

S1ro1 commented Jun 3, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

E2E verification (SLURM)

Install

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 3, 2026

Choose a reason for hiding this comment

Decode sidecar missing extra nodes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

S1ro1 commented Jun 3, 2026 •

edited by cursor Bot

Loading