[PD]: Bidirectional KV transfer#2522
Conversation
b6c170f to
a3f8a6d
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit a3f8a6d. Configure here.
| """Ensure X-Session-ID header is always set for sticky DP-aware routing at the inference router.""" | ||
| """Ensure stable routing headers are set for inference routers.""" | ||
| self.orchestrator.client.extra_headers_from_state.setdefault("X-Session-ID", "example_id") | ||
| self.orchestrator.client.extra_headers_from_state.setdefault("X-Conversation-ID", "trajectory_id") |
There was a problem hiding this comment.
Nonexistent state field used for conversation header mapping
High Severity
The extra_headers_from_state dict maps header names to rollout state field names. The value "trajectory_id" doesn't appear to exist as a field in the rollout state dict anywhere in src/. Searching the codebase, "trajectory_id" only appears in test fixtures, never in the actual orchestrator state or rollout output dictionaries. By contrast, "example_id" (used for X-Session-ID) is a well-established state field found in buffer.py and envs.py. At runtime, the X-Conversation-ID header will likely be empty or cause an error when the framework tries to read "trajectory_id" from the state, breaking bidirectional KV routing which depends on this header.
Reviewed by Cursor Bugbot for commit a3f8a6d. Configure here.
| "router_cache_ttl_seconds must be less than abort_timeout_seconds " | ||
| f"({self.router_cache_ttl_seconds} >= {self.abort_timeout_seconds})" | ||
| ) | ||
| return self |
There was a problem hiding this comment.
Auto-computed TTL can be zero for small timeouts
Low Severity
When abort_timeout_seconds is 1 (the minimum allowed by gt=0), the auto-computed router_cache_ttl_seconds becomes int(1 * 0.95) = 0. This bypasses the field's gt=0 constraint since model validators run after field validation and don't re-validate. The resulting value 0 gets passed to --pd-kv-cache-ttl-secs 0, which may cause unexpected router behavior.
Reviewed by Cursor Bugbot for commit a3f8a6d. Configure here.
4e7b366 to
41c05cb
Compare
bb49a85 to
5135c6c
Compare
5135c6c to
c732fe0
Compare
samsja
left a comment
There was a problem hiding this comment.
yeah lest merge v21 and we can merge this one right after


Sibling to PrimeIntellect-ai/router#36
Summary
vllm==0.21.0and the sibling router branch at1a441d6for optional bidirectional P/D KV transfer while we sort the router wheel packaging.kv_transport_config = NixlTransportConfig()under disaggregated inference config withtype = "nixl", bidirectional disabled by default,kv_recompute_threshold, NIXL abort timeout, and router TTL defaults.X-Conversation-IDfromtrajectory_idandX-Session-IDfromexample_id.Validation
source .env && uv sync --all-extrassource .env && uv sync --all-extras --lockedcargo build --all-targetscargo testNote
Medium Risk
Medium risk: upgrades and re-pins core inference dependencies (
vllm/vllm-router) and changes SLURM launch-time KV-transfer/router settings, which can affect inference stability and routing behavior.Overview
Adds a new
NixlTransportConfigunder disaggregated inference deployments to parameterize NIXL KV-transfer behavior (threads, recompute threshold, abort timeout, and router KV-metadata TTL), with bidirectional KV reuse optional and disabled by default.Plumbs these settings through the inference/RL entrypoints and SLURM templates: exports NIXL abort timeouts, injects NIXL connector extra config (including bidirectional flags), and configures
vllm-routerwith a per-deployment--policyplus--pd-kv-cache-ttl-secswhen bidirectional is enabled. RL sessions now also default anX-Conversation-IDheader alongsideX-Session-IDfor stable router routing.Updates dependency pins to
vllm==0.21.0, switchesvllm-routerfrom a release wheel to a git rev, addstokenspeed-mla, and removes a now-unneeded DeepGEMM SiLU/mul Triton monkey patch.Reviewed by Cursor Bugbot for commit 4f60a7f. Bugbot is set up for automated code reviews on this repo. Configure here.