Skip to content

[PD]: Bidirectional KV transfer#2522

Open
S1ro1 wants to merge 2 commits into
mainfrom
feat/bidirectional-pd-kv-transfer
Open

[PD]: Bidirectional KV transfer#2522
S1ro1 wants to merge 2 commits into
mainfrom
feat/bidirectional-pd-kv-transfer

Conversation

@S1ro1
Copy link
Copy Markdown
Collaborator

@S1ro1 S1ro1 commented May 16, 2026

Sibling to PrimeIntellect-ai/router#36

Summary

  • Pins vllm==0.21.0 and the sibling router branch at 1a441d6 for optional bidirectional P/D KV transfer while we sort the router wheel packaging.
  • Adds kv_transport_config = NixlTransportConfig() under disaggregated inference config with type = "nixl", bidirectional disabled by default, kv_recompute_threshold, NIXL abort timeout, and router TTL defaults.
  • Plumbs the config through inference/RL entrypoints and slurm templates, including X-Conversation-ID from trajectory_id and X-Session-ID from example_id.

Validation

  • source .env && uv sync --all-extras
  • source .env && uv sync --all-extras --locked
  • Router sibling: cargo build --all-targets
  • Router sibling: cargo test

Note

Medium Risk
Medium risk: upgrades and re-pins core inference dependencies (vllm/vllm-router) and changes SLURM launch-time KV-transfer/router settings, which can affect inference stability and routing behavior.

Overview
Adds a new NixlTransportConfig under disaggregated inference deployments to parameterize NIXL KV-transfer behavior (threads, recompute threshold, abort timeout, and router KV-metadata TTL), with bidirectional KV reuse optional and disabled by default.

Plumbs these settings through the inference/RL entrypoints and SLURM templates: exports NIXL abort timeouts, injects NIXL connector extra config (including bidirectional flags), and configures vllm-router with a per-deployment --policy plus --pd-kv-cache-ttl-secs when bidirectional is enabled. RL sessions now also default an X-Conversation-ID header alongside X-Session-ID for stable router routing.

Updates dependency pins to vllm==0.21.0, switches vllm-router from a release wheel to a git rev, adds tokenspeed-mla, and removes a now-unneeded DeepGEMM SiLU/mul Triton monkey patch.

Reviewed by Cursor Bugbot for commit 4f60a7f. Bugbot is set up for automated code reviews on this repo. Configure here.

@S1ro1 S1ro1 force-pushed the feat/bidirectional-pd-kv-transfer branch from b6c170f to a3f8a6d Compare May 16, 2026 23:36
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit a3f8a6d. Configure here.

"""Ensure X-Session-ID header is always set for sticky DP-aware routing at the inference router."""
"""Ensure stable routing headers are set for inference routers."""
self.orchestrator.client.extra_headers_from_state.setdefault("X-Session-ID", "example_id")
self.orchestrator.client.extra_headers_from_state.setdefault("X-Conversation-ID", "trajectory_id")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nonexistent state field used for conversation header mapping

High Severity

The extra_headers_from_state dict maps header names to rollout state field names. The value "trajectory_id" doesn't appear to exist as a field in the rollout state dict anywhere in src/. Searching the codebase, "trajectory_id" only appears in test fixtures, never in the actual orchestrator state or rollout output dictionaries. By contrast, "example_id" (used for X-Session-ID) is a well-established state field found in buffer.py and envs.py. At runtime, the X-Conversation-ID header will likely be empty or cause an error when the framework tries to read "trajectory_id" from the state, breaking bidirectional KV routing which depends on this header.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit a3f8a6d. Configure here.

"router_cache_ttl_seconds must be less than abort_timeout_seconds "
f"({self.router_cache_ttl_seconds} >= {self.abort_timeout_seconds})"
)
return self
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-computed TTL can be zero for small timeouts

Low Severity

When abort_timeout_seconds is 1 (the minimum allowed by gt=0), the auto-computed router_cache_ttl_seconds becomes int(1 * 0.95) = 0. This bypasses the field's gt=0 constraint since model validators run after field validation and don't re-validate. The resulting value 0 gets passed to --pd-kv-cache-ttl-secs 0, which may cause unexpected router behavior.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit a3f8a6d. Configure here.

@S1ro1 S1ro1 force-pushed the feat/bidirectional-pd-kv-transfer branch 2 times, most recently from 4e7b366 to 41c05cb Compare May 16, 2026 23:59
@S1ro1 S1ro1 force-pushed the feat/bidirectional-pd-kv-transfer branch 2 times, most recently from bb49a85 to 5135c6c Compare May 17, 2026 00:16
@S1ro1 S1ro1 force-pushed the feat/bidirectional-pd-kv-transfer branch from 5135c6c to c732fe0 Compare May 17, 2026 00:18
Copy link
Copy Markdown
Member

@samsja samsja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah lest merge v21 and we can merge this one right after

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants