Skip to content

[PD]: Bidirectional KV transfer#36

Open
S1ro1 wants to merge 1 commit into
mainfrom
feat/bidirectional-pd-kv-transfer
Open

[PD]: Bidirectional KV transfer#36
S1ro1 wants to merge 1 commit into
mainfrom
feat/bidirectional-pd-kv-transfer

Conversation

@S1ro1
Copy link
Copy Markdown

@S1ro1 S1ro1 commented May 16, 2026

Sibling to PrimeIntellect-ai/prime-rl#2522

Summary

  • Adds opt-in Decode -> Prefill KV metadata reuse for vLLM P/D routing behind pd_kv_cache_ttl_secs; default is 0, so the existing path does no extra conversation parsing or decode KV work.
  • Keys reuse only by x-conversation-id, caches only object-valued decode kv_transfer_params, and prunes expired entries before reads and writes.
  • Wires the TTL through Python router args, Rust config, CLI, and the PyO3 binding.

Validation

  • cargo build --all-targets
  • cargo test
  • source .env && uv sync --all-extras from the sibling prime-rl branch after pinning this router commit

Note

Medium Risk
Adds a new in-memory, TTL-governed cache keyed by x-conversation-id that influences request payloads in the vLLM PD router; incorrect TTL sizing or high conversation churn could increase memory use or cause subtle routing/behavior changes in PD flows.

Overview
Enables optional bidirectional KV transfer in vLLM Prefill/Decode mode by caching Decode-side kv_transfer_params per conversation and reusing them on subsequent Prefill requests.

Introduces pd_kv_cache_ttl_secs (default 0 = disabled) wired through Python args/docs, Rust RouterConfig, CLI (--pd-kv-cache-ttl-secs), and the PyO3 binding. When enabled, VllmPDRouter stores kv_transfer_params extracted from Decode responses (including SSE streaming bodies) in a TTL cache keyed by x-conversation-id, evicts expired entries on access, and injects adjusted params into Prefill requests to point back at the Decode instance.

Reviewed by Cursor Bugbot for commit 1a441d6. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Add bidirectional KV transfer support to VllmPDRouter with conversation-scoped caching

  • Adds a decode_kv_cache to VllmPDRouter that stores kv_transfer_params from decode responses, keyed by x-conversation-id header, with a configurable TTL (pd_kv_cache_ttl_secs, default 0/disabled).
  • On subsequent prefill requests within the same conversation, cached decode-side KV params are injected into the prefill request, enabling bidirectional P/D KV transfer.
  • Exposes pd_kv_cache_ttl_secs through the Rust RouterConfig, Python RouterArgs, CLI (--pd-kv-cache-ttl-secs), and the PyO3 Router constructor.
  • Cache eviction runs before each insert, removing entries whose TTL has elapsed.
  • Behavioral Change: when pd_kv_cache_ttl_secs > 0 and requests include x-conversation-id, prefill kv_transfer_params will differ from default based on prior decode responses.

Macroscope summarized 1a441d6.

@S1ro1 S1ro1 force-pushed the feat/bidirectional-pd-kv-transfer branch from ede4930 to 8d9bf63 Compare May 16, 2026 23:44
Comment thread src/routers/http/vllm_pd_router.rs
@macroscopeapp
Copy link
Copy Markdown

macroscopeapp Bot commented May 16, 2026

Approvability

Verdict: Needs human review

1 blocking correctness issue found. This PR introduces substantial new functionality for bidirectional KV cache transfer, adding new caching logic and modifying request processing behavior. Additionally, there is an unresolved review comment identifying a potential bug where non-object cached values could break prefill-decode coordination.

You can customize Macroscope's approvability policy. Learn more.

@S1ro1 S1ro1 force-pushed the feat/bidirectional-pd-kv-transfer branch 2 times, most recently from 926b0c4 to 645c4c0 Compare May 16, 2026 23:55
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 645c4c0. Configure here.

Comment thread src/routers/http/vllm_pd_router.rs Outdated
Comment thread src/routers/http/vllm_pd_router.rs
@S1ro1 S1ro1 force-pushed the feat/bidirectional-pd-kv-transfer branch from 645c4c0 to 1a441d6 Compare May 17, 2026 00:07
@S1ro1 S1ro1 changed the title feat: enable bidirectional pd kv transfer [PD]: Bidirectional KV transfer May 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant