[Do Not Merge] - Add LMCacheConnectorV1 PD diagnostic harness#329
Open
AAbouzeid wants to merge 9 commits into
Open
[Do Not Merge] - Add LMCacheConnectorV1 PD diagnostic harness#329AAbouzeid wants to merge 9 commits into
AAbouzeid wants to merge 9 commits into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds a diagnostic harness and evidence collector for validating vLLM
LMCacheConnectorV1in a 1-prefill / 1-decode disaggregated-prefill setup, then uses that evidence to isolate the kvcached compatibility issue and verify the simplest fix.The validated fix is to run
LMCacheConnectorV1with kvcached's non-compound per-layer KV layout:With the default compound layout, kvcached exposes per-layer KV tensors as non-contiguous layer-interleaved views that LMCache cannot normalize. With
KVCACHED_CONTIGUOUS_LAYOUT=false, LMCache sees contiguous per-layer KV tensors, the prefiller store path succeeds, decoder retrieval succeeds, and the run reports non-zero LMCache hit tokens.Added
experiments/12_lmcache_connector_v1_debug.shLMCacheConnectorV1with LMCache PD/NIXL config.kv_transfer_params.disagg_specso the prefiller has decoder receiver metadata.RUN_WITH_KVCACHED=0plain baseline andRUN_WITH_KVCACHED=1kvcached mode.KV_LAYOUT_DIAG=1to log KV tensor shape, stride, storage offset, storage pointer, and contiguity.experiments/collect_lmcache_connector_v1_evidence.shexperiments/lmcache_connector_v1_validation.mdFix / Compatibility Mode
LMCacheConnectorV1should not use kvcached's current compound contiguous layout. It should run with:Why:
Implementation options after this diagnostic PR:
kv_connector='LMCacheConnectorV1'.KVCACHED_CONTIGUOUS_LAYOUT=falseas required forLMCacheConnectorV1until auto-detection is added.LMCacheConnectorV1is used with the compound layout.Commands Used
Plain baseline:
kvcached default compound layout, expected failure:
kvcached non-compound layout, verified pass:
Evidence collection:
Environment
Evidence Bundles
These evidence tarballs are committed under
experiments/evidence/lmcache_connector_v1/. Their filenames and top-level archive directories are timestamp-free so the artifacts are stable in the repository and PR discussion.experiments/evidence/lmcache_connector_v1/lmcache_connector_v1_plain_hits.tar.gz- plain vLLM baseline with LMCacheConnectorV1 completing and reporting 512-token LMCache hits.experiments/evidence/lmcache_connector_v1/lmcache_connector_v1_kvcached_default_failure.tar.gz- initial kvcached default-layout failure in LMCache GPU KV store.experiments/evidence/lmcache_connector_v1/lmcache_connector_v1_plain_layout_diag.tar.gz- plain vLLM layout diagnostic control showing contiguous per-layer KV tensors.experiments/evidence/lmcache_connector_v1/lmcache_connector_v1_kvcached_compound_layout_failure_diag.tar.gz- kvcached default compound-layout diagnostic showing layer-interleaved non-contiguous KV views and the LMCacheValueError.experiments/evidence/lmcache_connector_v1/lmcache_connector_v1_kvcached_noncompound_layout_fix_pass.tar.gz- passing fix proof withKVCACHED_CONTIGUOUS_LAYOUT=false, contiguous LMCache-visible per-layer KV tensors, successful prefiller/decoder requests, and 512-token LMCache hits.Evidence: Plain vLLM LMCacheConnectorV1 Works
Run ID:
plain_lmcache_hits_1The plain baseline completed and reached real LMCache retrieve paths with non-zero hit tokens.
Prefiller:
Decoder:
Plain per-layer KV tensors are already contiguous:
Evidence: kvcached Compound Layout Fails
Run ID:
kvcached_layout_diag_1The kvcached default run starts both vLLM instances, but the first prefiller request returns HTTP 500.
Proxy:
The failure is not missing PD request metadata. The failing scheduler output includes:
Root stack:
kvcached was asked for the normal vLLM logical KV shape:
But it returned layer views into a shared compound backing allocation:
LMCache receives the same view and rejects it:
The key mismatch is the hidden layer interleaving. For a logical NHD tensor shaped
[2, NB, 16, 2, 128], the expected contiguous block stride is4096. The compound kvcached view has block stride229376, which is4096 * 56, i.e.28layers times2KV buffers.Evidence: Non-Compound Layout Fix Works
Run ID:
kvcached_noncontig_lmcache_1Evidence bundle:
The same kvcached + LMCacheConnectorV1 run passes when launched with
KVCACHED_CONTIGUOUS_LAYOUT=false.Harness result:
Requests completed through the proxy:
The prefiller stores the first long prompt and hits on the shared prefix for the second:
The decoder retrieves from LMCache for both requests:
Most importantly, LMCache now sees contiguous per-layer tensors:
This is the proof of the fix: the layout incompatibility disappears, LMCache's store/retrieve paths execute, and the run reports real 512-token LMCache hits.
Notes
The repeated LMCache log line below appears in both failing and passing runs and is not the root failure:
The actual failing condition was the compound-layout tensor stride mismatch. That condition is absent in the
KVCACHED_CONTIGUOUS_LAYOUT=falsepassing run.