Skip to content

[https://nvbugs/6330273][fix] In StorageManager.__init__, when typical_batch is supplied, append a synthetic…#15465

Closed
tensorrt-cicd wants to merge 1 commit into
NVIDIA:mainfrom
tensorrt-cicd:repair-bot-bug6330273
Closed

[https://nvbugs/6330273][fix] In StorageManager.__init__, when typical_batch is supplied, append a synthetic…#15465
tensorrt-cicd wants to merge 1 commit into
NVIDIA:mainfrom
tensorrt-cicd:repair-bot-bug6330273

Conversation

@tensorrt-cicd

@tensorrt-cicd tensorrt-cicd commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Root cause: typical_batch concurrency was never used to floor _min_slots, so windowed pool groups (window_size < tokens_per_block) collapsed to non_stale=1 per request and the absolute floor of 1 slot per pool group caused scheduler deadlock at concurrency > 1.
  • Fix: In StorageManager.init, when typical_batch is supplied, append a synthetic BatchDesc of len(typical_batch.kv_caches) decode requests with capacity=tokens_per_block, history_length=tokens_per_block-1 (yielding non_stale=1 in every PG) before computing _min_slots — flooring every pool group at len(typical_batch.kv_caches).
  • Automated fix generated by repair-bot

Test plan

  • Verify fix on the same GPU type as the original failure
  • Check for regressions in related tests

Links

Summary by CodeRabbit

  • Refactor
    • Enhanced runtime resource allocation efficiency through improved constraint computation logic.

…id windowed-pool deadlock

When KVCacheManagerV2 is built with a typical_batch describing the
working set (e.g., max_batch_size concurrent decode requests with
capacity=max_seq_len), windowed pool groups whose window_size is
smaller than tokens_per_block previously collapsed to min_slots=1
because get_stale_range() consumed all but one block per request, and
_compute_min_slots_from_constraints() only enforced an absolute floor
of 1 slot per pool group. With more than 1 concurrent decode request,
the V2 scheduler could not find a free slot in windowed pools and
deadlocked.

Synthesize a constraint from the typical_batch: one
KVCacheDesc(capacity=tokens_per_block, history_length=tokens_per_block-1)
per request. For every pool group, this yields non_stale=1 per
request, so the new floor is len(typical_batch.kv_caches) — large
enough to support the scheduler's full concurrency.

Signed-off-by: tensorrt-cicd <90828364+tensorrt-cicd@users.noreply.github.com>
@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 577c21a7-dd0a-4224-a923-bde5ffbb63fc

📥 Commits

Reviewing files that changed from the base of the PR and between 42a3e55 and 5740b31.

📒 Files selected for processing (1)
  • tensorrt_llm/runtime/kv_cache_manager_v2/_storage_manager.py

📝 Walkthrough

Walkthrough

In StorageManager.__init__, a synthetic BatchDesc is now derived from typical_batch.kv_caches when present, using a single-decode KVCacheDesc per cache (capacity=tokens_per_block, history_length=tokens_per_block - 1). This synthetic constraint is appended to form effective_constraints, which is then passed to _compute_min_slots_from_constraints instead of the raw constraints or [].

Changes

StorageManager min-slots constraint synthesis

Layer / File(s) Summary
Effective constraints computation in StorageManager.__init__
tensorrt_llm/runtime/kv_cache_manager_v2/_storage_manager.py
Adds logic to build effective_constraints by appending a synthesized BatchDesc (one decode-step KVCacheDesc per KV cache from typical_batch) to the provided constraints, then passes effective_constraints to _compute_min_slots_from_constraints instead of constraints or [].

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related issues

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title references the NVBugs ticket and fix type, and describes the core change: appending a synthetic constraint in StorageManager.init when typical_batch is supplied.
Description check ✅ Passed The description provides root cause analysis, the specific fix implemented, test verification, and links to the bug. However, it lacks some template sections like explicit PR Checklist confirmation.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@lowsfer

lowsfer commented Jun 24, 2026

Copy link
Copy Markdown
Member

Similar fix is already included in #15462

@lowsfer lowsfer closed this Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants