Skip to content

fix: cap SWA/chunked-local runtime admission to startup pool-sizing bound#1659

Merged
AlpinDale merged 1 commit into
mainfrom
fix/kv-admission
Apr 28, 2026
Merged

fix: cap SWA/chunked-local runtime admission to startup pool-sizing bound#1659
AlpinDale merged 1 commit into
mainfrom
fix/kv-admission

Conversation

@AlpinDale

Copy link
Copy Markdown
Collaborator

No description provided.

…ound

Signed-off-by: AlpinDale <alpindale@gmail.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 87d8d64015

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

# `sum(reservations) <= pool` <=> `sum(peak_real_held) <= pool`.
# Drift between the two would re-introduce the deadlock from
# issue #39734 or, worse, mid-prefill OOM.
num_required_blocks = min(num_required_blocks, self._max_admission_blocks_per_request)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Compute capped admission against live blocks, not slot length

For running chunked-local requests, this cap is applied before the fast-path subtracts len(req_to_blocks), but req_to_blocks includes historical null placeholders from skipped chunks. Once the slot list grows past the cap, get_num_blocks_to_allocate can return 0 even when a new real KV block is still needed in the current chunk (no new skipped block freed in that step). In Scheduler._schedule_running this underestimation bypasses the free-block guard in KVCacheManager.allocate_slots, and allocate_new_blocks can then raise ValueError when BlockPool.get_new_blocks is called with insufficient free blocks.

Useful? React with 👍 / 👎.

@AlpinDale AlpinDale merged commit 42769da into main Apr 28, 2026
1 check failed
@AlpinDale AlpinDale deleted the fix/kv-admission branch April 28, 2026 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant