Skip to content

fix prefix default on#784

Merged
valarLip merged 2 commits into
mainfrom
fpz/fix_prefix_default_on
May 14, 2026
Merged

fix prefix default on#784
valarLip merged 2 commits into
mainfrom
fpz/fix_prefix_default_on

Conversation

@jiayyu
Copy link
Copy Markdown
Contributor

@jiayyu jiayyu commented May 14, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Copilot AI review requested due to automatic review settings May 14, 2026 09:09
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a prefix-caching bug where, when an entire prompt is fully cached, prefill ends up with zero tokens to forward and cannot produce logits for the next-token sampler. The fix forces the last full block to be recomputed when all blocks would otherwise be cache hits, and also removes the fp4x2-specific gating that disabled the prefix-cache path in MLA attention.

Changes:

  • In BlockManager.can_allocate and allocate, force the final full block to be treated as a cache miss when every block is a cache hit, ensuring at least one token remains for prefill.
  • In attention_mla.forward_impl_server_mode, drop the kv_b_proj.weight.dtype != fp4x2 guard so the prefix-cache attention branch is taken whenever attn_metadata.has_cached is true.
  • Update tests/test_block_manager.py to reflect the new expected num_cached_tokens values (4 instead of 8; 0 instead of 4) and rename a test accordingly.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
atom/model_engine/block_manager.py Force last full block to recompute when prompt would be fully cached, in both can_allocate and allocate.
atom/model_ops/attention_mla.py Remove fp4x2 dtype gating; use attn_metadata.has_cached directly to choose the prefix-cache attention path.
tests/test_block_manager.py Adjust expected cache-hit counts and rename test_exact_block_size_fully_cached to reflect last-block recompute behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@valarLip valarLip merged commit e77a5ce into main May 14, 2026
39 of 48 checks passed
@valarLip valarLip deleted the fpz/fix_prefix_default_on branch May 14, 2026 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants