Fpz/chunk prefill by jiayyu · Pull Request #740 · ROCm/ATOM

jiayyu · 2026-05-11T03:38:59Z

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Copilot

Pull request overview

This PR adds chunked prefill support so prompts can be prefetched in multiple steps when constrained by max_num_batched_tokens, and updates the runtime/attention plumbing to correctly handle prefix-cache + partial-block scheduling. It also renames the “cached tokens” tracking field to num_kv_computed across the engine and updates tests accordingly.

Changes:

Add chunked-prefill scheduling in Scheduler, including partial-prefill tracking and forwarding num_kv_computed into ScheduledBatch.
Extend forward/attention metadata to support partial-prefill execution and correct KV gather behavior when block tables are converted.
Update model runner to optionally skip logits/sampling for “intermediate” prefill chunks, plus test updates and config plumbing.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tests/test_scheduler.py	Updates scheduler tests for chunked prefill behavior and `num_kv_computed` semantics.
tests/test_block_manager.py	Updates block manager tests to use `num_kv_computed`.
tests/conftest.py	Adds `enable_chunked_prefill` to `MockConfig`.
atom/utils/forward_context.py	Adds `Context.is_partial_prefill` and `AttentionMetaData.orig_block_tables`.
atom/model_ops/base_attention.py	Adjusts fp8 dequant path in gather kernel and supports `per_token_quant` plumbing.
atom/model_ops/attentions/backends.py	Uses `num_kv_computed` and fixes slot mapping for partial-block prefills; ensures cu_seqlens buffers are copied.
atom/model_ops/attention_mla.py	Tweaks prefix-cache gating and weight preshuffle handling; fixes head-dim usage for V buffer.
atom/model_ops/attention_mha.py	Tracks cache layout for prefix gather, uses `orig_block_tables`, and passes `per_token_quant`.
atom/model_engine/sequence.py	Renames `num_cached_tokens` to `num_kv_computed` and removes unused cached-block helper.
atom/model_engine/scheduler.py	Implements chunked prefill scheduling/resume, partial-prefill bookkeeping, and batch metadata updates.
atom/model_engine/model_runner.py	Propagates `is_partial_prefill` and skips logits/sampling for intermediate chunks; fixes token indexing for deferred/new decode layout.
atom/model_engine/engine_core.py	Passes `scheduled_batch` into `Scheduler.postprocess()` for KV-progress updates.
atom/model_engine/block_manager.py	Updates prefix-cache hit accounting to increment `num_kv_computed`.
atom/config.py	Adds `enable_chunked_prefill` config flag (default enabled).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+                if num_new_tokens_est > budget_remaining and num_batched_tokens > 0:
+                    self.waiting.appendleft(seq)
+                    break


+                if context.is_partial_prefill:
+                    # B scheme: skip compute_logits for intermediate chunks
+                    logits = None
+                else:
+                    logits = self.model.compute_logits(hidden_states)
        else:


jiayyu added 6 commits May 11, 2026 02:40

clean commits

77a8332

fix bfloat16

3509d61

fix kimi

7307987

fix kimi

4f00c01

refine cu_seqlens_k

0f9aa82

move to aiter

ab067d9

Copilot AI review requested due to automatic review settings May 11, 2026 03:39

Copilot started reviewing on behalf of jiayyu May 11, 2026 03:40 View session

Copilot AI reviewed May 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fpz/chunk prefill#740

Fpz/chunk prefill#740
jiayyu wants to merge 6 commits into
mainfrom
fpz/chunk_prefill

jiayyu commented May 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jiayyu commented May 11, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants