Skip to content

[ascend] fix prefix caching#4448

Merged
lvhan028 merged 4 commits intoInternLM:mainfrom
DeepLink-org:prefix_caching
Apr 8, 2026
Merged

[ascend] fix prefix caching#4448
lvhan028 merged 4 commits intoInternLM:mainfrom
DeepLink-org:prefix_caching

Conversation

@yao-fengchen
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates DLInfer paged-attention plumbing to fix prefix caching behavior (notably for Ascend), primarily by renaming the “unpaged prefill” flag to a clearer “prefill w/o cache” concept and by passing q_seqlens into the token-attention kernel path.

Changes:

  • Rename is_unpaged_prefillis_prefill_no_cache across DLInfer attention metadata and backends.
  • Pass q_seqlens through the paged token-attention wrapper into the underlying ext op.
  • Adjust Ascend backend sequence-length/mask preparation logic used for prefix/paged prefill.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
lmdeploy/pytorch/kernels/dlinfer/pagedattention.py Renames prefill flag and threads q_seqlens into token-attention execution.
lmdeploy/pytorch/backends/dlinfer/maca/op_backend.py Updates metadata flag name for prefill/caching behavior.
lmdeploy/pytorch/backends/dlinfer/camb/op_backend.py Updates metadata flag name and related conditional logic.
lmdeploy/pytorch/backends/dlinfer/attention.py Renames metadata field and forwards it into the kernel call.
lmdeploy/pytorch/backends/dlinfer/ascend/op_backend.py Reworks Ascend prefill/prefix-caching preparation (seqlens + attention mask), and updates flag naming.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jinminxi104 jinminxi104 marked this pull request as ready for review April 7, 2026 01:25
@jinminxi104 jinminxi104 requested review from grimoire and lvhan028 April 7, 2026 01:27
@lvhan028 lvhan028 added the Bug:P1 label Apr 8, 2026
@lvhan028 lvhan028 merged commit 2ef9c6b into InternLM:main Apr 8, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants