[ascend] fix prefix caching by yao-fengchen · Pull Request #4448 · InternLM/lmdeploy

yao-fengchen · 2026-03-23T06:38:52Z

No description provided.

lmdeploy/pytorch/backends/dlinfer/ascend/op_backend.py

Copilot

Pull request overview

This PR updates DLInfer paged-attention plumbing to fix prefix caching behavior (notably for Ascend), primarily by renaming the “unpaged prefill” flag to a clearer “prefill w/o cache” concept and by passing q_seqlens into the token-attention kernel path.

Changes:

Rename is_unpaged_prefill → is_prefill_no_cache across DLInfer attention metadata and backends.
Pass q_seqlens through the paged token-attention wrapper into the underlying ext op.
Adjust Ascend backend sequence-length/mask preparation logic used for prefix/paged prefill.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
lmdeploy/pytorch/kernels/dlinfer/pagedattention.py	Renames prefill flag and threads `q_seqlens` into token-attention execution.
lmdeploy/pytorch/backends/dlinfer/maca/op_backend.py	Updates metadata flag name for prefill/caching behavior.
lmdeploy/pytorch/backends/dlinfer/camb/op_backend.py	Updates metadata flag name and related conditional logic.
lmdeploy/pytorch/backends/dlinfer/attention.py	Renames metadata field and forwards it into the kernel call.
lmdeploy/pytorch/backends/dlinfer/ascend/op_backend.py	Reworks Ascend prefill/prefix-caching preparation (seqlens + attention mask), and updates flag naming.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lmdeploy/pytorch/backends/dlinfer/ascend/op_backend.py

yao-fengchen and others added 3 commits March 23, 2026 02:41

fix prefix caching

44b7c2b

change attention layout from BSH to TND

16d847e

Merge branch 'main' into prefix_caching

85cc330

jinminxi104 requested changes Apr 2, 2026

View reviewed changes

lmdeploy/pytorch/backends/dlinfer/ascend/op_backend.py Outdated Show resolved Hide resolved

lmdeploy/pytorch/backends/dlinfer/ascend/op_backend.py Show resolved Hide resolved

jinminxi104 requested a review from Copilot April 2, 2026 12:06

Copilot started reviewing on behalf of jinminxi104 April 2, 2026 12:07 View session

Copilot AI reviewed Apr 2, 2026

View reviewed changes

lmdeploy/pytorch/backends/dlinfer/ascend/op_backend.py Show resolved Hide resolved

lmdeploy/pytorch/backends/dlinfer/ascend/op_backend.py Outdated Show resolved Hide resolved

remove unused comments

5c89375

jinminxi104 marked this pull request as ready for review April 7, 2026 01:25

jinminxi104 approved these changes Apr 7, 2026

View reviewed changes

jinminxi104 requested review from grimoire and lvhan028 April 7, 2026 01:27

grimoire approved these changes Apr 7, 2026

View reviewed changes

lvhan028 added the Bug:P1 label Apr 8, 2026

lvhan028 merged commit 2ef9c6b into InternLM:main Apr 8, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ascend] fix prefix caching#4448

[ascend] fix prefix caching#4448
lvhan028 merged 4 commits intoInternLM:mainfrom
DeepLink-org:prefix_caching

yao-fengchen commented Mar 23, 2026

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

yao-fengchen commented Mar 23, 2026

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants