[Iluvatar] Refactor transpose and reverse_transpose by wuyujiji · Pull Request #8065 · PaddlePaddle/FastDeploy

wuyujiji · 2026-06-22T03:00:36Z

Motivation

修复prefill和decode mxied attention的bug：如果prefill 的本次长度等于 1，但这里把 seq_len == 1 一律当 decode 处理了，导致在reorder prefill和decode hidden state时shape没对上

Modifications

N/A

Usage or Command

N/A

Accuracy Tests

N/A

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

codecov-commenter · 2026-06-22T05:11:42Z

Codecov Report

❌ Patch coverage is 0% with 14 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@165f827). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/worker/iluvatar_model_runner.py	0.00%	8 Missing ⚠️
...rs/backends/iluvatar/attention/mha_attn_backend.py	0.00%	6 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #8065   +/-   ##
==========================================
  Coverage           ?   67.51%           
==========================================
  Files              ?      475           
  Lines              ?    66879           
  Branches           ?    10312           
==========================================
  Hits               ?    45152           
  Misses             ?    18860           
  Partials           ?     2867

Flag	Coverage Δ
GPU	`77.53% <ø> (?)`
XPU	`6.95% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaddlePaddle-bot · 2026-06-22T06:16:42Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-06-22 14:16:10 UTC+08:00

CI报告基于以下代码生成（30分钟更新一次）:
PR commit: c008c42 | Merge base: 165f827 (branch: develop)

1 Required任务 : 9/10 通过

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
42(0)	42	39	3	0	0	0

任务	错误类型	置信度	日志
`Approval`	需要 Approval	高	Job

2 失败详情

🔴 Approval — 需要 Approval（置信度: 高）

该 Job 需要人工 Approval，完成审批后 CI 才会继续执行。

修复建议：请通过人工审批。

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-06-23 11:17:45

📋 Review 摘要

PR 概述：重构 Iluvatar mixed attention hidden states transpose/reverse_transpose，新增设备端 reorder custom op，并调整示例参数。
变更范围：custom_ops/iluvatar_ops/、Iluvatar attention backend/model runner、Iluvatar 安装文档
影响面 Tag：OP Iluvatar Docs

问题

级别	文件	概述
🟡 建议	`custom_ops/iluvatar_ops/mixed_reorder_hidden_states.cu:97`	shape/offset 索引统一使用 32 位 `int`，缺少范围校验，超大 token/hidden 配置下可能溢出

历史 Findings 修复情况

Finding	问题	状态
F1	`is_decode` 时强制 `copy_seq_len = 1` 会丢掉 speculative/MTP decode 的额外 token。	⚠️ 仍存在

📝 PR 规范检查

标题 Tag [Iluvatar] 合规，但 PR 描述中 Modifications 仍为 N/A，且本次新增 custom op 未看到对应单测/精度说明。建议使用下面可直接复制的标题和描述。

标题建议（可直接复制）：

[Iluvatar] Fix mixed attention hidden state reorder

PR 描述建议（点击展开，可直接复制）

## Motivation
修复 Iluvatar mixed attention 中 prefill 与 decode 混合时 hidden states 重排错误：prefill 本次长度为 1 时仍应按 seq_lens_encoder > 0 识别为 prefill，避免 reorder/reverse_reorder 后 shape 与顺序不匹配。

## Modifications
- 新增 `custom_ops/iluvatar_ops/mixed_reorder_hidden_states.cu`，按 `seq_lens_encoder`、`seq_lens_decoder`、`seq_lens_this_time` 在设备端重排 mixed attention hidden states。
- 在 Iluvatar MHA backend 中接入 `mixed_reorder_hidden_states`，并将 decode batch 判定改为 `seq_lens_encoder == 0 && seq_lens_decoder > 0`。
- 将新增 Iluvatar custom op 加入 `custom_ops/setup_ops.py` 编译列表。
- 更新 Iluvatar GPU 安装文档中 PaddleOCR-VL/PaddleOCR-VL-1.6 的示例服务参数。

## Usage or Command
N/A

## Accuracy Tests
N/A

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [x] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

本轮重点审查了新增 Iluvatar reorder custom op、mixed attention 调用链和 runner metadata 初始化。当前仍有一个历史 Bug 未修复，另有新增 custom op 索引类型需要补齐边界保护；建议在修复后补充覆盖 prefill length = 1、mixed prefill/decode 以及 decode 多 token 的回归测试。

PaddlePaddle-bot · 2026-06-23T03:19:49Z

+  auto stream = dev_ctx->stream();
+
+  const auto hidden_shape = hidden_states.shape();
+  const int hidden_dim = hidden_shape[1];


🟡 建议 新增重排 kernel 把 hidden_shape[1] 和 seq_lens_this_time.shape()[0] 收窄成 int，后续 copy_seq_len * hidden_dim、original_offset、reordered_offset 也全部用 32 位索引计算。

这些值来自 tensor shape 和调度 token 数，当前没有任何 <= INT_MAX / 乘法溢出校验；一旦 max_num_batched_tokens * hidden_dim 或 offset 超过 32 位范围，索引会回绕并读写错误位置。FastDeploy checklist 对 custom op 的 shape/offset 承接要求使用 int64_t。

建议修复方式：
将 host 侧 hidden_dim/max_num_seqs 和 kernel 内 original_start、reordered_start、idx、offset 相关变量改为 int64_t 计算；如果 Iluvatar kernel 必须使用 32 位索引，则在 launch 前显式校验 hidden_shape[0] * hidden_shape[1] <= INT_MAX、hidden_shape[1] <= INT_MAX 并给出清晰报错。

wuyujiji had a problem deploying to Metax_ci June 22, 2026 03:00 — with GitHub Actions Error

wuyujiji force-pushed the yuzhe_dev branch from 554b37d to cc3a6c4 Compare June 22, 2026 03:02

wuyujiji had a problem deploying to Metax_ci June 22, 2026 03:02 — with GitHub Actions Error

wuyujiji force-pushed the yuzhe_dev branch from cc3a6c4 to 39285bb Compare June 22, 2026 03:07

wuyujiji had a problem deploying to Metax_ci June 22, 2026 03:07 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

wuyujiji force-pushed the yuzhe_dev branch from 39285bb to c008c42 Compare June 22, 2026 04:37

wuyujiji had a problem deploying to Metax_ci June 22, 2026 04:38 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

[Iluvatar] Refactor transpose and reverse_transpose

5719306

wuyujiji force-pushed the yuzhe_dev branch from c008c42 to 5719306 Compare June 23, 2026 03:03

wuyujiji had a problem deploying to Metax_ci June 23, 2026 03:03 — with GitHub Actions Failure

PaddlePaddle-bot suggested changes Jun 23, 2026

View reviewed changes

Jiang-Jia-Jun approved these changes Jun 23, 2026

View reviewed changes

Jiang-Jia-Jun merged commit 4653221 into PaddlePaddle:develop Jun 23, 2026
40 of 43 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Iluvatar] Refactor transpose and reverse_transpose#8065

[Iluvatar] Refactor transpose and reverse_transpose#8065
Jiang-Jia-Jun merged 1 commit into
PaddlePaddle:developfrom
wuyujiji:yuzhe_dev

wuyujiji commented Jun 22, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov-commenter commented Jun 22, 2026 •

edited

Loading

Uh oh!

PaddlePaddle-bot commented Jun 22, 2026

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

wuyujiji commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov-commenter commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

PaddlePaddle-bot commented Jun 22, 2026

1 Required任务 : 9/10 通过

2 失败详情

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

问题

历史 Findings 修复情况

📝 PR 规范检查

总体评价

Uh oh!

PaddlePaddle-bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wuyujiji commented Jun 22, 2026 •

edited

Loading

codecov-commenter commented Jun 22, 2026 •

edited

Loading