Skip to content

[Iluvatar] Refactor transpose and reverse_transpose#8065

Merged
Jiang-Jia-Jun merged 1 commit into
PaddlePaddle:developfrom
wuyujiji:yuzhe_dev
Jun 23, 2026
Merged

[Iluvatar] Refactor transpose and reverse_transpose#8065
Jiang-Jia-Jun merged 1 commit into
PaddlePaddle:developfrom
wuyujiji:yuzhe_dev

Conversation

@wuyujiji

@wuyujiji wuyujiji commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Motivation

修复prefill和decode mxied attention的bug:如果prefill 的本次长度等于 1,但这里把 seq_len == 1 一律当 decode 处理了,导致在reorder prefill和decode hidden state时shape没对上
image

Modifications

N/A

Usage or Command

N/A

Accuracy Tests

N/A

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

@codecov-commenter

codecov-commenter commented Jun 22, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 0% with 14 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@165f827). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/worker/iluvatar_model_runner.py 0.00% 8 Missing ⚠️
...rs/backends/iluvatar/attention/mha_attn_backend.py 0.00% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #8065   +/-   ##
==========================================
  Coverage           ?   67.51%           
==========================================
  Files              ?      475           
  Lines              ?    66879           
  Branches           ?    10312           
==========================================
  Hits               ?    45152           
  Misses             ?    18860           
  Partials           ?     2867           
Flag Coverage Δ
GPU 77.53% <ø> (?)
XPU 6.95% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@PaddlePaddle-bot

Copy link
Copy Markdown

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-06-22 14:16:10 UTC+08:00

CI报告基于以下代码生成(30分钟更新一次):
PR commit: c008c42 | Merge base: 165f827 (branch: develop)


1 Required任务 : 9/10 通过

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
42(0) 42 39 3 0 0 0
任务 错误类型 置信度 日志
Approval 需要 Approval Job

2 失败详情

🔴 Approval — 需要 Approval(置信度: 高)

该 Job 需要人工 Approval,完成审批后 CI 才会继续执行。

修复建议:请通过人工审批。

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-06-23 11:17:45

📋 Review 摘要

PR 概述:重构 Iluvatar mixed attention hidden states transpose/reverse_transpose,新增设备端 reorder custom op,并调整示例参数。
变更范围custom_ops/iluvatar_ops/、Iluvatar attention backend/model runner、Iluvatar 安装文档
影响面 TagOP Iluvatar Docs

问题

级别 文件 概述
🟡 建议 custom_ops/iluvatar_ops/mixed_reorder_hidden_states.cu:97 shape/offset 索引统一使用 32 位 int,缺少范围校验,超大 token/hidden 配置下可能溢出

历史 Findings 修复情况

Finding 问题 状态
F1 is_decode 时强制 copy_seq_len = 1 会丢掉 speculative/MTP decode 的额外 token。 ⚠️ 仍存在

📝 PR 规范检查

标题 Tag [Iluvatar] 合规,但 PR 描述中 Modifications 仍为 N/A,且本次新增 custom op 未看到对应单测/精度说明。建议使用下面可直接复制的标题和描述。

标题建议(可直接复制):

  • [Iluvatar] Fix mixed attention hidden state reorder
PR 描述建议(点击展开,可直接复制)
## Motivation
修复 Iluvatar mixed attention 中 prefill 与 decode 混合时 hidden states 重排错误:prefill 本次长度为 1 时仍应按 seq_lens_encoder > 0 识别为 prefill,避免 reorder/reverse_reorder 后 shape 与顺序不匹配。

## Modifications
- 新增 `custom_ops/iluvatar_ops/mixed_reorder_hidden_states.cu`,按 `seq_lens_encoder``seq_lens_decoder``seq_lens_this_time` 在设备端重排 mixed attention hidden states。
- 在 Iluvatar MHA backend 中接入 `mixed_reorder_hidden_states`,并将 decode batch 判定改为 `seq_lens_encoder == 0 && seq_lens_decoder > 0`- 将新增 Iluvatar custom op 加入 `custom_ops/setup_ops.py` 编译列表。
- 更新 Iluvatar GPU 安装文档中 PaddleOCR-VL/PaddleOCR-VL-1.6 的示例服务参数。

## Usage or Command
N/A

## Accuracy Tests
N/A

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [x] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

本轮重点审查了新增 Iluvatar reorder custom op、mixed attention 调用链和 runner metadata 初始化。当前仍有一个历史 Bug 未修复,另有新增 custom op 索引类型需要补齐边界保护;建议在修复后补充覆盖 prefill length = 1、mixed prefill/decode 以及 decode 多 token 的回归测试。

auto stream = dev_ctx->stream();

const auto hidden_shape = hidden_states.shape();
const int hidden_dim = hidden_shape[1];

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 新增重排 kernel 把 hidden_shape[1]seq_lens_this_time.shape()[0] 收窄成 int,后续 copy_seq_len * hidden_dimoriginal_offsetreordered_offset 也全部用 32 位索引计算。

这些值来自 tensor shape 和调度 token 数,当前没有任何 <= INT_MAX / 乘法溢出校验;一旦 max_num_batched_tokens * hidden_dim 或 offset 超过 32 位范围,索引会回绕并读写错误位置。FastDeploy checklist 对 custom op 的 shape/offset 承接要求使用 int64_t

建议修复方式:
将 host 侧 hidden_dim/max_num_seqs 和 kernel 内 original_startreordered_startidx、offset 相关变量改为 int64_t 计算;如果 Iluvatar kernel 必须使用 32 位索引,则在 launch 前显式校验 hidden_shape[0] * hidden_shape[1] <= INT_MAXhidden_shape[1] <= INT_MAX 并给出清晰报错。

@Jiang-Jia-Jun Jiang-Jia-Jun merged commit 4653221 into PaddlePaddle:develop Jun 23, 2026
40 of 43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants