[Metax][CI]: skip trap asm on MetaX GPU to fix compile error#7905
[Metax][CI]: skip trap asm on MetaX GPU to fix compile error#7905Tryorish wants to merge 6 commits into
Conversation
|
Thanks for your contribution! |
There was a problem hiding this comment.
Pull request overview
该 PR 旨在解决 MetaX GPU 编译自定义算子时由于内联汇编 trap 指令导致的编译错误,通过条件编译在 MetaX 路径下跳过该指令。
Changes:
- 在
prefill_absorb_cache_kernel的一致性检查分支中,对asm volatile("trap;")增加PADDLE_WITH_CUSTOM_DEVICE_METAX_GPU的条件编译屏蔽。
补充(流程规范):
- PR 标题当前为
fix(metax): ...,未遵循模板要求的[CLASS]Title/ Tag 形式;建议例如使用[Metax][BugFix] Skip trap asm on MetaX GPU to fix compile error(按仓库约定选择合适标签)。 - PR 描述中的 Motivation / Modifications / Tests 等内容为空,建议补充“为何需要跳过 trap、对行为影响是什么、是否有对应编译/运行验证”。
CI报告基于以下代码生成(30分钟更新一次): 1 任务总览Required 任务已全部通过(10/10),当前 4 个失败任务均为 Optional,不阻塞合并;从 Required CI 角度建议通过,Optional 失败供维护者参考。
2 任务状态汇总日志列说明:失败任务直接使用日志链接;运行中任务使用 Job 链接。 2.1 Required任务 : 10/10 通过
2.2 可选任务 — 28/32 通过
3 失败详情(仅 required)无 required 失败任务。 4 代码上下文核查
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #7905 +/- ##
==========================================
Coverage ? 64.03%
==========================================
Files ? 467
Lines ? 64965
Branches ? 9962
==========================================
Hits ? 41599
Misses ? 20542
Partials ? 2824
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| #ifndef PADDLE_WITH_CUSTOM_DEVICE_METAX_GPU | ||
| const int32_t block_idx1 = slot_mapping[token_idx] / block_size; | ||
| if (block_idx1 != block_idx) { | ||
| printf("block_idx1 %d != block_idx %d\n", block_idx1, block_idx); | ||
| printf("token_idx %d\n", token_idx); | ||
| printf("slot_mapping %d\n", slot_mapping[token_idx]); | ||
| asm volatile("trap;"); | ||
| } | ||
| #endif |
| printf("block_idx1 %d != block_idx %d\n", block_idx1, block_idx); | ||
| printf("token_idx %d\n", token_idx); | ||
| printf("slot_mapping %d\n", slot_mapping[token_idx]); |
This reverts commit 34dfe5c.
| if (block_idx1 != block_idx) { | ||
| printf("block_idx1 %d != block_idx %d\n", block_idx1, block_idx); | ||
| printf("token_idx %d\n", token_idx); | ||
| printf("slot_mapping %d\n", slot_mapping[token_idx]); |
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-05-25 17:43:04
📋 Review 摘要
PR 概述:在 prefill_absorb_cache_kernel 中用预处理宏跳过 MetaX GPU 不支持的 asm volatile("trap;") 指令,修复 MetaX CI 编译错误。
变更范围:custom_ops/gpu_ops/append_attn/mla_cache_kernel.cuh
影响面 Tag:[Metax] [OP]
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 📝 PR 规范 | — | PR 描述各章节未填写,标题含多余冒号 |
| ❓ 疑问 | mla_cache_kernel.cuh:215 |
#ifndef 保护范围是否过宽(若 MetaX 支持 printf,可仅保护 trap 行) |
📝 PR 规范检查
PR 描述的 Motivation / Modifications / Usage or Command / Accuracy Tests 各节均未填写,仅保留模板注释;标题 [Metax][CI]: 中含多余冒号(: 非标准格式)。
标题建议(可直接复制):
[Metax][CI] Skip trap asm on MetaX GPU to fix compile error
PR 描述建议(可直接复制):
## Motivation
MetaX GPU 不支持 NVIDIA 特有的 `asm volatile("trap;")` 内联汇编指令,在 MetaX 设备上编译 `mla_cache_kernel.cuh` 时出现编译错误。通过预处理宏 `PADDLE_WITH_CUSTOM_DEVICE_METAX_GPU` 跳过该指令,修复 MetaX CI 编译失败问题。
## Modifications
- `custom_ops/gpu_ops/append_attn/mla_cache_kernel.cuh`:在 `prefill_absorb_cache_kernel` 函数中,用 `#ifndef PADDLE_WITH_CUSTOM_DEVICE_METAX_GPU` 宏包裹包含 `asm volatile("trap;")` 的 block_idx 一致性校验块,避免 MetaX 设备编译失败。
## Usage or Command
N/A
## Accuracy Tests
N/A
## Checklist
- [x] Add at least a tag in the PR title.
- Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
- You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.总体评价
变更逻辑简单清晰,有效修复了 MetaX 编译错误。建议确认 #ifndef 保护范围是否过宽(若 MetaX 支持 kernel printf,应仅保护 trap 行以保留诊断能力),并补全 PR 描述。
| const uint32_t block_idx = block_table_now[ori_seq_id / block_size]; | ||
| const uint32_t block_offset = ori_seq_id % block_size; | ||
|
|
||
| #ifndef PADDLE_WITH_CUSTOM_DEVICE_METAX_GPU |
There was a problem hiding this comment.
❓ 疑问:#ifndef 保护范围是否过宽?
asm volatile("trap;") 是 NVIDIA 特有指令,不支持 MetaX 是预期的。但当前实现将整个校验块(含 printf 诊断打印)一并用宏跳过,若 MetaX GPU 支持 kernel printf,则 block_idx 不一致时 MetaX 上将完全静默,丢失诊断信息。
请确认:MetaX 是否支持 CUDA kernel 中的 printf?
- 若支持,建议仅保护
asm volatile("trap;")一行:
const int32_t block_idx1 = slot_mapping[token_idx] / block_size;
if (block_idx1 != block_idx) {
printf("block_idx1 %d != block_idx %d\n", block_idx1, block_idx);
printf("token_idx %d\n", token_idx);
printf("slot_mapping %d\n", slot_mapping[token_idx]);
#ifndef PADDLE_WITH_CUSTOM_DEVICE_METAX_GPU
asm volatile("trap;");
#endif
}- 若不支持,当前实现正确,请在 PR 描述中说明原因。
Motivation
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.