Skip to content

[Metax][CI]: skip trap asm on MetaX GPU to fix compile error#7905

Open
Tryorish wants to merge 6 commits into
PaddlePaddle:developfrom
Tryorish:fix-ci
Open

[Metax][CI]: skip trap asm on MetaX GPU to fix compile error#7905
Tryorish wants to merge 6 commits into
PaddlePaddle:developfrom
Tryorish:fix-ci

Conversation

@Tryorish
Copy link
Copy Markdown
Contributor

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick,PR标题需遵循格式,在最开始加上[Cherry-Pick]标签,以及最后面加上原PR ID,例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

Copilot AI review requested due to automatic review settings May 25, 2026 02:52
@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented May 25, 2026

Thanks for your contribution!

@paddle-bot paddle-bot Bot added the contributor External developers label May 25, 2026
@Tryorish Tryorish changed the title fix(metax): skip trap asm on MetaX GPU to fix compile error [Metax][CI]: skip trap asm on MetaX GPU to fix compile error May 25, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

该 PR 旨在解决 MetaX GPU 编译自定义算子时由于内联汇编 trap 指令导致的编译错误,通过条件编译在 MetaX 路径下跳过该指令。

Changes:

  • prefill_absorb_cache_kernel 的一致性检查分支中,对 asm volatile("trap;") 增加 PADDLE_WITH_CUSTOM_DEVICE_METAX_GPU 的条件编译屏蔽。

补充(流程规范):

  • PR 标题当前为 fix(metax): ...,未遵循模板要求的 [CLASS]Title / Tag 形式;建议例如使用 [Metax][BugFix] Skip trap asm on MetaX GPU to fix compile error(按仓库约定选择合适标签)。
  • PR 描述中的 Motivation / Modifications / Tests 等内容为空,建议补充“为何需要跳过 trap、对行为影响是什么、是否有对应编译/运行验证”。

Comment thread custom_ops/gpu_ops/append_attn/mla_cache_kernel.cuh Outdated
Comment thread custom_ops/gpu_ops/append_attn/mla_cache_kernel.cuh
PaddlePaddle-bot

This comment was marked as outdated.

Kane2011
Kane2011 previously approved these changes May 25, 2026
PaddlePaddle-bot

This comment was marked as outdated.

@PaddlePaddle-bot
Copy link
Copy Markdown

PaddlePaddle-bot commented May 25, 2026

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-26 06:49:17

CI报告基于以下代码生成(30分钟更新一次):


1 任务总览

Required 任务已全部通过(10/10),当前 4 个失败任务均为 Optional,不阻塞合并;从 Required CI 角度建议通过,Optional 失败供维护者参考。

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
42(0) 42 38 4 0 0 0

2 任务状态汇总

日志列说明:失败任务直接使用日志链接;运行中任务使用 Job 链接。

2.1 Required任务 : 10/10 通过

必选任务阻塞合并,失败需优先处理。

状态 任务 耗时 根因 修复建议 日志 重跑
其余 10 个必选任务通过 - - - - -

2.2 可选任务 — 28/32 通过

可选任务不阻塞合并,失败仅供参考。

状态 任务 耗时 日志 重跑
Run iluvatar Tests / run_iluvatar_cases 1m59s Job -
Check PR Template 22s Job -
Trigger Jenkins for PR 1m59s Job -
CI_HPU 1h4m Job -
其余 28 个可选任务通过 - - -

3 失败详情(仅 required)

无 required 失败任务。


4 代码上下文核查

  • 本 PR 仅修改 custom_ops/gpu_ops/append_attn/mla_cache_kernel.cuh,在 PADDLE_WITH_CUSTOM_DEVICE_METAX_GPU 下跳过 CUDA asm volatile("trap;") 调试断言。
  • custom_ops/setup_ops.py 的 MetaX 编译参数会注入 -DPADDLE_WITH_CUSTOM_DEVICE_METAX_GPU,仓库内已有多处同类 MetaX 条件编译。
  • 当前 Required CI 全部通过,未发现本次变更导致阻塞性 CI 失败。

PaddlePaddle-bot

This comment was marked as outdated.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@e56d9ff). Learn more about missing BASE report.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7905   +/-   ##
==========================================
  Coverage           ?   64.03%           
==========================================
  Files              ?      467           
  Lines              ?    64965           
  Branches           ?     9962           
==========================================
  Hits               ?    41599           
  Misses             ?    20542           
  Partials           ?     2824           
Flag Coverage Δ
GPU 73.14% <ø> (?)
XPU 7.07% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot AI review requested due to automatic review settings May 25, 2026 08:33
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comment on lines +215 to +223
#ifndef PADDLE_WITH_CUSTOM_DEVICE_METAX_GPU
const int32_t block_idx1 = slot_mapping[token_idx] / block_size;
if (block_idx1 != block_idx) {
printf("block_idx1 %d != block_idx %d\n", block_idx1, block_idx);
printf("token_idx %d\n", token_idx);
printf("slot_mapping %d\n", slot_mapping[token_idx]);
asm volatile("trap;");
}
#endif
Comment on lines 218 to 220
printf("block_idx1 %d != block_idx %d\n", block_idx1, block_idx);
printf("token_idx %d\n", token_idx);
printf("slot_mapping %d\n", slot_mapping[token_idx]);
PaddlePaddle-bot

This comment was marked as outdated.

This reverts commit 34dfe5c.
PaddlePaddle-bot

This comment was marked as outdated.

Copilot AI review requested due to automatic review settings May 25, 2026 09:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

if (block_idx1 != block_idx) {
printf("block_idx1 %d != block_idx %d\n", block_idx1, block_idx);
printf("token_idx %d\n", token_idx);
printf("slot_mapping %d\n", slot_mapping[token_idx]);
PaddlePaddle-bot

This comment was marked as outdated.

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-05-25 17:43:04

📋 Review 摘要

PR 概述:在 prefill_absorb_cache_kernel 中用预处理宏跳过 MetaX GPU 不支持的 asm volatile("trap;") 指令,修复 MetaX CI 编译错误。
变更范围custom_ops/gpu_ops/append_attn/mla_cache_kernel.cuh
影响面 Tag[Metax] [OP]

问题

级别 文件 概述
📝 PR 规范 PR 描述各章节未填写,标题含多余冒号
❓ 疑问 mla_cache_kernel.cuh:215 #ifndef 保护范围是否过宽(若 MetaX 支持 printf,可仅保护 trap 行)

📝 PR 规范检查

PR 描述的 Motivation / Modifications / Usage or Command / Accuracy Tests 各节均未填写,仅保留模板注释;标题 [Metax][CI]: 中含多余冒号(: 非标准格式)。

标题建议(可直接复制):

  • [Metax][CI] Skip trap asm on MetaX GPU to fix compile error

PR 描述建议(可直接复制):

## Motivation
MetaX GPU 不支持 NVIDIA 特有的 `asm volatile("trap;")` 内联汇编指令,在 MetaX 设备上编译 `mla_cache_kernel.cuh` 时出现编译错误。通过预处理宏 `PADDLE_WITH_CUSTOM_DEVICE_METAX_GPU` 跳过该指令,修复 MetaX CI 编译失败问题。

## Modifications
- `custom_ops/gpu_ops/append_attn/mla_cache_kernel.cuh`:在 `prefill_absorb_cache_kernel` 函数中,用 `#ifndef PADDLE_WITH_CUSTOM_DEVICE_METAX_GPU` 宏包裹包含 `asm volatile("trap;")` 的 block_idx 一致性校验块,避免 MetaX 设备编译失败。

## Usage or Command
N/A

## Accuracy Tests
N/A

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

变更逻辑简单清晰,有效修复了 MetaX 编译错误。建议确认 #ifndef 保护范围是否过宽(若 MetaX 支持 kernel printf,应仅保护 trap 行以保留诊断能力),并补全 PR 描述。

const uint32_t block_idx = block_table_now[ori_seq_id / block_size];
const uint32_t block_offset = ori_seq_id % block_size;

#ifndef PADDLE_WITH_CUSTOM_DEVICE_METAX_GPU
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ 疑问#ifndef 保护范围是否过宽?

asm volatile("trap;") 是 NVIDIA 特有指令,不支持 MetaX 是预期的。但当前实现将整个校验块(含 printf 诊断打印)一并用宏跳过,若 MetaX GPU 支持 kernel printf,则 block_idx 不一致时 MetaX 上将完全静默,丢失诊断信息。

请确认:MetaX 是否支持 CUDA kernel 中的 printf

  • 支持,建议仅保护 asm volatile("trap;") 一行:
const int32_t block_idx1 = slot_mapping[token_idx] / block_size;
if (block_idx1 != block_idx) {
  printf("block_idx1 %d != block_idx %d\n", block_idx1, block_idx);
  printf("token_idx %d\n", token_idx);
  printf("slot_mapping %d\n", slot_mapping[token_idx]);
#ifndef PADDLE_WITH_CUSTOM_DEVICE_METAX_GPU
  asm volatile("trap;");
#endif
}
  • 不支持,当前实现正确,请在 PR 描述中说明原因。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants