Skip to content

feat: support tritonPA for rocm decode#835

Closed
liaocz wants to merge 1 commit into
mainfrom
features/rocm-support-triton-pa
Closed

feat: support tritonPA for rocm decode#835
liaocz wants to merge 1 commit into
mainfrom
features/rocm-support-triton-pa

Conversation

@liaocz
Copy link
Copy Markdown
Collaborator

@liaocz liaocz commented Mar 26, 2026

No description provided.

@liaocz liaocz requested a review from LLLLKKKK as a code owner March 26, 2026 11:01
@liaocz liaocz force-pushed the features/rocm-support-triton-pa branch from b680f92 to 5bf289f Compare March 26, 2026 11:09
@LLLLKKKK
Copy link
Copy Markdown
Collaborator

🤖 AI Code Review — PR #835

PR 概述

Title: feat: support tritonPA for rocm decode
Author: liaocz
规模: 4(GitHub) files, ~+20/-10

核心目标

为 ROCm decode 阶段添加 Triton PA 支持:AiterDecodeImplTriton 优先级提升到 ASM 之前,use_triton_pa=true 时禁用 ASM decode,默认关闭。


Review 意见

无 P0/P1 问题。小改动,配置传播完整。

LGTM ready to ci

@LLLLKKKK
Copy link
Copy Markdown
Collaborator

🤖 AI Code Review — PR #835

PR 概述

Title: feat: support tritonPA for rocm decode
Author: liaocz
规模: 4 files, +15/-3

核心目标

为 ROCm decode 阶段引入 Triton PA 支持,通过优先级调整和互斥逻辑使 Triton PA 在启用时替代 ASM decode。默认关闭(opt-in),用户通过 --use_triton_pa=true 显式启用。


改动逻辑拆解

1. 配置层

  • ConfigModules.huse_triton_pa 默认值 truefalse
  • fmha_group_args.py:新增 --use_triton_pa 命令行参数(default=False

2. 优先级调整

  • __init__.pyAiterDecodeImplTriton 从列表末尾移到首位,优先级最高

3. 互斥逻辑

  • attn_factory.pyAiterDecodeImplAsm 拆出独立分支,use_triton_pa=True 时强制禁用 ASM decode

Review 意见

问题

  1. 缺少测试覆盖 [P1]

    互斥逻辑(use_triton_pa=True 时禁用 ASM decode)和优先级重排是新增行为,但无任何测试。建议添加 _is_fmha_impl_disabled 的参数化单元测试,覆盖 use_triton_pause_asm_pa 的组合场景。

  2. PR description 为空 [P2]

    建议补充动机说明:Triton PA 相比 ASM decode 的优势、为什么默认关闭。

整体评价

小而聚焦的 PR,逻辑清晰,配置传播链完整一致。默认关闭是安全选择。主要风险在于互斥逻辑缺少测试——后续修改 _is_fmha_impl_disabled 条件顺序可能无意中破坏互斥行为。

存在 P1 问题(缺少测试),不建议合入。建议补充测试后再合入。

@LLLLKKKK
Copy link
Copy Markdown
Collaborator

AI Code Review — PR #835

Summary: P0/0 · P1/0 · P2/0 · P3/0

Review status: LGTM

lgtm ready to ci

Strengths

  • Adds targeted smoke test coverage for the ROCm Triton paged attention path (MI308X)
  • Golden JSON is well-formed with deterministic top_k=1 sampling for reproducibility

@liaocz liaocz closed this May 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants