[Feature] support blackwell gemm in ht#7053
Conversation
|
Thanks for your contribution! |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #7053 +/- ##
==========================================
Coverage ? 73.30%
==========================================
Files ? 377
Lines ? 53138
Branches ? 8286
==========================================
Hits ? 38954
Misses ? 11475
Partials ? 2709
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review |
2026-04-02 19:56 CST
📋 Review 摘要
PR 概述:支持高吞吐模式下 Blackwell 架构的高性能 MoE GEMM 后端
变更范围:model_executor/layers/moe/、model_executor/layers/quantization/、envs.py
影响面 Tag:OP Quantization
📝 PR 规范检查
PR 标题和描述基本符合规范。建议补充 Modifications 部分的具体变更说明。
描述补充建议(可直接复制):
## Modifications
1. 新增环境变量 `FD_USE_BLACKWELL_GEMM` 控制 Blackwell GEMM 后端开关
2. 新增 `fused_moe_blackwell_backend.py` 实现 Blackwell 架构的 MoE 计算
3. 修改 `block_wise_fp8.py` 添加 Blackwell 量化方法分支
4. 修改 `fused_moe_triton_backend.py` 支持 Blackwell scale 格式转换
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🟡 建议 | envs.py:79 |
注释复制粘贴错误 |
| 🟡 建议 | fused_moe_blackwell_backend.py:280 |
类 docstring 描述错误 |
| 🟡 建议 | fused_moe_blackwell_backend.py:482 |
重复的解包操作 |
| ❓ 疑问 | fused_moe_triton_backend.py:1716 |
scale 设为 None 可能影响其他代码路径 |
总体评价
代码实现整体结构清晰,继承了 DeepGemm 后端的设计模式。主要问题是一些复制粘贴导致的注释/文档错误和冗余代码,建议修复后合入。
| "FD_PD_CHANGEABLE": lambda: os.getenv("FD_PD_CHANGEABLE", "0"), | ||
| # Whether to use DeepGemm for FP8 blockwise MoE. | ||
| "FD_USE_DEEP_GEMM": lambda: bool(int(os.getenv("FD_USE_DEEP_GEMM", "0"))), | ||
| # Whether to use DeepGemm for FP8 blockwise MoE. |
There was a problem hiding this comment.
🟡 建议 注释复制粘贴错误
当前注释 # Whether to use DeepGemm for FP8 blockwise MoE. 与上一行 FD_USE_DEEP_GEMM 的注释完全相同,应该描述 Blackwell GEMM 的用途。
建议修改为:
# Whether to use Blackwell GEMM for FP8 blockwise MoE (SM100+).
"FD_USE_BLACKWELL_GEMM": lambda: bool(int(os.getenv("FD_USE_BLACKWELL_GEMM", "0"))),|
|
||
|
|
||
| class BlackwellGemmFusedMoeMethod(MoEMethodBase): | ||
| """ |
There was a problem hiding this comment.
🟡 建议 类 docstring 描述错误
当前 docstring 描述的是 DeepGemmFusedMoeMethod,但实际类名是 BlackwellGemmFusedMoeMethod。
建议修改为:
class BlackwellGemmFusedMoeMethod(MoEMethodBase):
"""
BlackwellGemmFusedMoeMethod is a class that implements the MoEMethodBase interface for Blackwell GEMM backend.
"""| global_values[thread_name] = {} | ||
|
|
||
| (recv_x_value, recv_x_scale) = recv_x | ||
| (recv_x_value, recv_x_scale) = recv_x |
There was a problem hiding this comment.
🟡 建议 重复的解包操作
(recv_x_value, recv_x_scale) = recv_x 被连续执行了两次(第481行和第482行),是明显的冗余代码。
建议删除重复行:
(recv_x_value, recv_x_scale) = recv_x
# 删除下面这行重复的解包
qingqing01
left a comment
There was a problem hiding this comment.
后续需要规范此包的使用方式及环境变量、增加单测
好的 算子包发布时会增加使用规范及监控 |
* [Feature] support blackwell gemm in ht * [Feature] support ops for convert * fix cuda error 716 * fix cuda error * opt memory * remove unused code
Motivation
支持高吞吐模式下高性能moe gemm backend,使用需配合算子仓库 使用方式:
export FD_USE_BLACKWELL_GEMM=1
当前需与deepgemm同时开启
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.