upgrade triton moe config.#7980
Conversation
CI报告基于以下代码生成(30分钟更新一次): 1 Required任务 : 9/10 通过
2 失败详情🔴 Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — PR问题(置信度: 高)错误类型: PR问题 | 置信度: 高
关键日志:
PR将 修复建议:
关联变更: |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #7980 +/- ##
==========================================
Coverage ? 77.86%
==========================================
Files ? 402
Lines ? 57315
Branches ? 9032
==========================================
Hits ? 44630
Misses ? 9839
Partials ? 2846
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
3ca8136 to
25702dd
Compare
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-06-03 20:47:45
📋 Review 摘要
PR 概述:为 Triton MoE 后端的 _get_default_config 增加 SM100 (B200) GPU 专用查找表,替代原来仅有的 vLLM 启发式逻辑。
变更范围:fastdeploy/model_executor/layers/moe/fused_moe_triton_backend.py、tests/layers/test_fused_moe_triton_backend.py
影响面 Tag:[OP]
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🟡 建议 | fused_moe_triton_backend.py:1932 |
热路径中的函数内 import 语句,建议移至模块顶部 |
| 🟡 建议 | test_fused_moe_triton_backend.py:1119 |
测试 docstring 称"等距"但注释内数值显示并不等距,存在自相矛盾 |
历史 Findings 修复情况
| Finding | 问题 | 状态 |
|---|---|---|
| F1 | 大型配置字典在方法体内每次调用时重新创建 | _SM100_CONFIGS(约 130 行)仍定义在 if get_sm_version() >= 100: 分支内部,每次调用均重建。建议提升为模块级常量 _SM100_CONFIGS = {...} |
| F2 | 参数 E(专家数量)在新实现中完全未使用 |
E(tokens_per_expert = M // max(E, 1)),但 SM100 路径完全忽略 E,与函数签名语义不符 |
📝 PR 规范检查
标题缺少官方 Tag,PR 描述各段落为空,不符合模板要求。
标题建议(可直接复制):
[OP] Upgrade triton MoE config with per-GPU tuned lookup tables
PR 描述建议(点击展开,可直接复制)
## Motivation
Replace the heuristic tile config in Triton MoE backend with GPU-aware (SM90/SM100) tuned lookup tables derived from SGLang's autotuning results, improving kernel performance on B200.
## Modifications
- Added SM100 (B200) lookup table `_SM100_CONFIGS` in `_get_default_config`, using nearest-M key selection
- Retained original vLLM-ported heuristic as fallback for non-SM100 GPUs
- Added `get_sm_version()` call to detect GPU architecture at runtime
- Updated unit tests: all existing SM90 tests patched with `_mock_sm90`; new SM100-specific test cases added
## Usage or Command
N/A
## Accuracy Tests
N/A
## Checklist
- [ ] Add at least a tag in the PR title.
- Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
- You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. SM100 lookup table tests added in `test_fused_moe_triton_backend.py`.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.总体评价
PR 方向正确,为 B200 补充了专用 Triton 配置表并同步更新了测试。两个历史遗留问题(_SM100_CONFIGS 每次重建、SM100 路径忽略 E 参数)尚未修复,建议一并处理后合入。
| M: number of tokens (pre-expansion token count). | ||
| E: number of (local) experts. | ||
| """ | ||
| from fastdeploy.model_executor.utils import get_sm_version |
There was a problem hiding this comment.
🟡 建议 热路径中的函数内 import 语句
get_sm_version 在每次调用 _get_default_config 时都会触发一次模块字典查找(即使 Python 会缓存模块对象,from ... import 语句每次仍执行 sys.modules 查找 + 属性获取)。对于 MoE 推理热路径,建议将此 import 移到文件顶部模块级别:
# 文件顶部(其他 import 附近)
from fastdeploy.model_executor.utils import get_sm_version然后在方法内直接使用 get_sm_version()。
| assert cfg["BLOCK_SIZE_K"] == 64 | ||
|
|
||
| def test_get_default_config_sm100_nearest_key(self, monkeypatch): | ||
| """SM100: M=100 is equidistant between 96 and 128; should pick one of them.""" |
There was a problem hiding this comment.
🟡 建议 测试 docstring 与代码注释自相矛盾
docstring 写的是 "M=100 is equidistant between 96 and 128"(等距),但同一函数内的注释已正确计算 abs(96-100)=4、abs(128-100)=28,两者相差 7 倍,并不等距。
建议修正 docstring,避免误导后续维护者:
"""SM100: M=100 nearest key is 96 (abs_diff=4) not 128 (abs_diff=28)."""
Motivation
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.