Skip to content

upgrade triton moe config.#7980

Open
xuanyuanminzheng wants to merge 1 commit into
PaddlePaddle:developfrom
xuanyuanminzheng:develop-triton-moe-config
Open

upgrade triton moe config.#7980
xuanyuanminzheng wants to merge 1 commit into
PaddlePaddle:developfrom
xuanyuanminzheng:develop-triton-moe-config

Conversation

@xuanyuanminzheng
Copy link
Copy Markdown
Collaborator

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick,PR标题需遵循格式,在最开始加上[Cherry-Pick]标签,以及最后面加上原PR ID,例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

PaddlePaddle-bot

This comment was marked as outdated.

@PaddlePaddle-bot
Copy link
Copy Markdown

PaddlePaddle-bot commented Jun 2, 2026

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-06-03 15:35:39

CI报告基于以下代码生成(30分钟更新一次):
PR commit: 3ca8136 | Merge base: 529ec9ef (branch: develop)


1 Required任务 : 9/10 通过

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
41(0) 41 36 5 0 0 0
任务 错误类型 置信度 日志
Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage PR问题:测试断言值过时,期望num_stages=4但实际返回5 Job

2 失败详情

🔴 Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — PR问题(置信度: 高)

错误类型: PR问题 | 置信度: 高
分析器: 通用分析(fallback)
失败用例:

用例 错误摘要
tests/layers/test_fused_moe_triton_backend.py::TestTritonMoEMethod::test_get_default_config_num_stages num_stages断言失败:期望4,实际返回5

关键日志:

def test_get_default_config_num_stages(self):
    """M<=32 → num_stages=4; M>32 → num_stages=3."""
    method = backend.TritonMoEMethod()
    cfg32 = method._get_default_config(M=32, E=8)
>   assert cfg32["num_stages"] == 4
E   assert 5 == 4

tests/layers/test_fused_moe_triton_backend.py:1064: AssertionError
  • 根因摘要: PR修改了_get_default_config,将M=32时的num_stages从4改为5,导致测试断言失败

PR将 _get_default_config 从基于简单M阈值的启发式规则替换为GPU感知的查表配置(SM100/SM90分档)。在新配置中,M=32对应 _SM100_CONFIGS[32]["num_stages"]=4_SM90_CONFIGS[32]["num_stages"]=5。CI运行环境为SM90(H100),故返回5,而测试期望的是旧逻辑的4。

修复建议:

  1. 更新 tests/layers/test_fused_moe_triton_backend.py 第1064行(及附近相关断言),将 == 4 改为与新配置逻辑一致的期望值(SM90: M=32对应num_stages=5,M>32对应视具体配置而定)
  2. 或者在测试中通过mock get_sm_version() 固定SM版本,再分别断言SM90和SM100两种配置的期望值,使测试不依赖运行时GPU环境

关联变更: fastdeploy/model_executor/layers/moe/fused_moe_triton_backend.py 第1920-2237行,_get_default_config 方法完整替换

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jun 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@529ec9e). Learn more about missing BASE report.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7980   +/-   ##
==========================================
  Coverage           ?   77.86%           
==========================================
  Files              ?      402           
  Lines              ?    57315           
  Branches           ?     9032           
==========================================
  Hits               ?    44630           
  Misses             ?     9839           
  Partials           ?     2846           
Flag Coverage Δ
GPU 77.86% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-06-03 20:47:45

📋 Review 摘要

PR 概述:为 Triton MoE 后端的 _get_default_config 增加 SM100 (B200) GPU 专用查找表,替代原来仅有的 vLLM 启发式逻辑。
变更范围fastdeploy/model_executor/layers/moe/fused_moe_triton_backend.pytests/layers/test_fused_moe_triton_backend.py
影响面 Tag[OP]

问题

级别 文件 概述
🟡 建议 fused_moe_triton_backend.py:1932 热路径中的函数内 import 语句,建议移至模块顶部
🟡 建议 test_fused_moe_triton_backend.py:1119 测试 docstring 称"等距"但注释内数值显示并不等距,存在自相矛盾

历史 Findings 修复情况

Finding 问题 状态
F1 大型配置字典在方法体内每次调用时重新创建 ⚠️ 仍存在 — _SM100_CONFIGS(约 130 行)仍定义在 if get_sm_version() >= 100: 分支内部,每次调用均重建。建议提升为模块级常量 _SM100_CONFIGS = {...}
F2 参数 E(专家数量)在新实现中完全未使用 ⚠️ 仍存在(SM100 路径)— SM90 默认路径已使用 Etokens_per_expert = M // max(E, 1)),但 SM100 路径完全忽略 E,与函数签名语义不符

📝 PR 规范检查

标题缺少官方 Tag,PR 描述各段落为空,不符合模板要求。

标题建议(可直接复制):

  • [OP] Upgrade triton MoE config with per-GPU tuned lookup tables
PR 描述建议(点击展开,可直接复制)
## Motivation

Replace the heuristic tile config in Triton MoE backend with GPU-aware (SM90/SM100) tuned lookup tables derived from SGLang's autotuning results, improving kernel performance on B200.

## Modifications

- Added SM100 (B200) lookup table `_SM100_CONFIGS` in `_get_default_config`, using nearest-M key selection
- Retained original vLLM-ported heuristic as fallback for non-SM100 GPUs
- Added `get_sm_version()` call to detect GPU architecture at runtime
- Updated unit tests: all existing SM90 tests patched with `_mock_sm90`; new SM100-specific test cases added

## Usage or Command

N/A

## Accuracy Tests

N/A

## Checklist

- [ ] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. SM100 lookup table tests added in `test_fused_moe_triton_backend.py`.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

PR 方向正确,为 B200 补充了专用 Triton 配置表并同步更新了测试。两个历史遗留问题(_SM100_CONFIGS 每次重建、SM100 路径忽略 E 参数)尚未修复,建议一并处理后合入。

M: number of tokens (pre-expansion token count).
E: number of (local) experts.
"""
from fastdeploy.model_executor.utils import get_sm_version
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 热路径中的函数内 import 语句

get_sm_version 在每次调用 _get_default_config 时都会触发一次模块字典查找(即使 Python 会缓存模块对象,from ... import 语句每次仍执行 sys.modules 查找 + 属性获取)。对于 MoE 推理热路径,建议将此 import 移到文件顶部模块级别:

# 文件顶部(其他 import 附近)
from fastdeploy.model_executor.utils import get_sm_version

然后在方法内直接使用 get_sm_version()

assert cfg["BLOCK_SIZE_K"] == 64

def test_get_default_config_sm100_nearest_key(self, monkeypatch):
"""SM100: M=100 is equidistant between 96 and 128; should pick one of them."""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 测试 docstring 与代码注释自相矛盾

docstring 写的是 "M=100 is equidistant between 96 and 128"(等距),但同一函数内的注释已正确计算 abs(96-100)=4abs(128-100)=28,两者相差 7 倍,并不等距。

建议修正 docstring,避免误导后续维护者:

"""SM100: M=100 nearest key is 96 (abs_diff=4) not 128 (abs_diff=28)."""

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants