[#9164][feat] AutoDeploy: noaux_tc MoE routing pattern matcher#13765
[#9164][feat] AutoDeploy: noaux_tc MoE routing pattern matcher#13765guan404ming wants to merge 1 commit into
Conversation
📝 WalkthroughWalkthroughThis PR adds pattern-matching and fusion support for DeepSeek-V3's "noaux_tc" MoE routing operator. It introduces helper functions to identify the sigmoid→bias→grouped top-k→outer top-k→gather chain, a new ChangesNoAux TC Pattern Matcher
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tensorrt_llm/_torch/auto_deploy/transform/library/moe_routing.py`:
- Around line 402-427: The current matcher picks the first masked user of
scores_with_bias_node (masked_node) without verifying it is driven by
outer_grp_topk, so a different masked branch can be incorrectly fused; modify
the search for masked_node to only accept a user that is dataflow-descended from
outer_grp_topk (e.g., check that outer_grp_topk is an ancestor/input of the
candidate masked_node or that masked_node.args include outer_grp_topk or
outer_grp_topk.output), otherwise skip; update the loop that finds masked_node
(referencing outer_grp_topk, scores_with_bias_node, masked_node, and
topk_group/_TOPK_OPS) to perform this ancestry/input check before selecting the
masked_node.
- Around line 307-332: The _walk_div_then_mul function only matches a divisor
that is exactly sum(cur, ...), so it misses the stabilized form sum(cur,
...)+epsilon; update _walk_div_then_mul to accept a divisor that is either the
sum node or an aten.add/aten.add.Tensor node whose operands are the sum node and
a small numeric epsilon (or reversed operand order), treating that as the same
normalization; keep the existing checks that the sum node.args[0] is cur and
preserve the rest of the logic (i.e., advance cur to the div node and still
detect a following mul.Tensor with a numeric scalar).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: ca7744b1-f3ad-46a3-ae99-b2323c0dca75
📒 Files selected for processing (3)
tensorrt_llm/_torch/auto_deploy/config/default.yamltensorrt_llm/_torch/auto_deploy/transform/library/moe_routing.pytests/unittest/_torch/auto_deploy/unit/singlegpu/transformations/library/test_match_noaux_tc.py
38e911c to
9a6dd31
Compare
|
@guan404ming thanks for the PR! Could you please rebase and push? |
9a6dd31 to
1334ce8
Compare
|
Hi @bmarimuthu-nv, I just updated. |
7dda05e to
2c94fbc
Compare
|
@guan404ming One scoping concern: the changes to If you remove the Everything else stays inside auto_deploy where it belongs for this feature. |
2c94fbc to
b4e818d
Compare
|
Hi @bmarimuthu-nv gentle ping, please feel free to let me know if there is need to update. Thanks! |
Signed-off-by: Guan-Ming (Wesley) Chiu <105915352+guan404ming@users.noreply.github.com>
b4e818d to
eadbf06
Compare
Summary by CodeRabbit
Release Notes
New Features
Tests
Description
Adds a graph-level pattern matcher (
MatchNoAuxTCPattern) that detects thenoaux_tcMoE routing chain (sigmoid → +bias → group top-k → mask → top-k → gather → [norm] → scale) used by DeepSeek-V3, NemotronH, GLM4-MoE-Lite and Kimi-K2, and rewrites it into a singletorch.ops.trtllm.noaux_tc_opcall.Lets HF upstream models that use this routing benefit from the fused kernel without forking their
modeling_*.pyintoauto_deploy/models/custom/.Closes #9164.
Test Coverage
positive match + constant extraction, output rewire to the fused op, and a negative case
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.