Skip to content

[#9164][feat] AutoDeploy: noaux_tc MoE routing pattern matcher#13765

Open
guan404ming wants to merge 1 commit into
NVIDIA:mainfrom
guan404ming:feat/ad-noaux-tc-pattern-matcher
Open

[#9164][feat] AutoDeploy: noaux_tc MoE routing pattern matcher#13765
guan404ming wants to merge 1 commit into
NVIDIA:mainfrom
guan404ming:feat/ad-noaux-tc-pattern-matcher

Conversation

@guan404ming

@guan404ming guan404ming commented May 5, 2026

Copy link
Copy Markdown
Contributor

Summary by CodeRabbit

Release Notes

  • New Features

    • Added optimization support for DeepSeek-V3 MoE (Mixture of Experts) routing patterns, improving model compilation and inference efficiency through graph-level fusion.
  • Tests

    • Added test suite to validate pattern recognition and fusion behavior for the new routing optimization.

Description

Adds a graph-level pattern matcher (MatchNoAuxTCPattern) that detects the noaux_tc MoE routing chain (sigmoid → +bias → group top-k → mask → top-k → gather → [norm] → scale) used by DeepSeek-V3, NemotronH, GLM4-MoE-Lite and Kimi-K2, and rewrites it into a single torch.ops.trtllm.noaux_tc_op call.

Lets HF upstream models that use this routing benefit from the fused kernel without forking their modeling_*.py into auto_deploy/models/custom/.

Closes #9164.

Test Coverage

positive match + constant extraction, output rewire to the fused op, and a negative case

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@guan404ming guan404ming marked this pull request as ready for review May 5, 2026 12:23
@guan404ming guan404ming requested a review from a team as a code owner May 5, 2026 12:23
@guan404ming guan404ming requested a review from bmarimuthu-nv May 5, 2026 12:23
@coderabbitai

coderabbitai Bot commented May 5, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

This PR adds pattern-matching and fusion support for DeepSeek-V3's "noaux_tc" MoE routing operator. It introduces helper functions to identify the sigmoid→bias→grouped top-k→outer top-k→gather chain, a new MatchNoAuxTCPattern transform to replace that subgraph with torch.ops.trtllm.noaux_tc_op, registers the transform in the default pipeline, and validates it with comprehensive tests.

Changes

NoAux TC Pattern Matcher

Layer / File(s) Summary
Pattern Matching Helpers
tensorrt_llm/_torch/auto_deploy/transform/library/moe_routing.py (lines 235–333)
New constants (_TOPK_OPS, _VIEW_OPS, _ADD_OPS) and utility functions (_scalar_int, _find_bias_add_after_sigmoid, _find_group_topk, _find_outer_topk, _find_gather_from_indices, _walk_div_then_mul) to locate and extract noaux_tc routing subgraph components.
Transform Implementation
tensorrt_llm/_torch/auto_deploy/transform/library/moe_routing.py (lines 335–482)
MatchNoAuxTCPattern class scans FX graphs for noaux_tc routing chains, extracts structural parameters (group counts, top-k values), inserts torch.ops.trtllm.noaux_tc_op, rewires downstream indices and weights outputs, eliminates dead code, and reports match statistics.
Pipeline Wiring
tensorrt_llm/_torch/auto_deploy/config/default.yaml
Registers match_noaux_tc_pattern transform in the pattern_matcher stage of the transform pipeline.
Tests and Utilities
tests/unittest/_torch/auto_deploy/unit/singlegpu/transformations/library/test_match_noaux_tc.py
Test modules (_NoAuxTCRouter, _PlainSoftmaxRouter), helper utilities (_apply_matcher, _is_noaux_tc, _find_noaux_tc_node), and three test cases validating pattern matching, constant extraction, output rewiring, and negative case (non-match).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 31.58% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding a noaux_tc MoE routing pattern matcher to AutoDeploy.
Description check ✅ Passed The description adequately explains the feature, motivation, test coverage, and author confirms PR checklist compliance.
Linked Issues check ✅ Passed The PR implementation addresses all coding objectives from #9164: pattern matcher implementation, operator fusion into noaux_tc_op, and coverage of relevant models.
Out of Scope Changes check ✅ Passed All changes are directly relevant to implementing the noaux_tc pattern matcher and its tests; no extraneous modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tensorrt_llm/_torch/auto_deploy/transform/library/moe_routing.py`:
- Around line 402-427: The current matcher picks the first masked user of
scores_with_bias_node (masked_node) without verifying it is driven by
outer_grp_topk, so a different masked branch can be incorrectly fused; modify
the search for masked_node to only accept a user that is dataflow-descended from
outer_grp_topk (e.g., check that outer_grp_topk is an ancestor/input of the
candidate masked_node or that masked_node.args include outer_grp_topk or
outer_grp_topk.output), otherwise skip; update the loop that finds masked_node
(referencing outer_grp_topk, scores_with_bias_node, masked_node, and
topk_group/_TOPK_OPS) to perform this ancestry/input check before selecting the
masked_node.
- Around line 307-332: The _walk_div_then_mul function only matches a divisor
that is exactly sum(cur, ...), so it misses the stabilized form sum(cur,
...)+epsilon; update _walk_div_then_mul to accept a divisor that is either the
sum node or an aten.add/aten.add.Tensor node whose operands are the sum node and
a small numeric epsilon (or reversed operand order), treating that as the same
normalization; keep the existing checks that the sum node.args[0] is cur and
preserve the rest of the logic (i.e., advance cur to the div node and still
detect a following mul.Tensor with a numeric scalar).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ca7744b1-f3ad-46a3-ae99-b2323c0dca75

📥 Commits

Reviewing files that changed from the base of the PR and between f8a9a29 and bc2ca49.

📒 Files selected for processing (3)
  • tensorrt_llm/_torch/auto_deploy/config/default.yaml
  • tensorrt_llm/_torch/auto_deploy/transform/library/moe_routing.py
  • tests/unittest/_torch/auto_deploy/unit/singlegpu/transformations/library/test_match_noaux_tc.py

Comment thread tensorrt_llm/_torch/auto_deploy/transform/library/moe_routing.py
Comment thread tensorrt_llm/_torch/auto_deploy/transform/library/moe_routing.py
@svc-trtllm-gh-bot svc-trtllm-gh-bot added the Community want to contribute PRs initiated from Community label May 5, 2026
@guan404ming guan404ming force-pushed the feat/ad-noaux-tc-pattern-matcher branch 2 times, most recently from 38e911c to 9a6dd31 Compare May 7, 2026 09:56
@bmarimuthu-nv

Copy link
Copy Markdown
Collaborator

@guan404ming thanks for the PR! Could you please rebase and push?

@guan404ming guan404ming force-pushed the feat/ad-noaux-tc-pattern-matcher branch from 9a6dd31 to 1334ce8 Compare May 28, 2026 06:17
@guan404ming

Copy link
Copy Markdown
Contributor Author

Hi @bmarimuthu-nv, I just updated.

@guan404ming guan404ming force-pushed the feat/ad-noaux-tc-pattern-matcher branch 2 times, most recently from 7dda05e to 2c94fbc Compare June 3, 2026 10:32
@bmarimuthu-nv

Copy link
Copy Markdown
Collaborator

@guan404ming One scoping concern: the changes to tensorrt_llm/_torch/pyexecutor/model_loader.py add registry-based defaults into the PyTorch model loading path, which is beyond what #13697 asks for. The issue is specifically about AutoDeploy config registry. Let's keep the PyTorch backend out of scope for now.

If you remove the model_loader.py changes, then model_config_loader.py no longer needs to be a shared module in tensorrt_llm/llmapi/. It can live inside tensorrt_llm/_torch/auto_deploy/ (closer to where it's actually consumed), which keeps the PR's footprint contained within the AD subsystem.

Everything else stays inside auto_deploy where it belongs for this feature.

@guan404ming guan404ming force-pushed the feat/ad-noaux-tc-pattern-matcher branch from 2c94fbc to b4e818d Compare June 16, 2026 16:03
@guan404ming

Copy link
Copy Markdown
Contributor Author

Hi @bmarimuthu-nv gentle ping, please feel free to let me know if there is need to update. Thanks!

Signed-off-by: Guan-Ming (Wesley) Chiu <105915352+guan404ming@users.noreply.github.com>
@guan404ming guan404ming force-pushed the feat/ad-noaux-tc-pattern-matcher branch from b4e818d to eadbf06 Compare June 23, 2026 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Community want to contribute PRs initiated from Community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[AutoDeploy][Feature]: Add pattern matcher to support the NemotronHTopkRouter

3 participants