[TRTLLM-13250][feat] Wave 5: Enable MX post-transform Llama receiver by chienchunhung · Pull Request #15432 · NVIDIA/TensorRT-LLM

chienchunhung · 2026-06-16T23:25:17Z

Summary

Stacked on Wave 4 / #15387.

This implements Wave 5 of the staged post-load hook rollout for MX:

publish MX sources after post-load transforms with SourceIdentity and transform-layout metadata
let compatible, allow-listed Llama receivers consume post-transform MX bytes and run only setup_aliases() + cache_derived_state()
fail closed before P2P when SourceIdentity is missing/mismatched, transform protocol metadata is unsupported, or the model is not allow-listed
add unit coverage for metadata fallback cases, publish metadata, GMS/MX post-load publish ordering, and a tiny real-Llama staged receiver equivalence check

Dependency / prerequisite stack

This PR is Wave 5 in the staged post-load hooks rollout. The foundation PRs #14770 and #14878 are already merged. The wave PRs should merge in sequence; after each upstream wave lands, rebase the next wave onto main so review and CI focus on that wave's delta.

Arrows point from prerequisite to dependent. PR numbers in graph nodes are clickable.

graph TD
    PR14770["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/14770'>#14770</a>: staged-hook contract (merged)"]
    PR14878["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/14878'>#14878</a>: GMS SourceIdentity gate (merged)"]
    PR15014["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15014'>#15014</a>: Wave 1 aliases + GMS RO load (merged)"]
    PR15288["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15288'>#15288</a>: Wave 2 Linear/Attention transforms (merged)"]
    PR15386["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15386'>#15386</a>: Wave 3 MoE/Mamba staged hooks (open)"]
    PR15387["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15387'>#15387</a>: Wave 4 MX receiver cutover (open)"]
    PR15432["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15432'>#15432</a>: Wave 5 MX publisher + Llama receiver (this PR, open)"]
    VERIFY["post-migration verification / demo (planned)"]

    PR14770 -->|satisfied| PR15014
    PR14878 -->|satisfied| PR15014
    PR15014 -->|blocking| PR15288
    PR15288 -->|blocking| PR15386
    PR15386 -->|blocking| PR15387
    PR15387 -->|blocking| PR15432
    PR15432 -.->|planned| VERIFY

    classDef merged fill:#dcfce7,stroke:#16a34a,color:#14532d;
    classDef open fill:#dbeafe,stroke:#2563eb,color:#1e3a8a;
    classDef current fill:#ede9fe,stroke:#7c3aed,color:#3b0764,stroke-width:3px;
    classDef downstream fill:#f3f4f6,stroke:#6b7280,color:#374151,stroke-dasharray:5 5;
    linkStyle 0,1 stroke:#16a34a,stroke-width:2px;
    linkStyle 2,3,4,5 stroke:#ea580c,stroke-width:3px;
    linkStyle 6 stroke:#6b7280,stroke-width:2px,stroke-dasharray:5 5;

    class PR14770,PR14878,PR15014,PR15288 merged;
    class PR15386,PR15387 open;
    class PR15432 current;
    class VERIFY downstream;

Immediate merge dependency for this PR: #15387 must land first; after Wave 5 lands, run the post-migration verification/demo for the completed staged-hook rollout.

Validation

git diff --check
python -m py_compile tensorrt_llm/_torch/models/checkpoints/mx/checkpoint_loader.py tensorrt_llm/_torch/pyexecutor/model_loader.py tests/unittest/_torch/models/checkpoints/mx/test_mx_checkpoint_loader.py tests/unittest/_torch/pyexecutor/test_model_loader_gms.py tests/unittest/_torch/pyexecutor/test_model_loader_mx.py tests/unittest/_torch/weight_sharing/test_mx_source_identity_gate.py
pre-commit on commit, with waive list check and validate-test-lists skipped locally because scripts/check_test_list.py fails under this hook interpreter with TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'

Focused pytest collection is blocked in this local environment by missing transformers before tests are collected.

Summary by CodeRabbit

New Features
- Added support for staged post-transform weight transfers with source identity verification in ModelExpress transfers.
- Introduced source identity gating to validate weight compatibility across distributed systems.
Improvements
- Refactored weight transformation pipeline to separate weight transformation and state caching phases for improved clarity and maintainability.

chienchunhung · 2026-06-16T23:26:48Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-06-16T23:33:29Z

PR_Github #54683 [ run ] triggered by Bot. Commit: 756e717 Link to invocation

tensorrt-cicd · 2026-06-17T06:25:31Z

PR_Github #54683 [ run ] completed with state FAILURE. Commit: 756e717
/LLM/main/L0_MergeRequest_PR pipeline #43714 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

chienchunhung · 2026-06-18T00:37:49Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-06-18T00:43:01Z

PR_Github #54888 [ run ] triggered by Bot. Commit: f123c77 Link to invocation

tensorrt-cicd · 2026-06-18T10:41:14Z

PR_Github #54888 [ run ] completed with state SUCCESS. Commit: f123c77
/LLM/main/L0_MergeRequest_PR pipeline #43893 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

chienchunhung · 2026-06-19T01:32:40Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-06-19T01:38:48Z

PR_Github #54955 [ run ] triggered by Bot. Commit: 14a4537 Link to invocation

tensorrt-cicd · 2026-06-19T08:05:01Z

PR_Github #54955 [ run ] completed with state SUCCESS. Commit: 14a4537
/LLM/main/L0_MergeRequest_PR pipeline #43955 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

CI Report

Link to invocation

chienchunhung · 2026-06-29T03:19:55Z

/bot run

chienchunhung · 2026-06-29T03:26:11Z

/bot run

tensorrt-cicd · 2026-06-29T03:26:12Z

PR_Github #56288 [ run ] triggered by Bot. Commit: b3705a0 Link to invocation

tensorrt-cicd · 2026-06-29T03:32:20Z

PR_Github #56290 [ run ] triggered by Bot. Commit: b3705a0 Link to invocation

tensorrt-cicd · 2026-06-29T03:35:41Z

PR_Github #56288 [ run ] completed with state ABORTED. Commit: b3705a0

Link to invocation

tensorrt-cicd · 2026-06-29T04:51:16Z

PR_Github #56290 [ run ] completed with state FAILURE. Commit: b3705a0
/LLM/main/L0_MergeRequest_PR pipeline #45143 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

chienchunhung · 2026-06-30T17:00:00Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-06-30T17:06:47Z

PR_Github #56673 [ run ] triggered by Bot. Commit: 40ed0b1 Link to invocation

tensorrt-cicd · 2026-06-30T22:27:36Z

PR_Github #56673 [ run ] completed with state SUCCESS. Commit: 40ed0b1
/LLM/main/L0_MergeRequest_PR pipeline #45496 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

chienchunhung · 2026-06-30T22:33:09Z

/bot run

tensorrt-cicd · 2026-06-30T22:38:33Z

PR_Github #56746 [ run ] triggered by Bot. Commit: edffee0 Link to invocation

tensorrt-cicd · 2026-07-01T00:21:06Z

PR_Github #56746 [ run ] completed with state SUCCESS. Commit: edffee0
/LLM/main/L0_MergeRequest_PR pipeline #45565 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>

chienchunhung · 2026-07-02T16:11:20Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-07-02T16:18:18Z

PR_Github #57216 [ run ] triggered by Bot. Commit: c2f35a8 Link to invocation

tensorrt-cicd · 2026-07-02T22:04:29Z

PR_Github #57216 [ run ] completed with state SUCCESS. Commit: c2f35a8
/LLM/main/L0_MergeRequest_PR pipeline #45988 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

github-actions Bot assigned chienchunhung Jun 16, 2026

chienchunhung force-pushed the codex/staged-hooks-wave5-mx-publisher branch 2 times, most recently from ae210cb to f123c77 Compare June 18, 2026 00:37

chienchunhung changed the title ~~[TRTLLM-13250][feat] Enable MX post-transform Llama receiver~~ [TRTLLM-13250][feat] Wav3 5: Enable MX post-transform Llama receiver Jun 18, 2026

chienchunhung force-pushed the codex/staged-hooks-wave5-mx-publisher branch from f123c77 to 14a4537 Compare June 19, 2026 01:32

chienchunhung changed the title ~~[TRTLLM-13250][feat] Wav3 5: Enable MX post-transform Llama receiver~~ [TRTLLM-13250][feat] Wave 5: Enable MX post-transform Llama receiver Jun 21, 2026

chienchunhung marked this pull request as ready for review June 22, 2026 18:04

chienchunhung requested review from a team as code owners June 22, 2026 18:04

chienchunhung requested review from byshiue, danielafrimi, dongjiyingdjy, hlu1 and symphonylyh June 22, 2026 18:04

xxi-nv approved these changes Jun 26, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/modules/fused_moe/configurable_moe.py Outdated

Comment thread tests/unittest/_torch/modules/moe/test_moe_backend.py

chienchunhung enabled auto-merge (squash) June 26, 2026 22:40

chienchunhung force-pushed the codex/staged-hooks-wave5-mx-publisher branch from 63ea3d6 to 03e707f Compare June 29, 2026 03:19

chienchunhung force-pushed the codex/staged-hooks-wave5-mx-publisher branch from 03e707f to b3705a0 Compare June 29, 2026 03:25

chienchunhung requested a review from pcastonguay June 29, 2026 17:29

chienchunhung mentioned this pull request Jun 30, 2026

[TRTLLM-13248][feat] Wave 3: migrate MoE staged hooks #15386

Open

chienchunhung force-pushed the codex/staged-hooks-wave5-mx-publisher branch from b3705a0 to 40ed0b1 Compare June 30, 2026 16:58

chienchunhung added 7 commits July 2, 2026 09:07

[TRTLLM-13248][feat] Wave 3: migrate MoE staged hooks

9a30e43

Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>

[TRTLLM-13249][feat] Wave 4 add MX staged receiver cutover

b2a408a

Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>

[TRTLLM-13249][fix] address Wave 4 review feedback

037f896

Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>

[TRTLLM-13249][fix] add draft model config to MX loader test

45e24ee

Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>

[TRTLLM-13250][feat] Enable MX post-transform Llama receiver

0f31366

Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>

[TRTLLM-13250][fix] Avoid GMS MX double transforms

841278b

Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>

[TRTLLM-13250][fix] Mark staged MoE hook tests

c2f35a8

Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>

chienchunhung force-pushed the codex/staged-hooks-wave5-mx-publisher branch from edffee0 to c2f35a8 Compare July 2, 2026 16:11

Uh oh!

Conversation

chienchunhung commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Dependency / prerequisite stack

Validation

Summary by CodeRabbit

Uh oh!

chienchunhung commented Jun 16, 2026

Uh oh!

tensorrt-cicd commented Jun 16, 2026

Uh oh!

tensorrt-cicd commented Jun 17, 2026

Uh oh!

chienchunhung commented Jun 18, 2026

Uh oh!

tensorrt-cicd commented Jun 18, 2026

Uh oh!

tensorrt-cicd commented Jun 18, 2026

Uh oh!

chienchunhung commented Jun 19, 2026

Uh oh!

tensorrt-cicd commented Jun 19, 2026

Uh oh!

tensorrt-cicd commented Jun 19, 2026

Uh oh!

Uh oh!

Uh oh!

chienchunhung commented Jun 29, 2026

Uh oh!

chienchunhung commented Jun 29, 2026

Uh oh!

tensorrt-cicd commented Jun 29, 2026

Uh oh!

tensorrt-cicd commented Jun 29, 2026

Uh oh!

tensorrt-cicd commented Jun 29, 2026

Uh oh!

tensorrt-cicd commented Jun 29, 2026

Uh oh!

chienchunhung commented Jun 30, 2026

Uh oh!

tensorrt-cicd commented Jun 30, 2026

Uh oh!

tensorrt-cicd commented Jun 30, 2026

Uh oh!

chienchunhung commented Jun 30, 2026

Uh oh!

tensorrt-cicd commented Jun 30, 2026

Uh oh!

tensorrt-cicd commented Jul 1, 2026

Uh oh!

chienchunhung commented Jul 2, 2026

Uh oh!

tensorrt-cicd commented Jul 2, 2026

Uh oh!

tensorrt-cicd commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chienchunhung commented Jun 16, 2026 •

edited

Loading