Skip to content

[TRTLLM-13250][feat] Wave 5: Enable MX post-transform Llama receiver#15432

Open
chienchunhung wants to merge 7 commits into
NVIDIA:mainfrom
chienchunhung:codex/staged-hooks-wave5-mx-publisher
Open

[TRTLLM-13250][feat] Wave 5: Enable MX post-transform Llama receiver#15432
chienchunhung wants to merge 7 commits into
NVIDIA:mainfrom
chienchunhung:codex/staged-hooks-wave5-mx-publisher

Conversation

@chienchunhung

@chienchunhung chienchunhung commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Summary

Stacked on Wave 4 / #15387.

This implements Wave 5 of the staged post-load hook rollout for MX:

  • publish MX sources after post-load transforms with SourceIdentity and transform-layout metadata
  • let compatible, allow-listed Llama receivers consume post-transform MX bytes and run only setup_aliases() + cache_derived_state()
  • fail closed before P2P when SourceIdentity is missing/mismatched, transform protocol metadata is unsupported, or the model is not allow-listed
  • add unit coverage for metadata fallback cases, publish metadata, GMS/MX post-load publish ordering, and a tiny real-Llama staged receiver equivalence check

Dependency / prerequisite stack

This PR is Wave 5 in the staged post-load hooks rollout. The foundation PRs #14770 and #14878 are already merged. The wave PRs should merge in sequence; after each upstream wave lands, rebase the next wave onto main so review and CI focus on that wave's delta.

Arrows point from prerequisite to dependent. PR numbers in graph nodes are clickable.

graph TD
    PR14770["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/14770'>#14770</a>: staged-hook contract (merged)"]
    PR14878["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/14878'>#14878</a>: GMS SourceIdentity gate (merged)"]
    PR15014["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15014'>#15014</a>: Wave 1 aliases + GMS RO load (merged)"]
    PR15288["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15288'>#15288</a>: Wave 2 Linear/Attention transforms (merged)"]
    PR15386["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15386'>#15386</a>: Wave 3 MoE/Mamba staged hooks (open)"]
    PR15387["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15387'>#15387</a>: Wave 4 MX receiver cutover (open)"]
    PR15432["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15432'>#15432</a>: Wave 5 MX publisher + Llama receiver (this PR, open)"]
    VERIFY["post-migration verification / demo (planned)"]

    PR14770 -->|satisfied| PR15014
    PR14878 -->|satisfied| PR15014
    PR15014 -->|blocking| PR15288
    PR15288 -->|blocking| PR15386
    PR15386 -->|blocking| PR15387
    PR15387 -->|blocking| PR15432
    PR15432 -.->|planned| VERIFY

    classDef merged fill:#dcfce7,stroke:#16a34a,color:#14532d;
    classDef open fill:#dbeafe,stroke:#2563eb,color:#1e3a8a;
    classDef current fill:#ede9fe,stroke:#7c3aed,color:#3b0764,stroke-width:3px;
    classDef downstream fill:#f3f4f6,stroke:#6b7280,color:#374151,stroke-dasharray:5 5;
    linkStyle 0,1 stroke:#16a34a,stroke-width:2px;
    linkStyle 2,3,4,5 stroke:#ea580c,stroke-width:3px;
    linkStyle 6 stroke:#6b7280,stroke-width:2px,stroke-dasharray:5 5;

    class PR14770,PR14878,PR15014,PR15288 merged;
    class PR15386,PR15387 open;
    class PR15432 current;
    class VERIFY downstream;
Loading

Immediate merge dependency for this PR: #15387 must land first; after Wave 5 lands, run the post-migration verification/demo for the completed staged-hook rollout.

Validation

  • git diff --check
  • python -m py_compile tensorrt_llm/_torch/models/checkpoints/mx/checkpoint_loader.py tensorrt_llm/_torch/pyexecutor/model_loader.py tests/unittest/_torch/models/checkpoints/mx/test_mx_checkpoint_loader.py tests/unittest/_torch/pyexecutor/test_model_loader_gms.py tests/unittest/_torch/pyexecutor/test_model_loader_mx.py tests/unittest/_torch/weight_sharing/test_mx_source_identity_gate.py
  • pre-commit on commit, with waive list check and validate-test-lists skipped locally because scripts/check_test_list.py fails under this hook interpreter with TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'

Focused pytest collection is blocked in this local environment by missing transformers before tests are collected.

Summary by CodeRabbit

  • New Features

    • Added support for staged post-transform weight transfers with source identity verification in ModelExpress transfers.
    • Introduced source identity gating to validate weight compatibility across distributed systems.
  • Improvements

    • Refactored weight transformation pipeline to separate weight transformation and state caching phases for improved clarity and maintainability.

@chienchunhung

Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #54683 [ run ] triggered by Bot. Commit: 756e717 Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #54683 [ run ] completed with state FAILURE. Commit: 756e717
/LLM/main/L0_MergeRequest_PR pipeline #43714 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@chienchunhung chienchunhung force-pushed the codex/staged-hooks-wave5-mx-publisher branch 2 times, most recently from ae210cb to f123c77 Compare June 18, 2026 00:37

Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #54888 [ run ] triggered by Bot. Commit: f123c77 Link to invocation

@chienchunhung chienchunhung changed the title [TRTLLM-13250][feat] Enable MX post-transform Llama receiver [TRTLLM-13250][feat] Wav3 5: Enable MX post-transform Llama receiver Jun 18, 2026
@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #54888 [ run ] completed with state SUCCESS. Commit: f123c77
/LLM/main/L0_MergeRequest_PR pipeline #43893 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@chienchunhung chienchunhung force-pushed the codex/staged-hooks-wave5-mx-publisher branch from f123c77 to 14a4537 Compare June 19, 2026 01:32

Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #54955 [ run ] triggered by Bot. Commit: 14a4537 Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #54955 [ run ] completed with state SUCCESS. Commit: 14a4537
/LLM/main/L0_MergeRequest_PR pipeline #43955 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

CI Report

Link to invocation

@chienchunhung chienchunhung changed the title [TRTLLM-13250][feat] Wav3 5: Enable MX post-transform Llama receiver [TRTLLM-13250][feat] Wave 5: Enable MX post-transform Llama receiver Jun 21, 2026
@chienchunhung chienchunhung marked this pull request as ready for review June 22, 2026 18:04
@chienchunhung chienchunhung requested review from a team as code owners June 22, 2026 18:04
Comment thread tensorrt_llm/_torch/modules/fused_moe/configurable_moe.py Outdated
Comment thread tests/unittest/_torch/modules/moe/test_moe_backend.py
@chienchunhung chienchunhung enabled auto-merge (squash) June 26, 2026 22:40
@chienchunhung chienchunhung force-pushed the codex/staged-hooks-wave5-mx-publisher branch from 63ea3d6 to 03e707f Compare June 29, 2026 03:19

Copy link
Copy Markdown
Collaborator Author

/bot run

@chienchunhung chienchunhung force-pushed the codex/staged-hooks-wave5-mx-publisher branch from 03e707f to b3705a0 Compare June 29, 2026 03:25

Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #56288 [ run ] triggered by Bot. Commit: b3705a0 Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #56290 [ run ] triggered by Bot. Commit: b3705a0 Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #56288 [ run ] completed with state ABORTED. Commit: b3705a0

Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #56290 [ run ] completed with state FAILURE. Commit: b3705a0
/LLM/main/L0_MergeRequest_PR pipeline #45143 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@chienchunhung

Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #56673 [ run ] triggered by Bot. Commit: 40ed0b1 Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #56673 [ run ] completed with state SUCCESS. Commit: 40ed0b1
/LLM/main/L0_MergeRequest_PR pipeline #45496 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@chienchunhung

Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #56746 [ run ] triggered by Bot. Commit: edffee0 Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #56746 [ run ] completed with state SUCCESS. Commit: edffee0
/LLM/main/L0_MergeRequest_PR pipeline #45565 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>
Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>
Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>
Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>
Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>
Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>
Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>
@chienchunhung chienchunhung force-pushed the codex/staged-hooks-wave5-mx-publisher branch from edffee0 to c2f35a8 Compare July 2, 2026 16:11
@chienchunhung

Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #57216 [ run ] triggered by Bot. Commit: c2f35a8 Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #57216 [ run ] completed with state SUCCESS. Commit: c2f35a8
/LLM/main/L0_MergeRequest_PR pipeline #45988 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants