[TRTLLM-13246][feat] Wave 1: migrate aliases to setup_aliases and stage GMS RO load by chienchunhung · Pull Request #15014 · NVIDIA/TensorRT-LLM

chienchunhung · 2026-06-05T18:17:48Z

Summary

Wave 1 of the staged post-load hooks rollout. The staged-hook contract landed in #14770, and #14878 has now merged, so this PR is a single Wave 1 commit on top of main.

This change migrates alias-only model hooks from post_load_weights() to setup_aliases() and cuts the GMS read-only (RO) load path over from the old meta-tensor workaround to the staged-hook protocol.

What Changed

Alias migration for 7 top-level model classes: modeling_llama (LlamaForCausalLM, Llama4ForConditionalGeneration), modeling_deepseekv3, modeling_glm, modeling_exaone_moe, modeling_qwen3_moe, modeling_qwen3_next, and modeling_gpt_oss. Their alias-only post_load_weights() bodies move verbatim into setup_aliases(), while standard load paths continue through the base post_load_weights() orchestrator.
GMS RO ordering now runs staged hooks around zero-copy materialization:

post_load_apply
  -> _setup_aliases(model)                  # recursive alias walk
  -> _check_gms_source_identity(gms_backend) # STRICT pre-materialize gate from #14878
  -> materialize_module(model)
  -> _walk_cache_state(model)
  -> post_load_publish

GMS docs now describe alias setup before materialization and derived-state refresh after real tensors are bound.
Full reload now resets existing _weights_transformed flags before rebinding fresh weights, while partial reload keeps existing transform guards intact for untouched modules.
Tests cover the staged walkers, reload reset, and GMS RO ordering in test_model_loader_gms.py and test_model_loader_mx.py.

Dependency / prerequisite stack

This PR is Wave 1 in the staged post-load hooks rollout. The foundation PRs #14770 and #14878 are already merged. The wave PRs should merge in sequence; after each upstream wave lands, rebase the next wave onto main so review and CI focus on that wave's delta.

Arrows point from prerequisite to dependent. PR numbers in graph nodes are clickable.

graph TD
    PR14770["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/14770'>#14770</a>: staged-hook contract (merged)"]
    PR14878["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/14878'>#14878</a>: GMS SourceIdentity gate (merged)"]
    PR15014["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15014'>#15014</a>: Wave 1 aliases + GMS RO load (this PR, open)"]
    PR15288["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15288'>#15288</a>: Wave 2 Linear/Attention transforms (draft)"]
    PR15386["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15386'>#15386</a>: Wave 3 MoE/Mamba staged hooks (draft)"]
    PR15387["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15387'>#15387</a>: Wave 4 MX receiver cutover (draft)"]
    PR15432["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15432'>#15432</a>: Wave 5 MX publisher + Llama receiver (draft)"]
    VERIFY["post-migration verification / demo (planned)"]

    PR14770 -->|satisfied| PR15014
    PR14878 -->|satisfied| PR15014
    PR15014 -->|blocking| PR15288
    PR15288 -->|blocking| PR15386
    PR15386 -->|blocking| PR15387
    PR15387 -->|blocking| PR15432
    PR15432 -.->|planned| VERIFY

    classDef merged fill:#dcfce7,stroke:#16a34a,color:#14532d;
    classDef inflight fill:#dbeafe,stroke:#2563eb,color:#1e3a8a;
    classDef draft fill:#ffedd5,stroke:#f97316,color:#7c2d12;
    classDef current fill:#ede9fe,stroke:#7c3aed,color:#3b0764,stroke-width:3px;
    classDef downstream fill:#f3f4f6,stroke:#6b7280,color:#374151,stroke-dasharray:5 5;
    linkStyle 0,1 stroke:#16a34a,stroke-width:2px;
    linkStyle 2,3,4,5 stroke:#ea580c,stroke-width:3px;
    linkStyle 6 stroke:#6b7280,stroke-width:2px,stroke-dasharray:5 5;

    class PR14770,PR14878 merged;
    class PR15288,PR15386,PR15387,PR15432 draft;
    class PR15014 current;
    class VERIFY downstream;

Immediate merge dependency for this PR: none beyond the already-merged #14770 and #14878 foundation PRs; downstream waves remain stacked on this branch until Wave 1 lands.

Test Plan

pytest tests/unittest/_torch/pyexecutor/test_model_loader_gms.py tests/unittest/_torch/pyexecutor/test_model_loader_mx.py
Full L0 CI with /bot run --disable-fail-fast before review.
Local checks completed: git diff --check, py_compile on changed Python files, and pre-commit. Local pytest was not available in this macOS shell.

Next Steps

Wave 2: migrate Linear/Attention transforms into transform_weights() with _weights_transformed guards.
Wave 3: migrate MoE and Mamba transforms.
Wave 4: MX publish-after-transform flip, receiver cutover, and per-model allow-list.

PR Checklist

PR description clearly explains what and why.
Follows TRT-LLM coding guidelines to the best of my knowledge.
Test cases are provided for new code paths.
No public API changes.
No new dependencies.

Summary by CodeRabbit

Refactor
- Restructured internal model initialization and layer alias resolution for improved stability during weight loading.
- Optimized GPU memory management during model setup and state caching.
- Updated model loading sequence for read-only GPU memory paths to ensure correct operation ordering.

chienchunhung · 2026-06-05T21:32:12Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-06-05T21:40:21Z

PR_Github #52438 [ run ] triggered by Bot. Commit: ac30c0a Link to invocation

chienchunhung · 2026-06-05T22:17:50Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-06-05T22:23:41Z

PR_Github #52445 [ run ] triggered by Bot. Commit: ac30c0a Link to invocation

tensorrt-cicd · 2026-06-05T22:27:56Z

PR_Github #52438 [ run ] completed with state ABORTED. Commit: ac30c0a

Link to invocation

tensorrt-cicd · 2026-06-06T09:00:59Z

PR_Github #52445 [ run ] completed with state FAILURE. Commit: ac30c0a
/LLM/main/L0_MergeRequest_PR pipeline #41738 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

chienchunhung · 2026-06-09T00:03:48Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-06-09T00:09:21Z

PR_Github #52887 [ run ] triggered by Bot. Commit: eabb7c0 Link to invocation

tensorrt-cicd · 2026-06-09T06:18:32Z

PR_Github #52887 [ run ] completed with state FAILURE. Commit: eabb7c0
/LLM/main/L0_MergeRequest_PR pipeline #42137 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

chienchunhung · 2026-06-11T21:23:51Z

/bot run

chienchunhung · 2026-06-11T21:29:26Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-06-11T21:30:14Z

PR_Github #53683 [ run ] triggered by Bot. Commit: c42781c Link to invocation

tensorrt-cicd · 2026-06-11T21:37:40Z

PR_Github #53687 [ run ] triggered by Bot. Commit: c42781c Link to invocation

tensorrt-cicd · 2026-06-11T21:41:42Z

PR_Github #53683 [ run ] completed with state ABORTED. Commit: c42781c

Link to invocation

tensorrt-cicd · 2026-06-16T14:26:58Z

PR_Github #54480 [ run ] completed with state SUCCESS. Commit: 85973b0
/LLM/main/L0_MergeRequest_PR pipeline #43545 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

chienchunhung · 2026-06-16T16:47:09Z

/bot run --disable-fail-fast --stage-list "A100X-PyTorch-1, DGX_B200-8_GPUs-PyTorch-1"

tensorrt-cicd · 2026-06-16T16:54:26Z

PR_Github #54636 [ run ] triggered by Bot. Commit: 85973b0 Link to invocation

tensorrt-cicd · 2026-06-16T18:35:21Z

PR_Github #54636 [ run ] completed with state SUCCESS. Commit: 85973b0
/LLM/main/L0_MergeRequest_PR pipeline #43668 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

chienchunhung · 2026-06-16T20:41:18Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-06-16T20:47:20Z

PR_Github #54668 [ run ] triggered by Bot. Commit: 85973b0 Link to invocation

tensorrt-cicd · 2026-06-16T21:15:06Z

PR_Github #54668 [ run ] completed with state SUCCESS. Commit: 85973b0
/LLM/main/L0_MergeRequest_PR pipeline #43700 completed with status: 'SUCCESS'

CI Report

Link to invocation

brb-nv

Changes to model files under tensorrt_llm/_torch/models/ look good to me.

chienchunhung · 2026-06-23T05:06:59Z

/bot run

tensorrt-cicd · 2026-06-23T05:13:04Z

PR_Github #55160 [ run ] triggered by Bot. Commit: 7eec9fe Link to invocation

tensorrt-cicd · 2026-06-23T06:29:04Z

PR_Github #55160 [ run ] completed with state SUCCESS. Commit: 7eec9fe
/LLM/main/L0_MergeRequest_PR pipeline #44135 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

chienchunhung · 2026-06-23T16:53:44Z

/bot run

tensorrt-cicd · 2026-06-23T17:00:24Z

PR_Github #55281 [ run ] triggered by Bot. Commit: a260f3b Link to invocation

…ge GMS RO load Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>

chienchunhung · 2026-06-23T17:04:10Z

/bot run

tensorrt-cicd · 2026-06-23T17:10:30Z

PR_Github #55287 [ run ] triggered by Bot. Commit: a21d821 Link to invocation

tensorrt-cicd · 2026-06-23T17:14:01Z

PR_Github #55281 [ run ] completed with state ABORTED. Commit: a260f3b

Link to invocation

chienchunhung · 2026-06-23T17:41:12Z

/bot run

tensorrt-cicd · 2026-06-23T17:47:35Z

PR_Github #55303 [ run ] triggered by Bot. Commit: a21d821 Link to invocation

tensorrt-cicd · 2026-06-23T17:51:56Z

PR_Github #55287 [ run ] completed with state ABORTED. Commit: a21d821

Link to invocation

mikeiovine

Signing off on PyExecutor related changes

tensorrt-cicd · 2026-06-23T23:06:42Z

PR_Github #55303 [ run ] completed with state FAILURE. Commit: a21d821
/LLM/main/L0_MergeRequest_PR pipeline #44253 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

chienchunhung · 2026-06-23T23:13:27Z

/bot run --disable-fail-fast --stage-list "DGX_B300-4_GPUs-PyTorch-1, GB200-4_GPUs-PyTorch-5"

tensorrt-cicd · 2026-06-23T23:20:01Z

PR_Github #55345 [ run ] triggered by Bot. Commit: a21d821 Link to invocation

github-actions Bot assigned chienchunhung Jun 5, 2026

chienchunhung changed the title ~~[TRTLLM-13077][feat] Wave 1: migrate aliases to setup_aliases and stage GMS RO load~~ [TRTLLM-13246][feat] Wave 1: migrate aliases to setup_aliases and stage GMS RO load Jun 5, 2026

chienchunhung force-pushed the feat/staged-hooks-wave1-gms-ro branch from 6456b20 to ac30c0a Compare June 5, 2026 20:31

chienchunhung force-pushed the feat/staged-hooks-wave1-gms-ro branch 2 times, most recently from 690c0c8 to ac30c0a Compare June 5, 2026 22:16

chienchunhung force-pushed the feat/staged-hooks-wave1-gms-ro branch from ac30c0a to eabb7c0 Compare June 8, 2026 23:53

chienchunhung force-pushed the feat/staged-hooks-wave1-gms-ro branch from eabb7c0 to c42781c Compare June 11, 2026 21:22

chienchunhung force-pushed the feat/staged-hooks-wave1-gms-ro branch from c42781c to 4352612 Compare June 12, 2026 00:28

chienchunhung marked this pull request as ready for review June 12, 2026 00:32

chienchunhung requested review from a team as code owners June 12, 2026 00:32

chienchunhung requested a review from dongjiyingdjy June 12, 2026 00:32

chienchunhung enabled auto-merge (squash) June 16, 2026 16:58

brb-nv approved these changes Jun 22, 2026

View reviewed changes

chienchunhung force-pushed the feat/staged-hooks-wave1-gms-ro branch from 85973b0 to 7eec9fe Compare June 23, 2026 05:06

chienchunhung force-pushed the feat/staged-hooks-wave1-gms-ro branch from 7eec9fe to a260f3b Compare June 23, 2026 16:53

[TRTLLM-13246][feat] Wave 1: migrate aliases to setup_aliases and sta…

a21d821

…ge GMS RO load Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>

chienchunhung requested a review from litaotju June 23, 2026 17:03

chienchunhung force-pushed the feat/staged-hooks-wave1-gms-ro branch from a260f3b to a21d821 Compare June 23, 2026 17:03

chienchunhung requested a review from QiJune June 23, 2026 17:59

mikeiovine approved these changes Jun 23, 2026

View reviewed changes

Conversation

chienchunhung commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What Changed

Dependency / prerequisite stack

Test Plan

Next Steps

PR Checklist

Summary by CodeRabbit

Uh oh!

chienchunhung commented Jun 5, 2026

Uh oh!

tensorrt-cicd commented Jun 5, 2026

Uh oh!

chienchunhung commented Jun 5, 2026

Uh oh!

tensorrt-cicd commented Jun 5, 2026

Uh oh!

tensorrt-cicd commented Jun 5, 2026

Uh oh!

tensorrt-cicd commented Jun 6, 2026

Uh oh!

chienchunhung commented Jun 9, 2026

Uh oh!

tensorrt-cicd commented Jun 9, 2026

Uh oh!

tensorrt-cicd commented Jun 9, 2026

Uh oh!

chienchunhung commented Jun 11, 2026

Uh oh!

chienchunhung commented Jun 11, 2026

Uh oh!

tensorrt-cicd commented Jun 11, 2026

Uh oh!

tensorrt-cicd commented Jun 11, 2026

Uh oh!

tensorrt-cicd commented Jun 11, 2026

Uh oh!

tensorrt-cicd commented Jun 16, 2026

Uh oh!

chienchunhung commented Jun 16, 2026

Uh oh!

tensorrt-cicd commented Jun 16, 2026

Uh oh!

tensorrt-cicd commented Jun 16, 2026

Uh oh!

chienchunhung commented Jun 16, 2026

Uh oh!

tensorrt-cicd commented Jun 16, 2026

Uh oh!

tensorrt-cicd commented Jun 16, 2026

Uh oh!

brb-nv left a comment

Choose a reason for hiding this comment

Uh oh!

chienchunhung commented Jun 23, 2026

Uh oh!

tensorrt-cicd commented Jun 23, 2026

Uh oh!

tensorrt-cicd commented Jun 23, 2026

Uh oh!

chienchunhung commented Jun 23, 2026

Uh oh!

tensorrt-cicd commented Jun 23, 2026

Uh oh!

chienchunhung commented Jun 23, 2026

Uh oh!

tensorrt-cicd commented Jun 23, 2026

Uh oh!

tensorrt-cicd commented Jun 23, 2026

Uh oh!

chienchunhung commented Jun 23, 2026

Uh oh!

tensorrt-cicd commented Jun 23, 2026

Uh oh!

tensorrt-cicd commented Jun 23, 2026

Uh oh!

mikeiovine left a comment

Choose a reason for hiding this comment

chienchunhung commented Jun 5, 2026 •

edited

Loading