[TRTLLM-13246][feat] Wave 1: migrate aliases to setup_aliases and stage GMS RO load#15014
[TRTLLM-13246][feat] Wave 1: migrate aliases to setup_aliases and stage GMS RO load#15014chienchunhung wants to merge 1 commit into
Conversation
6456b20 to
ac30c0a
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #52438 [ run ] triggered by Bot. Commit: |
690c0c8 to
ac30c0a
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #52445 [ run ] triggered by Bot. Commit: |
|
PR_Github #52438 [ run ] completed with state |
|
PR_Github #52445 [ run ] completed with state
|
ac30c0a to
eabb7c0
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #52887 [ run ] triggered by Bot. Commit: |
|
PR_Github #52887 [ run ] completed with state
|
eabb7c0 to
c42781c
Compare
|
/bot run |
|
/bot run --disable-fail-fast |
|
PR_Github #53683 [ run ] triggered by Bot. Commit: |
|
PR_Github #53687 [ run ] triggered by Bot. Commit: |
|
PR_Github #53683 [ run ] completed with state |
c42781c to
4352612
Compare
|
PR_Github #54480 [ run ] completed with state
|
|
/bot run --disable-fail-fast --stage-list "A100X-PyTorch-1, DGX_B200-8_GPUs-PyTorch-1" |
|
PR_Github #54636 [ run ] triggered by Bot. Commit: |
|
PR_Github #54636 [ run ] completed with state |
|
/bot run --disable-fail-fast |
|
PR_Github #54668 [ run ] triggered by Bot. Commit: |
|
PR_Github #54668 [ run ] completed with state |
brb-nv
left a comment
There was a problem hiding this comment.
Changes to model files under tensorrt_llm/_torch/models/ look good to me.
85973b0 to
7eec9fe
Compare
|
/bot run |
|
PR_Github #55160 [ run ] triggered by Bot. Commit: |
|
PR_Github #55160 [ run ] completed with state
|
7eec9fe to
a260f3b
Compare
|
/bot run |
|
PR_Github #55281 [ run ] triggered by Bot. Commit: |
…ge GMS RO load Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>
a260f3b to
a21d821
Compare
|
/bot run |
|
PR_Github #55287 [ run ] triggered by Bot. Commit: |
|
PR_Github #55281 [ run ] completed with state |
|
/bot run |
|
PR_Github #55303 [ run ] triggered by Bot. Commit: |
|
PR_Github #55287 [ run ] completed with state |
mikeiovine
left a comment
There was a problem hiding this comment.
Signing off on PyExecutor related changes
|
PR_Github #55303 [ run ] completed with state
|
|
/bot run --disable-fail-fast --stage-list "DGX_B300-4_GPUs-PyTorch-1, GB200-4_GPUs-PyTorch-5" |
|
PR_Github #55345 [ run ] triggered by Bot. Commit: |
Summary
Wave 1 of the staged post-load hooks rollout. The staged-hook contract landed in #14770, and #14878 has now merged, so this PR is a single Wave 1 commit on top of
main.This change migrates alias-only model hooks from
post_load_weights()tosetup_aliases()and cuts the GMS read-only (RO) load path over from the old meta-tensor workaround to the staged-hook protocol.What Changed
modeling_llama(LlamaForCausalLM,Llama4ForConditionalGeneration),modeling_deepseekv3,modeling_glm,modeling_exaone_moe,modeling_qwen3_moe,modeling_qwen3_next, andmodeling_gpt_oss. Their alias-onlypost_load_weights()bodies move verbatim intosetup_aliases(), while standard load paths continue through the basepost_load_weights()orchestrator._weights_transformedflags before rebinding fresh weights, while partial reload keeps existing transform guards intact for untouched modules.test_model_loader_gms.pyandtest_model_loader_mx.py.Dependency / prerequisite stack
This PR is Wave 1 in the staged post-load hooks rollout. The foundation PRs #14770 and #14878 are already merged. The wave PRs should merge in sequence; after each upstream wave lands, rebase the next wave onto
mainso review and CI focus on that wave's delta.Arrows point from prerequisite to dependent. PR numbers in graph nodes are clickable.
graph TD PR14770["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/14770'>#14770</a>: staged-hook contract (merged)"] PR14878["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/14878'>#14878</a>: GMS SourceIdentity gate (merged)"] PR15014["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15014'>#15014</a>: Wave 1 aliases + GMS RO load (this PR, open)"] PR15288["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15288'>#15288</a>: Wave 2 Linear/Attention transforms (draft)"] PR15386["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15386'>#15386</a>: Wave 3 MoE/Mamba staged hooks (draft)"] PR15387["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15387'>#15387</a>: Wave 4 MX receiver cutover (draft)"] PR15432["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15432'>#15432</a>: Wave 5 MX publisher + Llama receiver (draft)"] VERIFY["post-migration verification / demo (planned)"] PR14770 -->|satisfied| PR15014 PR14878 -->|satisfied| PR15014 PR15014 -->|blocking| PR15288 PR15288 -->|blocking| PR15386 PR15386 -->|blocking| PR15387 PR15387 -->|blocking| PR15432 PR15432 -.->|planned| VERIFY classDef merged fill:#dcfce7,stroke:#16a34a,color:#14532d; classDef inflight fill:#dbeafe,stroke:#2563eb,color:#1e3a8a; classDef draft fill:#ffedd5,stroke:#f97316,color:#7c2d12; classDef current fill:#ede9fe,stroke:#7c3aed,color:#3b0764,stroke-width:3px; classDef downstream fill:#f3f4f6,stroke:#6b7280,color:#374151,stroke-dasharray:5 5; linkStyle 0,1 stroke:#16a34a,stroke-width:2px; linkStyle 2,3,4,5 stroke:#ea580c,stroke-width:3px; linkStyle 6 stroke:#6b7280,stroke-width:2px,stroke-dasharray:5 5; class PR14770,PR14878 merged; class PR15288,PR15386,PR15387,PR15432 draft; class PR15014 current; class VERIFY downstream;Immediate merge dependency for this PR: none beyond the already-merged #14770 and #14878 foundation PRs; downstream waves remain stacked on this branch until Wave 1 lands.
Test Plan
pytest tests/unittest/_torch/pyexecutor/test_model_loader_gms.py tests/unittest/_torch/pyexecutor/test_model_loader_mx.py/bot run --disable-fail-fastbefore review.git diff --check,py_compileon changed Python files, and pre-commit. Local pytest was not available in this macOS shell.Next Steps
transform_weights()with_weights_transformedguards.PR Checklist
Summary by CodeRabbit