[TRTLLM-13250][feat] Wave 5: Enable MX post-transform Llama receiver#15432
[TRTLLM-13250][feat] Wave 5: Enable MX post-transform Llama receiver#15432chienchunhung wants to merge 7 commits into
Conversation
|
/bot run --disable-fail-fast |
|
PR_Github #54683 [ run ] triggered by Bot. Commit: |
|
PR_Github #54683 [ run ] completed with state
|
ae210cb to
f123c77
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #54888 [ run ] triggered by Bot. Commit: |
|
PR_Github #54888 [ run ] completed with state
|
f123c77 to
14a4537
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #54955 [ run ] triggered by Bot. Commit: |
|
PR_Github #54955 [ run ] completed with state |
63ea3d6 to
03e707f
Compare
|
/bot run |
03e707f to
b3705a0
Compare
|
/bot run |
|
PR_Github #56288 [ run ] triggered by Bot. Commit: |
|
PR_Github #56290 [ run ] triggered by Bot. Commit: |
|
PR_Github #56288 [ run ] completed with state |
|
PR_Github #56290 [ run ] completed with state
|
b3705a0 to
40ed0b1
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #56673 [ run ] triggered by Bot. Commit: |
|
PR_Github #56673 [ run ] completed with state
|
|
/bot run |
|
PR_Github #56746 [ run ] triggered by Bot. Commit: |
|
PR_Github #56746 [ run ] completed with state
|
Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>
Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>
Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>
Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>
Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>
Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>
Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>
edffee0 to
c2f35a8
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #57216 [ run ] triggered by Bot. Commit: |
|
PR_Github #57216 [ run ] completed with state
|
Summary
Stacked on Wave 4 / #15387.
This implements Wave 5 of the staged post-load hook rollout for MX:
setup_aliases()+cache_derived_state()Dependency / prerequisite stack
This PR is Wave 5 in the staged post-load hooks rollout. The foundation PRs #14770 and #14878 are already merged. The wave PRs should merge in sequence; after each upstream wave lands, rebase the next wave onto
mainso review and CI focus on that wave's delta.Arrows point from prerequisite to dependent. PR numbers in graph nodes are clickable.
graph TD PR14770["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/14770'>#14770</a>: staged-hook contract (merged)"] PR14878["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/14878'>#14878</a>: GMS SourceIdentity gate (merged)"] PR15014["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15014'>#15014</a>: Wave 1 aliases + GMS RO load (merged)"] PR15288["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15288'>#15288</a>: Wave 2 Linear/Attention transforms (merged)"] PR15386["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15386'>#15386</a>: Wave 3 MoE/Mamba staged hooks (open)"] PR15387["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15387'>#15387</a>: Wave 4 MX receiver cutover (open)"] PR15432["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15432'>#15432</a>: Wave 5 MX publisher + Llama receiver (this PR, open)"] VERIFY["post-migration verification / demo (planned)"] PR14770 -->|satisfied| PR15014 PR14878 -->|satisfied| PR15014 PR15014 -->|blocking| PR15288 PR15288 -->|blocking| PR15386 PR15386 -->|blocking| PR15387 PR15387 -->|blocking| PR15432 PR15432 -.->|planned| VERIFY classDef merged fill:#dcfce7,stroke:#16a34a,color:#14532d; classDef open fill:#dbeafe,stroke:#2563eb,color:#1e3a8a; classDef current fill:#ede9fe,stroke:#7c3aed,color:#3b0764,stroke-width:3px; classDef downstream fill:#f3f4f6,stroke:#6b7280,color:#374151,stroke-dasharray:5 5; linkStyle 0,1 stroke:#16a34a,stroke-width:2px; linkStyle 2,3,4,5 stroke:#ea580c,stroke-width:3px; linkStyle 6 stroke:#6b7280,stroke-width:2px,stroke-dasharray:5 5; class PR14770,PR14878,PR15014,PR15288 merged; class PR15386,PR15387 open; class PR15432 current; class VERIFY downstream;Immediate merge dependency for this PR: #15387 must land first; after Wave 5 lands, run the post-migration verification/demo for the completed staged-hook rollout.
Validation
git diff --checkpython -m py_compile tensorrt_llm/_torch/models/checkpoints/mx/checkpoint_loader.py tensorrt_llm/_torch/pyexecutor/model_loader.py tests/unittest/_torch/models/checkpoints/mx/test_mx_checkpoint_loader.py tests/unittest/_torch/pyexecutor/test_model_loader_gms.py tests/unittest/_torch/pyexecutor/test_model_loader_mx.py tests/unittest/_torch/weight_sharing/test_mx_source_identity_gate.pywaive list checkandvalidate-test-listsskipped locally becausescripts/check_test_list.pyfails under this hook interpreter withTypeError: unsupported operand type(s) for |: 'type' and 'NoneType'Focused pytest collection is blocked in this local environment by missing
transformersbefore tests are collected.Summary by CodeRabbit
New Features
Improvements