[TRTLLM-13247][feat] Wave 2: stage Linear and Attention transforms by chienchunhung · Pull Request #15288 · NVIDIA/TensorRT-LLM

chienchunhung · 2026-06-12T01:08:55Z

Summary

Wave 2 of the staged post-load hooks rollout, stacked on #15014.

This migrates the remaining Linear and MLA tensor-layout post-load work into transform_weights() with _weights_transformed guards, while keeping post_load_weights() as the backward-compatible shim for existing full post-load walks.

What Changed

Added Linear.transform_weights() and a quant-method-level transform_weights() hook, with post_load_weights() delegating through the staged hook.
Moved FP8 block-scale resmoothing, NVFP4 padding, and W4A16 NVFP4 scale unswizzling from Linear post_load_weights() implementations into transform_weights().
Added _weights_transformed state for Linear and MLA, reset when fresh Linear weights or auxiliary MLA weight tensors are created/loaded.
Moved MLA SM120 FP8 resmoothing into MLA.transform_weights() and kept MLA.post_load_weights() as a shim.
Clarified the GMS RO documentation: RO readers run setup_aliases(), materialize_module(), then cache_derived_state(); writer-only tensor layout changes belong in transform_weights().
Updated/added pyexecutor unit coverage for transform idempotency and the GMS RW source_identity call shape.

Dependency / prerequisite stack

This PR is Wave 2 in the staged post-load hooks rollout. The foundation PRs #14770 and #14878 are already merged. The wave PRs should merge in sequence; after each upstream wave lands, rebase the next wave onto main so review and CI focus on that wave's delta.

Arrows point from prerequisite to dependent. PR numbers in graph nodes are clickable.

graph TD
    PR14770["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/14770'>#14770</a>: staged-hook contract (merged)"]
    PR14878["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/14878'>#14878</a>: GMS SourceIdentity gate (merged)"]
    PR15014["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15014'>#15014</a>: Wave 1 aliases + GMS RO load (open)"]
    PR15288["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15288'>#15288</a>: Wave 2 Linear/Attention transforms (this PR, draft)"]
    PR15386["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15386'>#15386</a>: Wave 3 MoE/Mamba staged hooks (draft)"]
    PR15387["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15387'>#15387</a>: Wave 4 MX receiver cutover (draft)"]
    PR15432["<a href='https://github.com/NVIDIA/TensorRT-LLM/pull/15432'>#15432</a>: Wave 5 MX publisher + Llama receiver (draft)"]
    VERIFY["post-migration verification / demo (planned)"]

    PR14770 -->|satisfied| PR15014
    PR14878 -->|satisfied| PR15014
    PR15014 -->|blocking| PR15288
    PR15288 -->|blocking| PR15386
    PR15386 -->|blocking| PR15387
    PR15387 -->|blocking| PR15432
    PR15432 -.->|planned| VERIFY

    classDef merged fill:#dcfce7,stroke:#16a34a,color:#14532d;
    classDef inflight fill:#dbeafe,stroke:#2563eb,color:#1e3a8a;
    classDef draft fill:#ffedd5,stroke:#f97316,color:#7c2d12;
    classDef current fill:#ede9fe,stroke:#7c3aed,color:#3b0764,stroke-width:3px;
    classDef downstream fill:#f3f4f6,stroke:#6b7280,color:#374151,stroke-dasharray:5 5;
    linkStyle 0,1 stroke:#16a34a,stroke-width:2px;
    linkStyle 2,3,4,5 stroke:#ea580c,stroke-width:3px;
    linkStyle 6 stroke:#6b7280,stroke-width:2px,stroke-dasharray:5 5;

    class PR14770,PR14878 merged;
    class PR15014 inflight;
    class PR15386,PR15387,PR15432 draft;
    class PR15288 current;
    class VERIFY downstream;

Immediate merge dependency for this PR: #15014 must land first; after it lands, rebase this branch onto main so the PR diff collapses to the Wave 2 delta.

Test Plan

PYTHONPYCACHEPREFIX=/tmp/trtllm-wave2-pycache python3 -m py_compile tensorrt_llm/_torch/modules/linear.py tensorrt_llm/_torch/modules/attention.py tensorrt_llm/_torch/memory/gpu_memory_backend.py tests/unittest/_torch/pyexecutor/test_model_loader_mx.py tests/unittest/_torch/pyexecutor/test_model_loader_gms.py
git diff --check
PATH=/Users/chienchunh/.cache/codex-runtimes/codex-primary-runtime/dependencies/python/bin:$PATH pre-commit run --files tensorrt_llm/_torch/memory/gpu_memory_backend.py tensorrt_llm/_torch/modules/attention.py tensorrt_llm/_torch/modules/linear.py tests/unittest/_torch/pyexecutor/test_model_loader_gms.py tests/unittest/_torch/pyexecutor/test_model_loader_mx.py
Focused pytest command attempted but blocked in this macOS shell because transformers is not installed: PYTHONPATH=. PYTHONPYCACHEPREFIX=/tmp/trtllm-wave2-pycache pytest tests/unittest/_torch/pyexecutor/test_model_loader_mx.py tests/unittest/_torch/pyexecutor/test_model_loader_gms.py

Next Steps

Wave 3: migrate MoE and Mamba post-load transforms/cache state.
After [TRTLLM-13246][feat] Wave 1: migrate aliases to setup_aliases and stage GMS RO load #15014 lands, rebase this branch onto main so the PR diff collapses to the Wave 2 commit only.

Summary by CodeRabbit

Release Notes

Bug Fixes
- Fixed weight initialization sequencing in read-only weight sharing models to ensure structural aliases are properly configured before tensor materialization.
Refactor
- Restructured weight loading and transformation pipeline with enhanced state tracking and idempotency checks to prevent redundant transformations.
- Updated and consolidated weight initialization hooks across multiple model implementations for improved lifecycle management and consistency.

chienchunhung · 2026-06-12T01:11:46Z

Superseded stack note: this branch was rebased again after the initial draft PR setup. The current stack is now documented in #15288 (comment).

chienchunhung · 2026-06-12T01:16:21Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-06-12T01:23:10Z

PR_Github #53738 [ run ] triggered by Bot. Commit: d309240 Link to invocation

tensorrt-cicd · 2026-06-12T08:04:01Z

PR_Github #53738 [ run ] completed with state SUCCESS. Commit: d309240
/LLM/main/L0_MergeRequest_PR pipeline #42864 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

chienchunhung · 2026-06-12T18:02:11Z

/bot run --disable-fail-fast

chienchunhung · 2026-06-12T18:03:46Z

CI investigation update: the failed build 42864 ran on old head d309240d0b and old base 2dd5c67358. The dominant Ray failures were caused by the upstream/main regression from #14970: BaseWorker.reset_prefix_cache() conflicted with rlhf_utils.WorkerExtension.reset_prefix_cache(), producing ValueError: Worker class RayGPUWorker already defines 'reset_prefix_cache', which conflicts with extension WorkerExtension.

upstream/main has since reverted that regression in #15306 (db7161b675), so this branch has been rebased and force-pushed onto latest main:

Base: b03b78f300
Dependency commit replayed from [TRTLLM-13246][feat] Wave 1: migrate aliases to setup_aliases and stage GMS RO load #15014: d21c68baf8
Current [TRTLLM-13247][feat] Wave 2: stage Linear and Attention transforms #15288 head commit: 67267df3e0

A fresh /bot run --disable-fail-fast was requested in #15288 (comment).

tensorrt-cicd · 2026-06-12T18:07:47Z

PR_Github #53933 [ run ] triggered by Bot. Commit: 67267df Link to invocation

tensorrt-cicd · 2026-06-13T01:23:23Z

PR_Github #53933 [ run ] completed with state FAILURE. Commit: 67267df
/LLM/main/L0_MergeRequest_PR pipeline #43026 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

chienchunhung · 2026-06-13T02:30:12Z

/bot run --disable-fail-fast --stage-list "DGX_B200-4_GPUs-PyTorch-Ray-1, DGX_B200-8_GPUs-PyTorch-1"

tensorrt-cicd · 2026-06-13T02:35:53Z

PR_Github #53990 [ run ] triggered by Bot. Commit: 67267df Link to invocation

tensorrt-cicd · 2026-06-13T03:30:49Z

PR_Github #53990 [ run ] completed with state SUCCESS. Commit: 67267df
/LLM/main/L0_MergeRequest_PR pipeline #43076 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

chienchunhung · 2026-06-13T06:11:34Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-06-13T06:17:17Z

PR_Github #54020 [ run ] triggered by Bot. Commit: 67267df Link to invocation

tensorrt-cicd · 2026-06-13T06:40:52Z

PR_Github #54020 [ run ] completed with state FAILURE. Commit: 67267df
/LLM/main/L0_MergeRequest_PR pipeline #43104 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

chienchunhung · 2026-06-14T17:20:02Z

/bot run --disable-fail-fast --stage-list "SBSA-Linux"

tensorrt-cicd · 2026-06-14T17:25:39Z

PR_Github #54145 [ run ] triggered by Bot. Commit: 67267df Link to invocation

tensorrt-cicd · 2026-06-14T17:46:51Z

PR_Github #54145 [ run ] completed with state FAILURE. Commit: 67267df
/LLM/main/L0_MergeRequest_PR pipeline #43228 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

chienchunhung · 2026-06-15T17:15:50Z

/bot run

tensorrt-cicd · 2026-06-15T17:23:32Z

PR_Github #54338 [ run ] triggered by Bot. Commit: bfebf3a Link to invocation

tensorrt-cicd · 2026-06-15T20:56:21Z

PR_Github #54338 [ run ] completed with state SUCCESS. Commit: bfebf3a
/LLM/main/L0_MergeRequest_PR pipeline #43409 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

chienchunhung · 2026-06-15T22:16:24Z

/bot run --disable-fail-fast --stage-list "DGX_B200-PyTorch-1"

tensorrt-cicd · 2026-06-15T22:25:53Z

PR_Github #54369 [ run ] triggered by Bot. Commit: bfebf3a Link to invocation

tensorrt-cicd · 2026-06-15T23:10:39Z

PR_Github #54369 [ run ] completed with state SUCCESS. Commit: bfebf3a
/LLM/main/L0_MergeRequest_PR pipeline #43439 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

chienchunhung · 2026-06-15T23:21:09Z

/bot run --disable-fail-fast

coderabbitai · 2026-06-22T18:12:08Z

📝 Walkthrough

Walkthrough

The PR splits the post_load_weights lifecycle hook into two distinct hooks: setup_aliases (structural tensor aliasing, must run before GMS materialization) and transform_weights (idempotent weight transformations, guarded by _weights_transformed). Seven model classes rename their hook; Linear and MLA gain idempotency flags; the GMS RO load pipeline is re-ordered accordingly.

Changes

Lifecycle Hook Refactor

Layer / File(s)	Summary
`transform_weights` idempotency in `LinearMethodBase`, `Linear`, `MLA` `tensorrt_llm/_torch/modules/linear.py`, `tensorrt_llm/_torch/modules/attention.py`	`LinearMethodBase` adds `transform_weights` (no-op) and routes `post_load_weights` through it. Quant subclasses (`FP8BlockScalesLinearMethod`, `NVFP4LinearMethod`, `W4A16NVFP4LinearMethod`) override `transform_weights` instead of `post_load_weights`. `Linear` adds `_weights_transformed` flag reset on init/create/load and implements `transform_weights` with idempotency guard. `MLA` similarly extracts resmooth logic into `transform_weights` with a `_weights_transformed` guard.
`post_load_weights` → `setup_aliases` rename across model classes `tensorrt_llm/_torch/models/modeling_deepseekv3.py`, `tensorrt_llm/_torch/models/modeling_exaone_moe.py`, `tensorrt_llm/_torch/models/modeling_glm.py`, `tensorrt_llm/_torch/models/modeling_gpt_oss.py`, `tensorrt_llm/_torch/models/modeling_llama.py`, `tensorrt_llm/_torch/models/modeling_qwen3_moe.py`, `tensorrt_llm/_torch/models/modeling_qwen3_next.py`	Eight model classes rename their layer-norm aliasing hook from `post_load_weights` to `setup_aliases` with no logic changes. Related constructor comments in Llama decoder layers and models are updated to reference `setup_aliases`.
GMS RO pipeline re-ordering and `_setup_aliases` recursive walk `tensorrt_llm/_torch/pyexecutor/model_loader.py`, `tensorrt_llm/_torch/memory/gpu_memory_backend.py`	On the GMS RO path, the per-module `post_load_weights` loop is replaced with: `_setup_aliases(model)` → `_check_gms_source_identity` gate → `materialize_module` → `_walk_cache_state`. `_setup_aliases` changes from a single root-model call to a recursive module walk skipping `_weights_removed` modules. `reload()` resets `_weights_transformed` flags before weight loading. Docs in `gpu_memory_backend.py` update the stated call-order contract.
Tests: GMS ordering, recursive walk, idempotency `tests/unittest/_torch/pyexecutor/test_model_loader_gms.py`, `tests/unittest/_torch/pyexecutor/test_model_loader_mx.py`	GMS tests add `setup_aliases`/`cache_derived_state` to `_TinyModel`, stub `SourceIdentity.from_model_config`, set RO `get_source_identity` to `None`, assert the new RO event sequence, and add a dedicated ordering test asserting `post_load_weights` is absent on RO. MX tests add imports, stub identity, assert `_weights_transformed` reset on reload, replace the top-level-only `setup_aliases` test with a recursive-walk test, and add idempotency tests for `Linear.transform_weights` and `MLA.transform_weights`.

Sequence Diagram(s)

sequenceDiagram
  rect rgba(173, 216, 230, 0.5)
    Note over ModelLoader,GMSBackend: GMS RO Load Path
  end
  participant ModelLoader
  participant CheckpointLoader
  participant GMSBackend
  participant Model

  ModelLoader->>CheckpointLoader: post_load_apply(weights_preloaded=True)
  ModelLoader->>Model: _setup_aliases() — recursive walk, skip _weights_removed
  ModelLoader->>GMSBackend: _check_gms_source_identity() — SourceIdentity gate
  ModelLoader->>GMSBackend: materialize_module(model) — bind real tensors
  ModelLoader->>Model: _walk_cache_state() — refresh derived state
  ModelLoader->>CheckpointLoader: post_load_publish()

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

NVIDIA/TensorRT-LLM#14878: Introduces the SourceIdentity / _check_gms_source_identity gate in the GMS RO pipeline that this PR now positions after setup_aliases and before materialize_module.

Suggested labels

api-breaking

Suggested reviewers

chang-l
brb-nv
byshiue
galletas1712
pcastonguay
yechank-nvidia

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 17.39% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title '[TRTLLM-13247][feat] Wave 2: stage Linear and Attention transforms' clearly and concisely summarizes the main change: implementing Wave 2 of a staged rollout to refactor tensor-layout transformations for Linear and Attention modules.
Description check	✅ Passed	The PR description comprehensively covers what changed, why it changed, testing performed, and dependencies/prerequisites. It includes a detailed Summary section, What Changed section with specific technical details, clear Test Plan, and dependency graph showing Wave 2's place in the larger rollout.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tensorrt_llm/_torch/models/modeling_gpt_oss.py`:
- Line 634: The setup_aliases method is missing an explicit return type
annotation which violates the coding guidelines requiring all functions to be
annotated with their return types. Add -> None after the closing parenthesis of
the setup_aliases method signature to explicitly indicate that this method does
not return any value. This should be placed between the closing parenthesis and
the colon in the method definition.

In `@tensorrt_llm/_torch/models/modeling_qwen3_next.py`:
- Line 983: Add an explicit `-> None` return type annotation to the
`setup_aliases` method definition. Locate the method definition for
`setup_aliases` and modify it from `def setup_aliases(self):` to `def
setup_aliases(self) -> None:` to comply with the coding guideline that requires
all functions to have return type annotations.

In `@tensorrt_llm/_torch/modules/linear.py`:
- Around line 3145-3149: The `_weights_transformed` flag in the Linear class
becomes inaccurate when using GMS RO (Read-Only) materialization because
`materialize_module()` binds already transformed parameters but the flag remains
False, causing layout transforms to be incorrectly re-applied later. Add a RO
cache-state hook in the Linear module that sets `_weights_transformed = True`
when weights are materialized through the RO path, ensuring the flag truthfully
reflects the actual state of weight transformation. Apply the same state
handling logic to the MLA module to maintain consistency across both modules.
- Around line 383-384: The transform_weights method in LinearMethodBase class
currently violates Ruff rule B027 because it only contains a pass statement in a
concrete (non-abstract) method. Since this is an intentional optional hook that
should remain concrete rather than abstract, replace the pass statement with a
non-empty body such as ellipsis (...) or a docstring to satisfy the Ruff linter
while maintaining the optional hook functionality.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 402ced13-7870-45cc-83ca-6fa005ee6211

📥 Commits

Reviewing files that changed from the base of the PR and between 09449d4 and cf883fc.

📒 Files selected for processing (13)

tensorrt_llm/_torch/memory/gpu_memory_backend.py
tensorrt_llm/_torch/models/modeling_deepseekv3.py
tensorrt_llm/_torch/models/modeling_exaone_moe.py
tensorrt_llm/_torch/models/modeling_glm.py
tensorrt_llm/_torch/models/modeling_gpt_oss.py
tensorrt_llm/_torch/models/modeling_llama.py
tensorrt_llm/_torch/models/modeling_qwen3_moe.py
tensorrt_llm/_torch/models/modeling_qwen3_next.py
tensorrt_llm/_torch/modules/attention.py
tensorrt_llm/_torch/modules/linear.py
tensorrt_llm/_torch/pyexecutor/model_loader.py
tests/unittest/_torch/pyexecutor/test_model_loader_gms.py
tests/unittest/_torch/pyexecutor/test_model_loader_mx.py

chienchunhung · 2026-06-23T05:11:31Z

/bot run

tensorrt-cicd · 2026-06-23T05:17:22Z

PR_Github #55161 [ run ] triggered by Bot. Commit: e5d3175 Link to invocation

tensorrt-cicd · 2026-06-23T07:45:27Z

PR_Github #55161 [ run ] completed with state SUCCESS. Commit: e5d3175
/LLM/main/L0_MergeRequest_PR pipeline #44136 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

chienchunhung · 2026-06-23T17:04:14Z

/bot run

tensorrt-cicd · 2026-06-23T17:11:56Z

PR_Github #55288 [ run ] triggered by Bot. Commit: 896e764 Link to invocation

chienchunhung · 2026-06-23T17:41:15Z

/bot run

tensorrt-cicd · 2026-06-23T18:12:01Z

PR_Github #55288 [ run ] completed with state SUCCESS. Commit: 896e764
/LLM/main/L0_MergeRequest_PR pipeline #44239 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>

chienchunhung · 2026-06-24T21:15:29Z

/bot run

tensorrt-cicd · 2026-06-24T21:23:40Z

PR_Github #55595 [ run ] triggered by Bot. Commit: 9450f16 Link to invocation

tensorrt-cicd · 2026-06-24T22:45:11Z

PR_Github #55595 [ run ] completed with state FAILURE. Commit: 9450f16
/LLM/main/L0_MergeRequest_PR pipeline #44513 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

chienchunhung · 2026-06-24T22:58:44Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-06-24T23:04:12Z

PR_Github #55610 [ run ] triggered by Bot. Commit: 9450f16 Link to invocation

tensorrt-cicd · 2026-06-25T03:14:14Z

PR_Github #55610 [ run ] completed with state FAILURE. Commit: 9450f16
/LLM/main/L0_MergeRequest_PR pipeline #44528 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

github-actions Bot assigned chienchunhung Jun 12, 2026

chienchunhung force-pushed the codex/staged-hooks-wave2-transform-weights branch from 379b212 to d309240 Compare June 12, 2026 01:10

chienchunhung force-pushed the codex/staged-hooks-wave2-transform-weights branch from d309240 to 67267df Compare June 12, 2026 18:01

chienchunhung force-pushed the codex/staged-hooks-wave2-transform-weights branch from 67267df to bfebf3a Compare June 15, 2026 17:13

chienchunhung changed the title ~~[TRTLLM-13246][feat] Wave 2: stage Linear and Attention transforms~~ [TRTLLM-13247][feat] Wave 2: stage Linear and Attention transforms Jun 15, 2026

chienchunhung mentioned this pull request Jun 15, 2026

[TRTLLM-13248][feat] Wave 3: migrate MoE staged hooks #15386

Open

chienchunhung requested review from byshiue, dongjiyingdjy, hlu1, symphonylyh and yuxianq June 22, 2026 18:03

coderabbitai Bot reviewed Jun 22, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/models/modeling_gpt_oss.py Outdated

Comment thread tensorrt_llm/_torch/models/modeling_qwen3_next.py Outdated

Comment thread tensorrt_llm/_torch/modules/linear.py Outdated

Comment thread tensorrt_llm/_torch/modules/linear.py

chienchunhung force-pushed the codex/staged-hooks-wave2-transform-weights branch from cf883fc to e5d3175 Compare June 23, 2026 05:11

chienchunhung force-pushed the codex/staged-hooks-wave2-transform-weights branch 2 times, most recently from abce27d to 896e764 Compare June 23, 2026 17:03

chienchunhung requested a review from litaotju June 23, 2026 17:43

chienchunhung requested review from QiJune and xxi-nv June 23, 2026 21:39

chienchunhung added 2 commits June 24, 2026 14:13

[TRTLLM-13247][feat] Wave 2: stage Linear and Attention transforms

ffd91fb

Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>

[TRTLLM-13247][fix] Address CodeRabbit review comments

9450f16

Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>

chienchunhung force-pushed the codex/staged-hooks-wave2-transform-weights branch from 71027a6 to 9450f16 Compare June 24, 2026 21:15

juney-nvidia approved these changes Jun 25, 2026

View reviewed changes

Uh oh!

Conversation

chienchunhung commented Jun 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What Changed

Dependency / prerequisite stack

Test Plan

Next Steps

Summary by CodeRabbit

Release Notes

Uh oh!

chienchunhung commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chienchunhung commented Jun 12, 2026

Uh oh!

tensorrt-cicd commented Jun 12, 2026

Uh oh!

tensorrt-cicd commented Jun 12, 2026

Uh oh!

chienchunhung commented Jun 12, 2026

Uh oh!

chienchunhung commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tensorrt-cicd commented Jun 12, 2026

Uh oh!

tensorrt-cicd commented Jun 13, 2026

Uh oh!

chienchunhung commented Jun 13, 2026

Uh oh!

tensorrt-cicd commented Jun 13, 2026

Uh oh!

tensorrt-cicd commented Jun 13, 2026

Uh oh!

chienchunhung commented Jun 13, 2026

Uh oh!

tensorrt-cicd commented Jun 13, 2026

Uh oh!

tensorrt-cicd commented Jun 13, 2026

Uh oh!

chienchunhung commented Jun 14, 2026

Uh oh!

tensorrt-cicd commented Jun 14, 2026

Uh oh!

tensorrt-cicd commented Jun 14, 2026

Uh oh!

chienchunhung commented Jun 15, 2026

Uh oh!

tensorrt-cicd commented Jun 15, 2026

Uh oh!

tensorrt-cicd commented Jun 15, 2026

Uh oh!

chienchunhung commented Jun 15, 2026

Uh oh!

tensorrt-cicd commented Jun 15, 2026

Uh oh!

tensorrt-cicd commented Jun 15, 2026

Uh oh!

chienchunhung commented Jun 15, 2026

Uh oh!

coderabbitai Bot commented Jun 22, 2026

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chienchunhung commented Jun 23, 2026

chienchunhung commented Jun 12, 2026 •

edited by coderabbitai Bot

Loading

chienchunhung commented Jun 12, 2026 •

edited

Loading

chienchunhung commented Jun 12, 2026 •

edited

Loading