[TRTLLM-12500][feat] Add support for Qwen3.5 VL MoE - REVERTED by #14599#14164
Conversation
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
📝 WalkthroughWalkthroughThis PR adds Qwen3.5 MoE Vision Language Model support through dtype resolution utilities, config normalization for multimodal architectures, a new VLModel wrapper class, weight mapper registration, and comprehensive unit and integration tests. ChangesQwen3.5 MoE VLM Implementation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tensorrt_llm/_torch/pyexecutor/config_utils.py`:
- Around line 48-52: resolve_hf_torch_dtype and resolve_mamba_ssm_cache_dtype
call _coerce_torch_dtype on each candidate attribute but immediately return its
result, so a returned None from _coerce_torch_dtype (the "auto" sentinel)
prematurely stops the fallback chain; change both functions to only return the
coerced dtype when _coerce_torch_dtype(...) is not None, otherwise continue
scanning the remaining attributes (i.e., call getattr for each attr, call
_coerce_torch_dtype, and if the result is truthy/not None then return it; if
None keep looping and finally return None).
In `@tests/integration/defs/accuracy/test_llm_api_pytorch_multimodal.py`:
- Around line 438-447: The new test method test_auto_dtype lacks an explicit
return type; update its signature to include "-> None" (i.e., def
test_auto_dtype(self) -> None:) to comply with repository typing rules and
mypy-friendly guidelines—locate the test_auto_dtype method that constructs
LLM(...) and calls task.evaluate on MMMU(...) and add the return annotation
there.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: b09424f2-ec07-43fb-b7b5-e73404064a0a
📒 Files selected for processing (11)
tensorrt_llm/_torch/configs/__init__.pytensorrt_llm/_torch/models/__init__.pytensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.pytensorrt_llm/_torch/models/modeling_qwen3_5.pytensorrt_llm/_torch/models/modeling_qwen3_next.pytensorrt_llm/_torch/models/modeling_qwen3vl.pytensorrt_llm/_torch/pyexecutor/config_utils.pytensorrt_llm/_torch/pyexecutor/model_loader.pytests/integration/defs/accuracy/references/mmmu.yamltests/integration/defs/accuracy/test_llm_api_pytorch_multimodal.pytests/unittest/_torch/modeling/test_modeling_qwen3_5_vl_moe.py
|
/bot run |
|
/bot run |
|
PR_Github #48959 [ run ] triggered by Bot. Commit: |
|
PR_Github #48959 [ run ] completed with state
|
|
/bot run |
|
PR_Github #48976 [ run ] triggered by Bot. Commit: |
|
PR_Github #48976 [ run ] completed with state
|
|
/bot run |
|
/bot kill |
|
PR_Github #48987 [ run ] triggered by Bot. Commit: |
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
|
/bot help |
GitHub Bot Help
Provide a user friendly way for developers to interact with a Jenkins server. Run See details below for each supported subcommand. Details
Launch build/test pipelines. All previously running jobs will be killed.
kill
Kill all running builds associated with pull request. skip
Skip testing for latest commit on pull request. reuse-pipeline
Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break. |
|
/bot reuse-pipeline |
|
PR_Github #49571 [ reuse-pipeline ] triggered by Bot. Commit: |
|
PR_Github #49571 [ reuse-pipeline ] completed with state |
Tabrizian
left a comment
There was a problem hiding this comment.
Reviewed py_executor/* changes and LGTM.
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Michal Guzek <mguzek@nvidia.com> Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
This PR was later reverted due to MTP issues, see the follow up PR here: #14599 |
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Michal Guzek <mguzek@nvidia.com> Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Summary by CodeRabbit
New Features
Tests
Description
Qwen3.5-MoE-VL(Qwen3_5MoeForConditionalGeneration) on top of #12611.transformers.Qwen3_5MoeConfig(present in5.3.0), adds a thin post-load normalizer that materializes the handful of aliases the reusedQwen3Nextruntime expects ontext_config(intermediate_sizefrom the MoE fields,rope_theta/partial_rotary_factor/rope_scalingfromrope_parameters), and centralizes hybrid-cache dtype resolution in two helpers.Test Coverage
Accuracy & unit tests
TODO: Comparison unit tests against HF
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either
api-compatibleorapi-breaking. Forapi-breaking, includeBREAKINGin the PR title.Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.