[None][feat] Cosmos3 reasoner only support#15117
Conversation
Add Cosmos3ForConditionalGeneration as a Qwen3-VL derivative with unified checkpoint layout (transformer/ + vision_encoder/), weight mapper, config registration, modeling tests, and supported-models entry. Signed-off-by: Bartosz Stefaniak <bstefaniak@nvidia.com>
📝 WalkthroughWalkthroughThis PR adds comprehensive Cosmos3 omni multimodal model support to TensorRT-LLM. It introduces ChangesCosmos3 Omni Model Support
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tensorrt_llm/_torch/models/modeling_cosmos3_omni.py`:
- Around line 82-85: If omni_config is None, the instance attributes
self._checkpoint_root, self.llm_path and self._vision_encoder_path are never set
and later access (e.g. in llm_checkpoint_dir or load_weights) raises
AttributeError; in __init__ (where omni_config is handled and
_get_cosmos3_model_paths is called) add a fail-fast check: if omni_config is
None raise a clear ValueError (or TypeError) describing that
omni_config/pretrained_config is required, so __init__ either sets valid paths
via _get_cosmos3_model_paths or aborts; reference symbols: __init__,
omni_config, _get_cosmos3_model_paths, self._checkpoint_root, self.llm_path,
self._vision_encoder_path, llm_checkpoint_dir, load_weights.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: ce3203c4-1b97-46eb-8b7e-ba4fb224bde3
📒 Files selected for processing (14)
.github/CODEOWNERSdocs/source/models/supported-models.mdtensorrt_llm/_torch/configs/__init__.pytensorrt_llm/_torch/configs/cosmos3_omni.pytensorrt_llm/_torch/models/__init__.pytensorrt_llm/_torch/models/checkpoints/__init__.pytensorrt_llm/_torch/models/checkpoints/hf/cosmos3_weight_mapper.pytensorrt_llm/_torch/models/modeling_cosmos3_omni.pytensorrt_llm/_torch/models/modeling_qwen3vl.pytensorrt_llm/_torch/pyexecutor/config_utils.pytests/integration/test_lists/test-db/l0_l40s.ymltests/test_common/llm_data.pytests/unittest/_torch/modeling/test_modeling_cosmos3_omni.pytests/unittest/_torch/test_custom_config_registration.py
…sion model, added tests, updated supported_models.md Signed-off-by: Bartosz Stefaniak <bstefaniak@nvidia.com>
Signed-off-by: Bartosz Stefaniak <bstefaniak@nvidia.com>
| if is_registered_trtllm_model_path(model_path): | ||
| logger.info( | ||
| "Diffusers layout detected, but the checkpoint also advertises a " | ||
| "registered TRT-LLM architecture; treating it as a language model." | ||
| ) | ||
| return False |
There was a problem hiding this comment.
this would return False for Cosmos3 checkpoints so wouldn't it always serve the VLM model and never the diffusion model?
There was a problem hiding this comment.
Yes, it would return False for Cosmos3 checkpoints. We check the condition here.
Previously we would load diffusion model if we provided visual_gen_args or checkpoint contained model_index.json which contained _diffusers_version which was fine until we had a checkpoint that supports both diffusion and VLM. That way it was impossible to load VLM using trtllm-serve. Now if we load checkpoint that contains both model_index.json and config.json (that contains architectures entry registered in TRT-LLM) by default we will load VLM/LLM, to load diffusion model we need to provide visual_gen_args.
Pure diffusion-only checkpoints (only model_index.json, no registered TRT-LLM architecture) still auto-route to VisualGen as before.
So to load:
- VLM:
trtllm-serve nvidia/Cosmos3-Nano - Diffusion:
trtllm-serve nvidia/Cosmos3-Nano --visual_gen_args=examples/visual_gen/configs/cosmos3-nano-1gpu.yaml
There was a problem hiding this comment.
yeah I'm not sure if this is preferred since we want users to run diffusion models without providing visual gen args as well. We might need to find a different way
There was a problem hiding this comment.
@NVShreyas
Do you see another way to do it? In vllm-omni we pass an --omni flag to vllm serve to enable Diffusion model - it's always required for diffusion models, so we'd need to pass it anyway even if the Reasoner-only was not added, but it lets us distinguish the Reasoner and Generator models. Is there anything in TRT-LLM that we can do?
There was a problem hiding this comment.
yeah there's no arg like that right now. We may have to add something like that. Since it would only affect cosmos3, I think it is okay for now but I'm waiting to get some feedback from the team to see how we can improve it.
There was a problem hiding this comment.
Thanks @NVShreyas for looping us! Can we add --visual_gen in trtllm-serve?
We can docstring like this "Enable VisualGen runtime for model checkpoints that support both LLM and Visual Generation. Not required if --visual_gen_args specified or the model supports Visual Generation only."
There was a problem hiding this comment.
I added this flag, also added example in docs
Signed-off-by: Bartosz Stefaniak <bstefaniak@nvidia.com>
…fusion without visual_gen_args for checkpoint that supports both diffusion and llm Signed-off-by: Bartosz Stefaniak <bstefaniak@nvidia.com>
…asoner-onboard Signed-off-by: Bartosz Stefaniak <bstefaniak@nvidia.com>
Signed-off-by: bastefaniak <bstefaniak@nvidia.com>
|
@2ez4bz |
New Features
trtllm-servewith checkpoint that supports both diffusion and LLM/VLM only, LLM/VLM it is preferred, to enable diffusion we need to passvisual_gen_args/extra_visual_gen_optionsDocumentation
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either
api-compatibleorapi-breaking. Forapi-breaking, includeBREAKINGin the PR title.Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.Summary by CodeRabbit
Release Notes
New Features
Documentation
Tests