[None][feat] Cosmos3 reasoner only support by bastefaniak · Pull Request #15117 · NVIDIA/TensorRT-LLM

bastefaniak · 2026-06-08T17:07:46Z

New Features
- Added support for NVIDIA Cosmos3-Nano and Cosmos3-Super, reasoner only (VLM) models. Their architecture is the same as Qwen3VL and weights are loaded from unified Generator + Reasoner checkpoint.
- When we use trtllm-serve with checkpoint that supports both diffusion and LLM/VLM only, LLM/VLM it is preferred, to enable diffusion we need to pass visual_gen_args/extra_visual_gen_options
Documentation
- Updated supported models

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Summary by CodeRabbit

Release Notes

New Features
- Added support for Cosmos3 Omni multimodal vision-language model with PyTorch backend
- Integrated Cosmos3-Nano and Cosmos3-Super model variants
Documentation
- Updated supported models list to include Cosmos3ForConditionalGeneration
Tests
- Added comprehensive integration and unit tests for Cosmos3 model functionality

Add Cosmos3ForConditionalGeneration as a Qwen3-VL derivative with unified checkpoint layout (transformer/ + vision_encoder/), weight mapper, config registration, modeling tests, and supported-models entry. Signed-off-by: Bartosz Stefaniak <bstefaniak@nvidia.com>

Signed-off-by: Bartosz Stefaniak <bstefaniak@nvidia.com>

coderabbitai · 2026-06-08T17:20:47Z

📝 Walkthrough

Walkthrough

This PR adds comprehensive Cosmos3 omni multimodal model support to TensorRT-LLM. It introduces Cosmos3OmniConfig for checkpoint root resolution, Cosmos3HfWeightMapper for HuggingFace weight transformation, and Cosmos3OmniModel extending Qwen3VLModel with unified checkpoint path handling and vision-tower weight loading.

Changes

Cosmos3 Omni Model Support

Layer / File(s)	Summary
Config definition and initialization `tensorrt_llm/_torch/configs/cosmos3_omni.py`	`Cosmos3OmniConfig` class defines model metadata, nested vision/text configs, Cosmos3-specific token IDs, and overrides `from_dict`/`from_pretrained` to propagate checkpoint root `_name_or_path` for downstream resolution.
Config module registration and wiring `tensorrt_llm/_torch/configs/__init__.py`, `tensorrt_llm/_torch/pyexecutor/config_utils.py`	Register `Cosmos3OmniConfig` in custom config mappings, module exports, and executor config registry; add conditional `qwen3_vl_vision` alias registration.
HuggingFace weight mapper for checkpoint transformation `tensorrt_llm/_torch/models/checkpoints/hf/cosmos3_weight_mapper.py`, `tensorrt_llm/_torch/models/checkpoints/__init__.py`	`Cosmos3HfWeightMapper` extends `Qwen3VLHfWeightMapper` to drop diffusion/audio/action parameters and remap unified Cosmos3 checkpoint prefixes and Diffusers attention projections into nested Qwen3-VL structure.
Cosmos3OmniModel implementation and Qwen3VL integration `tensorrt_llm/_torch/models/modeling_cosmos3_omni.py`, `tensorrt_llm/_torch/models/__init__.py`, `tensorrt_llm/_torch/models/modeling_qwen3vl.py`	`_get_cosmos3_model_paths` resolves unified checkpoint root from config `_name_or_path` with HuggingFace validation; `Cosmos3OmniModel` extends `Qwen3VLModel` with path storage, `llm_checkpoint_dir` property, and vision-encoder weight loading. Treat `Cosmos3ForConditionalGeneration` equivalently to `Qwen3VLForConditionalGeneration` in base class LLM architecture selection.
Unit tests and config registration tests `tests/unittest/_torch/modeling/test_modeling_cosmos3_omni.py`, `tests/unittest/_torch/test_custom_config_registration.py`	`TestCosmos3Omni` extends `TestQwen3VL` with weight mapper integration, scenario setup, quant_config preservation assertions, and finite-logits validation. Generalize AutoConfig registration test coverage to parameterized `cosmos3_omni` with nested config structure verification and `_name_or_path` assertions.
Test data setup and integration tests `tests/test_common/llm_data.py`, `tests/integration/test_lists/test-db/l0_l40s.yml`	Add `nvidia/Cosmos3-Nano` and `nvidia/Cosmos3-Super` to HF snapshot directory mapping for local model resolution; register `TestCosmos3Omni::test_all` in pre-merge test suite.
CODEOWNERS and model support documentation `.github/CODEOWNERS`, `docs/source/models/supported-models.md`	Add CODEOWNERS entries for `modeling_cosmos3_omni.py` and `cosmos3_weight_mapper.py`; document `Cosmos3ForConditionalGeneration` in supported models matrix as untested.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

api-compatible

Suggested reviewers

kaiyux
chang-l
marinayanov

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 10.34% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	⚠️ Warning	The PR description is incomplete and does not follow the required template structure with full sections.	Add a proper 'Description' section explaining the changes, provide detailed 'Test Coverage' section listing all test cases, and complete all PR Checklist items with clear status indicators.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title is clear and specific about the main feature being added: support for Cosmos3 reasoner-only models.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tensorrt_llm/_torch/models/modeling_cosmos3_omni.py`:
- Around line 82-85: If omni_config is None, the instance attributes
self._checkpoint_root, self.llm_path and self._vision_encoder_path are never set
and later access (e.g. in llm_checkpoint_dir or load_weights) raises
AttributeError; in __init__ (where omni_config is handled and
_get_cosmos3_model_paths is called) add a fail-fast check: if omni_config is
None raise a clear ValueError (or TypeError) describing that
omni_config/pretrained_config is required, so __init__ either sets valid paths
via _get_cosmos3_model_paths or aborts; reference symbols: __init__,
omni_config, _get_cosmos3_model_paths, self._checkpoint_root, self.llm_path,
self._vision_encoder_path, llm_checkpoint_dir, load_weights.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ce3203c4-1b97-46eb-8b7e-ba4fb224bde3

📥 Commits

Reviewing files that changed from the base of the PR and between 8036cde and 6c91cab.

📒 Files selected for processing (14)

.github/CODEOWNERS
docs/source/models/supported-models.md
tensorrt_llm/_torch/configs/__init__.py
tensorrt_llm/_torch/configs/cosmos3_omni.py
tensorrt_llm/_torch/models/__init__.py
tensorrt_llm/_torch/models/checkpoints/__init__.py
tensorrt_llm/_torch/models/checkpoints/hf/cosmos3_weight_mapper.py
tensorrt_llm/_torch/models/modeling_cosmos3_omni.py
tensorrt_llm/_torch/models/modeling_qwen3vl.py
tensorrt_llm/_torch/pyexecutor/config_utils.py
tests/integration/test_lists/test-db/l0_l40s.yml
tests/test_common/llm_data.py
tests/unittest/_torch/modeling/test_modeling_cosmos3_omni.py
tests/unittest/_torch/test_custom_config_registration.py

…sion model, added tests, updated supported_models.md Signed-off-by: Bartosz Stefaniak <bstefaniak@nvidia.com>

Signed-off-by: Bartosz Stefaniak <bstefaniak@nvidia.com>

NVShreyas · 2026-06-09T19:42:30Z

+    if is_registered_trtllm_model_path(model_path):
+        logger.info(
+            "Diffusers layout detected, but the checkpoint also advertises a "
+            "registered TRT-LLM architecture; treating it as a language model."
+        )
+        return False


this would return False for Cosmos3 checkpoints so wouldn't it always serve the VLM model and never the diffusion model?

Yes, it would return False for Cosmos3 checkpoints. We check the condition here.

Previously we would load diffusion model if we provided visual_gen_args or checkpoint contained model_index.json which contained _diffusers_version which was fine until we had a checkpoint that supports both diffusion and VLM. That way it was impossible to load VLM using trtllm-serve. Now if we load checkpoint that contains both model_index.json and config.json (that contains architectures entry registered in TRT-LLM) by default we will load VLM/LLM, to load diffusion model we need to provide visual_gen_args.

Pure diffusion-only checkpoints (only model_index.json, no registered TRT-LLM architecture) still auto-route to VisualGen as before.

So to load:

VLM: trtllm-serve nvidia/Cosmos3-Nano

Diffusion: trtllm-serve nvidia/Cosmos3-Nano --visual_gen_args=examples/visual_gen/configs/cosmos3-nano-1gpu.yaml

yeah I'm not sure if this is preferred since we want users to run diffusion models without providing visual gen args as well. We might need to find a different way

@zhenhuaw-me @chang-l

@MaciejBalaNV

@NVShreyas
Do you see another way to do it? In vllm-omni we pass an --omni flag to vllm serve to enable Diffusion model - it's always required for diffusion models, so we'd need to pass it anyway even if the Reasoner-only was not added, but it lets us distinguish the Reasoner and Generator models. Is there anything in TRT-LLM that we can do?

yeah there's no arg like that right now. We may have to add something like that. Since it would only affect cosmos3, I think it is okay for now but I'm waiting to get some feedback from the team to see how we can improve it.

Thanks @NVShreyas for looping us! Can we add --visual_gen in trtllm-serve?

We can docstring like this "Enable VisualGen runtime for model checkpoints that support both LLM and Visual Generation. Not required if --visual_gen_args specified or the model supports Visual Generation only."

I added this flag, also added example in docs

Signed-off-by: Bartosz Stefaniak <bstefaniak@nvidia.com>

…fusion without visual_gen_args for checkpoint that supports both diffusion and llm Signed-off-by: Bartosz Stefaniak <bstefaniak@nvidia.com>

…asoner-onboard Signed-off-by: Bartosz Stefaniak <bstefaniak@nvidia.com>

Signed-off-by: bastefaniak <bstefaniak@nvidia.com>

MaciejBalaNV · 2026-06-23T13:09:09Z

@2ez4bz
Could you please take a look at the updated version with separate flag?

bastefaniak added 2 commits June 8, 2026 16:45

Lint

6c91cab

Signed-off-by: Bartosz Stefaniak <bstefaniak@nvidia.com>

bastefaniak requested review from a team as code owners June 8, 2026 17:07

bastefaniak requested review from Wanli-Jiang, byshiue, dpitman-nvda, joyang-nv, laikhtewari, symphonylyh, venkywonka, yechank-nvidia and yiqingy0 June 8, 2026 17:07

github-actions Bot assigned bastefaniak Jun 8, 2026

coderabbitai Bot reviewed Jun 8, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/models/modeling_cosmos3_omni.py Outdated

Updated trtllm-serve to prefer LLM/VLM when it is present along diffu…

1ed327c

…sion model, added tests, updated supported_models.md Signed-off-by: Bartosz Stefaniak <bstefaniak@nvidia.com>

bastefaniak requested a review from a team as a code owner June 9, 2026 13:17

MaciejBalaNV reviewed Jun 9, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/configs/__init__.py

Comment thread tensorrt_llm/_torch/configs/__init__.py Outdated

Comment thread tensorrt_llm/_torch/configs/__init__.py Outdated

Comment thread tensorrt_llm/_torch/models/__init__.py Outdated

Rename Cosmos3Omni to just Cosmos3

08a1e42

Signed-off-by: Bartosz Stefaniak <bstefaniak@nvidia.com>

NVShreyas reviewed Jun 9, 2026

View reviewed changes

2ez4bz approved these changes Jun 10, 2026

View reviewed changes

bastefaniak added 2 commits June 10, 2026 11:38

PR fixes

8783f6a

Signed-off-by: Bartosz Stefaniak <bstefaniak@nvidia.com>

Registered input processor for cosmos3_omni as backwards compatibility

16bd6d5

Signed-off-by: Bartosz Stefaniak <bstefaniak@nvidia.com>

bastefaniak added 3 commits June 17, 2026 17:32

Add enable_visual_gen flag to trtllm-serve in case we want to run dif…

b338fc2

…fusion without visual_gen_args for checkpoint that supports both diffusion and llm Signed-off-by: Bartosz Stefaniak <bstefaniak@nvidia.com>

Merge remote-tracking branch 'origin/main' into bstefaniak/cosmos3-re…

439bdaf

…asoner-onboard Signed-off-by: Bartosz Stefaniak <bstefaniak@nvidia.com>

Merge branch 'main' into bstefaniak/cosmos3-reasoner-onboard

997c829

Signed-off-by: bastefaniak <bstefaniak@nvidia.com>

2ez4bz approved these changes Jun 23, 2026

View reviewed changes

Conversation

bastefaniak commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

PR Checklist

GitHub Bot Help

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MaciejBalaNV commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

bastefaniak commented Jun 8, 2026 •

edited

Loading

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading