Skip to content

[None][feat] Cosmos3 reasoner only support#15117

Open
bastefaniak wants to merge 9 commits into
NVIDIA:mainfrom
bastefaniak:bstefaniak/cosmos3-reasoner-onboard
Open

[None][feat] Cosmos3 reasoner only support#15117
bastefaniak wants to merge 9 commits into
NVIDIA:mainfrom
bastefaniak:bstefaniak/cosmos3-reasoner-onboard

Conversation

@bastefaniak

@bastefaniak bastefaniak commented Jun 8, 2026

Copy link
Copy Markdown
  • New Features

    • Added support for NVIDIA Cosmos3-Nano and Cosmos3-Super, reasoner only (VLM) models. Their architecture is the same as Qwen3VL and weights are loaded from unified Generator + Reasoner checkpoint.
    • When we use trtllm-serve with checkpoint that supports both diffusion and LLM/VLM only, LLM/VLM it is preferred, to enable diffusion we need to pass visual_gen_args/extra_visual_gen_options
  • Documentation

    • Updated supported models

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for Cosmos3 Omni multimodal vision-language model with PyTorch backend
    • Integrated Cosmos3-Nano and Cosmos3-Super model variants
  • Documentation

    • Updated supported models list to include Cosmos3ForConditionalGeneration
  • Tests

    • Added comprehensive integration and unit tests for Cosmos3 model functionality

Add Cosmos3ForConditionalGeneration as a Qwen3-VL derivative with
unified checkpoint layout (transformer/ + vision_encoder/), weight mapper,
config registration, modeling tests, and supported-models entry.

Signed-off-by: Bartosz Stefaniak <bstefaniak@nvidia.com>
Signed-off-by: Bartosz Stefaniak <bstefaniak@nvidia.com>
@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds comprehensive Cosmos3 omni multimodal model support to TensorRT-LLM. It introduces Cosmos3OmniConfig for checkpoint root resolution, Cosmos3HfWeightMapper for HuggingFace weight transformation, and Cosmos3OmniModel extending Qwen3VLModel with unified checkpoint path handling and vision-tower weight loading.

Changes

Cosmos3 Omni Model Support

Layer / File(s) Summary
Config definition and initialization
tensorrt_llm/_torch/configs/cosmos3_omni.py
Cosmos3OmniConfig class defines model metadata, nested vision/text configs, Cosmos3-specific token IDs, and overrides from_dict/from_pretrained to propagate checkpoint root _name_or_path for downstream resolution.
Config module registration and wiring
tensorrt_llm/_torch/configs/__init__.py, tensorrt_llm/_torch/pyexecutor/config_utils.py
Register Cosmos3OmniConfig in custom config mappings, module exports, and executor config registry; add conditional qwen3_vl_vision alias registration.
HuggingFace weight mapper for checkpoint transformation
tensorrt_llm/_torch/models/checkpoints/hf/cosmos3_weight_mapper.py, tensorrt_llm/_torch/models/checkpoints/__init__.py
Cosmos3HfWeightMapper extends Qwen3VLHfWeightMapper to drop diffusion/audio/action parameters and remap unified Cosmos3 checkpoint prefixes and Diffusers attention projections into nested Qwen3-VL structure.
Cosmos3OmniModel implementation and Qwen3VL integration
tensorrt_llm/_torch/models/modeling_cosmos3_omni.py, tensorrt_llm/_torch/models/__init__.py, tensorrt_llm/_torch/models/modeling_qwen3vl.py
_get_cosmos3_model_paths resolves unified checkpoint root from config _name_or_path with HuggingFace validation; Cosmos3OmniModel extends Qwen3VLModel with path storage, llm_checkpoint_dir property, and vision-encoder weight loading. Treat Cosmos3ForConditionalGeneration equivalently to Qwen3VLForConditionalGeneration in base class LLM architecture selection.
Unit tests and config registration tests
tests/unittest/_torch/modeling/test_modeling_cosmos3_omni.py, tests/unittest/_torch/test_custom_config_registration.py
TestCosmos3Omni extends TestQwen3VL with weight mapper integration, scenario setup, quant_config preservation assertions, and finite-logits validation. Generalize AutoConfig registration test coverage to parameterized cosmos3_omni with nested config structure verification and _name_or_path assertions.
Test data setup and integration tests
tests/test_common/llm_data.py, tests/integration/test_lists/test-db/l0_l40s.yml
Add nvidia/Cosmos3-Nano and nvidia/Cosmos3-Super to HF snapshot directory mapping for local model resolution; register TestCosmos3Omni::test_all in pre-merge test suite.
CODEOWNERS and model support documentation
.github/CODEOWNERS, docs/source/models/supported-models.md
Add CODEOWNERS entries for modeling_cosmos3_omni.py and cosmos3_weight_mapper.py; document Cosmos3ForConditionalGeneration in supported models matrix as untested.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

api-compatible

Suggested reviewers

  • kaiyux
  • chang-l
  • marinayanov
🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.34% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning The PR description is incomplete and does not follow the required template structure with full sections. Add a proper 'Description' section explaining the changes, provide detailed 'Test Coverage' section listing all test cases, and complete all PR Checklist items with clear status indicators.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The PR title is clear and specific about the main feature being added: support for Cosmos3 reasoner-only models.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tensorrt_llm/_torch/models/modeling_cosmos3_omni.py`:
- Around line 82-85: If omni_config is None, the instance attributes
self._checkpoint_root, self.llm_path and self._vision_encoder_path are never set
and later access (e.g. in llm_checkpoint_dir or load_weights) raises
AttributeError; in __init__ (where omni_config is handled and
_get_cosmos3_model_paths is called) add a fail-fast check: if omni_config is
None raise a clear ValueError (or TypeError) describing that
omni_config/pretrained_config is required, so __init__ either sets valid paths
via _get_cosmos3_model_paths or aborts; reference symbols: __init__,
omni_config, _get_cosmos3_model_paths, self._checkpoint_root, self.llm_path,
self._vision_encoder_path, llm_checkpoint_dir, load_weights.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ce3203c4-1b97-46eb-8b7e-ba4fb224bde3

📥 Commits

Reviewing files that changed from the base of the PR and between 8036cde and 6c91cab.

📒 Files selected for processing (14)
  • .github/CODEOWNERS
  • docs/source/models/supported-models.md
  • tensorrt_llm/_torch/configs/__init__.py
  • tensorrt_llm/_torch/configs/cosmos3_omni.py
  • tensorrt_llm/_torch/models/__init__.py
  • tensorrt_llm/_torch/models/checkpoints/__init__.py
  • tensorrt_llm/_torch/models/checkpoints/hf/cosmos3_weight_mapper.py
  • tensorrt_llm/_torch/models/modeling_cosmos3_omni.py
  • tensorrt_llm/_torch/models/modeling_qwen3vl.py
  • tensorrt_llm/_torch/pyexecutor/config_utils.py
  • tests/integration/test_lists/test-db/l0_l40s.yml
  • tests/test_common/llm_data.py
  • tests/unittest/_torch/modeling/test_modeling_cosmos3_omni.py
  • tests/unittest/_torch/test_custom_config_registration.py

Comment thread tensorrt_llm/_torch/models/modeling_cosmos3_omni.py Outdated
…sion model, added tests, updated supported_models.md

Signed-off-by: Bartosz Stefaniak <bstefaniak@nvidia.com>
@bastefaniak bastefaniak requested a review from a team as a code owner June 9, 2026 13:17
Comment thread tensorrt_llm/_torch/configs/__init__.py
Comment thread tensorrt_llm/_torch/configs/__init__.py Outdated
Comment thread tensorrt_llm/_torch/configs/__init__.py Outdated
Comment thread tensorrt_llm/_torch/models/__init__.py Outdated
Signed-off-by: Bartosz Stefaniak <bstefaniak@nvidia.com>
Comment on lines +115 to +120
if is_registered_trtllm_model_path(model_path):
logger.info(
"Diffusers layout detected, but the checkpoint also advertises a "
"registered TRT-LLM architecture; treating it as a language model."
)
return False

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would return False for Cosmos3 checkpoints so wouldn't it always serve the VLM model and never the diffusion model?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it would return False for Cosmos3 checkpoints. We check the condition here.

Previously we would load diffusion model if we provided visual_gen_args or checkpoint contained model_index.json which contained _diffusers_version which was fine until we had a checkpoint that supports both diffusion and VLM. That way it was impossible to load VLM using trtllm-serve. Now if we load checkpoint that contains both model_index.json and config.json (that contains architectures entry registered in TRT-LLM) by default we will load VLM/LLM, to load diffusion model we need to provide visual_gen_args.

Pure diffusion-only checkpoints (only model_index.json, no registered TRT-LLM architecture) still auto-route to VisualGen as before.

So to load:

  • VLM: trtllm-serve nvidia/Cosmos3-Nano
  • Diffusion: trtllm-serve nvidia/Cosmos3-Nano --visual_gen_args=examples/visual_gen/configs/cosmos3-nano-1gpu.yaml

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I'm not sure if this is preferred since we want users to run diffusion models without providing visual gen args as well. We might need to find a different way

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NVShreyas
Do you see another way to do it? In vllm-omni we pass an --omni flag to vllm serve to enable Diffusion model - it's always required for diffusion models, so we'd need to pass it anyway even if the Reasoner-only was not added, but it lets us distinguish the Reasoner and Generator models. Is there anything in TRT-LLM that we can do?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah there's no arg like that right now. We may have to add something like that. Since it would only affect cosmos3, I think it is okay for now but I'm waiting to get some feedback from the team to see how we can improve it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @NVShreyas for looping us! Can we add --visual_gen in trtllm-serve?

We can docstring like this "Enable VisualGen runtime for model checkpoints that support both LLM and Visual Generation. Not required if --visual_gen_args specified or the model supports Visual Generation only."

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this flag, also added example in docs

Comment thread tensorrt_llm/_torch/configs/__init__.py
Comment thread tensorrt_llm/_torch/models/modeling_cosmos3.py Outdated
Comment thread tensorrt_llm/_torch/models/modeling_cosmos3.py Outdated
Comment thread tensorrt_llm/_torch/models/modeling_cosmos3.py Outdated
Comment thread tensorrt_llm/commands/utils.py Outdated
Comment thread tests/unittest/_torch/modeling/test_modeling_cosmos3.py Outdated
Comment thread tests/unittest/_torch/modeling/test_modeling_cosmos3.py
Comment thread tests/unittest/_torch/modeling/test_modeling_cosmos3.py Outdated
Comment thread tests/unittest/_torch/modeling/test_modeling_cosmos3.py Outdated
Comment thread tests/unittest/_torch/test_custom_config_registration.py Outdated
Signed-off-by: Bartosz Stefaniak <bstefaniak@nvidia.com>
Signed-off-by: Bartosz Stefaniak <bstefaniak@nvidia.com>
…fusion without visual_gen_args for checkpoint that supports both diffusion and llm

Signed-off-by: Bartosz Stefaniak <bstefaniak@nvidia.com>
…asoner-onboard

Signed-off-by: Bartosz Stefaniak <bstefaniak@nvidia.com>
Signed-off-by: bastefaniak <bstefaniak@nvidia.com>
@MaciejBalaNV

Copy link
Copy Markdown

@2ez4bz
Could you please take a look at the updated version with separate flag?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants