NVIDIA
diff --git a/‎examples/visual_gen/configs/cosmos3-nano-1gpu.yaml‎
Lines changed: 2 additions & 8 deletions b/‎examples/visual_gen/configs/cosmos3-nano-1gpu.yaml‎
Lines changed: 2 additions & 8 deletions
diff --git a/‎examples/visual_gen/configs/cosmos3-super-4gpu.yaml‎
Lines changed: 1 addition & 5 deletions b/‎examples/visual_gen/configs/cosmos3-super-4gpu.yaml‎
Lines changed: 1 addition & 5 deletions
diff --git a/‎examples/visual_gen/models/cosmos3/README.md‎
Lines changed: 77 additions & 0 deletions b/‎examples/visual_gen/models/cosmos3/README.md‎
Lines changed: 77 additions & 0 deletions
diff --git a/‎examples/visual_gen/models/cosmos3/cosmos3.py‎
Lines changed: 207 additions & 0 deletions b/‎examples/visual_gen/models/cosmos3/cosmos3.py‎
Lines changed: 207 additions & 0 deletions
@@ -13,20 +13,14 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-# 1-GPU Cosmos3 (Nano / Super) with FP8 dynamic quantization.
+# 1-GPU Cosmos3 (Nano / Super).
 # Model: nvidia/Cosmos3-Nano or nvidia/Cosmos3-Super
 # Shared by offline examples (--visual_gen_args) and trtllm-serve.
 #
 # Cosmos3 constraints: VANILLA attention only;
-# no Attention2D / Ring. Use CFG + Ulysses for multi-GPU (see cosmos3-super-4gpu.yaml).
-quant_config:
-  quant_algo: FP8
-  dynamic: true
-  ignore: ["language_model.*", "vae2llm", "llm2vae", "time_embedder.*"]
+# Use CFG + Ulysses for multi-GPU (see cosmos3-super-4gpu.yaml).
 attention_config:
   backend: VANILLA
 parallel_config:
   cfg_size: 1
   ulysses_size: 1
-cuda_graph_config:
-  enable: false
@@ -13,16 +13,12 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-# 4-GPU Cosmos3-Super with FP8 dynamic quantization (CFG + Ulysses + parallel VAE).
+# 4-GPU Cosmos3-Super with (CFG + Ulysses + parallel VAE).
 # Launch with 4 processes, e.g. torchrun --nproc_per_node=4 ...
 # Model: nvidia/Cosmos3-Super
 # Shared by offline examples (--visual_gen_args) and trtllm-serve.
 #
 # GPU layout: cfg_size=2 (positive | negative) x ulysses_size=2 (sequence split).
-quant_config:
-  quant_algo: FP8
-  dynamic: true
-  ignore: ["language_model.*", "vae2llm", "llm2vae", "time_embedder.*"]
 attention_config:
   backend: VANILLA
 parallel_config:
 
@@ -0,0 +1,77 @@
+# Cosmos3 Text(+Image)-to-Video(+Audio) generation
+
+Cosmos3 supports four generation modes from a single checkpoint:
+
+- **T2V** — text-to-video (`prompts/t2v.json`).
+- **T2I** — text-to-image (`prompts/t2i.json`); emits a still frame (use `--output_type image` / a non-video `--output_path`).
+- **I2V / TI2V** — image-conditioned video (`prompts/i2v.json`). Condition on a reference frame via the prompt file's `vision_path` or `--image_path`. The image may be a local path, a `file://` / `http(s)://` URL, or a `data:` URI.
+- **T2AV** — text-to-video with synchronized audio (`prompts/t2av.json` with `enable_audio: true`, or pass `--enable_audio`). Combine with a `vision_path` for image-conditioned audio-video (TI2AV).
+
+## Checkpoints
+
+Pass the Hub ID or local path via `--model`:
+
+- [`nvidia/Cosmos3-Nano`](https://huggingface.co/nvidia/Cosmos3-Nano)
+- [`nvidia/Cosmos3-Super`](https://huggingface.co/nvidia/Cosmos3-Super)
+
+## Guardrails
+
+Guardrails are enabled by default (required by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license)). Install and authenticate as follows:
+
+```bash
+pip install cosmos_guardrail==0.3.0 && pip uninstall opencv-python
+```
+
+Accept the terms for the guardrail checkpoint at https://huggingface.co/nvidia/Cosmos-1.0-Guardrail and set a valid `HF_TOKEN` (the checkpoint is downloaded automatically on first run).
+
+To run without guardrails (you are responsible for safe deployment):
+
+```bash
+export TRTLLM_DISABLE_COSMOS3_GUARDRAILS=1
+```
+
+## Deployment configs
+
+See `examples/visual_gen/configs/`:
+
+- `cosmos3-nano-1gpu.yaml` — 1 GPU
+- `cosmos3-super-4gpu.yaml` — 4 GPU, CFG + Ulysses + parallel VAE
+
+Example prompts live under `prompts/` (mirroring `cosmos3-internal/inputs/omni`).
+
+## Usage
+
+```bash
+# T2V: text-to-video
+python cosmos3.py --model nvidia/Cosmos3-Nano \
+    --prompt_file prompts/t2v.json \
+    --visual_gen_args ../configs/cosmos3-nano-1gpu.yaml
+
+# I2V/TI2V: image-conditioned video (vision_path is read from the prompt file;
+# local path, file://, http(s):// URL, or data: URI are all accepted)
+python cosmos3.py --model nvidia/Cosmos3-Nano \
+    --prompt_file prompts/i2v.json \
+    --visual_gen_args ../configs/cosmos3-nano-1gpu.yaml
+
+# I2V with an explicit conditioning image (overrides the prompt file)
+python cosmos3.py --model nvidia/Cosmos3-Nano \
+    --prompt_file prompts/i2v.json \
+    --image_path https://example.com/frame.jpg \
+    --visual_gen_args ../configs/cosmos3-nano-1gpu.yaml
+
+# T2AV: text-to-video with synchronized audio
+python cosmos3.py --model nvidia/Cosmos3-Nano \
+    --prompt_file prompts/t2av.json \
+    --visual_gen_args ../configs/cosmos3-nano-1gpu.yaml
+
+# T2I: text-to-image
+python cosmos3.py --model nvidia/Cosmos3-Nano \
+    --prompt_file prompts/t2i.json \
+    --visual_gen_args ../configs/cosmos3-nano-1gpu.yaml \
+    --output_path output.png
+
+# Inline prompt (--prompt or a JSON file path)
+python cosmos3.py --model nvidia/Cosmos3-Nano \
+    --prompt "A cute puppy playing with a ball in a park" \
+    --visual_gen_args ../configs/cosmos3-nano-1gpu.yaml
+```
@@ -0,0 +1,207 @@
+#!/usr/bin/env python3
+# SPDX-FileCopyrightText: Copyright (c) 2022-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+import json
+import os
+from pathlib import Path
+from typing import Any, Dict, Optional
+
+from tensorrt_llm import VisualGen, VisualGenArgs
+
+_SCRIPT_DIR = Path(__file__).resolve().parent
+
+
+def _resolve_path(path: str) -> str:
+    candidate = Path(path)
+    if candidate.is_file():
+        return str(candidate.resolve())
+    relative_to_script = _SCRIPT_DIR / path
+    if relative_to_script.is_file():
+        return str(relative_to_script.resolve())
+    return path
+
+
+def load_prompt_file(path: str) -> Dict[str, Any]:
+    """Load a Cosmos3 omni prompt JSON (``prompt``, optional ``vision_path``, etc.)."""
+    resolved = _resolve_path(path)
+    with open(resolved, encoding="utf-8") as f:
+        data = json.load(f)
+    if not isinstance(data, dict):
+        raise ValueError(f"Prompt file must be a JSON object, got {type(data)!r}.")
+    if not data.get("prompt"):
+        raise ValueError(f"Prompt file {resolved!r} is missing a non-empty 'prompt' field.")
+    return data
+
+
+def resolve_prompt_and_options(
+    *,
+    prompt: Optional[str],
+    prompt_file: Optional[str],
+    image_path: Optional[str],
+    enable_audio: bool,
+    output_type: str,
+) -> tuple[str, Optional[str], bool, str]:
+    """Merge CLI args with optional prompt-file defaults."""
+    prompt_data: Dict[str, Any] = {}
+    if prompt_file is not None:
+        prompt_data = load_prompt_file(prompt_file)
+
+    resolved_prompt = prompt
+    if resolved_prompt is None:
+        resolved_prompt = prompt_data.get("prompt")
+    if not resolved_prompt:
+        raise ValueError("Provide --prompt or --prompt_file with a 'prompt' field.")
+
+    resolved_image = image_path
+    if resolved_image is None:
+        resolved_image = prompt_data.get("vision_path") or prompt_data.get("image_path")
+
+    resolved_enable_audio = enable_audio or bool(prompt_data.get("enable_audio", False))
+
+    resolved_output_type = output_type
+    model_mode = str(prompt_data.get("model_mode", "")).lower()
+    if model_mode == "text2image" and output_type == "video":
+        resolved_output_type = "image"
+
+    return resolved_prompt, resolved_image, resolved_enable_audio, resolved_output_type
+
+
+def main():
+    parser = argparse.ArgumentParser(description="Cosmos3 Text(+Image)-to-Video(+Audio) example")
+    parser.add_argument(
+        "--model",
+        type=str,
+        default="nvidia/Cosmos3-Nano",
+        help="Model path or HuggingFace Hub ID (nvidia/Cosmos3-Nano, nvidia/Cosmos3-Super)",
+    )
+    parser.add_argument(
+        "--visual_gen_args",
+        dest="visual_gen_args",
+        type=str,
+        default=None,
+        help="Path to YAML config (same as trtllm-serve --visual_gen_args)",
+    )
+    parser.add_argument(
+        "--prompt",
+        type=str,
+        default=None,
+        help="Text prompt for generation (overrides --prompt_file when both are set)",
+    )
+    parser.add_argument(
+        "--prompt_file",
+        type=str,
+        default="prompts/t2v.json",
+        help="Path to a JSON prompt file (default: prompts/t2v.json)",
+    )
+    parser.add_argument(
+        "--negative_prompt",
+        type=str,
+        default="cosmos3_negative_prompt.json",
+        help="Text prompt or path to JSON file for negative prompt",
+    )
+    parser.add_argument(
+        "--image_path",
+        type=str,
+        default=None,
+        help="Optional conditioning image path or URL for I2V/TI2V",
+    )
+    parser.add_argument(
+        "--output_path",
+        type=str,
+        default="cosmos3_output.mp4",
+        help="Path to save the output video",
+    )
+    parser.add_argument(
+        "--disable_duration_template",
+        action="store_true",
+        help="Disable duration metadata template (enabled by default, matching cosmos-framework CLI)",
+    )
+    parser.add_argument(
+        "--disable_resolution_template",
+        action="store_true",
+        help="Disable resolution metadata template (enabled by default, matching cosmos-framework CLI)",
+    )
+    parser.add_argument(
+        "--use_system_prompt", action="store_true", help="Use system prompt in prompt"
+    )
+    parser.add_argument("--enable_audio", action="store_true", help="Enable audio generation")
+    parser.add_argument(
+        "--output_type", type=str, default="video", help="Output type (video, image)"
+    )
+
+    # Guardrails
+    parser.add_argument(
+        "--disable_guardrails", action="store_true", help="NOT RECOMMENDED: Disable guardrails"
+    )
+    args = parser.parse_args()
+
+    prompt, image_path, enable_audio, output_type = resolve_prompt_and_options(
+        prompt=args.prompt,
+        prompt_file=args.prompt_file,
+        image_path=args.image_path,
+        enable_audio=args.enable_audio,
+        output_type=args.output_type,
+    )
+
+    # Engine config from shared YAML (optional); model-specific defaults apply otherwise.
+    extra_args = VisualGenArgs.from_yaml(args.visual_gen_args) if args.visual_gen_args else None
+    visual_gen = VisualGen(model=args.model, args=extra_args)
+
+    # --- Model-specific: T2V / TI2V request construction ---
+    # Query per-model defaults (resolution, steps, guidance, seed, etc.).
+    params = visual_gen.default_params
+    if image_path is not None:
+        params.image = image_path
+
+    negative_prompt_path = _resolve_path(args.negative_prompt)
+    if args.negative_prompt is not None:
+        if os.path.isfile(negative_prompt_path) and negative_prompt_path.endswith(".json"):
+            with open(negative_prompt_path, encoding="utf-8") as f:
+                negative_prompt = json.load(f)
+        else:
+            negative_prompt = args.negative_prompt
+    else:
+        negative_prompt = None
+
+    if args.disable_duration_template:
+        params.extra_params["use_duration_template"] = False
+    if args.disable_resolution_template:
+        params.extra_params["use_resolution_template"] = False
+    params.extra_params["use_system_prompt"] = args.use_system_prompt
+    params.extra_params["enable_audio"] = enable_audio
+    params.extra_params["use_guardrails"] = not args.disable_guardrails
+    params.extra_params["output_type"] = output_type
+
+    if negative_prompt is None:
+        params.negative_prompt = None
+    elif isinstance(negative_prompt, str):
+        params.negative_prompt = negative_prompt
+    else:
+        params.negative_prompt = json.dumps(negative_prompt)
+
+    output = visual_gen.generate(
+        inputs=prompt,
+        params=params,
+    )
+
+    output.save(args.output_path)
+    print(f"Saved: {args.output_path}")
+    print(output.metrics)
+
+
+if __name__ == "__main__":
+    main()