Skip to content

Commit 64b5c79

Browse files
authored
[None][doc] Add visual generation models to supported models page (#12464)
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
1 parent 7ee9e8b commit 64b5c79

3 files changed

Lines changed: 34 additions & 8 deletions

File tree

docs/source/commands/trtllm-serve/trtllm-serve.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -218,7 +218,7 @@ Visual Generation Serving
218218
``trtllm-serve`` supports diffusion-based visual generation models (FLUX.1, FLUX.2, Wan2.1, Wan2.2) for image and video generation. When a diffusion model directory is provided (detected by the presence of ``model_index.json``), the server automatically launches in visual generation mode with dedicated endpoints.
219219
220220
.. note::
221-
VisualGen is in **prototype** stage. APIs, supported models, and optimization options are actively evolving and may change in future releases.
221+
VisualGen is in **beta** stage. APIs, supported models, and optimization options are actively evolving and may change in future releases.
222222
223223
.. code-block:: bash
224224

docs/source/models/supported-models.md

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,4 +87,31 @@ Note:
8787

8888
# Visual Generation Models
8989

90-
For diffusion-based image and video generation models, see the [Visual Generation](./visual-generation.md) documentation.
90+
TensorRT-LLM provides beta support for diffusion-based image and video generation.
91+
For full documentation, see the [Visual Generation](./visual-generation.md) page.
92+
93+
## Supported Models
94+
95+
| HuggingFace Model ID | Tasks |
96+
|---|---|
97+
| `black-forest-labs/FLUX.1-dev` | Text-to-Image |
98+
| `black-forest-labs/FLUX.2-dev` | Text-to-Image |
99+
| `Wan-AI/Wan2.1-T2V-1.3B-Diffusers` | Text-to-Video |
100+
| `Wan-AI/Wan2.1-T2V-14B-Diffusers` | Text-to-Video |
101+
| `Wan-AI/Wan2.1-I2V-14B-480P-Diffusers` | Image-to-Video |
102+
| `Wan-AI/Wan2.1-I2V-14B-720P-Diffusers` | Image-to-Video |
103+
| `Wan-AI/Wan2.2-T2V-A14B-Diffusers` | Text-to-Video |
104+
| `Wan-AI/Wan2.2-I2V-A14B-Diffusers` | Image-to-Video |
105+
| `Lightricks/LTX-2` | Text-to-Video (with Audio), Image-to-Video (with Audio) |
106+
107+
## Feature Matrix
108+
109+
| Model | TeaCache | CFG Parallelism | Ulysses Parallelism | Parallel VAE | CUDA Graph | torch.compile | trtllm-serve |
110+
|---|---|---|---|---|---|---|---|
111+
| **FLUX.1** | Yes | No [^vg1] | Yes | No | Yes | Yes | Yes |
112+
| **FLUX.2** | Yes | No [^vg1] | Yes | No | Yes | Yes | Yes |
113+
| **Wan 2.1** | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
114+
| **Wan 2.2** | No | Yes | Yes | Yes | Yes | Yes | Yes |
115+
| **LTX-2** | No | Yes | Yes | No | No | Yes | Yes |
116+
117+
[^vg1]: FLUX models use embedded guidance and do not have a separate negative prompt path, so CFG parallelism is not applicable.

docs/source/models/visual-generation.md

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
1-
# Visual Generation (Prototype)
1+
# Visual Generation (Beta)
22

33
```{note}
4-
This feature is in **prototype** stage. APIs, supported models, and optimization options are
4+
This feature is in **beta** stage. APIs, supported models, and optimization options are
55
actively evolving and may change in future releases.
66
```
77

@@ -30,7 +30,7 @@ TensorRT-LLM **VisualGen** provides a unified inference stack for diffusion mode
3030
| `Wan-AI/Wan2.1-I2V-14B-720P-Diffusers` | Image-to-Video |
3131
| `Wan-AI/Wan2.2-T2V-A14B-Diffusers` | Text-to-Video |
3232
| `Wan-AI/Wan2.2-I2V-A14B-Diffusers` | Image-to-Video |
33-
| `Lightricks/LTX-Video` | Text-to-Video (with Audio), Image-to-Video (with Audio) |
33+
| `Lightricks/LTX-2` | Text-to-Video (with Audio), Image-to-Video (with Audio) |
3434

3535
Models are auto-detected from the checkpoint directory. Diffusers-format models are detected via `model_index.json`; LTX-2 monolithic safetensors checkpoints are detected via embedded metadata. The `AutoPipeline` registry selects the appropriate pipeline class automatically.
3636

@@ -50,9 +50,8 @@ Models are auto-detected from the checkpoint directory. Diffusers-format models
5050

5151
Here is a simple example to generate a video with Wan 2.1:
5252

53-
```{literalinclude} ../../../examples/visual_gen/quickstart_example.py
54-
:language: python
55-
:linenos:
53+
```bash
54+
python examples/visual_gen/quickstart_example.py
5655
```
5756

5857
To learn more about VisualGen, see [`examples/visual_gen/`](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/visual_gen) for more examples including text-to-image, image-to-video, and batch generation.

0 commit comments

Comments
 (0)