Skip to content

Commit 79f41b3

Browse files
pthombrethomasdhcclaude
committed
feat: Add diffusion finetuning CI pipeline for nightly runs (#1728)
* feat: Add diffusion pipelines for nightly runs Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Reduce ci runtime to 30 minutes Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * debug: Check if HF_TOKEN is set Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * test: revert test variables Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * feat: add HunyuanVideo nightly CI test and parameterize diffusion launcher Add HunyuanVideo-1.5 to the diffusion finetuning CI pipeline alongside Wan2.1. Parameterize the launcher script to derive model-specific settings (processor, generate config, model name, frame counts) from the recipe config name. Also fix a pre-existing T5 layer norm compatibility issue in finetune.py that affects Hunyuan training with incompatible apex builds. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * style: ruff format on modified files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * revert: remove patch_t5_layer_norm from finetune.py The patch was a workaround for an ABI-incompatible apex build on a specific compute node, not a code issue. CI Docker builds apex from source so it is not needed there. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * feat: add Flux and QwenImage T2I nightly CI tests Extend the diffusion nightly CI pipeline to support text-to-image models (Flux and QwenImage) alongside the existing text-to-video models (Wan, HunyuanVideo). Uses the diffusers/tuxemon dataset for image CI smoke tests. Changes: - Add MEDIA_TYPE branching in launcher for image vs video stages - Add tuxemon dataset download/extraction with JSONL captions - Add image preprocessing and .png inference verification paths - Add ci: sections to flux_t2i_flow.yaml and qwen_image_t2i_flow.yaml - Register QwenImagePipeline in generate.py output type mapping Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> --------- Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> Co-authored-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 115f85c commit 79f41b3

9 files changed

Lines changed: 381 additions & 51 deletions

File tree

examples/diffusion/finetune/flux_t2i_flow.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,3 +77,7 @@ dist_env:
7777
init_method: "env://"
7878

7979
seed: 42
80+
81+
ci:
82+
recipe_owner: pthombre
83+
time: "00:30:00"

examples/diffusion/finetune/hunyuan_t2v_flow.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,3 +80,7 @@ dist_env:
8080
init_method: "env://"
8181

8282
seed: 42
83+
84+
ci:
85+
recipe_owner: pthombre
86+
time: "01:30:00"
Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
model:
2+
pretrained_model_name_or_path: "Qwen/Qwen-Image"
3+
mode: "finetune"
4+
cache_dir: null
5+
attention_backend: "flash"
6+
7+
optim:
8+
learning_rate: 1e-5
9+
10+
optimizer:
11+
weight_decay: 0.01
12+
betas: [0.9, 0.999]
13+
14+
#adjust dp_size to the total number of GPUs
15+
fsdp:
16+
dp_size: 8
17+
tp_size: 1
18+
cp_size: 1
19+
pp_size: 1
20+
activation_checkpointing: false
21+
cpu_offload: false
22+
23+
flow_matching:
24+
adapter_type: "qwen_image"
25+
adapter_kwargs:
26+
guidance_scale: 3.5
27+
use_guidance_embeds: false
28+
timestep_sampling: "logit_normal"
29+
logit_mean: 0.0
30+
logit_std: 1.0
31+
flow_shift: 2.23
32+
mix_uniform_ratio: 0.0
33+
sigma_min: 0.02
34+
sigma_max: 1.0
35+
num_train_timesteps: 1000
36+
i2v_prob: 0.0
37+
use_loss_weighting: true
38+
loss_weighting_scheme: "bsmntw"
39+
log_interval: 100
40+
summary_log_interval: 10
41+
42+
step_scheduler:
43+
num_epochs: 10
44+
local_batch_size: 1
45+
global_batch_size: 8
46+
ckpt_every_steps: 500
47+
save_checkpoint_every_epoch: false
48+
log_every: 1
49+
# max_steps: null # Set to limit training to a specific number of steps
50+
51+
data:
52+
dataloader:
53+
_target_: nemo_automodel.components.datasets.diffusion.build_text_to_image_multiresolution_dataloader
54+
cache_dir: PATH_TO_YOUR_DATA
55+
train_text_encoder: false
56+
num_workers: 2
57+
# Supported resolutions include [256x256], [512x512], and [1024x1024].
58+
base_resolution: [512, 512]
59+
dynamic_batch_size: false
60+
shuffle: true
61+
drop_last: false
62+
63+
checkpoint:
64+
enabled: true
65+
checkpoint_dir: PATH_TO_YOUR_CKPT_DIR
66+
model_save_format: safetensors
67+
save_consolidated: true
68+
diffusers_compatible: true
69+
restore_from: null
70+
71+
wandb:
72+
project: qwen-image-finetuning
73+
mode: online
74+
name: qwen_image_finetune_run_1
75+
76+
dist_env:
77+
backend: "nccl"
78+
init_method: "env://"
79+
80+
seed: 42
81+
82+
ci:
83+
recipe_owner: pthombre
84+
time: "00:30:00"

examples/diffusion/finetune/wan2_1_t2v_flow.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,3 +73,7 @@ checkpoint:
7373
save_consolidated: true
7474
diffusers_compatible: true
7575
restore_from: null
76+
77+
ci:
78+
recipe_owner: pthombre
79+
time: "00:30:00"

examples/diffusion/generate/generate.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@
5353
# Pipeline class name -> output type mapping
5454
_PIPELINE_OUTPUT_TYPES = {
5555
"FluxPipeline": "image",
56+
"QwenImagePipeline": "image",
5657
"WanPipeline": "video",
5758
"HunyuanVideoPipeline": "video",
5859
"HunyuanVideo15Pipeline": "video",
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
examples_dir: diffusion/finetune
16+
configs:
17+
- wan2_1_t2v_flow.yaml
18+
- hunyuan_t2v_flow.yaml
19+
- flux_t2i_flow.yaml
20+
- qwen_image_t2i_flow.yaml
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
exempt_models:
16+
17+
exempt_configs:
18+
19+
known_issue:
20+
Lines changed: 192 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,192 @@
1+
# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
set -euo pipefail
16+
17+
# Environment variables expected from CI template:
18+
# CONFIG_PATH, TEST_LEVEL, NPROC_PER_NODE, TEST_NODE_COUNT,
19+
# MASTER_ADDR, MASTER_PORT, SLURM_JOB_ID, PIPELINE_DIR, TEST_NAME
20+
21+
DATA_DIR="$PIPELINE_DIR/$TEST_NAME/data"
22+
CKPT_DIR="$PIPELINE_DIR/$TEST_NAME/checkpoint"
23+
INFER_DIR="$PIPELINE_DIR/$TEST_NAME/inference_output"
24+
25+
cd /opt/Automodel
26+
27+
# ============================================
28+
# Derive model-specific settings from config
29+
# ============================================
30+
RECIPE_NAME=$(basename "$CONFIG_PATH" .yaml)
31+
case "$RECIPE_NAME" in
32+
wan2_1_t2v_flow*)
33+
MEDIA_TYPE="video"
34+
PROCESSOR="wan"
35+
GENERATE_CONFIG="examples/diffusion/generate/configs/generate_wan.yaml"
36+
MODEL_NAME="Wan-AI/Wan2.1-T2V-1.3B-Diffusers"
37+
INFER_NUM_FRAMES=9
38+
PREPROCESS_EXTRA_ARGS=""
39+
;;
40+
hunyuan_t2v_flow*)
41+
MEDIA_TYPE="video"
42+
PROCESSOR="hunyuan"
43+
GENERATE_CONFIG="examples/diffusion/generate/configs/generate_hunyuan.yaml"
44+
MODEL_NAME="hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-720p_t2v"
45+
INFER_NUM_FRAMES=5
46+
PREPROCESS_EXTRA_ARGS="--target_frames 13"
47+
;;
48+
flux_t2i_flow*)
49+
MEDIA_TYPE="image"
50+
PROCESSOR="flux"
51+
GENERATE_CONFIG="examples/diffusion/generate/configs/generate_flux.yaml"
52+
MODEL_NAME="black-forest-labs/FLUX.1-dev"
53+
PREPROCESS_EXTRA_ARGS=""
54+
;;
55+
qwen_image_t2i_flow*)
56+
MEDIA_TYPE="image"
57+
PROCESSOR="qwen_image"
58+
GENERATE_CONFIG="examples/diffusion/generate/configs/generate_qwen_image.yaml"
59+
MODEL_NAME="Qwen/Qwen-Image"
60+
PREPROCESS_EXTRA_ARGS=""
61+
;;
62+
*)
63+
echo "ERROR: Unknown recipe '$RECIPE_NAME'. Add a case to diffusion_finetune_launcher.sh."
64+
exit 1
65+
;;
66+
esac
67+
echo "[config] Recipe=$RECIPE_NAME MediaType=$MEDIA_TYPE Processor=$PROCESSOR Model=$MODEL_NAME"
68+
69+
# ============================================
70+
# Stage 1: Download dataset
71+
# ============================================
72+
echo "============================================"
73+
echo "[data] Downloading dataset..."
74+
echo "============================================"
75+
if [ "$MEDIA_TYPE" = "image" ]; then
76+
uv run --extra diffusion python -c "
77+
from datasets import load_dataset
78+
from pathlib import Path
79+
import json
80+
81+
ds = load_dataset('diffusers/tuxemon', split='train')
82+
out_dir = Path('$DATA_DIR/raw')
83+
out_dir.mkdir(parents=True, exist_ok=True)
84+
85+
jsonl_entries = []
86+
for i, row in enumerate(ds):
87+
fname = f'tuxemon_sample_{i:04d}.png'
88+
row['image'].save(out_dir / fname)
89+
jsonl_entries.append({'file_name': fname, 'internvl': row['gpt4_turbo_caption']})
90+
91+
jsonl_path = out_dir / 'tuxemon_internvl.json'
92+
with open(jsonl_path, 'w') as jf:
93+
for entry in jsonl_entries:
94+
jf.write(json.dumps(entry) + '\n')
95+
96+
print(f'Extracted {len(ds)} images to {out_dir}')
97+
"
98+
else
99+
uv run --extra diffusion python -c "
100+
from huggingface_hub import snapshot_download
101+
snapshot_download('modal-labs/dissolve', repo_type='dataset', local_dir='$DATA_DIR/raw')
102+
print('Dataset downloaded successfully')
103+
"
104+
fi
105+
106+
# ============================================
107+
# Stage 2: Preprocess to latents
108+
# ============================================
109+
echo "============================================"
110+
echo "[preprocess] Converting ${MEDIA_TYPE}s to latents..."
111+
echo "============================================"
112+
if [ "$MEDIA_TYPE" = "image" ]; then
113+
uv run --extra diffusion python -m tools.diffusion.preprocessing_multiprocess image \
114+
--image_dir "$DATA_DIR/raw" \
115+
--output_dir "$DATA_DIR/cache" \
116+
--processor "$PROCESSOR" \
117+
$PREPROCESS_EXTRA_ARGS
118+
else
119+
uv run --extra diffusion python -m tools.diffusion.preprocessing_multiprocess video \
120+
--video_dir "$DATA_DIR/raw" \
121+
--output_dir "$DATA_DIR/cache" \
122+
--processor "$PROCESSOR" \
123+
--resolution_preset 512p \
124+
--caption_format sidecar \
125+
$PREPROCESS_EXTRA_ARGS
126+
fi
127+
128+
# ============================================
129+
# Stage 3: Finetune
130+
# ============================================
131+
echo "============================================"
132+
echo "[finetune] Running finetuning..."
133+
echo "============================================"
134+
CONFIG="--config /opt/Automodel/${CONFIG_PATH} \
135+
--data.dataloader.cache_dir $DATA_DIR/cache \
136+
--checkpoint.checkpoint_dir $CKPT_DIR \
137+
--step_scheduler.max_steps ${MAX_STEPS:-100} \
138+
--step_scheduler.ckpt_every_steps 100 \
139+
--step_scheduler.save_checkpoint_every_epoch false \
140+
--fsdp.dp_size ${NPROC_PER_NODE} \
141+
--wandb.mode disabled"
142+
143+
CMD="uv run --extra diffusion torchrun --nproc-per-node=${NPROC_PER_NODE} \
144+
--nnodes=${TEST_NODE_COUNT} \
145+
--rdzv_backend=c10d \
146+
--rdzv_endpoint=${MASTER_ADDR}:${MASTER_PORT} \
147+
--rdzv_id=${SLURM_JOB_ID}"
148+
149+
eval $CMD examples/diffusion/finetune/finetune.py $CONFIG
150+
151+
# ============================================
152+
# Stage 4: Inference smoke test
153+
# ============================================
154+
echo "============================================"
155+
echo "[inference] Running inference smoke test..."
156+
echo "============================================"
157+
CKPT_STEP_DIR=$(ls -d $CKPT_DIR/epoch_*_step_* | sort -t_ -k4 -n | tail -1)
158+
159+
if [ "$MEDIA_TYPE" = "image" ]; then
160+
uv run --extra diffusion python examples/diffusion/generate/generate.py \
161+
--config "$GENERATE_CONFIG" \
162+
--model.pretrained_model_name_or_path "$MODEL_NAME" \
163+
--model.checkpoint "$CKPT_STEP_DIR" \
164+
--inference.num_inference_steps 5 \
165+
--output.output_dir "$INFER_DIR" \
166+
--vae.enable_slicing true \
167+
--vae.enable_tiling true
168+
169+
if ls $INFER_DIR/sample_*.png 1>/dev/null 2>&1; then
170+
echo "[inference] SUCCESS: Output image(s) generated"
171+
else
172+
echo "[inference] FAILURE: No output images found"
173+
exit 1
174+
fi
175+
else
176+
uv run --extra diffusion python examples/diffusion/generate/generate.py \
177+
--config "$GENERATE_CONFIG" \
178+
--model.pretrained_model_name_or_path "$MODEL_NAME" \
179+
--model.checkpoint "$CKPT_STEP_DIR" \
180+
--inference.num_inference_steps 5 \
181+
--inference.pipeline_kwargs.num_frames "$INFER_NUM_FRAMES" \
182+
--output.output_dir "$INFER_DIR" \
183+
--vae.enable_slicing true \
184+
--vae.enable_tiling true
185+
186+
if ls $INFER_DIR/sample_*.mp4 1>/dev/null 2>&1; then
187+
echo "[inference] SUCCESS: Output video(s) generated"
188+
else
189+
echo "[inference] FAILURE: No output videos found"
190+
exit 1
191+
fi
192+
fi

0 commit comments

Comments
 (0)