Skip to content

Commit f5b0bc6

Browse files
authored
Merge branch 'main' into autoencoderkl-tests-refactor
2 parents 646ab6e + a851ce1 commit f5b0bc6

58 files changed

Lines changed: 6489 additions & 172 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/pr_labeler.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,8 @@ jobs:
2020
runs-on: ubuntu-latest
2121
steps:
2222
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
23+
with:
24+
ref: ${{ github.event.pull_request.base.sha }}
2325
- name: Check for missing tests
2426
id: check
2527
env:

.github/workflows/pr_style_bot.yml

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,14 @@ on:
55
types: [created]
66

77
permissions:
8-
contents: write
98
pull-requests: write
9+
contents: read
1010

1111
jobs:
1212
style:
13-
uses: huggingface/huggingface_hub/.github/workflows/style-bot-action.yml@e000c1c89c65aee188041723456ac3a479416d4c # main
13+
uses: huggingface/huggingface_hub/.github/workflows/style-bot-action.yml@e2867e92c07d15e1bf18994d0a945ef5ad6b8d65
1414
with:
1515
python_quality_dependencies: "[quality]"
1616
secrets:
17-
bot_token: ${{ secrets.HF_STYLE_BOT_ACTION }}
17+
app_id: ${{ secrets.HF_BOT_STYLE_APP_ID }}
18+
app_private_key: ${{ secrets.HF_BOT_STYLE_SECRET_PEM }}

docs/source/en/api/loaders/lora.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -132,6 +132,10 @@ LoRA is a fast and lightweight training method that inserts and trains a signifi
132132

133133
[[autodoc]] loaders.lora_pipeline.ZImageLoraLoaderMixin
134134

135+
## CosmosLoraLoaderMixin
136+
137+
[[autodoc]] loaders.lora_pipeline.CosmosLoraLoaderMixin
138+
135139
## KandinskyLoraLoaderMixin
136140
[[autodoc]] loaders.lora_pipeline.KandinskyLoraLoaderMixin
137141

docs/source/en/optimization/attention_backends.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ The [`~ModelMixin.set_attention_backend`] method iterates through all the module
3535
The example below demonstrates how to enable the `_flash_3_hub` implementation for FlashAttention-3 from the [`kernels`](https://github.com/huggingface/kernels) library, which allows you to instantly use optimized compute kernels from the Hub without requiring any setup.
3636

3737
> [!NOTE]
38-
> FlashAttention-3 is not supported for non-Hopper architectures, in which case, use FlashAttention with `set_attention_backend("flash")`.
38+
> FlashAttention-3 requires Ampere GPUs at a minimum.
3939
4040
```py
4141
import torch

examples/cosmos/README.md

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
# LoRA fine-tuning for Cosmos Predict 2.5
2+
3+
This example shows how to fine-tune [Cosmos Predict 2.5](https://huggingface.co/nvidia/Cosmos-Predict2.5-2B) using LoRA on a custom video dataset.
4+
5+
## Requirements
6+
7+
Install the library from source and the example-specific dependencies:
8+
9+
```bash
10+
git clone https://github.com/huggingface/diffusers
11+
cd diffusers
12+
pip install -e ".[dev]"
13+
cd examples/cosmos
14+
pip install -r requirements.txt
15+
```
16+
17+
## Data preparation
18+
19+
The training script expects a dataset directory with the following layout:
20+
21+
```
22+
<dataset_dir>/
23+
├── videos/ # .mp4 files
24+
└── metas/ # one .txt prompt file per video (same stem)
25+
├── 0.txt
26+
├── 1.txt
27+
└── ...
28+
```
29+
30+
### GR1 dataset (quick start)
31+
32+
The `download_and_preprocess_datasets.sh` script downloads the GR1-100 training set and the EVAL-175 test set, then runs the preprocessing script to create the per-video prompt files.
33+
34+
```bash
35+
bash download_and_preprocess_datasets.sh
36+
```
37+
38+
This produces:
39+
- `gr1_dataset/train/` — training videos + prompts
40+
- `gr1_dataset/test/` — evaluation images + prompts
41+
42+
## Training
43+
44+
Launch LoRA training with `accelerate`:
45+
46+
```bash
47+
export MODEL_NAME="nvidia/Cosmos-Predict2.5-2B"
48+
export DATA_DIR="gr1_dataset/train"
49+
export OUT_DIR="lora-output"
50+
51+
accelerate launch --mixed_precision="bf16" train_cosmos_predict25_lora.py \
52+
--pretrained_model_name_or_path=$MODEL_NAME \
53+
--revision diffusers/base/post-trained \
54+
--train_data_dir=$DATA_DIR \
55+
--output_dir=$OUT_DIR \
56+
--train_batch_size=1 \
57+
--num_train_epochs=500 \
58+
--checkpointing_epochs=100 \
59+
--seed=0 \
60+
--height 432 --width 768 \
61+
--allow_tf32 \
62+
--gradient_checkpointing \
63+
--lora_rank 32 --lora_alpha 32 \
64+
--report_to=wandb
65+
```
66+
67+
Or use the provided shell script:
68+
69+
```bash
70+
bash train_lora.sh
71+
```
72+
73+
## Evaluation
74+
75+
Run inference with the trained LoRA adapter:
76+
77+
```bash
78+
export DATA_DIR="gr1_dataset/test"
79+
export LORA_DIR="lora-output"
80+
export OUT_DIR="eval-output"
81+
82+
python eval_cosmos_predict25_lora.py \
83+
--data_dir $DATA_DIR \
84+
--output_dir $OUT_DIR \
85+
--lora_dir $LORA_DIR \
86+
--revision diffusers/base/post-trained \
87+
--height 432 --width 768 \
88+
--num_output_frames 93 \
89+
--num_steps 36 \
90+
--seed 0
91+
```
92+
93+
Or use the provided shell script:
94+
95+
```bash
96+
bash eval_lora.sh
97+
```
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
import argparse
17+
import os
18+
19+
from tqdm import tqdm
20+
21+
22+
"""example command
23+
python create_prompts_for_gr1_dataset.py --dataset_path datasets/benchmark_train/gr1
24+
"""
25+
26+
27+
def parse_args() -> argparse.ArgumentParser:
28+
parser = argparse.ArgumentParser(description="Create text prompts for GR1 dataset")
29+
parser.add_argument(
30+
"--dataset_path", type=str, default="datasets/benchmark_train/gr1", help="Root path to the dataset"
31+
)
32+
parser.add_argument(
33+
"--prompt_prefix", type=str, default="The robot arm is performing a task. ", help="Prefix of the prompt"
34+
)
35+
parser.add_argument(
36+
"--meta_csv", type=str, default=None, help="Metadata csv file (defaults to <dataset_path>/metadata.csv)"
37+
)
38+
return parser.parse_args()
39+
40+
41+
def main(args) -> None:
42+
meta_csv = args.meta_csv or os.path.join(args.dataset_path, "metadata.csv")
43+
meta_lines = open(meta_csv).readlines()[1:]
44+
meta_txt_dir = os.path.join(args.dataset_path, "metas")
45+
os.makedirs(meta_txt_dir, exist_ok=True)
46+
47+
for meta_line in tqdm(meta_lines):
48+
video_filename, prompt = meta_line.split(",", 1)
49+
prompt = prompt.strip("\n")
50+
if prompt.startswith('"') and prompt.endswith('"'):
51+
# Remove the quotes
52+
prompt = prompt[1:-1]
53+
prompt = args.prompt_prefix + prompt
54+
meta_txt_filename = os.path.join(meta_txt_dir, os.path.basename(video_filename).replace(".mp4", ".txt"))
55+
with open(meta_txt_filename, "w") as fp:
56+
fp.write(prompt)
57+
58+
print(f"encoding prompt: {prompt}")
59+
60+
61+
if __name__ == "__main__":
62+
args = parse_args()
63+
main(args)
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
dataset_dir='gr1_dataset'
2+
train_dir=$dataset_dir/train
3+
test_dir=$dataset_dir/test
4+
5+
# Download and Preprocess Training Dataset
6+
hf download nvidia/GR1-100 --repo-type dataset --local-dir datasets/benchmark_train/hf_gr1/ && \
7+
mkdir -p datasets/benchmark_train/gr1/videos && \
8+
mv datasets/benchmark_train/hf_gr1/gr1/*mp4 datasets/benchmark_train/gr1/videos && \
9+
mv datasets/benchmark_train/hf_gr1/metadata.csv datasets/benchmark_train/gr1/
10+
11+
python create_prompts_for_gr1_dataset.py --dataset_path datasets/benchmark_train/gr1
12+
13+
# Download Eval Dataset
14+
hf download nvidia/EVAL-175 --repo-type dataset --local-dir dream_gen_benchmark
15+
16+
17+
# Rename dataset directory
18+
mkdir $dataset_dir
19+
mv datasets/benchmark_train/gr1 $train_dir
20+
mv dream_gen_benchmark/gr1_object $test_dir
21+
echo Download training data to $train_dir
22+
echo Download test data to $test_dir
23+
24+
# Clean up staging directories
25+
rm -rf datasets/ dream_gen_benchmark/

0 commit comments

Comments
 (0)