Skip to content

Commit b069220

Browse files
committed
docs: emphasize high-quality filtered prompts in highlights
- Add highlight about ~200k carefully filtered prompts for optimal RL training - Update LongCat section to mention filtered prompts dataset
1 parent a3e2b53 commit b069220

1 file changed

Lines changed: 2 additions & 1 deletion

File tree

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@
3333
- 🎯 **Unified RL for Visual Generation** — A single framework covering text-to-image (T2I), text-to-video (T2V), and image-to-video (I2V) generation
3434
- 🔄 **Multi-Paradigm Support** — Native support for both **Diffusion** and **Rectified Flow** generation paradigms via unified SDE formulation
3535
- 🧩 **Modular Reward System** — Plug-and-play reward functions: aesthetic scores, text-alignment, motion quality, OCR accuracy, and custom user-defined rewards
36+
- 📝 **High-Quality Training Data** — Carefully curated **~200k filtered prompts** for optimal RL training performance
3637
-**Scalable & Efficient** — Multi-node FSDP training with activation checkpointing, LoRA / full fine-tune, EMA, 8-bit Adam, and memory-efficient reward model offloading
3738
- 🎛️ **YAML-Driven Configuration** — Everything from model choice, reward weights, training schedule to FSDP sharding strategy is controlled via a single YAML config
3839
- 🔬 **Reproducible by Design** — Deterministic seeding across sampling, training, and logging for bit-exact experiment reproduction
@@ -136,7 +137,7 @@
136137

137138
</div>
138139

139-
**LongCat Reproduction**: Our GenRL implementation successfully reproduces **LongCat** (not yet open-sourced) on the **Wan2.1-T2V 1.3B** model. Training with **64 H100 GPUs** up to **1.5k steps**, all four reward metrics continue to improve normally, demonstrating stable and effective multi-reward RLHF training.
140+
**LongCat Reproduction**: Our GenRL implementation successfully reproduces **LongCat** (not yet open-sourced) on the **Wan2.1-T2V 1.3B** model. Training with **64 H100 GPUs** up to **1.5k steps**, all four reward metrics continue to improve normally, demonstrating stable and effective multi-reward RLHF training. The training dataset consists of **~200k carefully filtered prompts** (`datasets/filtered_prompts/`), ensuring high-quality training data for optimal RL performance.
140141

141142
### 📈 Visual Example
142143

0 commit comments

Comments
 (0)