You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
By comparing these results to the baseline performance of the original model, you will see the benefits of applying EditScore as a reranker.
69
69
70
70
## Application 2: Reinforcement Fine-Tuning
71
-
TBD.
71
+
Use EditScore to provide a high-quality reward signal to train models for significantly better image editing performance. We employ the FlowGRPO algorithm combined with EditScore's accurate evaluation capabilities to achieve end-to-end reinforcement learning fine-tuning.
72
+
73
+
### 1. Data and Model Download
74
+
Download RL training data from [EditScore-RL-Data](https://huggingface.co/datasets/EditScore/EditScore-RL-Data), then put the `rl.jsonl` into `data/` and change its path in `data_configs/train/train.yml`
75
+
76
+
Download the base model OmniGen2 form [OmniGen2](https://huggingface.co/OmniGen2/OmniGen2),then change the model file format to pytorch_model.bin and modify `model.pretrained_model_path` in `options/omnigen2_edit_rl.yml`
77
+
78
+
### 2. Start Reward Server
79
+
80
+
Before beginning training, you need to start the EditScore reward server to provide real-time reward signal evaluation for RL training.
81
+
82
+
### 3. Start Training
83
+
84
+
**Configure Training Parameters**
85
+
86
+
Edit the `options/omnigen2_edit_rl.yml` configuration file, focusing on these key parameters:
87
+
-`train.global_batch_size`: Global batch size (num_machines * num_unique_prompts_per_sampling * num_images_per_prompt)
88
+
-`train.rl.num_images_per_prompt`: Rollout number of one prompt
89
+
-`train.rl.num_unique_prompts_per_sampling`: Number of global unique prompts
90
+
91
+
92
+
**Launch Distributed Training**
93
+
```bash
94
+
# Single machine training (8*H100 GPUs)
95
+
bash scripts/train/omnigen2_edit_rl.sh
96
+
97
+
# Multi-machine distributed training
98
+
```
99
+
100
+
> **⚠️ Training Configuration Key Points**
101
+
>
102
+
> **Reward Server IP**: Ensure the `REWARD_SERVER_IP` environment variable in training scripts points to the correct reward server address
103
+
104
+
105
+
### 4. Training Outputs and Monitoring
106
+
107
+
Logs and saved model checkpoints in `experiments/`
0 commit comments