You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
By comparing these results to the baseline performance of the original model, you will see the benefits of applying EditScore as a reranker.
69
69
70
70
## Application 2: Reinforcement Fine-Tuning
71
-
Use EditScore to provide a high-quality reward signal to train models for significantly better image editing performance. We employ the FlowGRPO algorithm combined with EditScore's accurate evaluation capabilities to achieve end-to-end reinforcement learning fine-tuning.
71
+
Beyond evaluation, **EditScore** can be used as a high-quality reward signal to fine-tune your image editing models using Reinforcement Learning (RL), leading to significantly improved performance.
72
72
73
-
### 1. Data and Model Download
74
-
Download RL training data from [EditScore-RL-Data](https://huggingface.co/datasets/EditScore/EditScore-RL-Data), then put the `rl.jsonl` into `data/` and change its path in `data_configs/train/train.yml`
75
-
To convert relative image paths to your absolute paths:
73
+
We employ the **FlowGRPO** algorithm, combining its strengths with EditScore's accurate, real-time feedback to create a powerful end-to-end fine-tuning pipeline. This process effectively guides the model toward generating better edits.
74
+
75
+
### 1. Prepare Training Data
76
+
First, set up the dataset for RL fine-tuning.
77
+
1. Download the Data
78
+
Downlaod the official RL training data from [EditScore-RL-Data](https://huggingface.co/datasets/EditScore/EditScore-RL-Data).
79
+
2. Create Meta File
80
+
The uploaded dataset uses relative image paths. Run the following script to convert them to absolute paths based on your local environment:
Download the base model OmniGen2 form [OmniGen2](https://huggingface.co/OmniGen2/OmniGen2),then change the model file format to pytorch_model.bin and modify `model.pretrained_model_path` in `options/omnigen2_edit_rl.yml`
81
101
82
-
### 2. Start Reward Server
102
+
### 3. Launch the Reward Server
103
+
RL training requires a live reward signal. Before starting the training process, you must launch the **EditScore Reward Server**. This server will provide real-time scores for the generated images during training.
83
104
84
-
Before beginning training, you need to start the EditScore reward server to provide real-time reward signal evaluation for RL training.
105
+
Our reward server is built with two components: a **proxy** and one or more **reward servers**. The proxy receives requests from the training node, distributes them to the individual reward servers for computation, and then collects the results to send back. This architecture allows for easy scaling across multiple machines.
85
106
86
-
### 3. Start Training
107
+
We provide a convenient script to launch the entire server stack across multiple machines, assuming you have `ssh` access to all reward server nodes.
> * Before running the script, you **must** specify the IP addresses of your reward server machines in the corresponding `.yml` configuration file.
123
+
> * If you cannot use `ssh` to control the nodes, please refer to the logic in `reward_server/start_multi_machines.sh` to manually start the proxy and server processes on each machine.
124
+
> * You can monitor the status of the proxy and servers by checking the log files in the `reward_server/logs/` directory.
125
+
126
+
## 3.5 (Optional) Reward Server Sanity Check
127
+
To ensure the reward server is configured correctly and running as expected, we provide a sanity check script.
0 commit comments