VectorSpaceLab
diff --git a/‎examples/OmniGen2-RL/README.md‎
Lines changed: 43 additions & 15 deletions b/‎examples/OmniGen2-RL/README.md‎
Lines changed: 43 additions & 15 deletions
diff --git a/‎examples/OmniGen2-RL/data_configs/train/example/edit/all.yml‎
Lines changed: 2 additions & 1 deletion b/‎examples/OmniGen2-RL/data_configs/train/example/edit/all.yml‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎examples/OmniGen2-RL/evaluation/GEdit-Bench/omnigen2_edit_rl_single_machine_editscore7b_step500.sh‎
Lines changed: 94 additions & 0 deletions b/‎examples/OmniGen2-RL/evaluation/GEdit-Bench/omnigen2_edit_rl_single_machine_editscore7b_step500.sh‎
Lines changed: 94 additions & 0 deletions
diff --git a/‎examples/OmniGen2-RL/nccl_logs/.gitkeep‎ b/‎examples/OmniGen2-RL/nccl_logs/.gitkeep‎
diff --git a/‎…OmniGen2-RL/options/omnigen2_edit_rl.yml‎ ‎…n2_edit_rl_4machine_editscore7b_avg4.yml‎examples/OmniGen2-RL/options/omnigen2_edit_rl.yml renamed to examples/OmniGen2-RL/options/omnigen2_edit_rl_4machine_editscore7b_avg4.yml
Lines changed: 8 additions & 10 deletions b/‎…OmniGen2-RL/options/omnigen2_edit_rl.yml‎ ‎…n2_edit_rl_4machine_editscore7b_avg4.yml‎examples/OmniGen2-RL/options/omnigen2_edit_rl.yml renamed to examples/OmniGen2-RL/options/omnigen2_edit_rl_4machine_editscore7b_avg4.yml
Lines changed: 8 additions & 10 deletions
diff --git a/‎examples/OmniGen2-RL/options/omnigen2_edit_rl_4machine_editscore7b_avg8.yml‎
Lines changed: 122 additions & 0 deletions b/‎examples/OmniGen2-RL/options/omnigen2_edit_rl_4machine_editscore7b_avg8.yml‎
Lines changed: 122 additions & 0 deletions
@@ -79,7 +79,11 @@ Downlaod the official RL training data from [EditScore-RL-Data](https://huggingf
 2. Create Meta File
 The uploaded dataset uses relative image paths. Run the following script to convert them to absolute paths based on your local environment:
 ```bash
+# Then
 python scripts/data/process_jsonl.py --input /path/to/EditScore-RL-Data/rl.jsonl --output /path/to/EditScore-RL-Data/rl_abs.jsonl --base-path /path/to/EditScore-RL-Data
+
+# Due to the limitation of base model (OmniGen2), we discard text change and portrait beautification, as these tasks harm RL training.
+python scripts/data/extract_9_tasks.py --input_path /path/to/EditScore-RL-Data/rl_abs.jsonl --output_path /path/to/EditScore-RL-Data/rl_abs_9tasks.jsonl
 ```
 3. Configure the Data Path
 Specify the path to your processed `.jsonl` file in the data configuration located at `data_configs/train/example/edit/all.yml`.
@@ -89,7 +93,7 @@ ratio_type: inside_ratio
 
 data:
   - 
-    path: '/path/to/EditScore-RL-Data/rl_abs.jsonl' # <-- Ensure this path is correct
+    path: '/path/to/EditScore-RL-Data/rl_abs_9tasks.jsonl' # <-- Ensure this path is correct
     type: 'edit'
     ratio: !!float 1
 ```
@@ -130,29 +134,53 @@ python reward_server/scripts/utils/reward_server_sanity_check.py --config_path=r
 ```
 Once these steps are complete, your environment is ready to begin the reinforcement learning fine-tuning process.
 
-### 4. Start Training
+### 4. Start RL Fine-Tuning
 
 **Configure Training Parameters**
+Before launching, you may need to adjust key parameters in the configuration file: `options/omnigen2_edit_rl_4machine_editscore7b_avg4.yml`.
 
-Edit the `options/omnigen2_edit_rl.yml` configuration file, focusing on these key parameters:
-- `train.global_batch_size`: Global batch size across all GPUs (num_unique_prompts_per_sampling * num_images_per_prompt)
-- `train.batch_size`: Batch size per GPU (batch_size_per_forward * gradient_accumulation_steps * num_update_steps_per_sampling)
-- `train.rl.num_images_per_prompt`: The number of roolout of one prompt 
-- `train.rl.num_unique_prompts_per_sampling`: Number of global unique prompts
+Here are some important settings:
+- `train.global_batch_size`: The total number of images generated across all GPUs in a single sampling phase before the policy is updated. It is calculated as `num_unique_prompts_per_sampling * num_images_per_prompt`.
+- `train.batch_size`: Batch size per GPU (`batch_size_per_forward * gradient_accumulation_steps * num_update_steps_per_sampling`)
+- `train.rl.num_images_per_prompt`: The number of candidate images to generate for each unique prompt.
+- `train.rl.num_unique_prompts_per_sampling`: The number of unique prompts in a global batch
+- `train.rl.num_update_steps_per_sampling`: The number of gradient updates to perform in each sampling phase. Set this to `> 1` to enable off-policy RL, which improves sample efficiency.
+- `train.rl.batch_size_per_forward`: Batch size for each forward pass. Together with `num_update_steps_per_sampling`, it defines the total number of samples processed per policy update.
 
 **Launch Distributed Training**
+We provide scripts for both single and multi-machine distributed training based on **FSDP**.
 ```bash
-# Single machine training (8*H100 GPUs)
-bash scripts/train/omnigen2_edit_rl.sh
+# Single-machine training (8 GPUs) using EditScore-7B as the reward model
+bash scripts/train/omnigen2_edit_rl_single_machine_editscore7b.sh
 
-# Multi-machine distributed training
+# Multi-machine training (e.g., 4 machines with 8 GPUs each) using EditScore-7B (Avg@4)
+bash scripts/train/omnigen2_edit_rl_4machine_editscore7b_avg4.sh
 ```
 
-> **⚠️ Training Configuration Key Points**
->
-> **Reward Server IP**: Ensure the `REWARD_SERVER_IP` environment variable in training scripts points to the correct reward server address
+### 4. Training Outputs and Monitoring
+All training artifacts, including logs and model checkpoints, are saved to the `experiments/` directory.
 
+### 5. Evaluate your RL Fine-Tuned Model
+After training, you must convert the FSDP-saved checkpoint (`.bin`) into the standard Hugging Face format before you can use it for inference.
 
-### 4. Training Outputs and Monitoring
+#### Step 1: Convert the Checkpoint
+We provide a script to automatically handle the conversion from the distributed FSDP format to the standard Hugging Face format (`.bin`).
+Run the following command, replacing the arguments with your experiment's details:
 
-Logs and saved model checkpoints in `experiments/`
+```shell
+bash scripts/misc/convert_dist_ckpt_to_hf_format.sh [EXPERIMENT_NAME] [STEP_NUMBER]
+```
+- [EXPERIMENT_NAME]: The name of your training experiment (e.g., omnigen2_edit_rl_single_machine_editscore7b).
+- [STEP_NUMBER]: The specific training step of the checkpoint you wish to evaluate (e.g., 500).
+
+This will create a new directory containing the converted model weights in the standard format, ready for inference.
+
+#### Step 2: Run Evaluation on GEdit-Bench
+Once the checkpoint is converted, you can benchmark its performance. We provide evaluation scripts tailored for GEdit-Bench.
+
+You can use our example scripts as a template. Simply copy one and modify the internal paths to point to your newly converted model checkpoint.
+```shell
+# Run evaluation for the converted model from step 500
+bash evaluation/GEdit-Bench/omnigen2_edit_rl_single_machine_editscore7b_step500.sh
+```
+By comparing the results to the baseline model's performance, you can quantify the improvements achieved through RL fine-tuning with EditScore.
@@ -2,6 +2,7 @@ ratio_type: inside_ratio
 
 data:
   - 
-    path: '/path/to/EditScore-RL-Data/rl_abs.jsonl'
+    # path: '/path/to/EditScore-RL-Data/rl_abs_9tasks.jsonl'
+    path: '/share/project/chenyuan/data2/EditScore-RL-Data-v4/rl_abs_9tasks.jsonl'
     type: 'edit'
     ratio: !!float 1
@@ -0,0 +1,94 @@
+# !/bin/bash
+SHELL_FOLDER=$(cd "$(dirname "$0")";pwd)
+cd $(dirname $SHELL_FOLDER)
+cd ../
+
+source "$(dirname $(which conda))/../etc/profile.d/conda.sh"
+conda activate py3.12+pytorch2.7.1+cu126
+
+RANK=0
+MASTER_ADDR=1
+MASTER_PORT=29500
+WORLD_SIZE=1
+
+# 处理命名参数
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        --rank=*)
+            RANK="${1#*=}"
+            shift
+            ;;
+        --master_addr=*)
+            MASTER_ADDR="${1#*=}"
+            shift
+            ;;
+        --master_port=*)
+            MASTER_PORT="${1#*=}"
+            shift
+            ;;
+        --world_size=*)
+            WORLD_SIZE="${1#*=}"
+            shift
+            ;;
+        *)
+            echo "未知参数: $1"
+            shift
+            ;;
+    esac
+done
+
+# 输出配置
+echo "RANK: $RANK"
+echo "MASTER_ADDR: $MASTER_ADDR"
+echo "MASTER_PORT: $MASTER_PORT"
+echo "WORLD_SIZE: $WORLD_SIZE"
+
+global_shift_index=0
+total_num_images=606
+
+num_gpus_per_machine=$(python -c "import torch; print(torch.cuda.device_count())")
+# Calculate images per machine, rounding up to ensure all data is covered
+num_images_per_machine=$(( (total_num_images + WORLD_SIZE - 1) / WORLD_SIZE ))
+shift_index=$((RANK * num_images_per_machine))
+
+if [ $((total_num_images - shift_index)) -lt $num_images_per_machine ]; then
+    num_images_per_machine=$((total_num_images - shift_index))
+fi
+
+# Calculate base number of images per GPU (for first 7 GPUs)
+num_images_per_gpu=$(( (num_images_per_machine + num_gpus_per_machine - 1) / num_gpus_per_machine ))
+
+text_guidance_scale=5.0
+image_guidance_scale=1.5
+
+for ((i=0; i<num_gpus_per_machine; i++)); do
+    if [ $i -lt $((num_gpus_per_machine - 1)) ]; then
+        # First 7 GPUs process equal amounts
+        start_idx=$((global_shift_index + i * num_images_per_gpu + shift_index))
+        end_idx=$((start_idx + num_images_per_gpu))
+    else
+        # Last GPU processes remaining data
+        start_idx=$((global_shift_index + (num_gpus_per_machine - 1) * num_images_per_gpu + shift_index))
+        end_idx=$((global_shift_index + shift_index + num_images_per_machine))
+    fi
+    echo ${start_idx} ${end_idx}
+
+    CUDA_VISIBLE_DEVICES=${i} WORLD_SIZE=1 nohup accelerate launch --num_processes 1 --num_machines 1 \
+    evaluation/GEdit-Bench/inference.py \
+    --load_from_pipeline \
+    --pipeline_path OmniGen2/OmniGen2 \
+    --transformer_lora_path experiments/omnigen2_edit_rl_single_machine_editscore7b/checkpoint-500/transformer_lora \
+    --num_inference_step 50 \
+    --height 1024 \
+    --width 1024 \
+    --text_guidance_scale ${text_guidance_scale} \
+    --image_guidance_scale ${image_guidance_scale} \
+    --time_shift_base_res 168 \
+    --negative_prompt "" \
+    --use_ori_neg_prompt_template \
+    --scheduler "euler" \
+    --result_dir evaluation/GEdit-Bench/results/OmniGen2/results_ts${text_guidance_scale}_ig${image_guidance_scale}_16samples \
+    --start_index ${start_idx} --end_index ${end_idx} \
+    --num_samples 16 \
+    > logs/gedit_OmniGen2_ts${text_guidance_scale}_ig${image_guidance_scale}_16samples_${start_idx}_${end_idx}.log 2>&1 &
+done
@@ -1,9 +1,11 @@
-name: omnigen2_edit_rl
+name: omnigen2_edit_rl_4machine_editscore7b_avg4
 
 seed: 2233
 device_specific_seed: true
 workder_specific_seed: true
 
+reward_server_config: reward_server/server_configs/editscore_7B_avg4.yml
+
 data:
   data_path: data_configs/train/example/train.yml
   use_chat_template: true
@@ -40,14 +42,11 @@ transport:
   dynamic_time_shift: true
 
 train:
-
-  # global_batch_size: 576
-  global_batch_size: 144
+  global_batch_size: 576
   batch_size: 18
   gradient_accumulation_steps: 1
-  
-  # num_train_epochs: 4
-  max_train_steps: 5000
+
+  max_train_steps: 1000
 
   dataloader_num_workers: 12
 
@@ -85,8 +84,7 @@ train:
   lora_dropout: 0
 
   rl:
-    # num_unique_prompts_per_sampling: 48
-    num_unique_prompts_per_sampling: 12
+    num_unique_prompts_per_sampling: 48
     num_update_steps_per_sampling: 2
     batch_size_per_forward: 9
     num_images_per_prompt: 12
@@ -107,12 +105,12 @@ train:
     server_type: vlm
     use_ori_neg_prompt_template: true
     time_shift_base_res: 168
+    policy_loss_reweighting: false
 
 val:
   train_visualization_interval: 5
   num_train_visualization_samples: 3
 
-
 logger:
   log_with: [wandb, tensorboard]
   # log_with: ~
 
@@ -0,0 +1,122 @@
+name: omnigen2_edit_rl_4machine_editscore7b_avg8
+
+seed: 2233
+device_specific_seed: true
+workder_specific_seed: true
+
+reward_server_config: reward_server/server_configs/editscore_7B_avg8.yml
+
+data:
+  data_path: data_configs/train/example/train.yml
+  use_chat_template: true
+  maximum_text_tokens: 888
+  prompt_dropout_prob: !!float 0.0
+  ref_img_dropout_prob: !!float 0.0
+  max_output_pixels: 262144 # 512 * 512
+  max_input_pixels: [262144, 262144, 262144, 262144] # [512 * 512, 512 * 512, 512 * 512, 512 * 512]
+  max_side_length: 2048
+  
+model:
+  pretrained_vae_model_name_or_path: black-forest-labs/FLUX.1-dev
+  pretrained_text_encoder_model_name_or_path: Qwen/Qwen2.5-VL-3B-Instruct
+  pretrained_model_path: pretrained_models/OmniGen2/transformer/pytorch_model.bin
+  
+  arch_opt:
+    patch_size: 2
+    in_channels: 16
+    hidden_size: 2520
+    num_layers: 32
+    num_refiner_layers: 2
+    num_attention_heads: 21
+    num_kv_heads: 7
+    multiple_of: 256
+    norm_eps: !!float 1e-05
+    axes_dim_rope: [40, 40, 40]
+    axes_lens: [10000, 10000, 10000]
+    text_feat_dim: 2048
+    timestep_scale: !!float 1000
+
+transport:
+  snr_type: lognorm
+  do_shift: true
+  dynamic_time_shift: true
+
+train:
+  global_batch_size: 576
+  batch_size: 18
+  gradient_accumulation_steps: 1
+
+  max_train_steps: 1000
+  
+  dataloader_num_workers: 12
+
+  # Optimizer
+  learning_rate: !!float 4e-4
+  scale_lr: false
+  lr_scheduler: timm_constant_with_warmup
+  warmup_t: 0
+  warmup_lr_init: 1e-7
+  warmup_prefix: true
+  t_in_epochs: false
+
+  # resume_from_checkpoint: 
+
+  use_8bit_adam: false
+  adam_beta1: 0.9
+  adam_beta2: 0.95
+  adam_weight_decay: !!float 0.01
+  adam_epsilon: !!float 1e-08
+  max_grad_norm: 1
+
+  gradient_checkpointing: true
+  
+  set_grads_to_none: true
+
+  # Misc
+  allow_tf32: false
+  mixed_precision: 'bf16'
+
+  ema_decay: 0.0
+
+  lora_ft: true
+  lora_rank: 32
+  lora_alpha: 64
+  lora_dropout: 0
+
+  rl:
+    num_unique_prompts_per_sampling: 48
+    num_update_steps_per_sampling: 2
+    batch_size_per_forward: 9
+    num_images_per_prompt: 12
+    sigma_coef: 0.7
+    negative_prompt: ""
+    num_inference_step: 20
+    max_sequence_length: 1024
+    text_guidance_scale: 4
+    image_guidance_scale: 2
+    cfg_range_start: 0.0
+    cfg_range_end: 0.6
+    train_timesteps_fraction: 0.6
+    reuse_samples_nums: 1
+    clip_range: [!!float 1e-4, !!float 5e-4]
+    adv_clip_max: !!float 5
+    kl_loss_weight: !!float 0.04
+    apply_cfg_in_training: true
+    server_type: vlm
+    use_ori_neg_prompt_template: true
+    time_shift_base_res: 168
+    policy_loss_reweighting: false
+
+val:
+  train_visualization_interval: 5
+  num_train_visualization_samples: 3
+
+logger:
+  log_with: [wandb, tensorboard]
+  # log_with: ~
+
+  checkpointing_steps: 50
+  checkpoints_total_limit: ~
+
+cache_dir: 
+resume_from_checkpoint: latest
Original file line number	Diff line number	Diff line change
`@@ -2,6 +2,7 @@ ratio_type: inside_ratio`
`2`	`2`
`3`	`3`	`data:`
`4`	`4`	`-`
`5`		`- path: '/path/to/EditScore-RL-Data/rl_abs.jsonl'`
	`5`	`+ # path: '/path/to/EditScore-RL-Data/rl_abs_9tasks.jsonl'`
	`6`	`+ path: '/share/project/chenyuan/data2/EditScore-RL-Data-v4/rl_abs_9tasks.jsonl'`
`6`	`7`	`type: 'edit'`
`7`	`8`	`ratio: !!float 1`