Skip to content

Commit 3b09385

Browse files
committed
prepare shell script, add micro ppo batch, fix bug inside vllm_rollout, update README
1 parent 8a0d3d4 commit 3b09385

19 files changed

Lines changed: 96 additions & 967 deletions

.gitignore

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -178,3 +178,12 @@ wandb/
178178
dataset/
179179
generation_results/
180180
backups/
181+
182+
examples/baselines/qwen2_5_vl_3b_clevr.sh
183+
examples/baselines/qwen2_5_vl_3b_geoqa8k.sh
184+
examples/baselines/qwen2_5_vl_7b_doc_agent.sh
185+
examples/baselines/qwen2_5_vl_7b_doc_agent_generation_SCC.sh
186+
examples/baselines/qwen2_5_vl_7b_doc_agent_NHR.sh
187+
examples/baselines/qwen2_5_vl_7b_doc_agent_ppo_NHR.sh
188+
examples/baselines/qwen2_5_vl_7b_doc_agent_ppo_SCC.sh
189+
examples/baselines/qwen2_5_vl_7b_doc_agent_SCC.sh

.idea/deployment.xml

Lines changed: 8 additions & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/sshConfigs.xml

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/webServers.xml

Lines changed: 7 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

README.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ pip install -e .
4040

4141
### 1. Corpus Building
4242

43-
We provide the processed training corpus on Hugging Face: **[SkyFishQ/ALDEN](https://www.google.com/search?q=https://huggingface.co/SkyFishQ/ALDEN)**.
43+
We provide the processed training corpus on Hugging Face: **[SkyFishQ/ALDEN](https://huggingface.co/datasets/SkyFishQ/ALDEN/tree/main)**.
4444

4545
If you wish to build the corpus from scratch using your own data:
4646

@@ -147,6 +147,8 @@ First, launch the RAG environment server which handles the `<search>` and `<fetc
147147
--port 42354
148148
```
149149

150+
*We initially set two retrievers for each GPU. Adapt the number of GPU and retriever according to the specific devices in the yaml file.*
151+
150152
### Step 2: RL Training
151153

152154
Once the tool server is running, start the training. Ensure the server URL in the training script points to the IP obtained in Step 1.
@@ -168,7 +170,7 @@ bash examples/baselines/qwen2_5_vl_7b_doc_agent_generation.sh
168170
### Merge Checkpoints in the Hugging Face Format
169171

170172
```bash
171-
python3 scripts/model_merger.py \
173+
python scripts/model_merger.py \
172174
--local_dir checkpoints/easy_r1/exp_name/global_step_1/actor
173175
```
174176

@@ -194,4 +196,4 @@ This work is built upon the following excellent open-source projects:
194196
- [verl](https://github.com/volcengine/verl): For efficient RL training.
195197
- [ReCall](https://github.com/Agent-RL/ReCall): For RAG integration concepts.
196198
197-
We greatly appreciate their valuable contributions to the community.
199+
We greatly appreciate their valuable contributions to the community.

examples/baselines/qwen2_5_vl_3b_clevr.sh

Lines changed: 0 additions & 14 deletions
This file was deleted.

examples/baselines/qwen2_5_vl_3b_geoqa8k.sh

Lines changed: 0 additions & 88 deletions
This file was deleted.

examples/baselines/qwen2_5_vl_7b_doc_agent.sh

Lines changed: 0 additions & 126 deletions
This file was deleted.

0 commit comments

Comments
 (0)