|
1 | | -# EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework |
| 1 | +# RAVQA |
2 | 2 |
|
3 | 3 | [](https://github.com/hiyouga/EasyR1/stargazers) |
4 | 4 | [](https://twitter.com/llamafactory_ai) |
5 | 5 |
|
6 | | -This project is a clean fork of the original [veRL](https://github.com/volcengine/verl) project to support vision language models, we thank all the authors for providing such a high-performance RL training framework. |
| 6 | +## Document VQA |
7 | 7 |
|
8 | | -EasyR1 is efficient and scalable due to the design of **[HybirdEngine](https://arxiv.org/abs/2409.19256)** and the latest release of **[vLLM](https://github.com/vllm-project/vllm)**'s SPMD mode. |
| 8 | +### Dataset Preprocessing |
9 | 9 |
|
10 | | -## Features |
| 10 | +#### Corpus Building |
11 | 11 |
|
12 | | -- Supported models |
13 | | - - Llama3/Qwen2/Qwen2.5 language models |
14 | | - - Qwen2/Qwen2.5-VL vision language models |
15 | | - - DeepSeek-R1 distill models |
| 12 | +Change the raw data path and the target path in `rag_serving/build_corpus.py` |
16 | 13 |
|
17 | | -- Supported algorithms |
18 | | - - GRPO |
19 | | - - Reinforce++ |
20 | | - - ReMax |
21 | | - - RLOO |
22 | | - |
23 | | -- Supported datasets |
24 | | - - Any text, vision-text dataset in a [specific format](#custom-dataset) |
25 | | - |
26 | | -- Supported tricks |
27 | | - - Padding-free training |
28 | | - - Resuming from checkpoint |
29 | | - - Wandb & SwanLab & Mlflow & Tensorboard tracking |
30 | | - |
31 | | -## Requirements |
32 | | - |
33 | | -### Software Requirements |
34 | | - |
35 | | -- Python 3.9+ |
36 | | -- transformers>=4.51.0 |
37 | | -- flash-attn>=2.4.3 |
38 | | -- vllm>=0.8.3 |
39 | | - |
40 | | -We provide a [Dockerfile](./Dockerfile) to easily build environments. |
41 | | - |
42 | | -We recommend using the [pre-built docker image](https://hub.docker.com/r/hiyouga/verl) in EasyR1. |
43 | | - |
44 | | -```bash |
45 | | -docker pull hiyouga/verl:ngc-th2.6.0-cu126-vllm0.8.3-flashinfer0.2.2-cxx11abi0 |
46 | | -``` |
47 | | - |
48 | | -### Hardware Requirements |
49 | | - |
50 | | -\* *estimated* |
51 | | - |
52 | | -| Method | Bits | 1.5B | 3B | 7B | 32B | |
53 | | -| ------------------------ | ---- | ------ | ------ | ------ | ------- | |
54 | | -| GRPO Full Fine-Tuning | AMP | 2*24GB | 4*40GB | 8*40GB | 16*80GB | |
55 | | -| GRPO Full Fine-Tuning | BF16 | 1*24GB | 1*40GB | 4*40GB | 8*80GB | |
56 | | - |
57 | | -> [!NOTE] |
58 | | -> Use `worker.actor.fsdp.torch_dtype=bf16` and `worker.actor.optim.strategy=adamw_bf16` to enable bf16 training. |
59 | | -> |
60 | | -> We are working hard to reduce the VRAM in RL training, LoRA support will be integrated in next updates. |
61 | | -
|
62 | | -## Tutorial: Run Qwen2.5-VL GRPO on [Geometry3K](https://huggingface.co/datasets/hiyouga/geometry3k) Dataset in Just 3 Steps |
63 | | - |
64 | | - |
65 | | - |
66 | | -### Installation |
67 | | - |
68 | | -```bash |
69 | | -git clone https://github.com/hiyouga/EasyR1.git |
70 | | -cd EasyR1 |
71 | | -pip install -e . |
72 | | -``` |
73 | | - |
74 | | -### GRPO Training |
75 | | - |
76 | | -```bash |
77 | | -bash examples/qwen2_5_vl_7b_geo3k_grpo.sh |
78 | | -``` |
79 | | - |
80 | | -### Merge Checkpoint in Hugging Face Format |
81 | | - |
82 | | -```bash |
83 | | -python3 scripts/model_merger.py --local_dir checkpoints/easy_r1/exp_name/global_step_1/actor |
84 | | -``` |
85 | | - |
86 | | -> [!TIP] |
87 | | -> If you encounter issues with connecting to Hugging Face, consider using `export HF_ENDPOINT=https://hf-mirror.com`. |
88 | | -> |
89 | | -> If you want to use SwanLab logger, consider using `bash examples/qwen2_5_vl_7b_geo3k_swanlab.sh`. |
90 | | -
|
91 | | -## Custom Dataset |
92 | | - |
93 | | -Please refer to the example datasets to prepare your own dataset. |
94 | | - |
95 | | -- Text dataset: https://huggingface.co/datasets/hiyouga/math12k |
96 | | -- Image-text dataset: https://huggingface.co/datasets/hiyouga/geometry3k |
97 | | -- Multi-image-text dataset: https://huggingface.co/datasets/hiyouga/journeybench-multi-image-vqa |
98 | | - |
99 | | -## How to Understand GRPO in EasyR1 |
100 | | - |
101 | | - |
102 | | - |
103 | | -- To learn about the GRPO algorithm, you can refer to [Hugging Face's blog](https://huggingface.co/docs/trl/v0.16.1/en/grpo_trainer). |
104 | | - |
105 | | -## How to Run 70B+ Model in Multi-node Environment |
106 | | - |
107 | | -1. Start the Ray head node. |
108 | | - |
109 | | -```bash |
110 | | -ray start --head --port=6379 --dashboard-host=0.0.0.0 |
111 | | -``` |
112 | | - |
113 | | -2. Start the Ray worker node and connect to the head node. |
114 | | - |
115 | | -```bash |
116 | | -ray start --address=<head_node_ip>:6379 |
| 14 | +```shell |
| 15 | +python rag_serving/build_corpus.py |
117 | 16 | ``` |
118 | 17 |
|
119 | | -3. Check the Ray resource pool. |
| 18 | +#### Image Index Building |
120 | 19 |
|
121 | | -```bash |
122 | | -ray status |
| 20 | +```shell |
| 21 | +python index_builder.py --retrieval_method vdr-2b-v1 --model_path llamaindex/vdr-2b-v1 --corpus_path /scratch-scc/projects/scc_ulsb_fe/yang/images_corpus/images.parquet --save_dir /scratch-scc/projects/scc_ulsb_fe/yang/images_index --max_length 512 --batch_size 128 --faiss_type Flat --index_modal image --sentence_transformer --save_embedding |
123 | 22 | ``` |
124 | 23 |
|
125 | | -4. Run training script on the Ray head node only. |
126 | | - |
127 | | -```bash |
128 | | -bash examples/qwen2_5_vl_7b_geo3k_grpo.sh |
129 | | -``` |
130 | | - |
131 | | -See the **[veRL's official doc](https://verl.readthedocs.io/en/latest/start/multinode.html)** for more details about multi-node training and Ray debugger. |
132 | | - |
133 | | -## Other Baselines |
134 | | - |
135 | | -We also reproduced the following two baselines of the [R1-V](https://github.com/deep-agent/R1-V) project. |
136 | | -- [CLEVR-70k-Counting](examples/baselines/qwen2_5_vl_3b_clevr.sh): Train the Qwen2.5-VL-3B-Instruct model on counting problem. |
137 | | -- [GeoQA-8k](examples/baselines/qwen2_5_vl_3b_geoqa8k.sh): Train the Qwen2.5-VL-3B-Instruct model on GeoQA problem. |
138 | | - |
139 | | -## Awesome Work using EasyR1 |
140 | | - |
141 | | -- **MMR1**: Advancing the Frontiers of Multimodal Reasoning. [![[code]](https://img.shields.io/github/stars/LengSicong/MMR1)](https://github.com/LengSicong/MMR1) |
142 | | -- **Vision-R1**: Incentivizing Reasoning Capability in Multimodal Large Language Models. [![[code]](https://img.shields.io/github/stars/Osilly/Vision-R1)](https://github.com/Osilly/Vision-R1) [![[arxiv]](https://img.shields.io/badge/arxiv-2503.06749-blue)](https://arxiv.org/abs/2503.06749) |
143 | | -- **Seg-Zero**: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement. [![[code]](https://img.shields.io/github/stars/dvlab-research/Seg-Zero)](https://github.com/dvlab-research/Seg-Zero) [![[arxiv]](https://img.shields.io/badge/arxiv-2503.06520-blue)](https://arxiv.org/abs/2503.06520) |
144 | | -- **MetaSpatial**: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse. [![[code]](https://img.shields.io/github/stars/PzySeere/MetaSpatial)](https://github.com/PzySeere/MetaSpatial) [![[arxiv]](https://img.shields.io/badge/arxiv-2503.18470-blue)](https://arxiv.org/abs/2503.18470) |
145 | | -- **Temporal-R1**: Envolving Temporal Reasoning Capability into LMMs via Temporal Consistent Reward. [![[code]](https://img.shields.io/github/stars/appletea233/Temporal-R1)](https://github.com/appletea233/Temporal-R1) |
146 | | -- **NoisyRollout**: Reinforcing Visual Reasoning with Data Augmentation. [![[code]](https://img.shields.io/github/stars/John-AI-Lab/NoisyRollout)](https://github.com/John-AI-Lab/NoisyRollout) [![[arxiv]](https://img.shields.io/badge/arxiv-2504.13055-blue)](https://arxiv.org/pdf/2504.13055) |
147 | | -- **GUI-R1**: A Generalist R1-Style Vision-Language Action Model For GUI Agents. [![[code]](https://img.shields.io/github/stars/ritzz-ai/GUI-R1)](https://github.com/ritzz-ai/GUI-R1) [![[arxiv]](https://img.shields.io/badge/arxiv-2504.10458-blue)](https://arxiv.org/abs/2504.10458) |
148 | | - |
149 | | -## TODO |
150 | | - |
151 | | -- Support LoRA (high priority). |
152 | | -- Support ulysses parallelism for VLMs (middle priority). |
153 | | -- Support more VLM architectures. |
154 | | - |
155 | | -> [!NOTE] |
156 | | -> We will not provide scripts for supervised fine-tuning and inference in this project. If you have such requirements, we recommend using [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory). |
157 | | -
|
158 | | -### Known bugs |
159 | | - |
160 | | -These features are temporarily disabled for now, we plan to fix them one-by-one in the future updates. |
161 | | - |
162 | | -- Vision language models are not compatible with ulysses parallelism yet. |
| 24 | +### Launch RL |
163 | 25 |
|
164 | | -## Discussion Group |
| 26 | +#### Tool Environment Serving |
165 | 27 |
|
166 | | -👋 Join our [WeChat group](assets/wechat.jpg). |
| 28 | +1. Get the IP address of the server |
167 | 29 |
|
168 | | -## FAQs |
| 30 | + ```shell |
| 31 | + hostname --ip-address |
| 32 | + ``` |
169 | 33 |
|
170 | | -> ValueError: Image features and image tokens do not match: tokens: 8192, features 9800 |
| 34 | +2. Start serving |
171 | 35 |
|
172 | | -Increase the `data.max_prompt_length` or reduce the `data.max_pixels`. |
| 36 | + ```shell |
| 37 | + python rag_serving/serving.py --config rag_serving/serving_config.yaml --num_retriever 4 --port 42354 |
| 38 | + ``` |
173 | 39 |
|
174 | | -> RuntimeError: CUDA Error: out of memory at /workspace/csrc/cumem_allocator.cpp:62 |
| 40 | +#### RL Training |
175 | 41 |
|
176 | | -Reduce the `worker.rollout.gpu_memory_utilization` and enable `worker.actor.offload.offload_params`. |
177 | 42 |
|
178 | | -> RuntimeError: 0 active drivers ([]). There should only be one. |
179 | 43 |
|
180 | | -Uninstall `deepspeed` from the current python environment. |
| 44 | +## General VQA |
0 commit comments