File tree Expand file tree Collapse file tree
Expand file tree Collapse file tree Original file line number Diff line number Diff line change @@ -99,7 +99,16 @@ Here is post-training result which **over 50% SFT data** comes from GraphGen and
9999| Math | AIME24 | ** 20.6** | 16.7 |
100100| | AIME25 | ** 22.7** | 7.2 |
101101
102+ ### RLVR
103+ We applied reinforcement learning directly to the Qwen2.5-7B base model without any prior SFT. Here are the results.
104+ | Domain | Dataset | Ours | Qwen2.5-7B-Instruct (baseline) |
105+ | :---------:| :---------------------------------------------------------:| :--------:| :------------------------------:|
106+ | Plant | SeedBench | ** 66.8** | 51.5 |
107+ | law | LawBench | ** 55.2** | 54.76 |
108+ | Medicine | MedQA | ** 87.1** | 80.7 |
109+ | General | BBH | ** 55.3** | 49.6 |
102110
111+ More details can be found at ` examples/generate/generate_masked_fill_in_blank_qa `
103112
104113## ⚙️ Support List
105114
You can’t perform that action at this time.
0 commit comments