Skip to content

Commit 67281d5

Browse files
superfartherbeanbunChenZiHong-Gavingemini-code-assist[bot]
authored
update README (#186)
* update README * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: beanbun <yuanzhonghang@pjlab.org.cn> Co-authored-by: chenzihong <58508660+ChenZiHong-Gavin@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
1 parent 39e2352 commit 67281d5

1 file changed

Lines changed: 9 additions & 0 deletions

File tree

README.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,16 @@ Here is post-training result which **over 50% SFT data** comes from GraphGen and
9999
| Math | AIME24 | **20.6** | 16.7 |
100100
| | AIME25 | **22.7** | 7.2 |
101101

102+
### RLVR
103+
We applied reinforcement learning directly to the Qwen2.5-7B base model without any prior SFT. Here are the results.
104+
| Domain | Dataset | Ours | Qwen2.5-7B-Instruct (baseline) |
105+
|:---------:|:---------------------------------------------------------:|:--------:|:------------------------------:|
106+
| Plant | [SeedBench](https://github.com/open-sciencelab/SeedBench) | **66.8** | 51.5 |
107+
| Law | LawBench | **55.2** | 54.76 |
108+
| Medicine | MedQA | **87.1** | 80.7 |
109+
| General | BBH | **55.3** | 49.6 |
102110

111+
More details can be found at [`examples/generate/generate_masked_fill_in_blank_qa`](./examples/generate/generate_masked_fill_in_blank_qa).
103112

104113
## ⚙️ Support List
105114

0 commit comments

Comments
 (0)