update README

beanbun · beanbun · commit 421776514eeb · 2026-04-13T12:53:53.000+08:00
diff --git a/README.md b/README.md
@@ -99,7 +99,16 @@ Here is post-training result which **over 50% SFT data** comes from GraphGen and
 |   Math    |                          AIME24                           | **20.6** |              16.7              |
 |           |                          AIME25                           | **22.7** |              7.2               |
 
+### RLVR
+We applied reinforcement learning directly to the Qwen2.5-7B base model without any prior SFT. Here are the results.
+|  Domain   |                          Dataset                          |   Ours   | Qwen2.5-7B-Instruct (baseline) |
+|:---------:|:---------------------------------------------------------:|:--------:|:------------------------------:|
+|   Plant   |                           SeedBench                       | **66.8** |              51.5              |
+|    law    |                           LawBench                        | **55.2** |              54.76             |
+|  Medicine |                            MedQA                          | **87.1** |              80.7              |
+|  General  |                             BBH                           | **55.3** |              49.6              |
 
+More details can be found at `examples/generate/generate_masked_fill_in_blank_qa`
 
 ## ⚙️ Support List