Skip to content

Commit 34c5d79

Browse files
Fix link in README for ByteDance Seed paper
1 parent c474f3b commit 34c5d79

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ After data generation, you can use [LLaMA-Factory](https://github.com/hiyouga/LL
7575
## Effectiveness of GraphGen
7676
### Pretrain
7777

78-
Inspired by Kimi-K2's [technical report](https://arxiv.org/pdf/2507.20534) (Improving Token Utility with Rephrasing) and ByteDance Seed's [Reformulation for Pretraining Data Augmentation](https://arxiv.org/abs/2507.15752) (MGA framework), GraphGen added a **rephrase pipeline** — using LLM-driven reformulation to generate diverse variants of the same corpus instead of redundant repetition.
78+
Inspired by Kimi-K2's [technical report](https://arxiv.org/pdf/2507.20534) (Improving Token Utility with Rephrasing) and ByteDance Seed's [Reformulation for Pretraining Data Augmentation](https://arxiv.org/abs/2502.04235) (MGA framework), GraphGen added a **rephrase pipeline** — using LLM-driven reformulation to generate diverse variants of the same corpus instead of redundant repetition.
7979

8080
**Setup:** Qwen3-0.6B trained from scratch on [SlimPajama-6B](https://huggingface.co/datasets/DKYoon/SlimPajama-6B).
8181

0 commit comments

Comments
 (0)