53 lines (38 loc) · 1.04 KB

By Learning Path

路线一: 开源基础模型主线

Llama 2
Llama 3
Qwen2
Qwen2.5
Mistral 7B / Mixtral 8x7B

目标: 建立对主流开源 base model 与 instruct model 演进的整体认知。

路线二: 后训练主线

InstructGPT / RLHF
PPO
DPO
GRPO
topics/post_training.md

目标: 理解从 SFT 到 preference optimization 再到 reasoning-oriented RL 的演进。

路线三: 长上下文主线

RoFormer / RoPE
ALiBi
Position Interpolation
YaRN
Ring Attention
topics/long_context.md

目标: 理解长上下文常见扩展路线以及它们的工程代价。

路线四: 架构与效率主线

Attention Is All You Need
FlashAttention
Switch Transformer
Mixtral 8x7B
DeepSeek-V3
topics/moe.md

目标: 理解稠密 Transformer 到高效注意力、再到 MoE 的主流路线。

路线五: Reasoning 模型主线

DeepSeekMath
DeepSeek-R1
GRPO
GSPO
topics/reasoning_rl.md

目标: 理解 reasoning 能力提升与强化学习设计之间的关系。