Skip to content

Commit 5f6c43e

Browse files
authored
Merge pull request #858 from sergiopaniego/fix-name
Fixed links inside `<Tip>` rendering
2 parents 8c7a64f + 3603032 commit 5f6c43e

7 files changed

Lines changed: 13 additions & 1 deletion

File tree

chapters/en/chapter12/1.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ Don't worry if you're missing some of these – we'll explain key concepts as we
7979

8080
<Tip>
8181

82-
If you don't have all the prerequisites, check out this [course](chapter1/1.mdx) from units 1 to 11
82+
If you don't have all the prerequisites, check out this [course](chapter1/1.mdx) from units 1 to 11.
8383

8484
</Tip>
8585

chapters/en/chapter12/2.mdx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,9 @@ Welcome to the first page!
55
We're going to start our journey into the exciting world of Reinforcement Learning (RL) and discover how it's revolutionizing the way we train Language Models like the ones you might use every day.
66

77
<Tip>
8+
89
In this chapter, we are focusing on reinforcement learning for language models. However, reinforcement learning is a broad field with many applications beyond language models. If you're interested in learning more about reinforcement learning, you should check out the [Deep Reinforcement Learning course](https://huggingface.co/courses/deep-rl-course/en/unit1/introduction).
10+
911
</Tip>
1012

1113
This page will give you a friendly and clear introduction to RL, even if you've never encountered it before. We'll break down the core ideas and see why RL is becoming so important in the field of Large Language Models (LLMs).

chapters/en/chapter12/3.mdx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,9 @@ In the next chapter, we will build on this knowledge and implement GRPO in pract
1111
The initial goal of the paper was to explore whether pure reinforcement learning could develop reasoning capabilities without supervised fine-tuning.
1212

1313
<Tip>
14+
1415
Up until that point, all the popular LLMs required some supervised fine-tuning, which we explored in [chapter 11](/chapters/en/chapter11/1).
16+
1517
</Tip>
1618

1719
## The Breakthrough 'Aha' Moment

chapters/en/chapter12/3a.mdx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,9 @@
11
# Advanced Understanding of Group Relative Policy Optimization (GRPO) in DeepSeekMath
22

33
<Tip>
4+
45
This section dives into the technical and mathematical details of GRPO. It was authored by [Shirin Yamani](https://github.com/shirinyamani).
6+
57
</Tip>
68

79
Let's deepen our understanding of GRPO so that we can improve our model's training process.

chapters/en/chapter12/4.mdx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,9 @@ In this page, we'll learn how to implement Group Relative Policy Optimization (G
55
We'll explore the core concepts of GRPO as they are embodied in TRL's GRPOTrainer, using snippets from the official TRL documentation to guide us.
66

77
<Tip>
8+
89
This chapter is aimed at TRL beginners. If you are already familiar with TRL, you might want to also check out the [Open R1 implementation](https://github.com/huggingface/open-r1/blob/main/src/open_r1/grpo.py) of GRPO.
10+
911
</Tip>
1012

1113
First, let's remind ourselves of some of the important concepts of GRPO algorithm:

chapters/en/chapter12/5.mdx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,9 @@
99
Now that you've seen the theory, let's put it into practice! In this exercise, you'll fine-tune a model with GRPO.
1010

1111
<Tip>
12+
1213
This exercise was written by LLM fine-tuning expert [@mlabonne](https://huggingface.co/mlabonne).
14+
1315
</Tip>
1416

1517
## Install dependencies

chapters/en/chapter12/6.mdx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,9 @@ model = FastLanguageModel.get_peft_model(
7171
This code loads the model in 4-bit quantization to save memory and applies LoRA (Low-Rank Adaptation) for efficient fine-tuning. The `target_modules` parameter specifies which layers of the model to fine-tune, and `use_gradient_checkpointing` enables training with longer contexts.
7272

7373
<Tip>
74+
7475
We won't cover the details of LoRA in this chapter, but you can learn more in [Chapter 11](/en/chapter11/3).
76+
7577
</Tip>
7678

7779
## Data Preparation

0 commit comments

Comments
 (0)