In Chapter 12 of LLM course, in a file named 3b.mdx, where the targent function is explained here, there is a missing comma after the first $A_i$, which can be misleading as we are calculating the $\min$. Here the correct equation:
$$J_{GRPO}(\theta) = \left[\frac{1}{G} \sum_{i=1}^{G} \min \left( \frac{\pi_{\theta}(o_i|q)}{\pi_{\theta_{old}}(o_i|q)} A_i, \text{clip}\left( \frac{\pi_{\theta}(o_i|q)}{\pi_{\theta_{old}}(o_i|q)}, 1 - \epsilon, 1 + \epsilon \right) A_i \right)\right]- \beta D_{KL}(\pi_{\theta} || \pi_{ref})$$
In Chapter 12 of LLM course, in a file named$A_i$ , which can be misleading as we are calculating the $\min$ . Here the correct equation:
3b.mdx, where the targent function is explained here, there is a missing comma after the first