Hi, thanks for releasing the code.
I am trying to reproduce the DR.KERNEL-14B RL training using drkernel/kernel/scripts/rl/14b_coldstart_trloo_mrs_pr_prs.sh. However, my training dynamics look different from those reported in the paper.
In my run, entropy keeps decreasing, and the accuracy stays below the reported result across 170 steps.
actor/avg_entropy: about 0.45 -> 0.13, while the paper reports relatively stable entropy around 0.5
- best last-turn Fast@1.2: about
0.20, while the paper reports 0.25
Would it be possible to share a successful W&B run, training log, or a partial/sanitized RL log?
Hi, thanks for releasing the code.
I am trying to reproduce the DR.KERNEL-14B RL training using
drkernel/kernel/scripts/rl/14b_coldstart_trloo_mrs_pr_prs.sh. However, my training dynamics look different from those reported in the paper.In my run, entropy keeps decreasing, and the accuracy stays below the reported result across 170 steps.
actor/avg_entropy: about0.45 -> 0.13, while the paper reports relatively stable entropy around0.50.20, while the paper reports0.25Would it be possible to share a successful W&B run, training log, or a partial/sanitized RL log?