Request for W&B/logs for RL reproduction

Hi, thanks for releasing the code.

I am trying to reproduce the DR.KERNEL-14B RL training using `drkernel/kernel/scripts/rl/14b_coldstart_trloo_mrs_pr_prs.sh`. However, my training dynamics look different from those reported in the paper.

In my run, entropy keeps decreasing, and the accuracy stays below the reported result across 170 steps.

* `actor/avg_entropy`: about `0.45 -> 0.13`, while the paper reports relatively stable entropy around `0.5`
* best last-turn Fast@1.2: about `0.20`, while the paper reports `0.25`

Would it be possible to share a successful W&B run, training log, or a partial/sanitized RL log?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for W&B/logs for RL reproduction #5

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Request for W&B/logs for RL reproduction #5

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions