Skip to content

Request for W&B/logs for RL reproduction #5

@slchenchn

Description

@slchenchn

Hi, thanks for releasing the code.

I am trying to reproduce the DR.KERNEL-14B RL training using drkernel/kernel/scripts/rl/14b_coldstart_trloo_mrs_pr_prs.sh. However, my training dynamics look different from those reported in the paper.

In my run, entropy keeps decreasing, and the accuracy stays below the reported result across 170 steps.

  • actor/avg_entropy: about 0.45 -> 0.13, while the paper reports relatively stable entropy around 0.5
  • best last-turn Fast@1.2: about 0.20, while the paper reports 0.25

Would it be possible to share a successful W&B run, training log, or a partial/sanitized RL log?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions