Skip to content

perf(rlhf): batch chosen+rejected forwards to cut DPO from 4 to 2 passes

669b64b
Select commit
Loading
Failed to load commit list.
Draft

Implement RLHF DPO (Direct Preference Optimization) training #1403

perf(rlhf): batch chosen+rejected forwards to cut DPO from 4 to 2 passes
669b64b
Select commit
Loading
Failed to load commit list.

Workflow runs completed with no jobs