Commit 641008c
committed
feat(trainer): add DPO trainer with FSDP backend
Add Direct Preference Optimization (Rafailov et al. 2023) as a new
trainer. The policy is directly optimized to prefer chosen over
rejected responses via a contrastive loss on log-probability ratios
against a frozen reference model, removing the need for a separately
trained reward model.
Reference logprobs are computed online each step by a colocated ref
engine, following the PPO/GRPO pattern. FSDP is the supported backend;
Megatron and Archon variants raise NotImplementedError as placeholders.
Verified on Qwen2.5-7B-Base + Anthropic/hh-rlhf (1 epoch, no SFT):
reward_accuracy rises from 0.50 to ~0.70 and reward_margin grows
monotonically, matching the original DPO paper's HH-RLHF results.1 parent e6f3c3c commit 641008c
24 files changed
Lines changed: 2051 additions & 10 deletions
File tree
- areal
- api
- dataset
- engine
- experimental/engine
- trainer
- dpo
- utils/functional
- docs
- en
- algorithms
- zh
- algorithms
- examples/alignment
- tests
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
197 | 197 | | |
198 | 198 | | |
199 | 199 | | |
200 | | - | |
| 200 | + | |
| 201 | + | |
201 | 202 | | |
202 | 203 | | |
203 | 204 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
18 | | - | |
19 | | - | |
| 18 | + | |
| 19 | + | |
20 | 20 | | |
21 | 21 | | |
| 22 | + | |
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
| |||
29 | 30 | | |
30 | 31 | | |
31 | 32 | | |
| 33 | + | |
32 | 34 | | |
33 | 35 | | |
34 | 36 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2637 | 2637 | | |
2638 | 2638 | | |
2639 | 2639 | | |
| 2640 | + | |
| 2641 | + | |
| 2642 | + | |
| 2643 | + | |
| 2644 | + | |
| 2645 | + | |
| 2646 | + | |
| 2647 | + | |
| 2648 | + | |
| 2649 | + | |
| 2650 | + | |
| 2651 | + | |
| 2652 | + | |
| 2653 | + | |
| 2654 | + | |
| 2655 | + | |
| 2656 | + | |
| 2657 | + | |
| 2658 | + | |
| 2659 | + | |
| 2660 | + | |
| 2661 | + | |
| 2662 | + | |
| 2663 | + | |
| 2664 | + | |
| 2665 | + | |
| 2666 | + | |
| 2667 | + | |
| 2668 | + | |
| 2669 | + | |
| 2670 | + | |
| 2671 | + | |
| 2672 | + | |
| 2673 | + | |
| 2674 | + | |
| 2675 | + | |
| 2676 | + | |
| 2677 | + | |
| 2678 | + | |
| 2679 | + | |
| 2680 | + | |
| 2681 | + | |
| 2682 | + | |
| 2683 | + | |
| 2684 | + | |
2640 | 2685 | | |
2641 | 2686 | | |
2642 | 2687 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
113 | 113 | | |
114 | 114 | | |
115 | 115 | | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
116 | 126 | | |
117 | 127 | | |
118 | 128 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| 9 | + | |
9 | 10 | | |
10 | 11 | | |
11 | 12 | | |
12 | 13 | | |
13 | 14 | | |
| 15 | + | |
14 | 16 | | |
15 | 17 | | |
16 | 18 | | |
| |||
21 | 23 | | |
22 | 24 | | |
23 | 25 | | |
| 26 | + | |
24 | 27 | | |
25 | 28 | | |
26 | 29 | | |
27 | 30 | | |
28 | 31 | | |
| 32 | + | |
29 | 33 | | |
30 | 34 | | |
31 | 35 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1958 | 1958 | | |
1959 | 1959 | | |
1960 | 1960 | | |
| 1961 | + | |
| 1962 | + | |
| 1963 | + | |
| 1964 | + | |
| 1965 | + | |
| 1966 | + | |
| 1967 | + | |
| 1968 | + | |
| 1969 | + | |
| 1970 | + | |
| 1971 | + | |
| 1972 | + | |
| 1973 | + | |
| 1974 | + | |
| 1975 | + | |
| 1976 | + | |
| 1977 | + | |
| 1978 | + | |
| 1979 | + | |
| 1980 | + | |
| 1981 | + | |
| 1982 | + | |
| 1983 | + | |
| 1984 | + | |
| 1985 | + | |
| 1986 | + | |
| 1987 | + | |
| 1988 | + | |
| 1989 | + | |
| 1990 | + | |
| 1991 | + | |
| 1992 | + | |
| 1993 | + | |
| 1994 | + | |
| 1995 | + | |
| 1996 | + | |
| 1997 | + | |
| 1998 | + | |
| 1999 | + | |
| 2000 | + | |
| 2001 | + | |
| 2002 | + | |
| 2003 | + | |
| 2004 | + | |
| 2005 | + | |
| 2006 | + | |
| 2007 | + | |
| 2008 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1930 | 1930 | | |
1931 | 1931 | | |
1932 | 1932 | | |
| 1933 | + | |
| 1934 | + | |
| 1935 | + | |
| 1936 | + | |
| 1937 | + | |
| 1938 | + | |
| 1939 | + | |
| 1940 | + | |
| 1941 | + | |
| 1942 | + | |
| 1943 | + | |
| 1944 | + | |
| 1945 | + | |
| 1946 | + | |
| 1947 | + | |
| 1948 | + | |
| 1949 | + | |
| 1950 | + | |
| 1951 | + | |
| 1952 | + | |
| 1953 | + | |
| 1954 | + | |
| 1955 | + | |
| 1956 | + | |
| 1957 | + | |
| 1958 | + | |
| 1959 | + | |
| 1960 | + | |
| 1961 | + | |
| 1962 | + | |
| 1963 | + | |
| 1964 | + | |
| 1965 | + | |
| 1966 | + | |
| 1967 | + | |
| 1968 | + | |
| 1969 | + | |
| 1970 | + | |
| 1971 | + | |
| 1972 | + | |
| 1973 | + | |
| 1974 | + | |
| 1975 | + | |
| 1976 | + | |
| 1977 | + | |
| 1978 | + | |
| 1979 | + | |
| 1980 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1468 | 1468 | | |
1469 | 1469 | | |
1470 | 1470 | | |
| 1471 | + | |
| 1472 | + | |
| 1473 | + | |
| 1474 | + | |
| 1475 | + | |
| 1476 | + | |
| 1477 | + | |
| 1478 | + | |
| 1479 | + | |
| 1480 | + | |
| 1481 | + | |
| 1482 | + | |
| 1483 | + | |
| 1484 | + | |
| 1485 | + | |
| 1486 | + | |
| 1487 | + | |
| 1488 | + | |
| 1489 | + | |
| 1490 | + | |
| 1491 | + | |
| 1492 | + | |
| 1493 | + | |
| 1494 | + | |
| 1495 | + | |
| 1496 | + | |
| 1497 | + | |
| 1498 | + | |
| 1499 | + | |
| 1500 | + | |
| 1501 | + | |
| 1502 | + | |
| 1503 | + | |
| 1504 | + | |
| 1505 | + | |
| 1506 | + | |
| 1507 | + | |
| 1508 | + | |
| 1509 | + | |
| 1510 | + | |
| 1511 | + | |
| 1512 | + | |
| 1513 | + | |
| 1514 | + | |
| 1515 | + | |
| 1516 | + | |
| 1517 | + | |
| 1518 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
| 3 | + | |
3 | 4 | | |
4 | 5 | | |
5 | 6 | | |
6 | 7 | | |
7 | | - | |
| 8 | + | |
0 commit comments