Commit e35e80c
authored
[Feature] Support share MTP weights. (#1672)
* Refactor MTP configuration to support weight sharing across layers. Updated MoE and MTPBlock classes to handle shared weights and adjusted layer initialization accordingly. Added share_weights parameter to MTPConfig for better control over layer behavior.
* Updated the checkpointing mechanism to ensure shared MTP heads are recomputed as necessary.
* resolve review comments1 parent 714483a commit e35e80c
3 files changed
Lines changed: 27 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
855 | 855 | | |
856 | 856 | | |
857 | 857 | | |
858 | | - | |
| 858 | + | |
| 859 | + | |
859 | 860 | | |
860 | 861 | | |
861 | 862 | | |
| |||
894 | 895 | | |
895 | 896 | | |
896 | 897 | | |
897 | | - | |
| 898 | + | |
898 | 899 | | |
899 | 900 | | |
900 | 901 | | |
| |||
1015 | 1016 | | |
1016 | 1017 | | |
1017 | 1018 | | |
1018 | | - | |
| 1019 | + | |
| 1020 | + | |
| 1021 | + | |
1019 | 1022 | | |
1020 | 1023 | | |
1021 | 1024 | | |
| |||
1234 | 1237 | | |
1235 | 1238 | | |
1236 | 1239 | | |
1237 | | - | |
| 1240 | + | |
| 1241 | + | |
| 1242 | + | |
| 1243 | + | |
1238 | 1244 | | |
1239 | 1245 | | |
1240 | 1246 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
23 | 26 | | |
24 | 27 | | |
25 | 28 | | |
| |||
30 | 33 | | |
31 | 34 | | |
32 | 35 | | |
| 36 | + | |
33 | 37 | | |
34 | 38 | | |
35 | 39 | | |
| |||
38 | 42 | | |
39 | 43 | | |
40 | 44 | | |
| 45 | + | |
41 | 46 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| 10 | + | |
10 | 11 | | |
11 | 12 | | |
12 | 13 | | |
| |||
25 | 26 | | |
26 | 27 | | |
27 | 28 | | |
| 29 | + | |
28 | 30 | | |
29 | 31 | | |
30 | 32 | | |
| |||
43 | 45 | | |
44 | 46 | | |
45 | 47 | | |
46 | | - | |
| 48 | + | |
47 | 49 | | |
48 | 50 | | |
49 | 51 | | |
| |||
58 | 60 | | |
59 | 61 | | |
60 | 62 | | |
61 | | - | |
| 63 | + | |
62 | 64 | | |
63 | 65 | | |
64 | 66 | | |
65 | 67 | | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
66 | 73 | | |
67 | | - | |
68 | 74 | | |
69 | 75 | | |
70 | 76 | | |
| |||
97 | 103 | | |
98 | 104 | | |
99 | 105 | | |
100 | | - | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
101 | 109 | | |
102 | 110 | | |
103 | 111 | | |
| |||
0 commit comments