Commit 17a7a16
authored
FIX Error when prefix tuning Gemma 4 (#3205)
There was an issue with applying prefix tuning to Gemma 4 because the
model uses different head dimensions for layers that use sliding window
attention. As prefix tuning only initializes a single projection matrix
that is used for all layers, this would lead to a shape mismatch.
The solution is to "overprovision" the matrix and then slice the prefix
down to size of the layer is smaller. This is not quite as parameter
efficient as it could be, but the overhead shouldn't be too large.
For robustness, we also skip layers if the matrix is underprovisioned,
but we warn about it and raise an error if all layers are skipped.
Alternatively, we could implement one project per layer, each with the
right size, like in google-deepmind/gemma#631.
However, this would be a big refactor and also very hard to make
backwards compatible with existing checkpoints, so going with the less
efficient solution is preferable.
This PR also contains an independent, single line fix to a prefix tuning
test that was referencing a non-existing model.1 parent 9cda9e3 commit 17a7a16
4 files changed
Lines changed: 160 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
69 | 69 | | |
70 | 70 | | |
71 | 71 | | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
72 | 94 | | |
73 | 95 | | |
74 | 96 | | |
| |||
785 | 807 | | |
786 | 808 | | |
787 | 809 | | |
788 | | - | |
| 810 | + | |
789 | 811 | | |
790 | 812 | | |
791 | 813 | | |
| |||
815 | 837 | | |
816 | 838 | | |
817 | 839 | | |
818 | | - | |
819 | | - | |
| 840 | + | |
| 841 | + | |
| 842 | + | |
| 843 | + | |
| 844 | + | |
| 845 | + | |
| 846 | + | |
| 847 | + | |
| 848 | + | |
| 849 | + | |
| 850 | + | |
| 851 | + | |
| 852 | + | |
| 853 | + | |
| 854 | + | |
| 855 | + | |
| 856 | + | |
| 857 | + | |
| 858 | + | |
| 859 | + | |
| 860 | + | |
| 861 | + | |
| 862 | + | |
| 863 | + | |
| 864 | + | |
| 865 | + | |
| 866 | + | |
820 | 867 | | |
821 | 868 | | |
822 | 869 | | |
| 870 | + | |
823 | 871 | | |
| 872 | + | |
| 873 | + | |
| 874 | + | |
| 875 | + | |
| 876 | + | |
| 877 | + | |
| 878 | + | |
| 879 | + | |
| 880 | + | |
| 881 | + | |
| 882 | + | |
| 883 | + | |
| 884 | + | |
| 885 | + | |
| 886 | + | |
| 887 | + | |
824 | 888 | | |
825 | 889 | | |
826 | 890 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1161 | 1161 | | |
1162 | 1162 | | |
1163 | 1163 | | |
1164 | | - | |
1165 | | - | |
| 1164 | + | |
| 1165 | + | |
| 1166 | + | |
| 1167 | + | |
| 1168 | + | |
| 1169 | + | |
| 1170 | + | |
| 1171 | + | |
1166 | 1172 | | |
| 1173 | + | |
1167 | 1174 | | |
1168 | 1175 | | |
| 1176 | + | |
1169 | 1177 | | |
1170 | 1178 | | |
1171 | 1179 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1106 | 1106 | | |
1107 | 1107 | | |
1108 | 1108 | | |
| 1109 | + | |
| 1110 | + | |
| 1111 | + | |
| 1112 | + | |
| 1113 | + | |
| 1114 | + | |
| 1115 | + | |
| 1116 | + | |
| 1117 | + | |
| 1118 | + | |
| 1119 | + | |
| 1120 | + | |
| 1121 | + | |
| 1122 | + | |
| 1123 | + | |
| 1124 | + | |
| 1125 | + | |
| 1126 | + | |
| 1127 | + | |
| 1128 | + | |
| 1129 | + | |
| 1130 | + | |
| 1131 | + | |
| 1132 | + | |
| 1133 | + | |
| 1134 | + | |
| 1135 | + | |
| 1136 | + | |
| 1137 | + | |
| 1138 | + | |
| 1139 | + | |
| 1140 | + | |
| 1141 | + | |
| 1142 | + | |
| 1143 | + | |
| 1144 | + | |
| 1145 | + | |
| 1146 | + | |
| 1147 | + | |
| 1148 | + | |
| 1149 | + | |
| 1150 | + | |
| 1151 | + | |
| 1152 | + | |
| 1153 | + | |
| 1154 | + | |
| 1155 | + | |
| 1156 | + | |
| 1157 | + | |
| 1158 | + | |
| 1159 | + | |
| 1160 | + | |
| 1161 | + | |
| 1162 | + | |
| 1163 | + | |
| 1164 | + | |
| 1165 | + | |
| 1166 | + | |
| 1167 | + | |
| 1168 | + | |
| 1169 | + | |
| 1170 | + | |
| 1171 | + | |
| 1172 | + | |
| 1173 | + | |
| 1174 | + | |
| 1175 | + | |
| 1176 | + | |
| 1177 | + | |
| 1178 | + | |
| 1179 | + | |
| 1180 | + | |
| 1181 | + | |
| 1182 | + | |
| 1183 | + | |
| 1184 | + | |
| 1185 | + | |
| 1186 | + | |
| 1187 | + | |
| 1188 | + | |
| 1189 | + | |
| 1190 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5401 | 5401 | | |
5402 | 5402 | | |
5403 | 5403 | | |
5404 | | - | |
| 5404 | + | |
5405 | 5405 | | |
5406 | 5406 | | |
5407 | 5407 | | |
| |||
0 commit comments