Commit 2afd94e
authored
fix(gemma4): Gemma4 packing, attention mask, and fixesMoE routing (#2116)
* propagate image_position_ids through VLM neat packing
Signed-off-by: shruthan <shrutha.radhakrishna@servicenow.com>
* propagate mm_token_type_ids through VLM neat packing
Signed-off-by: shruthan <shrutha.radhakrishna@servicenow.com>
* fix(models): convert 4D bool attention mask to additive format for eager attention
The packed collater emits a 4D block-causal bool mask. Eager attention adds
this directly to attn_weights (0/1 instead of 0/-inf), so no positions are
masked — the model sees across sequence boundaries and future tokens.
Also fixes _derive_padding_mask, which was applying logical_not to all mask
shapes; for 4D masks the pad positions come from the diagonal.
Signed-off-by: shruthan <shrutha.radhakrishna@servicenow.com>
* fix(gemma4): select top-k experts from router_probs not expert_scores
Consistent with HF Gemma4Router which applies top-k on softmax
probabilities, not raw logits.
Signed-off-by: shruthan <shrutha.radhakrishna@servicenow.com>
* linter
Signed-off-by: shruthan <shrutha.radhakrishna@servicenow.com>
* add tests
Signed-off-by: shruthan <shrutha.radhakrishna@servicenow.com>
* cleanup
Signed-off-by: shruthan <shrutha.radhakrishna@servicenow.com>
* fix(vlm): add configurable Gemma4 thinking-prefix injection and packed example config
Signed-off-by: shruthan <shrutha.radhakrishna@servicenow.com>
* fix(gemma4): handle EP_SHARD mesh in state dict adapter checkpoint load
Signed-off-by: shruthan <shrutha.radhakrishna@servicenow.com>
* update model in example
Signed-off-by: shruthan <shrutha.radhakrishna@servicenow.com>
* linter
Signed-off-by: shruthan <shrutha.radhakrishna@servicenow.com>
---------
Signed-off-by: shruthan <shrutha.radhakrishna@servicenow.com>1 parent 4767694 commit 2afd94e
11 files changed
Lines changed: 445 additions & 34 deletions
File tree
- examples/vlm_finetune/gemma4
- nemo_automodel
- components
- datasets/vlm
- models/gemma4_moe
- recipes/vlm
- tests/unit_tests
- datasets/vlm
- models/gemma4
Lines changed: 107 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1496 | 1496 | | |
1497 | 1497 | | |
1498 | 1498 | | |
| 1499 | + | |
| 1500 | + | |
| 1501 | + | |
| 1502 | + | |
| 1503 | + | |
| 1504 | + | |
1499 | 1505 | | |
1500 | 1506 | | |
1501 | 1507 | | |
| |||
1526 | 1532 | | |
1527 | 1533 | | |
1528 | 1534 | | |
| 1535 | + | |
1529 | 1536 | | |
1530 | 1537 | | |
1531 | 1538 | | |
| |||
1541 | 1548 | | |
1542 | 1549 | | |
1543 | 1550 | | |
1544 | | - | |
| 1551 | + | |
1545 | 1552 | | |
1546 | 1553 | | |
1547 | 1554 | | |
| |||
1804 | 1811 | | |
1805 | 1812 | | |
1806 | 1813 | | |
1807 | | - | |
1808 | | - | |
1809 | | - | |
1810 | | - | |
1811 | | - | |
1812 | | - | |
1813 | | - | |
1814 | 1814 | | |
1815 | 1815 | | |
1816 | 1816 | | |
| |||
1885 | 1885 | | |
1886 | 1886 | | |
1887 | 1887 | | |
| 1888 | + | |
| 1889 | + | |
| 1890 | + | |
| 1891 | + | |
| 1892 | + | |
| 1893 | + | |
| 1894 | + | |
| 1895 | + | |
| 1896 | + | |
| 1897 | + | |
| 1898 | + | |
| 1899 | + | |
| 1900 | + | |
| 1901 | + | |
| 1902 | + | |
| 1903 | + | |
| 1904 | + | |
| 1905 | + | |
| 1906 | + | |
1888 | 1907 | | |
1889 | 1908 | | |
1890 | 1909 | | |
| |||
1900 | 1919 | | |
1901 | 1920 | | |
1902 | 1921 | | |
1903 | | - | |
1904 | | - | |
| 1922 | + | |
1905 | 1923 | | |
1906 | 1924 | | |
1907 | 1925 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
922 | 922 | | |
923 | 923 | | |
924 | 924 | | |
925 | | - | |
| 925 | + | |
| 926 | + | |
| 927 | + | |
| 928 | + | |
| 929 | + | |
| 930 | + | |
| 931 | + | |
| 932 | + | |
| 933 | + | |
926 | 934 | | |
927 | 935 | | |
928 | 936 | | |
929 | 937 | | |
930 | 938 | | |
| 939 | + | |
931 | 940 | | |
932 | 941 | | |
933 | 942 | | |
| |||
998 | 1007 | | |
999 | 1008 | | |
1000 | 1009 | | |
| 1010 | + | |
| 1011 | + | |
1001 | 1012 | | |
1002 | 1013 | | |
1003 | 1014 | | |
| |||
Lines changed: 24 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
50 | | - | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
51 | 58 | | |
52 | 59 | | |
53 | 60 | | |
| |||
302 | 309 | | |
303 | 310 | | |
304 | 311 | | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
305 | 316 | | |
306 | 317 | | |
307 | 318 | | |
| |||
321 | 332 | | |
322 | 333 | | |
323 | 334 | | |
| 335 | + | |
324 | 336 | | |
325 | 337 | | |
326 | 338 | | |
327 | 339 | | |
328 | 340 | | |
| 341 | + | |
329 | 342 | | |
330 | 343 | | |
331 | 344 | | |
| |||
345 | 358 | | |
346 | 359 | | |
347 | 360 | | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
348 | 367 | | |
349 | 368 | | |
350 | 369 | | |
| |||
355 | 374 | | |
356 | 375 | | |
357 | 376 | | |
| 377 | + | |
| 378 | + | |
358 | 379 | | |
359 | 380 | | |
360 | 381 | | |
| |||
368 | 389 | | |
369 | 390 | | |
370 | 391 | | |
| 392 | + | |
371 | 393 | | |
372 | 394 | | |
373 | 395 | | |
| |||
379 | 401 | | |
380 | 402 | | |
381 | 403 | | |
| 404 | + | |
382 | 405 | | |
383 | 406 | | |
384 | 407 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
114 | 114 | | |
115 | 115 | | |
116 | 116 | | |
117 | | - | |
118 | | - | |
119 | | - | |
| 117 | + | |
120 | 118 | | |
121 | 119 | | |
122 | 120 | | |
| |||
264 | 262 | | |
265 | 263 | | |
266 | 264 | | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
267 | 285 | | |
268 | 286 | | |
269 | 287 | | |
| |||
356 | 374 | | |
357 | 375 | | |
358 | 376 | | |
359 | | - | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
360 | 381 | | |
361 | 382 | | |
362 | 383 | | |
| |||
0 commit comments