Commit 9bbcf4a
cp: 1813 fix: FSDP2 meta-device crash for Qwen3.5 GatedDeltaNet fp32 params (#1869)
* fix: FSDP2 meta-device crash for Qwen3.5 GatedDeltaNet fp32 params (#1813)
* fix: FSDP2 meta-device crash for Qwen3.5 GatedDeltaNet fp32 params
PR #1711 changed _should_load_before_shard to return False for multi-GPU
DP, so models stay on meta device through FSDP wrapping. This broke the
__dict__ trick in PR #1710's patch_hf_model.
Move the gate computation into _Fp32ParamHolder.forward() so FSDP's
unshard/reshard lifecycle fires naturally. Override CPAwareGatedDeltaNet
forward for both CP and non-CP paths to route through the holder.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
* chore: remove test yaml not intended for PR
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
* fix: add sentinel to prevent __getattr__ re-wrapping
Address Claude review: guard against re-wrapping __getattr__ on
repeated patch_hf_model calls by checking a class-level sentinel
attribute.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
* fix: add upstream version comment to _forward_no_cp
Address Claude review: note the transformers version the forward was
copied from to ease future upstream diffing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
* fix: update MoE test expectations for _forward_no_cp path
TestForwardFastPath tests expected super().forward() to be called,
but the non-CP path now uses _forward_no_cp(). Update mocks to match.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
* test: add coverage for _Fp32ParamHolder, _compute_gate, and sentinel guard
Add unit tests for:
- _Fp32ParamHolder.forward gate computation and dtype preservation
- _compute_gate routing through holder vs inline fallback
- patch_hf_model sentinel preventing __getattr__ re-wrapping
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
* test: add coverage for _forward_no_cp and forward() dispatch paths
Add 14 new tests covering the critical _forward_no_cp method (lines
91-193) and forward() dispatch logic (lines 207-213) to satisfy
codecov/patch requirements for PR #1813:
- _forward_no_cp basic forward, cache_params=None, causal_conv1d_fn
fallback, causal_conv1d_fn set, attention_mask, GQA repeat-interleave,
_compute_gate delegation, and output dtype
- forward() dispatch when _cp_mesh is None or size <= 1, parameter
pass-through, and extra CP kwargs
- _make_fp32_getattr fallback to AttributeError and real attr resolution
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: update MoE test_no_cp_does_not_forward_cache_position to use _forward_no_cp
The fast-path in CPAwareGatedDeltaNet.forward was refactored to call
self._forward_no_cp() instead of super().forward(), but this test still
mocked the base class forward and thus got called 0 times. Update the
mock target to match the new dispatch, and apply ruff format to the
two test files.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
---------
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 09b91f0 commit 9bbcf4a
3 files changed
Lines changed: 679 additions & 69 deletions
File tree
- nemo_automodel/components/models/qwen3_5_moe
- tests/unit_tests/models
- qwen3_5_moe
- qwen3_5
Lines changed: 171 additions & 13 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
77 | 77 | | |
78 | 78 | | |
79 | 79 | | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
80 | 195 | | |
81 | 196 | | |
82 | 197 | | |
| |||
88 | 203 | | |
89 | 204 | | |
90 | 205 | | |
91 | | - | |
| 206 | + | |
92 | 207 | | |
93 | | - | |
| 208 | + | |
94 | 209 | | |
95 | 210 | | |
96 | 211 | | |
| |||
299 | 414 | | |
300 | 415 | | |
301 | 416 | | |
302 | | - | |
| 417 | + | |
303 | 418 | | |
304 | 419 | | |
305 | 420 | | |
| |||
340 | 455 | | |
341 | 456 | | |
342 | 457 | | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
343 | 490 | | |
344 | 491 | | |
345 | 492 | | |
346 | 493 | | |
347 | | - | |
| 494 | + | |
| 495 | + | |
348 | 496 | | |
349 | | - | |
350 | | - | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
351 | 501 | | |
352 | 502 | | |
353 | 503 | | |
| |||
357 | 507 | | |
358 | 508 | | |
359 | 509 | | |
| 510 | + | |
360 | 511 | | |
| 512 | + | |
361 | 513 | | |
362 | 514 | | |
363 | 515 | | |
364 | 516 | | |
365 | | - | |
366 | | - | |
367 | | - | |
| 517 | + | |
| 518 | + | |
368 | 519 | | |
369 | 520 | | |
370 | | - | |
371 | | - | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
372 | 524 | | |
373 | 525 | | |
374 | 526 | | |
375 | 527 | | |
376 | 528 | | |
377 | | - | |
| 529 | + | |
378 | 530 | | |
379 | 531 | | |
380 | | - | |
381 | 532 | | |
382 | 533 | | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
383 | 541 | | |
384 | 542 | | |
385 | 543 | | |
| |||
0 commit comments