Commit d348f1b
fix: vision loss forward pass falls back to exclude on crash (#223)
Qwen3's vision-language merge changes internal sequence length
unpredictably. Both include and checkpoint modes crash intermittently
with attention mask mismatches (mask too large OR too small depending
on generated sequence length).
Fix: catch IndexError/RuntimeError from the vision forward pass and
retry with exclude mode (text-only, no vision tensors) for that step.
Training never crashes — some steps get vision-aware gradients, some
get text-only gradients, but all steps contribute to learning.
This is the pragmatic fix. The proper fix (capturing logits during
generation to avoid re-forward entirely) is future work.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 02e8216 commit d348f1b
1 file changed
Lines changed: 29 additions & 15 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
446 | 446 | | |
447 | 447 | | |
448 | 448 | | |
449 | | - | |
450 | | - | |
451 | | - | |
452 | | - | |
453 | | - | |
454 | | - | |
455 | 449 | | |
456 | | - | |
457 | | - | |
458 | | - | |
459 | | - | |
460 | | - | |
461 | | - | |
462 | | - | |
463 | 450 | | |
464 | | - | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
465 | 479 | | |
466 | | - | |
| 480 | + | |
467 | 481 | | |
468 | 482 | | |
469 | 483 | | |
| |||
0 commit comments