Commit b41d055
committed
Qwen 3.5 MoE: bypass on-device sampler conditional in non-CUDA export
Qwen35MoE.forward currently routes through an Optional[Tensor] temperature
parameter and an if/else that picks between the on-device fused Gumbel-max
sampler (CUDA) and raw logits (non-CUDA). The sampling branch is dead code
for MLX and Metal exports, since those backends sample on the host.
Even though torch.export statically eliminates the branch when temperature
defaults to None, the parameter, default value, and unused else-branch leak
into the exported program: extra placeholder nodes, different graph hashes,
and shifted kernel selection in the lowered MLX/Metal graph. On the tiny
test model this slows MLX prefill ~9-37% and decode ~5-19%, and shows up as
~10-25% noise on Metal.
Bind model.forward to a minimal (tokens, input_pos) -> logits variant inside
_export_mlx and _export_metal before torch.export, so the captured program
matches what the backend kernels are tuned for. Eager-mode callers and the
CUDA export path are unaffected.1 parent b32eae7 commit b41d055
1 file changed
Lines changed: 46 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
554 | 554 | | |
555 | 555 | | |
556 | 556 | | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
| 594 | + | |
557 | 595 | | |
558 | 596 | | |
559 | 597 | | |
| |||
568 | 606 | | |
569 | 607 | | |
570 | 608 | | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
571 | 613 | | |
572 | 614 | | |
573 | 615 | | |
| |||
651 | 693 | | |
652 | 694 | | |
653 | 695 | | |
| 696 | + | |
| 697 | + | |
| 698 | + | |
| 699 | + | |
654 | 700 | | |
655 | 701 | | |
656 | 702 | | |
| |||
0 commit comments