Qwen 3.5 MoE: Add Metal source transformations#18879
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18879
Note: Links to docs will display an error until the docs builds have been completed. ⏳ No Failures, 166 PendingAs of commit 187e4f5 with merge base d408a10 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
| # Metal custom op: handles both T=1 and T>1 | ||
| import executorch.backends.apple.metal.ops.gated_delta_rule as _ # noqa: F401 | ||
|
|
||
| output = torch.ops.metal.gated_delta_rule( |
There was a problem hiding this comment.
For future work, maybe we should fuse the q/k expansion into the kernel
This PR needs a
|
Adds metal_source_transformations.py with module replacements for Metal:
SiLU gating, replacing torch.ops.triton.fused_moe)
T=1 native path and T>1 Triton kernel)
Also includes quantize_experts_metal() which quantizes expert weights to
MLX affine INT4 format (unsigned uint4 with scale + bias per group),
compatible with the Metal gather_qmv kernel.
Authored with Claude.