Skip to content

Commit fcb09bf

Browse files
authored
[NVBug: 6038899] Fix MoE export crash on meta tensors with CPU offload (#1155)
## Summary Fixes `NotImplementedError` in `sync_moe_gate_up_amax` when quantizing MoE models (e.g. Qwen3-30B-A3B) on a single GPU with insufficient VRAM. When GPU memory is insufficient, ModelOpt enables CPU offload via accelerate, leaving uncalibrated expert parameters on the `meta` device. During export, `sync_moe_gate_up_amax` calls `torch.equal()` on these meta tensors, which raises `NotImplementedError` because `aten::equal` does not support meta tensors — even though calibration itself completed successfully. ## Changes - Add a guard in `sync_moe_gate_up_amax` to skip amax sync for meta tensors (which have no real data to sync) and emit a warning explaining the root cause. Bug: https://nvbugspro.nvidia.com/bug/6038899 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Added warning messages for unsupported tensor configurations in quantization workflows. * Improved edge case detection to gracefully skip processing in incompatible scenarios. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>
1 parent 7cd7d18 commit fcb09bf

1 file changed

Lines changed: 12 additions & 0 deletions

File tree

modelopt/torch/export/layer_utils.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1184,6 +1184,18 @@ def sync_moe_gate_up_amax(model: nn.Module) -> int:
11841184
up_amax = getattr(up_wq, "amax", None)
11851185
if gate_amax is None or up_amax is None:
11861186
break
1187+
# Meta tensors have no storage (e.g. CPU-offloaded experts that
1188+
# were never activated during calibration). Skip — there is no
1189+
# real amax data to sync.
1190+
if gate_amax.is_meta or up_amax.is_meta:
1191+
warn(
1192+
f"Skipping gate/up amax sync for expert with meta tensors "
1193+
f"(gate_amax.is_meta={gate_amax.is_meta}, "
1194+
f"up_amax.is_meta={up_amax.is_meta}). "
1195+
f"This typically means the expert was CPU-offloaded and "
1196+
f"not activated during calibration."
1197+
)
1198+
break
11871199
if not torch.equal(gate_amax, up_amax):
11881200
shared_amax = torch.max(gate_amax, up_amax)
11891201
gate_wq.amax = shared_amax

0 commit comments

Comments
 (0)