Commit fcb09bf
authored
[NVBug: 6038899] Fix MoE export crash on meta tensors with CPU offload (#1155)
## Summary
Fixes `NotImplementedError` in `sync_moe_gate_up_amax` when quantizing
MoE models (e.g. Qwen3-30B-A3B) on a single GPU with insufficient VRAM.
When GPU memory is insufficient, ModelOpt enables CPU offload via
accelerate, leaving uncalibrated expert parameters on the `meta` device.
During export, `sync_moe_gate_up_amax` calls `torch.equal()` on these
meta tensors, which raises `NotImplementedError` because `aten::equal`
does not support meta tensors — even though calibration itself completed
successfully.
## Changes
- Add a guard in `sync_moe_gate_up_amax` to skip amax sync for meta
tensors (which have no real data to sync) and emit a warning explaining
the root cause.
Bug: https://nvbugspro.nvidia.com/bug/6038899
🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Bug Fixes**
* Added warning messages for unsupported tensor configurations in
quantization workflows.
* Improved edge case detection to gracefully skip processing in
incompatible scenarios.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>1 parent 7cd7d18 commit fcb09bf
1 file changed
Lines changed: 12 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1184 | 1184 | | |
1185 | 1185 | | |
1186 | 1186 | | |
| 1187 | + | |
| 1188 | + | |
| 1189 | + | |
| 1190 | + | |
| 1191 | + | |
| 1192 | + | |
| 1193 | + | |
| 1194 | + | |
| 1195 | + | |
| 1196 | + | |
| 1197 | + | |
| 1198 | + | |
1187 | 1199 | | |
1188 | 1200 | | |
1189 | 1201 | | |
| |||
0 commit comments