Commit b3d6df2
committed
[qwen3_5_moe][ci] Track export GPU peak memory and gate it in CI
## Summary
Add a GPU memory regression guard so that the Qwen3.5 MoE export keeps
fitting on consumer-grade 24 GB GPUs (RTX 4090 / 3090 / A5000 …).
## What this diff does
1. `examples/models/qwen3_5_moe/export.py`
- Reset CUDA peak memory stats at the start of the CUDA backend setup.
- At the end of `main()`, when running with `--backend cuda`, print a
stable, machine-parseable marker line:
`EXPORT_GPU_PEAK_MEMORY_MB: <peak_in_MB>`
This makes the actual peak GPU memory consumed by the entire
load + quantize + lower pipeline visible to both humans and CI.
2. `.ci/scripts/export_model_artifact.sh` (qwen3_5_moe path)
- Tee the export output to a temp log.
- Grep the `EXPORT_GPU_PEAK_MEMORY_MB` marker and compare against
`EXPORT_GPU_PEAK_MB_LIMIT` (default 20480 MB = 20 GB; overridable
via env var).
- Fail the job with an explanatory error if the budget is exceeded,
so any future regression that reintroduces the ~18 GB unnecessary
GPU clone (or comparable leak) is caught at PR time rather than
silently breaking 24 GB-class GPUs.
## Notes
- Current measured peak with the CUDA backend memory fixes (see prior
commit on this branch) is ~18 GB, leaving ~2 GB headroom under the
20 GB limit. Without those fixes the peak shoots to ~37 GB and CI
will fail loudly.
- The threshold is intentionally tighter than the 24 GB physical cap
to leave room for measurement noise and small allocator overhead.
## Test Plan
- Manual: ran `python -m executorch.examples.models.qwen3_5_moe.export
--prequantized <hqq-int4-bundle> --backend cuda` and confirmed the
marker line is printed at the end with a sensible value (~18 GB).
- Manual: simulated CI gate logic locally with the marker line and
confirmed both the success path and the failure path (forced
threshold below the actual peak) behave as expected.1 parent e3751bc commit b3d6df2
2 files changed
Lines changed: 41 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
415 | 415 | | |
416 | 416 | | |
417 | 417 | | |
| 418 | + | |
418 | 419 | | |
419 | 420 | | |
420 | 421 | | |
421 | | - | |
| 422 | + | |
| 423 | + | |
422 | 424 | | |
423 | 425 | | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
424 | 450 | | |
425 | 451 | | |
426 | 452 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
967 | 967 | | |
968 | 968 | | |
969 | 969 | | |
| 970 | + | |
| 971 | + | |
| 972 | + | |
| 973 | + | |
| 974 | + | |
| 975 | + | |
| 976 | + | |
970 | 977 | | |
971 | 978 | | |
972 | 979 | | |
| |||
989 | 996 | | |
990 | 997 | | |
991 | 998 | | |
| 999 | + | |
| 1000 | + | |
| 1001 | + | |
| 1002 | + | |
| 1003 | + | |
| 1004 | + | |
| 1005 | + | |
992 | 1006 | | |
993 | 1007 | | |
994 | 1008 | | |
0 commit comments