You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add test sharding for Frontier CI; switch to batch/hackathon partition
Split Frontier GPU test configs into 2 shards (~75 min each) so they
fit within the batch partition's 2h wall time limit. This allows all
Frontier SLURM jobs to run concurrently instead of serially on the
extended partition (which has a 1-job-per-user limit), reducing total
CI wall clock from ~4.5h to ~2h.
Changes:
- Add --shard CLI argument (e.g., --shard 1/2) with modulo-based
round-robin distribution across shards
- Switch Frontier submit scripts from extended to batch/hackathon
(CFD154 account, 1h59m wall time)
- Shard the 3 Frontier GPU matrix entries into 6 (2 shards each)
- CPU entries remain unsharded
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
0 commit comments