Commit 4c3fce5
committed
Fix VM image selection for SXM instance types
The _get_image function checked gpu_type (e.g. 'A100') for 'SXM', but
gpuhunt normalizes GPU names and strips the SXM qualifier. Check the
instance type name instead (e.g. 'a100-80gb-sxm-ib.8x') which
preserves the '-sxm' indicator.
Without this fix, SXM-IB instances used the PCIe docker image which
lacks IB drivers, HPC-X, and NCCL topology files. Verified with a
2-node A100-SXM-IB NCCL all_reduce test: 193 GB/s bus bandwidth.
Made-with: Cursor1 parent 8cd5ec1 commit 4c3fce5
1 file changed
Lines changed: 5 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
94 | 94 | | |
95 | 95 | | |
96 | 96 | | |
97 | | - | |
| 97 | + | |
98 | 98 | | |
99 | 99 | | |
100 | | - | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
101 | 103 | | |
102 | 104 | | |
103 | 105 | | |
| |||
216 | 218 | | |
217 | 219 | | |
218 | 220 | | |
219 | | - | |
| 221 | + | |
220 | 222 | | |
221 | 223 | | |
222 | 224 | | |
| |||
0 commit comments