Skip to content

Commit fbb361a

Browse files
author
Han Wang
committed
fix(test): skip DDP tests when NCCL is selected with fewer than 2 GPUs
NCCL rejects two ranks sharing the same GPU device, causing all DDP tests to fail on single-GPU CI runners. Skip the entire module when the backend is NCCL and device_count < 2.
1 parent 28fbcac commit fbb361a

1 file changed

Lines changed: 4 additions & 0 deletions

File tree

source/tests/pt_expt/test_training_ddp.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,10 @@
6363
# Auto-detect DDP backend based on device availability.
6464
_DDP_BACKEND = "nccl" if torch.cuda.is_available() else "gloo"
6565

66+
# NCCL requires at least 2 GPUs for multi-rank tests.
67+
if _DDP_BACKEND == "nccl" and torch.cuda.device_count() < 2:
68+
raise unittest.SkipTest("NCCL DDP tests require at least 2 GPUs")
69+
6670

6771
def _find_free_port():
6872
"""Find a free TCP port on localhost."""

0 commit comments

Comments
 (0)