Skip to content

Commit 8a7aa04

Browse files
committed
Enable P2P transport for AMD systems with >2 GPUs at PHB level
On AMD multi-socket systems, GPUs on the same NUMA node connect through separate PCIe root complexes under the same PCIe Host Bridge (PATH_PHB). The default P2P level (PATH_PXB) disables P2P for these paths, forcing shared memory transport with 24-42% bandwidth loss. Extend the existing AMD P2P exception to allow PHB-level P2P for configurations with more than 2 GPUs. The original SYS-level P2P for ≤2 GPU configurations is preserved. Benchmarked on dual-socket AMD EPYC 9575F (Turin) with 4x RTX PRO 6000 on the same socket (NCCL 2.29.7+cuda13.2): Transport change: SHM/direct/direct -> P2P/direct pointer Throughput: +24-42% across 256K-128M message sizes Latency: up to 19% lower at 128K Signed-off-by: Martin Vit <martin@voipmonitor.org>
1 parent 6da4220 commit 8a7aa04

1 file changed

Lines changed: 7 additions & 2 deletions

File tree

src/graph/paths.cc

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -339,8 +339,13 @@ ncclResult_t ncclTopoCheckP2p(struct ncclComm* comm, struct ncclTopoSystem* syst
339339

340340
int arch, vendor, model;
341341
NCCLCHECK(ncclTopoCpuType(system, &arch, &vendor, &model));
342-
// Allow P2P between pairs of GPU devices on AMD systems
343-
if ((arch == NCCL_TOPO_CPU_ARCH_X86 && vendor == NCCL_TOPO_CPU_VENDOR_AMD) && system->nodes[DEV].count <= 2) p2pLevel = PATH_SYS;
342+
// Allow P2P on AMD systems: SYS level for ≤2 GPU devices (original behavior),
343+
// PHB level for >2 to enable same-socket P2P through the PCIe Host Bridge.
344+
// Without this, GPUs under separate root complexes on the same NUMA node
345+
// (PATH_PHB) fall back to shared memory transport, losing 24-46% bandwidth.
346+
if (arch == NCCL_TOPO_CPU_ARCH_X86 && vendor == NCCL_TOPO_CPU_VENDOR_AMD) {
347+
p2pLevel = (system->nodes[DEV].count <= 2) ? PATH_SYS : PATH_PHB;
348+
}
344349

345350
// User override
346351
NCCLCHECK(ncclGetUserP2pLevel(&p2pLevel));

0 commit comments

Comments
 (0)