Skip to content

Commit bcd0da0

Browse files
committed
Enable P2P transport for AMD systems with >2 GPUs at PHB level
On AMD multi-socket systems, GPUs on the same NUMA node connect through separate PCIe root complexes under the same PCIe Host Bridge (PATH_PHB). The default P2P level (PATH_PXB) disables P2P for these paths, forcing shared memory transport with 24-42% bandwidth loss. Extend the existing AMD P2P exception to allow PHB-level P2P for configurations with more than 2 GPUs. The original SYS-level P2P for ≤2 GPU configurations is preserved. Benchmarked on dual-socket AMD EPYC 9575F (Turin) with 4x RTX PRO 6000 on the same socket (NCCL 2.29.7+cuda13.2): Transport change: SHM/direct/direct -> P2P/direct pointer Throughput: +24-42% across 256K-128M message sizes Latency: up to 19% lower at 128K Signed-off-by: Martin Vit <martin@voipmonitor.org>
1 parent 3619159 commit bcd0da0

1 file changed

Lines changed: 7 additions & 2 deletions

File tree

src/graph/paths.cc

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -324,8 +324,13 @@ ncclResult_t ncclTopoCheckP2p(struct ncclComm* comm, struct ncclTopoSystem* syst
324324

325325
int arch, vendor, model;
326326
NCCLCHECK(ncclTopoCpuType(system, &arch, &vendor, &model));
327-
// Allow P2P between pairs of GPUs on AMD systems
328-
if ((arch == NCCL_TOPO_CPU_ARCH_X86 && vendor == NCCL_TOPO_CPU_VENDOR_AMD) && system->nodes[GPU].count <= 2) p2pLevel = PATH_SYS;
327+
// Allow P2P on AMD systems: SYS level for ≤2 GPUs (original behavior),
328+
// PHB level for >2 GPUs to enable same-socket P2P through the PCIe Host Bridge.
329+
// Without this, GPUs under separate root complexes on the same NUMA node
330+
// (PATH_PHB) fall back to shared memory transport, losing 24-46% bandwidth.
331+
if (arch == NCCL_TOPO_CPU_ARCH_X86 && vendor == NCCL_TOPO_CPU_VENDOR_AMD) {
332+
p2pLevel = (system->nodes[GPU].count <= 2) ? PATH_SYS : PATH_PHB;
333+
}
329334

330335
// User override
331336
NCCLCHECK(ncclGetUserP2pLevel(&p2pLevel));

0 commit comments

Comments
 (0)