Commit 8a7aa04
committed
Enable P2P transport for AMD systems with >2 GPUs at PHB level
On AMD multi-socket systems, GPUs on the same NUMA node connect through
separate PCIe root complexes under the same PCIe Host Bridge (PATH_PHB).
The default P2P level (PATH_PXB) disables P2P for these paths, forcing
shared memory transport with 24-42% bandwidth loss.
Extend the existing AMD P2P exception to allow PHB-level P2P for
configurations with more than 2 GPUs. The original SYS-level P2P
for ≤2 GPU configurations is preserved.
Benchmarked on dual-socket AMD EPYC 9575F (Turin) with 4x RTX PRO 6000
on the same socket (NCCL 2.29.7+cuda13.2):
Transport change: SHM/direct/direct -> P2P/direct pointer
Throughput: +24-42% across 256K-128M message sizes
Latency: up to 19% lower at 128K
Signed-off-by: Martin Vit <martin@voipmonitor.org>1 parent 6da4220 commit 8a7aa04
1 file changed
Lines changed: 7 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
339 | 339 | | |
340 | 340 | | |
341 | 341 | | |
342 | | - | |
343 | | - | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
344 | 349 | | |
345 | 350 | | |
346 | 351 | | |
| |||
0 commit comments