Commit bcd0da0
committed
Enable P2P transport for AMD systems with >2 GPUs at PHB level
On AMD multi-socket systems, GPUs on the same NUMA node connect through
separate PCIe root complexes under the same PCIe Host Bridge (PATH_PHB).
The default P2P level (PATH_PXB) disables P2P for these paths, forcing
shared memory transport with 24-42% bandwidth loss.
Extend the existing AMD P2P exception to allow PHB-level P2P for
configurations with more than 2 GPUs. The original SYS-level P2P
for ≤2 GPU configurations is preserved.
Benchmarked on dual-socket AMD EPYC 9575F (Turin) with 4x RTX PRO 6000
on the same socket (NCCL 2.29.7+cuda13.2):
Transport change: SHM/direct/direct -> P2P/direct pointer
Throughput: +24-42% across 256K-128M message sizes
Latency: up to 19% lower at 128K
Signed-off-by: Martin Vit <martin@voipmonitor.org>1 parent 3619159 commit bcd0da0
1 file changed
Lines changed: 7 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
324 | 324 | | |
325 | 325 | | |
326 | 326 | | |
327 | | - | |
328 | | - | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
329 | 334 | | |
330 | 335 | | |
331 | 336 | | |
| |||
0 commit comments