Skip to content

What's the cost of the UMA fault latency? #2

@rajesh-s

Description

@rajesh-s

My query on the Nvidia forums brought me here.

Here's the data point from the GH200 and H100-SXM5, but I don't quite understand if this measurement is reliable since there's not a significant difference between PCIe and the HW coherent configurations with p50 fault latency. However, p99 seem to highlight the differences better but that is not the common case behavior.

Would it be possible to infer the number of page faults from the CUPTI counters to verify that the access patterns are consistent with expectation?

Metric gh200.txt h100 sxm.txt
GPU NVIDIA GH200 120GB NVIDIA H100 80GB HBM3
Platform HARDWARE_COHERENT_UMA DISCRETE_PCIE
Coherent yes, hardware no
Atomic GPU scope p50 14.6 ns, 29 cycles 11.1 ns, 22 cycles
Atomic GPU scope p99 20.7 ns 13.6 ns
Atomic SYS scope p50 15.2 ns, 30 cycles 11.1 ns, 22 cycles
Atomic SYS scope p99 21.2 ns 13.6 ns
Atomic contention p50 14.6 ns, 29 cycles 11.1 ns, 22 cycles
Atomic contention p99 21.2 ns 13.6 ns
Atomic SYS to GPU p50 ratio 1.03x 1.00x
Atomic SYS to GPU p99 ratio 1.02x 1.00x
Coherence cost 0.5 ns overhead 0.0 ns overhead
Fault COLD p50 12.6 ns, 25 cycles 13.1 ns, 26 cycles
Fault COLD p99 485.4 ns 52335.4 ns
Fault WARM p50 12.6 ns, 25 cycles 12.1 ns, 24 cycles
Fault WARM p99 28.3 ns 27.8 ns
Fault PRESSURE p50 12.6 ns, 25 cycles 12.6 ns, 25 cycles
Fault PRESSURE p99 181.8 ns 33068.2 ns
Fault COLD to WARM p50 ratio 1.00x 1.08x
Fault COLD to WARM p99 ratio 17.15x 1882.57x

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions