Platform: NVIDIA GB10 (SM 12.1), AArch64
The platform on which these tests are intended to run reports:
cudaDevAttrPageableMemoryAccessUsesHostPageTables = 1
Hardware measurements collected across three independent GB10 systems:
Fault COLD/WARM p50 ratio: 1.00x
Atomic SYS/GPU scope ratio: 1.00x
NVLink-C2C coherence overhead: 0.0 ns
Source: https://github.com/parallelArchitect/nvidia-uma-fault-probe
These measurements are consistent with a hardware-coherent memory model
and motivate evaluation of OpenMP unified shared memory behavior on
this platform.
OpenMP map-clause behavior under #pragma omp requires unified_shared_memory
has not yet been characterized on GB10. Two candidate tests and platform
notes are available at:
https://github.com/parallelArchitect/OpenMP_VV/tree/gb10-uma-aware
test_gb10_requires_unified_shared_memory_nomap.c
Evaluates heap allocation access and interleaved CPU/GPU updates under
requires unified_shared_memory without explicit map clauses.
test_gb10_uma_pointer_validity.c
Evaluates host stack and heap pointer accessibility inside an OpenMP
target region when unified shared memory is active and neither map
nor is_device_ptr clauses are provided.
Both tests include a runtime platform gate.
Systems reporting 0: SKIP. Systems reporting 1: RUN.
Neither test has been executed on GB10 hardware at the time of writing.
Execution would provide conformance data for OpenMP unified shared memory
constructs on a platform reporting:
cudaDevAttrPageableMemoryAccessUsesHostPageTables = 1
Platform: NVIDIA GB10 (SM 12.1), AArch64
The platform on which these tests are intended to run reports:
cudaDevAttrPageableMemoryAccessUsesHostPageTables = 1
Hardware measurements collected across three independent GB10 systems:
Fault COLD/WARM p50 ratio: 1.00x
Atomic SYS/GPU scope ratio: 1.00x
NVLink-C2C coherence overhead: 0.0 ns
Source: https://github.com/parallelArchitect/nvidia-uma-fault-probe
These measurements are consistent with a hardware-coherent memory model
and motivate evaluation of OpenMP unified shared memory behavior on
this platform.
OpenMP map-clause behavior under #pragma omp requires unified_shared_memory
has not yet been characterized on GB10. Two candidate tests and platform
notes are available at:
https://github.com/parallelArchitect/OpenMP_VV/tree/gb10-uma-aware
test_gb10_requires_unified_shared_memory_nomap.c
Evaluates heap allocation access and interleaved CPU/GPU updates under
requires unified_shared_memory without explicit map clauses.
test_gb10_uma_pointer_validity.c
Evaluates host stack and heap pointer accessibility inside an OpenMP
target region when unified shared memory is active and neither map
nor is_device_ptr clauses are provided.
Both tests include a runtime platform gate.
Systems reporting 0: SKIP. Systems reporting 1: RUN.
Neither test has been executed on GB10 hardware at the time of writing.
Execution would provide conformance data for OpenMP unified shared memory
constructs on a platform reporting:
cudaDevAttrPageableMemoryAccessUsesHostPageTables = 1