Skip to content

Improve counter reporting on heterogeneous Arm systems#10

Open
kieranhejmadi01 wants to merge 1 commit intoArmDeveloperEcosystem:mainfrom
kieranhejmadi01:dgx-spark-test
Open

Improve counter reporting on heterogeneous Arm systems#10
kieranhejmadi01 wants to merge 1 commit intoArmDeveloperEcosystem:mainfrom
kieranhejmadi01:dgx-spark-test

Conversation

@kieranhejmadi01
Copy link
Copy Markdown

Summary

Running sysreport.py on a NVIDIA DGX Spark (heterogeneous arm64 system with 10 X725 and 10 X925 cores) I believe I sysreport outputs incorrect perf counter data.

I was able to confirm hardware counters with

perf stat -x, -e "{instructions:u}" -- true

which outputted:

WARNING: events were regrouped to match PMUs
<not counted>,,armv8_pmuv3_0/instructions/u,0,0.00,,
134652,,armv8_pmuv3_1/instructions/u,212064,100.00,,

However, when running sysreport on the DGX shows None for perf counters:

System feature report:
  Collected:           2026-04-09 11:11:35.108513
  Script version:      2026-04-09 11:11:29.368312
  Running as root:     False
System hardware:
  Architecture:        aarch64
  CPUs:                20
  CPU types:           10 x Arm Part 0xd87 r0p1, 10 x Arm Part 0xd85 r0p1
  cache info:          size, associativity, sharing
  cache line size:     64
  Caches:
    20 x L1D 64K 4-way 64b-line
    20 x L1I 64K 4-way 64b-line
    10 x L2U 2M 8-way 64b-line
    10 x L2U 512K 8-way 64b-line
    1 x L3U 16M 16-way 64b-line
    1 x L3U 8M 16-way 64b-line
  System memory:       120G
  Atomic operations:   True
  interconnect:        unknown x 1
  NUMA nodes:          1
  Sockets:             1
OS configuration:
  Kernel:              6.14.0
  config:              /boot/config-6.14.0-1013-nvidia
  build dir:           /lib/modules/6.14.0-1013-nvidia/build
  uses atomics:        True
  page size:           4K
  huge pages:          2048kB: 0, 32768kB: 0, 64kB: 0, 1048576kB: 0
  transparent HP:      madvise
  MPAM configured:     False
  resctrl:             False
  Distribution:        Ubuntu 24.04.3 LTS
  libc version:        glibc 2.39
  boot info:           ACPI
  KPTI enforced:       False
  Lockdown:            landlock, lockdown, yama, integrity, apparmor
  Mitigations:         spectre_v2:CSV2, BHB; spec_store_bypass:Speculative Store Bypass disabled via prctl; spectre_v1:__user pointer sanitization
Performance features:
  perf tools:          True
  perf installed at:   /usr/lib/linux-tools/6.14.0-1013-nvidia/perf
  perf with OpenCSD:   False
  perf counters:       None
  perf sampling:       SPE
  perf HW trace:       None
  perf paranoid:       0
  CAP_PERFMON:         disabled
  kptr_restrict:       1
  perf in userspace:   disabled
  interconnect perf:   None
  /proc/kcore:         True
  /dev/mem:            True
  eBPF:
    kernel configured for BPF: True
    bpftool installed:         True
      bpftool v7.6.0 using libbpf v1.6 features: 
    bpftrace installed:        bpftrace v0.20.2


Actions that can be taken to improve performance tools experience:
  perf tools cannot decode hardware trace
    build with CORESIGHT=1
  Hardware perf events are not available
    ensure APIC table describes PMU interrupt
  hardware trace not enabled
    ensure ACPI describes CoreSight trace fabric

Root Cause

I believe the system redistributes events across PMUs rather than scheduling them as a single group. This means the group saturation point - which is how we detect the counter limit is never reached, so detection returns None even though counters are available.

Improvement

Simply print out a more informative info about the perf counters if it returns None and the system is heterogeneous

@kieranhejmadi01 kieranhejmadi01 changed the title Improve counter reporting on heterogeneous ARM systems Improve counter reporting on heterogeneous Arm systems Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant