Improve counter reporting on heterogeneous Arm systems#10
Open
kieranhejmadi01 wants to merge 1 commit intoArmDeveloperEcosystem:mainfrom
Open
Improve counter reporting on heterogeneous Arm systems#10kieranhejmadi01 wants to merge 1 commit intoArmDeveloperEcosystem:mainfrom
kieranhejmadi01 wants to merge 1 commit intoArmDeveloperEcosystem:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Running
sysreport.pyon a NVIDIA DGX Spark (heterogeneousarm64system with 10 X725 and 10 X925 cores) I believe I sysreport outputs incorrect perf counter data.I was able to confirm hardware counters with
which outputted:
However, when running sysreport on the DGX shows
Noneforperf counters:Root Cause
I believe the system redistributes events across PMUs rather than scheduling them as a single group. This means the group saturation point - which is how we detect the counter limit is never reached, so detection returns None even though counters are available.
Improvement
Simply print out a more informative info about the perf counters if it returns None and the system is heterogeneous