Enhancement: Add rocprofv2 trace support for AMD GPUs#817
Conversation
- runner.py: Add SB_ENABLE_ROCPROF/SB_ROCPROF_TRACE_DIR env vars to enable rocprofv2 profiling (--hip-trace --kernel-trace --plugin json) in local, torch.distributed, and mpi modes - pytorch_base.py: Extend GPU guard to support ROCm (torch.version.hip) so PyTorch profiler works on AMD GPUs
|
@shcho please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds ROCm profiling support alongside existing Nsight Systems tracing and updates model benchmark GPU detection to include AMD (HIP) builds.
Changes:
- Add
rocprofv2command injection (gated by env vars) for local/distributed/mpi runner modes. - Introduce ROCm trace directory env var support (
SB_ROCPROF_TRACE_DIR). - Expand PyTorch GPU check to treat HIP builds as GPU-capable.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| superbench/runner/runner.py | Adds optional rocprofv2 profiling prefixes/trace commands controlled by environment variables. |
| superbench/benchmarks/model_benchmarks/pytorch_base.py | Updates GPU detection to include ROCm/HIP PyTorch builds. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Address PR review: wrap all interpolated path/name segments in shlex.quote() to prevent command injection or broken commands when paths contain whitespace or shell metacharacters. Applied to both nsys and rocprofv2 trace commands across all three execution modes (local, torch.distributed, mpi).
Address PR review: the variable holds either an nsys or rocprofv2 prefix, so rename to trace_prefix to avoid implying Nsight-only behavior.
| # Check if this is a Nvidia GPU | ||
| if not (torch.cuda.is_available() and torch.version.cuda is not None): | ||
| # Check if this is a Nvidia or AMD GPU | ||
| if not (torch.cuda.is_available() and (torch.version.cuda is not None or torch.version.hip is not None)): |
| elif enable_rocprof and mode.proc_rank == 0: | ||
| trace_output = shlex.quote(f'{rocprof_trace_dir}/{benchmark_name}_{mode.proc_rank}_traces') | ||
| trace_command = ( | ||
| f'rocprofv2 --hip-trace --kernel-trace --plugin json ' | ||
| f'-d {trace_output} ' | ||
| ) |
| trace_command = '' | ||
| if enable_nsys and mode.proc_rank == 0: | ||
| trace_output = shlex.quote(f'{trace_dir}/{benchmark_name}_{mode.proc_rank}_traces') | ||
| trace_command = ( | ||
| f'nsys profile --output {trace_output} ' | ||
| f'--backtrace none --sample none --force-overwrite true --cpuctxsw none --trace cuda,nvtx ' | ||
| ) | ||
| elif enable_rocprof and mode.proc_rank == 0: | ||
| trace_output = shlex.quote(f'{rocprof_trace_dir}/{benchmark_name}_{mode.proc_rank}_traces') | ||
| trace_command = ( | ||
| f'rocprofv2 --hip-trace --kernel-trace --plugin json ' | ||
| f'-d {trace_output} ' | ||
| ) |
| trace_prefix = '' | ||
| if enable_nsys: | ||
| trace_output = shlex.quote(f'{trace_dir}/{benchmark_name}_traces') | ||
| trace_prefix = ( | ||
| f'nsys profile --output {trace_output} ' | ||
| f'--backtrace none --sample none --force-overwrite true --cpuctxsw none --trace cuda,nvtx ' | ||
| ) | ||
| elif enable_rocprof: | ||
| trace_output = shlex.quote(f'{rocprof_trace_dir}/{benchmark_name}_traces') | ||
| trace_prefix = ( | ||
| f'rocprofv2 --hip-trace --kernel-trace --plugin json ' | ||
| f'-d {trace_output} ' | ||
| ) |
Description
Extend trace generation to support AMD GPUs using rocprofv2