Skip to content

[ALMIOPEN-1856] Simplify dnn-benchmarking engine comparison CLI#7866

Draft
SamuelReeder wants to merge 2 commits into
developfrom
users/sareeder/simplify-benchmark-engine-comparison
Draft

[ALMIOPEN-1856] Simplify dnn-benchmarking engine comparison CLI#7866
SamuelReeder wants to merge 2 commits into
developfrom
users/sareeder/simplify-benchmark-engine-comparison

Conversation

@SamuelReeder
Copy link
Copy Markdown
Contributor

@SamuelReeder SamuelReeder commented May 28, 2026

Summary

This PR replaces the old A/B-specific dnn-benchmarking workflow with an explicit multi-engine comparison path. It keeps engine selection on --engine, allows comma-delimited plugin paths, and makes comparison output opt-in through --compare-engines.

Risk Assessment

Medium risk. This changes CLI behavior and removes legacy A/B flags in a benchmark tool, but the new path is covered by parser, config, execution, reporting, non-GPU tests, and gfx90a hardware smoke validation.

Testing Summary

  • Static Python compile validation covered the changed source and tests.
  • Focused unit tests covered CLI parsing, config validation, engine/plugin ordering, comparison deltas, output shape, and serialization.
  • Full non-GPU test suite covered existing CPU-safe dnn-benchmarking behavior.
  • CLI help was manually checked for new flags and absence of old A/B flags.
  • Alola gfx90a validation built hipDNN/MIOpen provider through setup.sh --force-build -y and ran dnn-benchmarking smoke tests for no-comparison, comparison, and comma-delimited plugin-path flows.

Testing Checklist

  • Python compile - python3 -m compileall -q projects/hipdnn/tools/dnn-benchmarking/src/dnn_benchmarking projects/hipdnn/tools/dnn-benchmarking/tests - Status: Passed
  • Focused dnn-benchmarking tests - .venv/bin/python -m pytest tests/unit/cli/test_internal_profiling.py tests/unit/cli/test_suite_cli.py tests/unit/config/test_benchmark_config.py tests/unit/reporting/test_reporter.py tests/unit/reporting/test_suite_results.py tests/unit/execution/test_suite_runner.py - Status: Passed
  • Non-GPU dnn-benchmarking suite - .venv/bin/python -m pytest -m 'not gpu' - Status: Passed
  • CLI help check - .venv/bin/python -m dnn_benchmarking --help - Status: Passed
  • Alola setup - bash setup.sh --force-build -y - ASICs: gfx90a - Status: Passed
  • Alola single-engine smoke - python -m dnn_benchmarking --graph graphs/sample_conv_fwd.json --warmup 1 --iters 2 --engine 1563989756945604898 --plugin-path /opt/rocm/lib/hipdnn_plugins/engines - ASICs: gfx90a - Status: Passed
  • Alola comparison smoke - python -m dnn_benchmarking --graph graphs/sample_conv_fwd.json --warmup 1 --iters 2 --engine 1563989756945604898,-6748551569128940061 --plugin-path /opt/rocm/lib/hipdnn_plugins/engines --compare-engines - ASICs: gfx90a - Status: Passed
  • Alola comma-delimited plugin-path smoke - python -m dnn_benchmarking --graph graphs/sample_conv_fwd.json --warmup 1 --iters 1 --engine 1563989756945604898,-6748551569128940061 --plugin-path /opt/rocm/lib/hipdnn_plugins/engines,/opt/rocm/lib/hipdnn_plugins/engines --compare-engines - ASICs: gfx90a - Status: Passed
  • PR CI - GitHub PR checks - Status: Pending

Technical Changes

  • Removes legacy A/B runner/configuration code and old --AId, --BId, --APath, and --BPath flags.
  • Adds --compare-engines and comma-delimited --plugin-path parsing with validation for one shared plugin path or one path per selected engine.
  • Runs explicit --engine IDs in caller order so the first selected engine is the comparison baseline.
  • Adds per-engine summary rows with raw kernel/E2E means and medians, plus opt-in percent delta columns for kernel mean, kernel median, E2E mean, and E2E median.
  • Serializes median timing data and comparison-to-baseline metadata for JSON consumers.
  • Keeps tarball graph extraction compatible with older Python tarfile APIs while preserving path traversal protections.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

❌ Your project status has failed because the head coverage (77.83%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #7866      +/-   ##
===========================================
+ Coverage    62.06%   62.08%   +0.02%     
===========================================
  Files         2085     2087       +2     
  Lines       357573   357886     +313     
  Branches     54060    54097      +37     
===========================================
+ Hits        221918   222176     +258     
- Misses      116866   116897      +31     
- Partials     18789    18813      +24     
Flag Coverage Δ *Carryforward flag
TensileLite 27.29% <ø> (ø) Carriedforward from 5dbd0d3
hipBLAS 90.65% <ø> (ø) Carriedforward from 5dbd0d3
hipBLASLt 41.27% <ø> (ø) Carriedforward from 5dbd0d3
hipCUB 82.21% <ø> (ø) Carriedforward from 5dbd0d3
hipDNN 86.58% <ø> (-0.03%) ⬇️
hipFFT 50.00% <ø> (ø) Carriedforward from 5dbd0d3
hipRAND 76.12% <ø> (ø) Carriedforward from 5dbd0d3
hipSOLVER 69.24% <ø> (ø) Carriedforward from 5dbd0d3
hipSPARSE 85.42% <ø> (ø) Carriedforward from 5dbd0d3
rocBLAS 48.09% <ø> (ø) Carriedforward from 5dbd0d3
rocFFT 52.07% <ø> (ø) Carriedforward from 5dbd0d3
rocRAND 57.04% <ø> (ø) Carriedforward from 5dbd0d3
rocSOLVER 77.83% <ø> (ø) Carriedforward from 5dbd0d3
rocSPARSE 72.68% <ø> (ø) Carriedforward from 5dbd0d3

*This pull request uses carry forward flags. Click here to find out more.
see 8 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@SamuelReeder SamuelReeder changed the title Simplify dnn-benchmarking engine comparison CLI [ALMIOPEN-1856] Simplify dnn-benchmarking engine comparison CLI May 29, 2026
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants