Add multi-shape profiling support to perftest tool (`--data_shape`) by adrianlizarraga · Pull Request #28638 · microsoft/onnxruntime

adrianlizarraga · 2026-05-22T08:35:07Z

Description

Adds a --data_shape CLI flag to onnxruntime_perf_test that allows benchmarking multiple input shapes within a single session. The tool cycles through shape groups round-robin, collecting and reporting per-shape latency statistics. This is useful for profiling models with dynamic dimensions across representative input sizes without running separate benchmark sessions.

Syntax:

--data_shape "input_name:[d0,d1,...][d0,d1,...] input_name2:[d0,d1][d0,d1]"

Requires -I (generated test input mode).

Example

onnxruntime_perf_test.exe -I -t 5 \
    --data_shape "A:[1,16,4][2,32,4] B:[1,16,4][2,32,4]" \
    dynamic_shape_add.onnx

Output:

Session creation time cost: 0.227376 s
First inference time cost: 1 ms
Total inference time cost: 5.00025 s
Total inference requests: 14548
Average inference time cost total: 0.343707 ms
Total inference run time: 5.11098 s
Number of inferences per second: 2846.42
Avg CPU usage: 5 %
Peak working set size: 101904384 bytes

Latency per shape group:
  1. A : [1,16,4], B : [1,16,4]
      Iterations: 7274
      Average Latency: 0.343136 ms
      Min Latency: 0.3173 ms
      Max Latency: 1.2603 ms
      P50 Latency: 0.3353 ms
      P90 Latency: 0.3656 ms
      P95 Latency: 0.3737 ms
      P99 Latency: 0.4426 ms
  2. A : [2,32,4], B : [2,32,4]
      Iterations: 7274
      Average Latency: 0.344277 ms
      Min Latency: 0.3181 ms
      Max Latency: 1.1894 ms
      P50 Latency: 0.336 ms
      P90 Latency: 0.3664 ms
      P95 Latency: 0.3746 ms
      P99 Latency: 0.4402 ms
Avg CPU usage:5
Peak working set size:101904384
Runs:14548
Min Latency: 0.0003173 s
Max Latency: 0.0012603 s
P50 Latency: 0.0003357 s
P90 Latency: 0.000366 s
P95 Latency: 0.000374 s
P99 Latency: 0.0004403 s
P999 Latency: 0.0006384 s
Shape group 1 (7274 iterations):
  Min Latency: 0.0003173 s
  Max Latency: 0.0012603 s
  P50 Latency: 0.0003353 s
  P90 Latency: 0.0003656 s
  P95 Latency: 0.0003737 s
  P99 Latency: 0.0004426 s
Shape group 2 (7274 iterations):
  Min Latency: 0.0003181 s
  Max Latency: 0.0011894 s
  P50 Latency: 0.000336 s
  P90 Latency: 0.0003664 s
  P95 Latency: 0.0003746 s
  P99 Latency: 0.0004402 s

The first block is the main stdout report (latencies in ms). The second block is DumpToFile() output printed to stdout when no -o file is specified (latencies in seconds, matching the CSV file format).

Copilot

Pull request overview

Adds multi-shape profiling to onnxruntime_perf_test via a new --data_shape flag, enabling round-robin benchmarking across multiple input-shape groups within a single session and reporting per-shape latency stats. This fits into the perftest tooling by extending its generated-input path (-I) and result reporting without requiring separate benchmark runs per shape.

Changes:

Introduces --data_shape CLI parsing and stores shape-group specs in RunConfig.
Generates and cycles through multiple preloaded input sets (round-robin) and records per-shape-group latency.
Extends perf result reporting (stdout + DumpToFile) with per-shape latency summaries and adds a small dynamic-shape ONNX model generator for manual testing.

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
onnxruntime/test/testdata/dynamic_shape_add.py	Adds a small ONNX model generator with dynamic dims to exercise `--data_shape`.
onnxruntime/test/perftest/test_session.h	Extends `RunTiming` to carry the shape-group index used for the iteration.
onnxruntime/test/perftest/test_configuration.h	Adds `RunConfig::data_shape_groups` to hold per-input shape groups.
onnxruntime/test/perftest/strings_helper.h	Declares `ParseDataShapeGroups` helper for parsing `--data_shape`.
onnxruntime/test/perftest/strings_helper.cc	Implements parsing of multi-shape specs into shape groups.
onnxruntime/test/perftest/performance_runner.h	Adds per-shape timing storage and plumbing to record per-iteration shape-group timings.
onnxruntime/test/perftest/performance_runner.cc	Prints per-shape latency stats and warms up one iteration per shape group; wires multi-shape input generation.
onnxruntime/test/perftest/ort_test_session.h	Adds state for round-robin shape selection and multi-shape input population API.
onnxruntime/test/perftest/ort_test_session.cc	Implements multi-shape input generation and round-robin selection logic, and returns the shape-group index.
onnxruntime/test/perftest/command_args_parser.cc	Adds the `--data_shape` flag and enforces `-I` requirement.

Comments suppressed due to low confidence (1)

onnxruntime/test/perftest/strings_helper.h:11

strings_helper.h declares APIs using std::string but does not include <string>. Relying on transitive includes is fragile and can break builds when include order changes; please add #include <string> here (alongside the newly added <cstdint>).

#include <cstdint>
#include <string_view>
#include <map>
#include <unordered_map>
#include <unordered_set>
#include <vector>

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 10 out of 11 changed files in this pull request and generated 4 comments.

+  for (char c : input) {
+    if (c == '[') {
+      bracket_depth++;
+      current += c;
+    } else if (c == ']') {
+      bracket_depth--;
+      current += c;
+    } else if ((c == ' ' || c == '\t') && bracket_depth == 0) {


+  // warm up — run one iteration per shape group to ensure all shapes are warmed
+  size_t warmup_count = 1;
+  if (!performance_test_config_.run_config.data_shape_groups.empty()) {
+    warmup_count = performance_test_config_.run_config.data_shape_groups.begin()->second.size();
+  }


+  } else {
+    const std::uniform_int_distribution<int>::param_type p(0, static_cast<int>(test_inputs_.size() - 1));
+    id = static_cast<size_t>(dist_(rand_engine_, p));
+  }


+      auto tensor_info = type_info.GetTensorTypeAndShapeInfo();
+      std::vector<int64_t> input_node_dim;
+
+      // Use user-specified shape if available, otherwise fall back to model metadata
+      auto it = data_shape_groups.find(input_names_str_[i]);
+      if (it != data_shape_groups.end()) {
+        input_node_dim = it->second[g];
+      } else {
+        input_node_dim = tensor_info.GetShape();
+        auto transform_fcn = [](int64_t input) { return (input == -1) ? -input : input; };
+        std::transform(input_node_dim.begin(), input_node_dim.end(), input_node_dim.begin(), transform_fcn);
+      }


adrianlizarraga added 8 commits May 21, 2026 23:16

Add CLI option (phase 1)

c1eb5b3

Implement phase 2 ( Multi-shape input gen )

2159606

Implement phase 3 (Deterministic Shape Cycling)

13bacdc

Implement phase 4 (reporting)

c8676df

Improve error messaging for use of stoll()

d20a8b4

Add warmup per shape group (all warmups happen at once)

fcf0594

Add test model

75574e9

Remove temporary implementation plan

94556ab

adrianlizarraga requested a review from Copilot May 22, 2026 08:35

Copilot started reviewing on behalf of adrianlizarraga May 22, 2026 08:36 View session

adrianlizarraga mentioned this pull request May 22, 2026

[Feature Request] Multi-Shape Profiling in Perftest Tool #28628

Open

github-advanced-security AI found potential problems May 22, 2026

View reviewed changes

Comment thread onnxruntime/test/testdata/dynamic_shape_add.py Fixed

Copilot AI reviewed May 22, 2026

View reviewed changes

Comment thread onnxruntime/test/perftest/performance_runner.cc Outdated

Comment thread onnxruntime/test/perftest/strings_helper.cc Outdated

Comment thread onnxruntime/test/perftest/test_configuration.h

Comment thread onnxruntime/test/perftest/strings_helper.cc

adrianlizarraga added 2 commits May 22, 2026 01:45

run linter

030002a

Address review comments

f1f3bcf

adrianlizarraga requested a review from Copilot May 22, 2026 09:03

Copilot started reviewing on behalf of adrianlizarraga May 22, 2026 09:04 View session

Copilot AI reviewed May 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multi-shape profiling support to perftest tool (`--data_shape`)#28638

Add multi-shape profiling support to perftest tool (`--data_shape`)#28638
adrianlizarraga wants to merge 10 commits into
microsoft:mainfrom
adrianlizarraga:adrianl/PerfTest_DynShapes

adrianlizarraga commented May 22, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

adrianlizarraga commented May 22, 2026

Description

Example

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants