Skip to content

Add multi-shape profiling support to perftest tool (--data_shape)#28638

Draft
adrianlizarraga wants to merge 10 commits into
microsoft:mainfrom
adrianlizarraga:adrianl/PerfTest_DynShapes
Draft

Add multi-shape profiling support to perftest tool (--data_shape)#28638
adrianlizarraga wants to merge 10 commits into
microsoft:mainfrom
adrianlizarraga:adrianl/PerfTest_DynShapes

Conversation

@adrianlizarraga
Copy link
Copy Markdown
Contributor

Description

Adds a --data_shape CLI flag to onnxruntime_perf_test that allows benchmarking multiple input shapes within a single session. The tool cycles through shape groups round-robin, collecting and reporting per-shape latency statistics. This is useful for profiling models with dynamic dimensions across representative input sizes without running separate benchmark sessions.

Syntax:

--data_shape "input_name:[d0,d1,...][d0,d1,...] input_name2:[d0,d1][d0,d1]"

Requires -I (generated test input mode).

Example

onnxruntime_perf_test.exe -I -t 5 \
    --data_shape "A:[1,16,4][2,32,4] B:[1,16,4][2,32,4]" \
    dynamic_shape_add.onnx

Output:

Session creation time cost: 0.227376 s
First inference time cost: 1 ms
Total inference time cost: 5.00025 s
Total inference requests: 14548
Average inference time cost total: 0.343707 ms
Total inference run time: 5.11098 s
Number of inferences per second: 2846.42
Avg CPU usage: 5 %
Peak working set size: 101904384 bytes

Latency per shape group:
  1. A : [1,16,4], B : [1,16,4]
      Iterations: 7274
      Average Latency: 0.343136 ms
      Min Latency: 0.3173 ms
      Max Latency: 1.2603 ms
      P50 Latency: 0.3353 ms
      P90 Latency: 0.3656 ms
      P95 Latency: 0.3737 ms
      P99 Latency: 0.4426 ms
  2. A : [2,32,4], B : [2,32,4]
      Iterations: 7274
      Average Latency: 0.344277 ms
      Min Latency: 0.3181 ms
      Max Latency: 1.1894 ms
      P50 Latency: 0.336 ms
      P90 Latency: 0.3664 ms
      P95 Latency: 0.3746 ms
      P99 Latency: 0.4402 ms
Avg CPU usage:5
Peak working set size:101904384
Runs:14548
Min Latency: 0.0003173 s
Max Latency: 0.0012603 s
P50 Latency: 0.0003357 s
P90 Latency: 0.000366 s
P95 Latency: 0.000374 s
P99 Latency: 0.0004403 s
P999 Latency: 0.0006384 s
Shape group 1 (7274 iterations):
  Min Latency: 0.0003173 s
  Max Latency: 0.0012603 s
  P50 Latency: 0.0003353 s
  P90 Latency: 0.0003656 s
  P95 Latency: 0.0003737 s
  P99 Latency: 0.0004426 s
Shape group 2 (7274 iterations):
  Min Latency: 0.0003181 s
  Max Latency: 0.0011894 s
  P50 Latency: 0.000336 s
  P90 Latency: 0.0003664 s
  P95 Latency: 0.0003746 s
  P99 Latency: 0.0004402 s

The first block is the main stdout report (latencies in ms). The second block is DumpToFile() output printed to stdout when no -o file is specified (latencies in seconds, matching the CSV file format).

Comment thread onnxruntime/test/testdata/dynamic_shape_add.py Fixed
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds multi-shape profiling to onnxruntime_perf_test via a new --data_shape flag, enabling round-robin benchmarking across multiple input-shape groups within a single session and reporting per-shape latency stats. This fits into the perftest tooling by extending its generated-input path (-I) and result reporting without requiring separate benchmark runs per shape.

Changes:

  • Introduces --data_shape CLI parsing and stores shape-group specs in RunConfig.
  • Generates and cycles through multiple preloaded input sets (round-robin) and records per-shape-group latency.
  • Extends perf result reporting (stdout + DumpToFile) with per-shape latency summaries and adds a small dynamic-shape ONNX model generator for manual testing.

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
onnxruntime/test/testdata/dynamic_shape_add.py Adds a small ONNX model generator with dynamic dims to exercise --data_shape.
onnxruntime/test/perftest/test_session.h Extends RunTiming to carry the shape-group index used for the iteration.
onnxruntime/test/perftest/test_configuration.h Adds RunConfig::data_shape_groups to hold per-input shape groups.
onnxruntime/test/perftest/strings_helper.h Declares ParseDataShapeGroups helper for parsing --data_shape.
onnxruntime/test/perftest/strings_helper.cc Implements parsing of multi-shape specs into shape groups.
onnxruntime/test/perftest/performance_runner.h Adds per-shape timing storage and plumbing to record per-iteration shape-group timings.
onnxruntime/test/perftest/performance_runner.cc Prints per-shape latency stats and warms up one iteration per shape group; wires multi-shape input generation.
onnxruntime/test/perftest/ort_test_session.h Adds state for round-robin shape selection and multi-shape input population API.
onnxruntime/test/perftest/ort_test_session.cc Implements multi-shape input generation and round-robin selection logic, and returns the shape-group index.
onnxruntime/test/perftest/command_args_parser.cc Adds the --data_shape flag and enforces -I requirement.
Comments suppressed due to low confidence (1)

onnxruntime/test/perftest/strings_helper.h:11

  • strings_helper.h declares APIs using std::string but does not include <string>. Relying on transitive includes is fragile and can break builds when include order changes; please add #include <string> here (alongside the newly added <cstdint>).
#include <cstdint>
#include <string_view>
#include <map>
#include <unordered_map>
#include <unordered_set>
#include <vector>


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/test/perftest/performance_runner.cc Outdated
Comment thread onnxruntime/test/perftest/strings_helper.cc Outdated
Comment thread onnxruntime/test/perftest/test_configuration.h
Comment thread onnxruntime/test/perftest/strings_helper.cc
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 11 changed files in this pull request and generated 4 comments.

Comment on lines +165 to +172
for (char c : input) {
if (c == '[') {
bracket_depth++;
current += c;
} else if (c == ']') {
bracket_depth--;
current += c;
} else if ((c == ' ' || c == '\t') && bracket_depth == 0) {
Comment on lines +207 to +211
// warm up — run one iteration per shape group to ensure all shapes are warmed
size_t warmup_count = 1;
if (!performance_test_config_.run_config.data_shape_groups.empty()) {
warmup_count = performance_test_config_.run_config.data_shape_groups.begin()->second.size();
}
Comment on lines +49 to +52
} else {
const std::uniform_int_distribution<int>::param_type p(0, static_cast<int>(test_inputs_.size() - 1));
id = static_cast<size_t>(dist_(rand_engine_, p));
}
Comment on lines +1169 to +1180
auto tensor_info = type_info.GetTensorTypeAndShapeInfo();
std::vector<int64_t> input_node_dim;

// Use user-specified shape if available, otherwise fall back to model metadata
auto it = data_shape_groups.find(input_names_str_[i]);
if (it != data_shape_groups.end()) {
input_node_dim = it->second[g];
} else {
input_node_dim = tensor_info.GetShape();
auto transform_fcn = [](int64_t input) { return (input == -1) ? -input : input; };
std::transform(input_node_dim.begin(), input_node_dim.end(), input_node_dim.begin(), transform_fcn);
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants