Add multi-shape profiling support to perftest tool (--data_shape)#28638
Draft
adrianlizarraga wants to merge 10 commits into
Draft
Add multi-shape profiling support to perftest tool (--data_shape)#28638adrianlizarraga wants to merge 10 commits into
--data_shape)#28638adrianlizarraga wants to merge 10 commits into
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Adds multi-shape profiling to onnxruntime_perf_test via a new --data_shape flag, enabling round-robin benchmarking across multiple input-shape groups within a single session and reporting per-shape latency stats. This fits into the perftest tooling by extending its generated-input path (-I) and result reporting without requiring separate benchmark runs per shape.
Changes:
- Introduces
--data_shapeCLI parsing and stores shape-group specs inRunConfig. - Generates and cycles through multiple preloaded input sets (round-robin) and records per-shape-group latency.
- Extends perf result reporting (
stdout+DumpToFile) with per-shape latency summaries and adds a small dynamic-shape ONNX model generator for manual testing.
Reviewed changes
Copilot reviewed 10 out of 11 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| onnxruntime/test/testdata/dynamic_shape_add.py | Adds a small ONNX model generator with dynamic dims to exercise --data_shape. |
| onnxruntime/test/perftest/test_session.h | Extends RunTiming to carry the shape-group index used for the iteration. |
| onnxruntime/test/perftest/test_configuration.h | Adds RunConfig::data_shape_groups to hold per-input shape groups. |
| onnxruntime/test/perftest/strings_helper.h | Declares ParseDataShapeGroups helper for parsing --data_shape. |
| onnxruntime/test/perftest/strings_helper.cc | Implements parsing of multi-shape specs into shape groups. |
| onnxruntime/test/perftest/performance_runner.h | Adds per-shape timing storage and plumbing to record per-iteration shape-group timings. |
| onnxruntime/test/perftest/performance_runner.cc | Prints per-shape latency stats and warms up one iteration per shape group; wires multi-shape input generation. |
| onnxruntime/test/perftest/ort_test_session.h | Adds state for round-robin shape selection and multi-shape input population API. |
| onnxruntime/test/perftest/ort_test_session.cc | Implements multi-shape input generation and round-robin selection logic, and returns the shape-group index. |
| onnxruntime/test/perftest/command_args_parser.cc | Adds the --data_shape flag and enforces -I requirement. |
Comments suppressed due to low confidence (1)
onnxruntime/test/perftest/strings_helper.h:11
strings_helper.hdeclares APIs usingstd::stringbut does not include<string>. Relying on transitive includes is fragile and can break builds when include order changes; please add#include <string>here (alongside the newly added<cstdint>).
#include <cstdint>
#include <string_view>
#include <map>
#include <unordered_map>
#include <unordered_set>
#include <vector>
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+165
to
+172
| for (char c : input) { | ||
| if (c == '[') { | ||
| bracket_depth++; | ||
| current += c; | ||
| } else if (c == ']') { | ||
| bracket_depth--; | ||
| current += c; | ||
| } else if ((c == ' ' || c == '\t') && bracket_depth == 0) { |
Comment on lines
+207
to
+211
| // warm up — run one iteration per shape group to ensure all shapes are warmed | ||
| size_t warmup_count = 1; | ||
| if (!performance_test_config_.run_config.data_shape_groups.empty()) { | ||
| warmup_count = performance_test_config_.run_config.data_shape_groups.begin()->second.size(); | ||
| } |
Comment on lines
+49
to
+52
| } else { | ||
| const std::uniform_int_distribution<int>::param_type p(0, static_cast<int>(test_inputs_.size() - 1)); | ||
| id = static_cast<size_t>(dist_(rand_engine_, p)); | ||
| } |
Comment on lines
+1169
to
+1180
| auto tensor_info = type_info.GetTensorTypeAndShapeInfo(); | ||
| std::vector<int64_t> input_node_dim; | ||
|
|
||
| // Use user-specified shape if available, otherwise fall back to model metadata | ||
| auto it = data_shape_groups.find(input_names_str_[i]); | ||
| if (it != data_shape_groups.end()) { | ||
| input_node_dim = it->second[g]; | ||
| } else { | ||
| input_node_dim = tensor_info.GetShape(); | ||
| auto transform_fcn = [](int64_t input) { return (input == -1) ? -input : input; }; | ||
| std::transform(input_node_dim.begin(), input_node_dim.end(), input_node_dim.begin(), transform_fcn); | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds a
--data_shapeCLI flag toonnxruntime_perf_testthat allows benchmarking multiple input shapes within a single session. The tool cycles through shape groups round-robin, collecting and reporting per-shape latency statistics. This is useful for profiling models with dynamic dimensions across representative input sizes without running separate benchmark sessions.Syntax:
Requires
-I(generated test input mode).Example
onnxruntime_perf_test.exe -I -t 5 \ --data_shape "A:[1,16,4][2,32,4] B:[1,16,4][2,32,4]" \ dynamic_shape_add.onnxOutput:
The first block is the main stdout report (latencies in ms). The second block is
DumpToFile()output printed to stdout when no-ofile is specified (latencies in seconds, matching the CSV file format).