Skip to content

Commit e589ac8

Browse files
authored
Integrate Automated QDQ benchmark - part 3.1 (#837)
## What does this PR do? This PR integrates benchmark module to QDQ autotunner. This benchamrk module is used to evaluate ONNX model perf. This PR is 1/3 of #703. Once all small PRs are merged #703 could be closed. PR 3.1: #837 PR 3.2 #838 PR 3.3: #839 ## Testing <!-- Mention how have you tested your change if applicable. --> ## Before your PR is "*Ready for review*" <!-- If you haven't finished some of the above items you can still open `Draft` PR. --> - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes - **Did you write any new necessary tests?**: No - **Did you add or update any necessary documentation?**: No, document will be added in part 4. - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: No, change log will be updated when all changes are merged. ## Additional Information <!-- E.g. related issue. --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added ONNX quantization autotuning capabilities with a consolidated module providing streamlined import paths for core components. * Introduced unified benchmarking framework supporting TensorRT-based model evaluation with both command-line and Python API implementations. * Added support for timing cache persistence, custom plugin libraries, shape validation, and dynamic input shape configuration for flexible model testing and optimization. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Will Guo <willg@nvidia.com>
1 parent d78797b commit e589ac8

File tree

4 files changed

+1049
-0
lines changed

4 files changed

+1049
-0
lines changed
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
"""Pattern-Based Q/DQ Autotuning for ONNX Models.
17+
18+
This package provides automated optimization of Quantize/Dequantize (Q/DQ) node placement
19+
in ONNX computation graphs to minimize TensorRT inference latency. It uses pattern-based
20+
region analysis to efficiently explore and optimize Q/DQ insertion strategies.
21+
"""
22+
23+
# Core data structures
24+
from .benchmark import TensorRTPyBenchmark, TrtExecBenchmark
25+
from .common import (
26+
AutotunerError,
27+
AutotunerNotInitializedError,
28+
InsertionScheme,
29+
InvalidSchemeError,
30+
Region,
31+
RegionType,
32+
)
33+
from .insertion_points import (
34+
ChildRegionInputInsertionPoint,
35+
ChildRegionOutputInsertionPoint,
36+
NodeInputInsertionPoint,
37+
ResolvedInsertionPoint,
38+
)
39+
from .region_pattern import RegionPattern
40+
from .region_search import CombinedRegionSearch
41+
42+
__all__ = [
43+
"AutotunerError",
44+
"AutotunerNotInitializedError",
45+
"ChildRegionInputInsertionPoint",
46+
"ChildRegionOutputInsertionPoint",
47+
"CombinedRegionSearch",
48+
"InsertionScheme",
49+
"InvalidSchemeError",
50+
"NodeInputInsertionPoint",
51+
"Region",
52+
"RegionPattern",
53+
"RegionType",
54+
"ResolvedInsertionPoint",
55+
"TensorRTPyBenchmark",
56+
"TrtExecBenchmark",
57+
]

0 commit comments

Comments
 (0)