@@ -4,22 +4,22 @@ This example demonstrates automated Q/DQ (Quantize/Dequantize) node placement op
44
55## Table of Contents
66
7- - [ Prerequisites ] ( #prerequisites )
8- - [ Get the Model ] ( #get-the-model )
9- - [ Set Fixed Batch Size ] ( #set-fixed-batch-size )
10- - [ What's in This Directory ] ( #whats-in-this-directory )
11- - [ Quick Start ] ( #quick-start )
12- - [ Basic Usage ] ( #basic-usage )
13- - [ FP8 Quantization ] ( #fp8-quantization )
14- - [ Faster Exploration ] ( #faster-exploration )
15- - [ Output Structure ] ( #output-structure )
16- - [ Region Inspection ] ( #region-inspection )
17- - [ Using the Optimized Model] ( #using-the-optimized- model )
18- - [ Pattern Cache ] ( #pattern-cache )
19- - [ Optimize from Existing QDQ Model ] ( #optimize-from-existing-qdq-model )
20- - [ Remote Autotuning with TensorRT ] ( #remote-autotuning-with-tensorrt )
21- - [ Programmatic API Usage ] ( #programmatic-api-usage )
22- - [ Documentation ] ( #documentation )
7+ < div align = " center " >
8+
9+ | ** Section ** | ** Description ** | ** Link ** | ** Docs ** |
10+ | :------------: | :------------: | :------------: | :------------: |
11+ | Prerequisites | Get the model, set fixed batch size, and directory overview | [ Link ] ( #prerequisites ) | |
12+ | Quick Start | Basic usage, FP8 quantization, and faster exploration | [ Link ] ( #quick-start ) | |
13+ | Output Structure | Output workspace layout and files | [ Link ] ( #output-structure ) | |
14+ | Region Inspection | Debug region discovery and partitioning | [ Link ] ( #region-inspection ) | |
15+ | Using the Optimized Model | Deploy with TensorRT | [ Link ] ( #using-the-optimized-model ) | |
16+ | Pattern Cache | Reuse learned patterns on similar models | [ Link ] ( #pattern-cache ) | |
17+ | Optimize from Existing QDQ Model | Start from an existing quantized model | [ Link ] ( #optimize-from-existing-qdq- model ) | |
18+ | Remote Autotuning with TensorRT | Offload autotuning to remote hardware | [ Link ] ( #remote-autotuning-with-tensorrt ) | |
19+ | Programmatic API Usage | Python API and low-level control | [ Link ] ( #programmatic-api-usage ) | |
20+ | Documentation | User guide and API reference | [ Link ] ( #documentation ) | [ docs ] ( https://nvidia.github.io/Model-Optimizer/ ) |
21+
22+ </ div >
2323
2424## Prerequisites
2525
@@ -34,23 +34,16 @@ curl -L -o resnet50_Opset17.onnx https://github.com/onnx/models/raw/main/Compute
3434
3535### Set Fixed Batch Size
3636
37- The downloaded model has a dynamic batch size. For best performance with TensorRT benchmarking, set a fixed batch size:
37+ The downloaded model has a dynamic batch size. For best performance with TensorRT benchmarking, set a fixed batch size using Polygraphy :
3838
3939``` bash
40- # Set batch size to 128 using the provided script
41- python3 set_batch_size.py resnet50_Opset17.onnx --batch-size 128 --output resnet50.bs128.onnx
42-
43- # Or for other batch sizes
44- python3 set_batch_size.py resnet50_Opset17.onnx --batch-size 1 --output resnet50.bs1.onnx
40+ polygraphy surgeon sanitize --override-input-shapes x:[128,3,1024,1024] -o resnet50_Opset17_bs128.onnx resnet50_Opset17.onnx
4541```
4642
47- This creates ` resnet50.bs128.onnx ` with a fixed batch size of 128, which is optimal for TensorRT performance benchmarking.
48-
49- ** Note:** The script requires the ` onnx ` package.
43+ For other batch sizes, change the first dimension in the shape (e.g. ` x:[1,3,1024,1024] ` for batch size 1).
5044
5145### What's in This Directory
5246
53- - ` set_batch_size.py ` - Script to convert dynamic batch size models to fixed batch size
5447- ` README.md ` - This guide
5548
5649** Note:** ONNX model files are not included in the repository (excluded via ` .gitignore ` ). Download and prepare them using the instructions above.
@@ -64,7 +57,7 @@ Optimize the ResNet50 model with INT8 quantization:
6457``` bash
6558# Using the fixed batch size model
6659python3 -m modelopt.onnx.quantization.autotune \
67- --onnx_path resnet50.bs128 .onnx \
60+ --onnx_path resnet50_Opset17_bs128 .onnx \
6861 --output_dir ./resnet50_results \
6962 --quant_type int8 \
7063 --schemes_per_region 30
@@ -92,7 +85,7 @@ For FP8 quantization:
9285
9386``` bash
9487python3 -m modelopt.onnx.quantization.autotune \
95- --onnx_path resnet50.bs128 .onnx \
88+ --onnx_path resnet50_Opset17_bs128 .onnx \
9689 --output_dir ./resnet50_fp8_results \
9790 --quant_type fp8 \
9891 --schemes_per_region 50
@@ -104,7 +97,7 @@ For quick experiments, reduce the number of schemes:
10497
10598``` bash
10699python3 -m modelopt.onnx.quantization.autotune \
107- --onnx_path resnet50.bs128 .onnx \
100+ --onnx_path resnet50_Opset17_bs128 .onnx \
108101 --output_dir ./resnet50_quick \
109102 --schemes_per_region 15
110103```
@@ -133,16 +126,16 @@ To debug how the autotuner discovers and partitions regions in your model, use t
133126
134127``` bash
135128# Basic inspection (regions with quantizable ops only)
136- python3 -m modelopt.onnx.quantization.autotune.region_inspect --model resnet50.bs128 .onnx
129+ python3 -m modelopt.onnx.quantization.autotune.region_inspect --model resnet50_Opset17_bs128 .onnx
137130
138131# Verbose mode for detailed debug logging
139- python3 -m modelopt.onnx.quantization.autotune.region_inspect --model resnet50.bs128 .onnx --verbose
132+ python3 -m modelopt.onnx.quantization.autotune.region_inspect --model resnet50_Opset17_bs128 .onnx --verbose
140133
141134# Custom maximum sequence region size
142- python3 -m modelopt.onnx.quantization.autotune.region_inspect --model resnet50.bs128 .onnx --max-sequence-size 20
135+ python3 -m modelopt.onnx.quantization.autotune.region_inspect --model resnet50_Opset17_bs128 .onnx --max-sequence-size 20
143136
144137# Include all regions (including those without Conv/MatMul etc.)
145- python3 -m modelopt.onnx.quantization.autotune.region_inspect --model resnet50.bs128 .onnx --include-all-regions
138+ python3 -m modelopt.onnx.quantization.autotune.region_inspect --model resnet50_Opset17_bs128 .onnx --include-all-regions
146139```
147140
148141Short option: ` -m ` for ` --model ` , ` -v ` for ` --verbose ` . Use this to verify region boundaries and counts before or during autotuning.
@@ -164,16 +157,16 @@ Reuse learned patterns on similar models (warm-start):
164157``` bash
165158# First optimization on ResNet50
166159python3 -m modelopt.onnx.quantization.autotune \
167- --onnx_path resnet50.bs128 .onnx \
160+ --onnx_path resnet50_Opset17_bs128 .onnx \
168161 --output_dir ./resnet50_run
169162
170163# Download and prepare ResNet101 (or any similar model)
171- curl -L -o resnet101_Opset17.onnx https://github.com/onnx/models/blob /main/Computer_Vision/resnet101_Opset17_torch_hub/resnet101_Opset17.onnx
172- python3 set_batch_size.py resnet101_Opset17.onnx --batch-size 128 --output resnet101.bs128 .onnx
164+ curl -L -o resnet101_Opset17.onnx https://github.com/onnx/models/raw /main/Computer_Vision/resnet101_Opset17_torch_hub/resnet101_Opset17.onnx
165+ polygraphy surgeon sanitize --override-input-shapes x:[ 128,3,1024,1024] -o resnet101_Opset17_bs128.onnx resnet101_Opset17 .onnx
173166
174167# Reuse patterns from ResNet50 on ResNet101
175168python3 -m modelopt.onnx.quantization.autotune \
176- --onnx_path resnet101.bs128 .onnx \
169+ --onnx_path resnet101_Opset17_bs128 .onnx \
177170 --output_dir ./resnet101_run \
178171 --pattern_cache ./resnet50_run/autotuner_state_pattern_cache.yaml
179172```
@@ -185,7 +178,7 @@ If the user already have a quantized model, he can use it as a starting point to
185178``` bash
186179# Use an existing QDQ model as baseline (imports quantization patterns)
187180python3 -m modelopt.onnx.quantization.autotune \
188- --onnx_path resnet50.bs128 .onnx \
181+ --onnx_path resnet50_Opset17_bs128 .onnx \
189182 --output_dir ./resnet50_improved \
190183 --qdq_baseline resnet50_quantized.onnx \
191184 --schemes_per_region 40
@@ -216,7 +209,7 @@ from modelopt.onnx.quantization import quantize
216209# Create dummy calibration data (replace with real data for production)
217210dummy_input = np.random.randn(128, 3, 224, 224).astype(np.float32)
218211quantize(
219- 'resnet50.bs128 .onnx',
212+ 'resnet50_Opset17_bs128 .onnx',
220213 calibration_data=dummy_input,
221214 calibration_method='entropy',
222215 output_path='resnet50_quantized.onnx'
@@ -226,7 +219,7 @@ quantize(
226219# Step 2: Use the quantized baseline for autotuning
227220# The autotuner will try to find better Q/DQ placements than the initial quantization
228221python3 -m modelopt.onnx.quantization.autotune \
229- --onnx_path resnet50.bs128 .onnx \
222+ --onnx_path resnet50_Opset17_bs128 .onnx \
230223 --output_dir ./resnet50_autotuned \
231224 --qdq_baseline resnet50_quantized.onnx \
232225 --schemes_per_region 50
@@ -242,7 +235,7 @@ To use remote autotuning during Q/DQ placement optimization, run with `trtexec`
242235
243236``` bash
244237python3 -m modelopt.onnx.quantization.autotune \
245- --onnx_path resnet50.bs128 .onnx \
238+ --onnx_path resnet50_Opset17_bs128 .onnx \
246239 --output_dir ./resnet50_remote_autotuned \
247240 --schemes_per_region 50 \
248241 --use_trtexec \
0 commit comments