Skip to content

Commit ee48afc

Browse files
committed
resolve comments
Signed-off-by: Will Guo <willg@nvidia.com>
1 parent cda5eb9 commit ee48afc

File tree

2 files changed

+34
-183
lines changed

2 files changed

+34
-183
lines changed

examples/onnx/autoqdq/README.md

Lines changed: 34 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -4,22 +4,22 @@ This example demonstrates automated Q/DQ (Quantize/Dequantize) node placement op
44

55
## Table of Contents
66

7-
- [Prerequisites](#prerequisites)
8-
- [Get the Model](#get-the-model)
9-
- [Set Fixed Batch Size](#set-fixed-batch-size)
10-
- [What's in This Directory](#whats-in-this-directory)
11-
- [Quick Start](#quick-start)
12-
- [Basic Usage](#basic-usage)
13-
- [FP8 Quantization](#fp8-quantization)
14-
- [Faster Exploration](#faster-exploration)
15-
- [Output Structure](#output-structure)
16-
- [Region Inspection](#region-inspection)
17-
- [Using the Optimized Model](#using-the-optimized-model)
18-
- [Pattern Cache](#pattern-cache)
19-
- [Optimize from Existing QDQ Model](#optimize-from-existing-qdq-model)
20-
- [Remote Autotuning with TensorRT](#remote-autotuning-with-tensorrt)
21-
- [Programmatic API Usage](#programmatic-api-usage)
22-
- [Documentation](#documentation)
7+
<div align="center">
8+
9+
| **Section** | **Description** | **Link** | **Docs** |
10+
| :------------: | :------------: | :------------: | :------------: |
11+
| Prerequisites | Get the model, set fixed batch size, and directory overview | [Link](#prerequisites) | |
12+
| Quick Start | Basic usage, FP8 quantization, and faster exploration | [Link](#quick-start) | |
13+
| Output Structure | Output workspace layout and files | [Link](#output-structure) | |
14+
| Region Inspection | Debug region discovery and partitioning | [Link](#region-inspection) | |
15+
| Using the Optimized Model | Deploy with TensorRT | [Link](#using-the-optimized-model) | |
16+
| Pattern Cache | Reuse learned patterns on similar models | [Link](#pattern-cache) | |
17+
| Optimize from Existing QDQ Model | Start from an existing quantized model | [Link](#optimize-from-existing-qdq-model) | |
18+
| Remote Autotuning with TensorRT | Offload autotuning to remote hardware | [Link](#remote-autotuning-with-tensorrt) | |
19+
| Programmatic API Usage | Python API and low-level control | [Link](#programmatic-api-usage) | |
20+
| Documentation | User guide and API reference | [Link](#documentation) | [docs](https://nvidia.github.io/Model-Optimizer/) |
21+
22+
</div>
2323

2424
## Prerequisites
2525

@@ -34,23 +34,16 @@ curl -L -o resnet50_Opset17.onnx https://github.com/onnx/models/raw/main/Compute
3434

3535
### Set Fixed Batch Size
3636

37-
The downloaded model has a dynamic batch size. For best performance with TensorRT benchmarking, set a fixed batch size:
37+
The downloaded model has a dynamic batch size. For best performance with TensorRT benchmarking, set a fixed batch size using Polygraphy:
3838

3939
```bash
40-
# Set batch size to 128 using the provided script
41-
python3 set_batch_size.py resnet50_Opset17.onnx --batch-size 128 --output resnet50.bs128.onnx
42-
43-
# Or for other batch sizes
44-
python3 set_batch_size.py resnet50_Opset17.onnx --batch-size 1 --output resnet50.bs1.onnx
40+
polygraphy surgeon sanitize --override-input-shapes x:[128,3,1024,1024] -o resnet50_Opset17_bs128.onnx resnet50_Opset17.onnx
4541
```
4642

47-
This creates `resnet50.bs128.onnx` with a fixed batch size of 128, which is optimal for TensorRT performance benchmarking.
48-
49-
**Note:** The script requires the `onnx` package.
43+
For other batch sizes, change the first dimension in the shape (e.g. `x:[1,3,1024,1024]` for batch size 1).
5044

5145
### What's in This Directory
5246

53-
- `set_batch_size.py` - Script to convert dynamic batch size models to fixed batch size
5447
- `README.md` - This guide
5548

5649
**Note:** ONNX model files are not included in the repository (excluded via `.gitignore`). Download and prepare them using the instructions above.
@@ -64,7 +57,7 @@ Optimize the ResNet50 model with INT8 quantization:
6457
```bash
6558
# Using the fixed batch size model
6659
python3 -m modelopt.onnx.quantization.autotune \
67-
--onnx_path resnet50.bs128.onnx \
60+
--onnx_path resnet50_Opset17_bs128.onnx \
6861
--output_dir ./resnet50_results \
6962
--quant_type int8 \
7063
--schemes_per_region 30
@@ -92,7 +85,7 @@ For FP8 quantization:
9285

9386
```bash
9487
python3 -m modelopt.onnx.quantization.autotune \
95-
--onnx_path resnet50.bs128.onnx \
88+
--onnx_path resnet50_Opset17_bs128.onnx \
9689
--output_dir ./resnet50_fp8_results \
9790
--quant_type fp8 \
9891
--schemes_per_region 50
@@ -104,7 +97,7 @@ For quick experiments, reduce the number of schemes:
10497

10598
```bash
10699
python3 -m modelopt.onnx.quantization.autotune \
107-
--onnx_path resnet50.bs128.onnx \
100+
--onnx_path resnet50_Opset17_bs128.onnx \
108101
--output_dir ./resnet50_quick \
109102
--schemes_per_region 15
110103
```
@@ -133,16 +126,16 @@ To debug how the autotuner discovers and partitions regions in your model, use t
133126

134127
```bash
135128
# Basic inspection (regions with quantizable ops only)
136-
python3 -m modelopt.onnx.quantization.autotune.region_inspect --model resnet50.bs128.onnx
129+
python3 -m modelopt.onnx.quantization.autotune.region_inspect --model resnet50_Opset17_bs128.onnx
137130

138131
# Verbose mode for detailed debug logging
139-
python3 -m modelopt.onnx.quantization.autotune.region_inspect --model resnet50.bs128.onnx --verbose
132+
python3 -m modelopt.onnx.quantization.autotune.region_inspect --model resnet50_Opset17_bs128.onnx --verbose
140133

141134
# Custom maximum sequence region size
142-
python3 -m modelopt.onnx.quantization.autotune.region_inspect --model resnet50.bs128.onnx --max-sequence-size 20
135+
python3 -m modelopt.onnx.quantization.autotune.region_inspect --model resnet50_Opset17_bs128.onnx --max-sequence-size 20
143136

144137
# Include all regions (including those without Conv/MatMul etc.)
145-
python3 -m modelopt.onnx.quantization.autotune.region_inspect --model resnet50.bs128.onnx --include-all-regions
138+
python3 -m modelopt.onnx.quantization.autotune.region_inspect --model resnet50_Opset17_bs128.onnx --include-all-regions
146139
```
147140

148141
Short option: `-m` for `--model`, `-v` for `--verbose`. Use this to verify region boundaries and counts before or during autotuning.
@@ -164,16 +157,16 @@ Reuse learned patterns on similar models (warm-start):
164157
```bash
165158
# First optimization on ResNet50
166159
python3 -m modelopt.onnx.quantization.autotune \
167-
--onnx_path resnet50.bs128.onnx \
160+
--onnx_path resnet50_Opset17_bs128.onnx \
168161
--output_dir ./resnet50_run
169162

170163
# Download and prepare ResNet101 (or any similar model)
171-
curl -L -o resnet101_Opset17.onnx https://github.com/onnx/models/blob/main/Computer_Vision/resnet101_Opset17_torch_hub/resnet101_Opset17.onnx
172-
python3 set_batch_size.py resnet101_Opset17.onnx --batch-size 128 --output resnet101.bs128.onnx
164+
curl -L -o resnet101_Opset17.onnx https://github.com/onnx/models/raw/main/Computer_Vision/resnet101_Opset17_torch_hub/resnet101_Opset17.onnx
165+
polygraphy surgeon sanitize --override-input-shapes x:[128,3,1024,1024] -o resnet101_Opset17_bs128.onnx resnet101_Opset17.onnx
173166

174167
# Reuse patterns from ResNet50 on ResNet101
175168
python3 -m modelopt.onnx.quantization.autotune \
176-
--onnx_path resnet101.bs128.onnx \
169+
--onnx_path resnet101_Opset17_bs128.onnx \
177170
--output_dir ./resnet101_run \
178171
--pattern_cache ./resnet50_run/autotuner_state_pattern_cache.yaml
179172
```
@@ -185,7 +178,7 @@ If the user already have a quantized model, he can use it as a starting point to
185178
```bash
186179
# Use an existing QDQ model as baseline (imports quantization patterns)
187180
python3 -m modelopt.onnx.quantization.autotune \
188-
--onnx_path resnet50.bs128.onnx \
181+
--onnx_path resnet50_Opset17_bs128.onnx \
189182
--output_dir ./resnet50_improved \
190183
--qdq_baseline resnet50_quantized.onnx \
191184
--schemes_per_region 40
@@ -216,7 +209,7 @@ from modelopt.onnx.quantization import quantize
216209
# Create dummy calibration data (replace with real data for production)
217210
dummy_input = np.random.randn(128, 3, 224, 224).astype(np.float32)
218211
quantize(
219-
'resnet50.bs128.onnx',
212+
'resnet50_Opset17_bs128.onnx',
220213
calibration_data=dummy_input,
221214
calibration_method='entropy',
222215
output_path='resnet50_quantized.onnx'
@@ -226,7 +219,7 @@ quantize(
226219
# Step 2: Use the quantized baseline for autotuning
227220
# The autotuner will try to find better Q/DQ placements than the initial quantization
228221
python3 -m modelopt.onnx.quantization.autotune \
229-
--onnx_path resnet50.bs128.onnx \
222+
--onnx_path resnet50_Opset17_bs128.onnx \
230223
--output_dir ./resnet50_autotuned \
231224
--qdq_baseline resnet50_quantized.onnx \
232225
--schemes_per_region 50
@@ -242,7 +235,7 @@ To use remote autotuning during Q/DQ placement optimization, run with `trtexec`
242235

243236
```bash
244237
python3 -m modelopt.onnx.quantization.autotune \
245-
--onnx_path resnet50.bs128.onnx \
238+
--onnx_path resnet50_Opset17_bs128.onnx \
246239
--output_dir ./resnet50_remote_autotuned \
247240
--schemes_per_region 50 \
248241
--use_trtexec \

examples/onnx/autoqdq/set_batch_size.py

Lines changed: 0 additions & 142 deletions
This file was deleted.

0 commit comments

Comments
 (0)