Skip to content

Commit dec2952

Browse files
[6034518] Downgrade TRT support for remote autotuning in Autotune from 10.16 to 10.15 (#1259)
### What does this PR do? Type of change: Bug fix Remote autotuning is supported in TensorRT from version 10.15, but fails with Autotune as it's checking for 10.16+. This PR fixes that check and updates documentation accordingly. ### Usage ```python # Add a code snippet demonstrating how to use this ``` ### Testing See bug 6034518. ### Before your PR is "*Ready for review*" - Is this change backward compatible?: ✅ - If you copied code from any other sources or added a new PIP dependency, did you follow guidance in `CONTRIBUTING.md`: N/A <!--- Mandatory --> - Did you write any new necessary tests?: N/A <!--- Mandatory for new features or examples. --> - Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?: ✅ <!--- Only for new features, API changes, critical bug fixes or backward incompatible changes. --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Documentation** * Added a Remote Autotuning guide for TensorRT 10.15+ with CLI examples; updated examples to require `--safe --skipInference`. * **Updates** * Lowered TensorRT minimum requirement for remote autotuning from 10.16 to 10.15. * Clarified CLI help text for trtexec/autotune arguments. * **Bug Fixes** * trtexec-based autotuning now verifies the trtexec executable version when checking compatibility. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com> Signed-off-by: dmoodie <dmoodie@nvidia.com> Co-authored-by: dmoodie <dmoodie@nvidia.com>
1 parent 7c85571 commit dec2952

File tree

6 files changed

+109
-11
lines changed

6 files changed

+109
-11
lines changed

CHANGELOG.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ Changelog
2222
**Bug Fixes**
2323

2424
- Fix Minitron pruning (``mcore_minitron``) for MoE models. Importance estimation hooks were incorrectly registered for MoE modules and NAS step was hanging before this.
25+
- Fix TRT support for remote autotuning in ONNX Autotune from 10.16+ to 10.15+ and fix TRT versioning check to the ``trtexec`` version instead of the TRT Python API when using ``trtexec`` backend.
2526

2627
**Misc**
2728

docs/source/guides/9_autotune.rst

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -221,6 +221,31 @@ If the model uses custom TensorRT operations, provide the plugin libraries:
221221
--output_dir ./results \
222222
--plugin_libraries /path/to/plugin1.so /path/to/plugin2.so
223223
224+
Remote Autotuning
225+
-----------------------
226+
227+
TensorRT 10.15+ supports remote autotuning in safety mode (``--safe``), which allows TensorRT's optimization process to be offloaded to a remote hardware. This is useful when optimizing models for different target GPUs without having direct access to them.
228+
229+
To use remote autotuning during Q/DQ placement optimization, run with ``trtexec`` and pass extra args:
230+
231+
.. code-block:: bash
232+
233+
python -m modelopt.onnx.quantization.autotune \
234+
--onnx_path model.onnx \
235+
--output_dir ./model_remote_autotuned \
236+
--schemes_per_region 50 \
237+
--use_trtexec \
238+
--trtexec_benchmark_args "--remoteAutoTuningConfig=\"<remote autotuning config>\" --safe --skipInference"
239+
240+
**Requirements:**
241+
242+
* TensorRT 10.15 or later
243+
* Valid remote autotuning configuration
244+
* ``--use_trtexec`` must be set (benchmarking uses ``trtexec`` instead of the TensorRT Python API)
245+
* ``--safe --skipInference`` must be enabled via ``--trtexec_benchmark_args``
246+
247+
Replace ``<remote autotuning config>`` with an actual remote autotuning configuration string (see ``trtexec --help`` for more details). Other TensorRT benchmark options (e.g. ``--timing_cache``, ``--warmup_runs``, ``--timing_runs``, ``--plugin_libraries``) are also available; run ``--help`` for details.
248+
224249
Low-Level API Usage
225250
===================
226251

examples/onnx_ptq/autotune/README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -229,7 +229,7 @@ python3 -m modelopt.onnx.quantization.autotune \
229229

230230
## Remote Autotuning with TensorRT
231231

232-
TensorRT 10.16+ supports remote autotuning in safety mode (`--safe`), which allows TensorRT's optimization process to be offloaded to a remote hardware. This is useful when optimizing models for different target GPUs without having direct access to them.
232+
TensorRT 10.15+ supports remote autotuning in safety mode (`--safe`), which allows TensorRT's optimization process to be offloaded to a remote hardware. This is useful when optimizing models for different target GPUs without having direct access to them.
233233

234234
To use remote autotuning during Q/DQ placement optimization, run with `trtexec` and pass extra args:
235235

@@ -239,15 +239,15 @@ python3 -m modelopt.onnx.quantization.autotune \
239239
--output_dir ./resnet50_remote_autotuned \
240240
--schemes_per_region 50 \
241241
--use_trtexec \
242-
--trtexec_benchmark_args "--remoteAutoTuningConfig=\"<remote autotuning config>\" --safe"
242+
--trtexec_benchmark_args "--remoteAutoTuningConfig=\"<remote autotuning config>\" --safe --skipInference"
243243
```
244244

245245
**Requirements:**
246246

247-
- TensorRT 10.16 or later
247+
- TensorRT 10.15 or later
248248
- Valid remote autotuning configuration
249249
- `--use_trtexec` must be set (benchmarking uses `trtexec` instead of the TensorRT Python API)
250-
- `--safe` must be enabled via `--trtexec_benchmark_args`
250+
- `--safe --skipInference` must be enabled via `--trtexec_benchmark_args`
251251

252252
Replace `<remote autotuning config>` with an actual remote autotuning configuration string (see `trtexec --help` for more details).
253253
Other TensorRT benchmark options (e.g. `--timing_cache`, `--warmup_runs`, `--timing_runs`, `--plugin_libraries`) are also available; run `--help` for details.

modelopt/onnx/quantization/__main__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -398,8 +398,8 @@ def get_parser() -> argparse.ArgumentParser:
398398
type=str,
399399
default=None,
400400
help=(
401-
"Additional trtexec arguments as a single quoted string. "
402-
"Example: --autotune_trtexec_args '--fp16 --workspace=4096'"
401+
"Additional 'trtexec' arguments as a single quoted string. Only relevant when '--autotune_use_trtexec' is "
402+
"set. Example: '--fp16 --workspace=4096'"
403403
),
404404
)
405405
return argparser

modelopt/onnx/quantization/autotune/benchmark.py

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@
4242
import torch
4343

4444
from modelopt.onnx.logging_config import logger
45-
from modelopt.onnx.quantization.ort_utils import _check_for_tensorrt
45+
from modelopt.onnx.quantization.ort_utils import _check_for_trtexec
4646

4747
TRT_AVAILABLE = importlib.util.find_spec("tensorrt") is not None
4848
if TRT_AVAILABLE:
@@ -208,17 +208,22 @@ def __init__(
208208

209209
if has_remote_config:
210210
try:
211-
_check_for_tensorrt(min_version="10.16")
212-
self.logger.debug("TensorRT Python API version >= 10.16 detected")
211+
_check_for_trtexec(min_version="10.15")
212+
self.logger.debug("TensorRT Python API version >= 10.15 detected")
213213
if "--safe" not in trtexec_args:
214214
self.logger.warning(
215215
"Remote autotuning requires '--safe' to be set. Adding it to trtexec arguments."
216216
)
217217
self.trtexec_args.append("--safe")
218+
if "--skipInference" not in trtexec_args:
219+
self.logger.warning(
220+
"Remote autotuning requires '--skipInference' to be set. Adding it to trtexec arguments."
221+
)
222+
self.trtexec_args.append("--skipInference")
218223
return
219224
except ImportError:
220225
self.logger.warning(
221-
"Remote autotuning is not supported with TensorRT version < 10.16. "
226+
"Remote autotuning is not supported with TensorRT version < 10.15. "
222227
"Removing --remoteAutoTuningConfig from trtexec arguments"
223228
)
224229
trtexec_args = [

modelopt/onnx/quantization/ort_utils.py

Lines changed: 68 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,14 +19,17 @@
1919
import io
2020
import os
2121
import platform
22+
import re
23+
import shutil
24+
import subprocess # nosec B404
2225
import sys
2326
from collections.abc import Sequence
2427
from contextlib import redirect_stderr, redirect_stdout
2528

2629
import onnxruntime as ort
2730
from onnxruntime.quantization.operators.qdq_base_operator import QDQOperatorBase
2831
from onnxruntime.quantization.registry import QDQRegistry, QLinearOpsRegistry
29-
from packaging.version import Version
32+
from packaging.version import InvalidVersion, Version
3033

3134
from modelopt.onnx.logging_config import logger
3235
from modelopt.onnx.quantization.operators import QDQConvTranspose, QDQCustomOp, QDQNormalization
@@ -41,6 +44,70 @@ def _check_lib_in_ld_library_path(ld_library_path, lib_pattern):
4144
return False, None
4245

4346

47+
def _check_for_trtexec(min_version: str = "10.0") -> str:
48+
"""Check if the `trtexec` CLI tool is available in PATH and is >= min_version.
49+
50+
Args:
51+
min_version (str): Minimum required version (e.g., "10.0")
52+
53+
Returns:
54+
str: The resolved path to the `trtexec` binary.
55+
56+
Raises:
57+
ImportError: If `trtexec` is not found or the version is too low.
58+
"""
59+
60+
def _parse_version_from_string(version_str: str) -> str | None:
61+
# Try canonical TensorRT x.x.x.x strings first
62+
match = re.search(
63+
r"TensorRT(?:\s+version)?\s*[:=]\s*(\d+(?:\.\d+)+)",
64+
version_str,
65+
flags=re.IGNORECASE,
66+
)
67+
if match:
68+
return match.group(1)
69+
70+
# Fallback: look for "[TensorRT v101502]" pattern and convert to "10.15"
71+
match = re.search(r"\[TensorRT v(\d{6,8})\]", version_str)
72+
if match:
73+
vnum = match.group(1)
74+
# Use only major and minor, e.g., v101502 -> 10.15
75+
if len(vnum) >= 4:
76+
major = int(vnum[0:2])
77+
minor = int(vnum[2:4])
78+
return f"{major}.{minor}"
79+
return None
80+
return None
81+
82+
trtexec_path = shutil.which("trtexec")
83+
if trtexec_path is None:
84+
logger.error("trtexec executable not found in PATH.")
85+
raise ImportError(
86+
"Could not find the `trtexec` executable. Please install TensorRT and ensure `trtexec` is in your PATH."
87+
)
88+
89+
try:
90+
result = subprocess.run([trtexec_path], capture_output=True, text=True, timeout=5) # nosec B603
91+
banner_output = result.stdout + result.stderr
92+
parsed_version = _parse_version_from_string(banner_output)
93+
94+
if not parsed_version:
95+
raise ValueError("Could not parse version from trtexec output.")
96+
97+
if Version(parsed_version) < Version(min_version):
98+
logger.error(
99+
f"trtexec version found ({parsed_version}) is lower than required ({min_version})"
100+
)
101+
raise ImportError(f"`trtexec` version must be >= {min_version}, found {parsed_version}")
102+
logger.info(f"trtexec found at {trtexec_path} (version {parsed_version})")
103+
return trtexec_path
104+
except (subprocess.SubprocessError, FileNotFoundError, ValueError, InvalidVersion) as err:
105+
logger.error(f"Failed to check trtexec version: {err}")
106+
raise ImportError(
107+
"Could not determine the version of `trtexec`. Please ensure the CLI is installed and available."
108+
)
109+
110+
44111
def _check_for_tensorrt(min_version: str = "10.0"):
45112
"""Check if the `tensorrt` python package is installed and that it's >= min_version."""
46113
try:

0 commit comments

Comments
 (0)