[6034518] Downgrade TRT support for remote autotuning in Autotune from 10.16 to 10.15 (#1259)

gcunhase · dthienan-nv · web-flow · commit dec2952992b5 · 2026-04-15T14:31:41.000-04:00
### What does this PR do? Type of change: Bug fix Remote autotuning is supported in TensorRT from version 10.15, but fails with Autotune as it's checking for 10.16+. This PR fixes that check and updates documentation accordingly. ### Usage ```python # Add a code snippet demonstrating how to use this ``` ### Testing See bug 6034518. ### Before your PR is "*Ready for review*" - Is this change backward compatible?: ✅ - If you copied code from any other sources or added a new PIP dependency, did you follow guidance in `CONTRIBUTING.md`: N/A  - Did you write any new necessary tests?: N/A  - Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?: ✅   ## Summary by CodeRabbit * **Documentation** * Added a Remote Autotuning guide for TensorRT 10.15+ with CLI examples; updated examples to require `--safe --skipInference`. * **Updates** * Lowered TensorRT minimum requirement for remote autotuning from 10.16 to 10.15. * Clarified CLI help text for trtexec/autotune arguments. * **Bug Fixes** * trtexec-based autotuning now verifies the trtexec executable version when checking compatibility.  --------- Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com> Signed-off-by: dmoodie <dmoodie@nvidia.com> Co-authored-by: dmoodie <dmoodie@nvidia.com>
diff --git a/CHANGELOG.rst b/CHANGELOG.rst
@@ -22,6 +22,7 @@ Changelog
 **Bug Fixes**
 
 - Fix Minitron pruning (``mcore_minitron``) for MoE models. Importance estimation hooks were incorrectly registered for MoE modules and NAS step was hanging before this.
+- Fix TRT support for remote autotuning in ONNX Autotune from 10.16+ to 10.15+ and fix TRT versioning check to the ``trtexec`` version instead of the TRT Python API when using ``trtexec`` backend.
 
 **Misc**
 
diff --git a/docs/source/guides/9_autotune.rst b/docs/source/guides/9_autotune.rst
@@ -221,6 +221,31 @@ If the model uses custom TensorRT operations, provide the plugin libraries:
        --output_dir ./results \
        --plugin_libraries /path/to/plugin1.so /path/to/plugin2.so
 
+Remote Autotuning
+-----------------------
+
+TensorRT 10.15+ supports remote autotuning in safety mode (``--safe``), which allows TensorRT's optimization process to be offloaded to a remote hardware. This is useful when optimizing models for different target GPUs without having direct access to them.
+
+To use remote autotuning during Q/DQ placement optimization, run with ``trtexec`` and pass extra args:
+
+.. code-block:: bash
+
+   python -m modelopt.onnx.quantization.autotune \
+       --onnx_path model.onnx \
+       --output_dir ./model_remote_autotuned \
+       --schemes_per_region 50 \
+       --use_trtexec \
+       --trtexec_benchmark_args "--remoteAutoTuningConfig=\"<remote autotuning config>\" --safe --skipInference"
+
+**Requirements:**
+
+* TensorRT 10.15 or later
+* Valid remote autotuning configuration
+* ``--use_trtexec`` must be set (benchmarking uses ``trtexec`` instead of the TensorRT Python API)
+* ``--safe --skipInference`` must be enabled via ``--trtexec_benchmark_args``
+
+Replace ``<remote autotuning config>`` with an actual remote autotuning configuration string (see ``trtexec --help`` for more details). Other TensorRT benchmark options (e.g. ``--timing_cache``, ``--warmup_runs``, ``--timing_runs``, ``--plugin_libraries``) are also available; run ``--help`` for details.
+
 Low-Level API Usage
 ===================
 
diff --git a/examples/onnx_ptq/autotune/README.md b/examples/onnx_ptq/autotune/README.md
@@ -229,7 +229,7 @@ python3 -m modelopt.onnx.quantization.autotune \
 
 ## Remote Autotuning with TensorRT
 
-TensorRT 10.16+ supports remote autotuning in safety mode (`--safe`), which allows TensorRT's optimization process to be offloaded to a remote hardware. This is useful when optimizing models for different target GPUs without having direct access to them.
+TensorRT 10.15+ supports remote autotuning in safety mode (`--safe`), which allows TensorRT's optimization process to be offloaded to a remote hardware. This is useful when optimizing models for different target GPUs without having direct access to them.
 
 To use remote autotuning during Q/DQ placement optimization, run with `trtexec` and pass extra args:
 
@@ -239,15 +239,15 @@ python3 -m modelopt.onnx.quantization.autotune \
     --output_dir ./resnet50_remote_autotuned \
     --schemes_per_region 50 \
     --use_trtexec \
-    --trtexec_benchmark_args "--remoteAutoTuningConfig=\"<remote autotuning config>\" --safe"
+    --trtexec_benchmark_args "--remoteAutoTuningConfig=\"<remote autotuning config>\" --safe --skipInference"
 ```
 
 **Requirements:**
 
-- TensorRT 10.16 or later
+- TensorRT 10.15 or later
 - Valid remote autotuning configuration
 - `--use_trtexec` must be set (benchmarking uses `trtexec` instead of the TensorRT Python API)
-- `--safe` must be enabled via `--trtexec_benchmark_args`
+- `--safe --skipInference` must be enabled via `--trtexec_benchmark_args`
 
 Replace `<remote autotuning config>` with an actual remote autotuning configuration string (see `trtexec --help` for more details).
  Other TensorRT benchmark options (e.g. `--timing_cache`, `--warmup_runs`, `--timing_runs`, `--plugin_libraries`) are also available; run `--help` for details.
diff --git a/modelopt/onnx/quantization/__main__.py b/modelopt/onnx/quantization/__main__.py
@@ -398,8 +398,8 @@ def get_parser() -> argparse.ArgumentParser:
         type=str,
         default=None,
         help=(
-            "Additional trtexec arguments as a single quoted string. "
-            "Example: --autotune_trtexec_args '--fp16 --workspace=4096'"
+            "Additional 'trtexec' arguments as a single quoted string. Only relevant when '--autotune_use_trtexec' is "
+            "set. Example: '--fp16 --workspace=4096'"
         ),
     )
     return argparser
diff --git a/modelopt/onnx/quantization/autotune/benchmark.py b/modelopt/onnx/quantization/autotune/benchmark.py
@@ -42,7 +42,7 @@
 import torch
 
 from modelopt.onnx.logging_config import logger
-from modelopt.onnx.quantization.ort_utils import _check_for_tensorrt
+from modelopt.onnx.quantization.ort_utils import _check_for_trtexec
 
 TRT_AVAILABLE = importlib.util.find_spec("tensorrt") is not None
 if TRT_AVAILABLE:
@@ -208,17 +208,22 @@ def __init__(
 
         if has_remote_config:
             try:
-                _check_for_tensorrt(min_version="10.16")
-                self.logger.debug("TensorRT Python API version >= 10.16 detected")
+                _check_for_trtexec(min_version="10.15")
+                self.logger.debug("TensorRT Python API version >= 10.15 detected")
                 if "--safe" not in trtexec_args:
                     self.logger.warning(
                         "Remote autotuning requires '--safe' to be set. Adding it to trtexec arguments."
                     )
                     self.trtexec_args.append("--safe")
+                if "--skipInference" not in trtexec_args:
+                    self.logger.warning(
+                        "Remote autotuning requires '--skipInference' to be set. Adding it to trtexec arguments."
+                    )
+                    self.trtexec_args.append("--skipInference")
                 return
             except ImportError:
                 self.logger.warning(
-                    "Remote autotuning is not supported with TensorRT version < 10.16. "
+                    "Remote autotuning is not supported with TensorRT version < 10.15. "
                     "Removing --remoteAutoTuningConfig from trtexec arguments"
                 )
                 trtexec_args = [
diff --git a/modelopt/onnx/quantization/ort_utils.py b/modelopt/onnx/quantization/ort_utils.py
@@ -19,14 +19,17 @@
 import io
 import os
 import platform
+import re
+import shutil
+import subprocess  # nosec B404
 import sys
 from collections.abc import Sequence
 from contextlib import redirect_stderr, redirect_stdout
 
 import onnxruntime as ort
 from onnxruntime.quantization.operators.qdq_base_operator import QDQOperatorBase
 from onnxruntime.quantization.registry import QDQRegistry, QLinearOpsRegistry
-from packaging.version import Version
+from packaging.version import InvalidVersion, Version
 
 from modelopt.onnx.logging_config import logger
 from modelopt.onnx.quantization.operators import QDQConvTranspose, QDQCustomOp, QDQNormalization
@@ -41,6 +44,70 @@ def _check_lib_in_ld_library_path(ld_library_path, lib_pattern):
     return False, None
 
 
+def _check_for_trtexec(min_version: str = "10.0") -> str:
+    """Check if the `trtexec` CLI tool is available in PATH and is >= min_version.
+
+    Args:
+        min_version (str): Minimum required version (e.g., "10.0")
+
+    Returns:
+        str: The resolved path to the `trtexec` binary.
+
+    Raises:
+        ImportError: If `trtexec` is not found or the version is too low.
+    """
+
+    def _parse_version_from_string(version_str: str) -> str | None:
+        # Try canonical TensorRT x.x.x.x strings first
+        match = re.search(
+            r"TensorRT(?:\s+version)?\s*[:=]\s*(\d+(?:\.\d+)+)",
+            version_str,
+            flags=re.IGNORECASE,
+        )
+        if match:
+            return match.group(1)
+
+        # Fallback: look for "[TensorRT v101502]" pattern and convert to "10.15"
+        match = re.search(r"\[TensorRT v(\d{6,8})\]", version_str)
+        if match:
+            vnum = match.group(1)
+            # Use only major and minor, e.g., v101502 -> 10.15
+            if len(vnum) >= 4:
+                major = int(vnum[0:2])
+                minor = int(vnum[2:4])
+                return f"{major}.{minor}"
+            return None
+        return None
+
+    trtexec_path = shutil.which("trtexec")
+    if trtexec_path is None:
+        logger.error("trtexec executable not found in PATH.")
+        raise ImportError(
+            "Could not find the `trtexec` executable. Please install TensorRT and ensure `trtexec` is in your PATH."
+        )
+
+    try:
+        result = subprocess.run([trtexec_path], capture_output=True, text=True, timeout=5)  # nosec B603
+        banner_output = result.stdout + result.stderr
+        parsed_version = _parse_version_from_string(banner_output)
+
+        if not parsed_version:
+            raise ValueError("Could not parse version from trtexec output.")
+
+        if Version(parsed_version) < Version(min_version):
+            logger.error(
+                f"trtexec version found ({parsed_version}) is lower than required ({min_version})"
+            )
+            raise ImportError(f"`trtexec` version must be >= {min_version}, found {parsed_version}")
+        logger.info(f"trtexec found at {trtexec_path} (version {parsed_version})")
+        return trtexec_path
+    except (subprocess.SubprocessError, FileNotFoundError, ValueError, InvalidVersion) as err:
+        logger.error(f"Failed to check trtexec version: {err}")
+        raise ImportError(
+            "Could not determine the version of `trtexec`. Please ensure the CLI is installed and available."
+        )
+
+
 def _check_for_tensorrt(min_version: str = "10.0"):
     """Check if the `tensorrt` python package is installed and that it's >= min_version."""
     try: