Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ Changelog
**Bug Fixes**

- Fix Minitron pruning (``mcore_minitron``) for MoE models. Importance estimation hooks were incorrectly registered for MoE modules and NAS step was hanging before this.
- Downgrade TRT support for remote autotuning in ONNX Autotune from 10.16 to 10.15.

**Misc**

Expand Down
26 changes: 26 additions & 0 deletions docs/source/guides/9_autotune.rst
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,32 @@ If the model uses custom TensorRT operations, provide the plugin libraries:
--output_dir ./results \
--plugin_libraries /path/to/plugin1.so /path/to/plugin2.so

Remote Autotuning
-----------------------

TensorRT 10.15+ supports remote autotuning in safety mode (``--safe``), which allows TensorRT's optimization process to be offloaded to a remote hardware. This is useful when optimizing models for different target GPUs without having direct access to them.

To use remote autotuning during Q/DQ placement optimization, run with ``trtexec`` and pass extra args:

.. code-block:: bash
python -m modelopt.onnx.quantization.autotune \
--onnx_path resnet50_Opset17_bs128.onnx \
--output_dir ./resnet50_remote_autotuned \
--schemes_per_region 50 \
--use_trtexec \
--trtexec_benchmark_args "--remoteAutoTuningConfig=\"<remote autotuning config>\" --safe --skipInference"
```

**Requirements:**

* TensorRT 10.15 or later
* Valid remote autotuning configuration
* ``--use_trtexec`` must be set (benchmarking uses ``trtexec`` instead of the TensorRT Python API)
* ``--safe --skipInference`` must be enabled via ``--trtexec_benchmark_args``

Replace ``<remote autotuning config>`` with an actual remote autotuning configuration string (see ``trtexec --help`` for more details).
Other TensorRT benchmark options (e.g. ``--timing_cache``, ``--warmup_runs``, ``--timing_runs``, ``--plugin_libraries``) are also available; run ``--help`` for details.

Low-Level API Usage
===================

Expand Down
8 changes: 4 additions & 4 deletions examples/onnx_ptq/autotune/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -229,7 +229,7 @@ python3 -m modelopt.onnx.quantization.autotune \

## Remote Autotuning with TensorRT

TensorRT 10.16+ supports remote autotuning in safety mode (`--safe`), which allows TensorRT's optimization process to be offloaded to a remote hardware. This is useful when optimizing models for different target GPUs without having direct access to them.
TensorRT 10.15+ supports remote autotuning in safety mode (`--safe`), which allows TensorRT's optimization process to be offloaded to a remote hardware. This is useful when optimizing models for different target GPUs without having direct access to them.

To use remote autotuning during Q/DQ placement optimization, run with `trtexec` and pass extra args:

Expand All @@ -239,15 +239,15 @@ python3 -m modelopt.onnx.quantization.autotune \
--output_dir ./resnet50_remote_autotuned \
--schemes_per_region 50 \
--use_trtexec \
--trtexec_benchmark_args "--remoteAutoTuningConfig=\"<remote autotuning config>\" --safe"
--trtexec_benchmark_args "--remoteAutoTuningConfig=\"<remote autotuning config>\" --safe --skipInference"
```

**Requirements:**

- TensorRT 10.16 or later
- TensorRT 10.15 or later
- Valid remote autotuning configuration
- `--use_trtexec` must be set (benchmarking uses `trtexec` instead of the TensorRT Python API)
- `--safe` must be enabled via `--trtexec_benchmark_args`
- `--safe --skipInference` must be enabled via `--trtexec_benchmark_args`

Replace `<remote autotuning config>` with an actual remote autotuning configuration string (see `trtexec --help` for more details).
Other TensorRT benchmark options (e.g. `--timing_cache`, `--warmup_runs`, `--timing_runs`, `--plugin_libraries`) are also available; run `--help` for details.
Expand Down
4 changes: 2 additions & 2 deletions modelopt/onnx/quantization/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -398,8 +398,8 @@ def get_parser() -> argparse.ArgumentParser:
type=str,
default=None,
help=(
"Additional trtexec arguments as a single quoted string. "
"Example: --autotune_trtexec_args '--fp16 --workspace=4096'"
"Additional 'trtexec' arguments as a single quoted string. Only relevant with the 'trtexec' workflow "
"enabled. Example (simple): '--fp16 --workspace=4096'"
),
)
return argparser
Expand Down
6 changes: 3 additions & 3 deletions modelopt/onnx/quantization/autotune/benchmark.py
Original file line number Diff line number Diff line change
Expand Up @@ -208,8 +208,8 @@ def __init__(

if has_remote_config:
try:
_check_for_tensorrt(min_version="10.16")
self.logger.debug("TensorRT Python API version >= 10.16 detected")
_check_for_tensorrt(min_version="10.15")
self.logger.debug("TensorRT Python API version >= 10.15 detected")
if "--safe" not in trtexec_args:
self.logger.warning(
"Remote autotuning requires '--safe' to be set. Adding it to trtexec arguments."
Expand All @@ -218,7 +218,7 @@ def __init__(
return
except ImportError:
self.logger.warning(
"Remote autotuning is not supported with TensorRT version < 10.16. "
"Remote autotuning is not supported with TensorRT version < 10.15. "
"Removing --remoteAutoTuningConfig from trtexec arguments"
)
trtexec_args = [
Expand Down
Loading