Integrate TorchRegionBuilder to AutoQDQ#963
Conversation
📝 WalkthroughWalkthroughAdds a TorchRegionBuilder and naming-convention detector to discover hierarchical regions from PyTorch-exported ONNX graphs, integrates it into the autotuner's region search (conditional on naming detection), and provides inspection and CLI utilities for region analysis. Changes
Sequence Diagram(s)sequenceDiagram
participant Autotuner as Autotuner._search_regions
participant Detector as check_torch_naming_convention
participant Torch as TorchRegionBuilder
participant Combined as CombinedRegionSearch
Autotuner->>Detector: check_torch_naming_convention(graph)
alt PyTorch naming detected
Detector-->>Autotuner: true
Autotuner->>Torch: TorchRegionBuilder(graph).build_regions(linearize=True, only_quantizable=True)
Torch-->>Autotuner: regions_list
else Fallback
Detector-->>Autotuner: false
Autotuner->>Combined: CombinedRegionSearch(...).search_regions()
Combined-->>Autotuner: regions_list
end
Autotuner->>Autotuner: assign ids / reassign/flatten/sort / compute type_counts
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelopt/onnx/quantization/autotune/autotuner.py`:
- Around line 101-104: The Torch branch double-flattens regions: when
check_torch_naming_convention(self.graph) is true you call
TorchRegionBuilder(self.graph).build_regions(linearize=True, ...) which already
returns linearized descendants, but the subsequent recursive flatten loop (the
code that iterates over self.regions and extends a flat list) flattens again and
creates duplicates; fix by detecting the linearize=True case and skipping the
extra recursive flatten (i.e., assign self.regions directly to the build_regions
result when linearize is requested) or, alternatively, guard the recursive
flatten with a check for already-linearized entries or deduplicate by region
identity; reference check_torch_naming_convention, TorchRegionBuilder,
build_regions and self.regions when applying the change.
In `@modelopt/onnx/quantization/autotune/torch_region_builder.py`:
- Line 756: The code in torch_region_builder.py is forcibly overriding the
only_quantizable flag (setting only_quantizable = True), which ignores the
caller/CLI value; update the places that set only_quantizable (both occurrences)
to respect the incoming parameter/argument (e.g., remove the hardcoded
assignment and use the function parameter or pass-through value) so the
--only-quantizable option and the function argument control behavior instead of
being overridden; ensure any default remains set in the function signature/CLI
parsing and not by assigning True inside functions like the region-building
routine that references only_quantizable.
- Around line 622-627: torch_node_ratio can raise ZeroDivisionError when there
are no non-Constant nodes; update torch_node_ratio to guard against an empty
non_constant_nodes list by checking its length and returning 0.0 (or another
safe default) when it's zero, otherwise compute slash_count /
len(non_constant_nodes). Modify the method in torch_region_builder.py (function
torch_node_ratio, using self.graph.nodes and non_constant_nodes) to perform this
early-return check before the division.
- Around line 547-549: The loop in _probe_epilogues_recursive currently appends
every consumer_idx to epilogue_ops before validation; change it to validate each
consumer for fusibility and non-divergence first and only append + recurse when
it passes. Replace the direct epilogue_ops.append(consumer_idx) with a
conditional that calls the appropriate checks (e.g.,
self._is_fusible(consumer_idx) and self._is_non_divergent(consumer_idx) or a
combined self._is_fusible_non_divergent(consumer_idx)) and only then call
epilogue_ops.append(consumer_idx) followed by
self._probe_epilogues_recursive(consumer_idx, ...).
- Around line 698-700: After calling
self._filter_out_non_quantizable_nodes(root_region), the region membership has
changed but the region boundary information (inputs/outputs) computed earlier is
stale; fix by recomputing boundaries for root_region and its child regions
immediately after the filter step—either call the same helper used at the
earlier boundary computation or iterate the affected regions and recalculate
their inputs and outputs so downstream insertion logic reads up-to-date
boundaries (update region.inputs and region.outputs for root_region and nested
regions).
- Around line 313-315: The three methods _build_id_to_region_map,
_build_tensor_to_regions_map, and _merge_neighboring_regions use mutable default
arguments ({} and set()) which leak state across calls; change their signatures
to accept None (e.g., id_to_region_map: Optional[dict[int, Region]] = None,
tensor_to_regions_map: Optional[dict[str, set[int]]] = None, to_remove:
Optional[set[int]] = None) and inside each function initialize them to empty
dict/set when None is passed, then use those local containers; adjust any
internal calls (including where these methods are invoked without arguments) to
rely on the new None-default behavior so no shared mutable object is reused
across invocations.
- Around line 68-69: The graph is being topologically sorted after
RegionSearchBase.__init__ runs, so the precomputed tensor_users_map and
forward_reachable_nodes_map in RegionSearchBase are using stale node indices;
fix this by calling self.graph.toposort() before invoking
super().__init__(graph, root=None) so that RegionSearchBase builds its
node-indexed maps against the final node order (alternatively, if reordering
before super is not possible, explicitly rebuild the index maps immediately
after self.graph.toposort() by re-running the same logic that sets
tensor_users_map and forward_reachable_nodes_map).
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
modelopt/onnx/quantization/autotune/__init__.pymodelopt/onnx/quantization/autotune/autotuner.pymodelopt/onnx/quantization/autotune/torch_region_builder.py
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
a550ed7 to
2d3d90f
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (2)
modelopt/onnx/quantization/autotune/torch_region_builder.py (1)
63-67:⚠️ Potential issue | 🔴 CriticalCritical: Node index maps become stale after toposort reorders the graph.
RegionSearchBase.__init__()precomputestensor_users_mapandforward_reachable_nodes_mapusing node indices from the current graph order. Callingtoposort()afterward reorders all nodes, leaving the precomputed indices pointing to wrong positions.Suggested fix: Toposort before calling super().__init__
def __init__(self, graph: gs.Graph): """Initialize the TorchRegionBuilder with a computation graph.""" + graph.toposort() super().__init__(graph, root=None) - self.graph.toposort() self.regions: list[Region] = []🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/onnx/quantization/autotune/torch_region_builder.py` around lines 63 - 67, The precomputed node-indexed maps (tensor_users_map, forward_reachable_nodes_map) in RegionSearchBase are invalidated by calling graph.toposort() after RegionSearchBase.__init__; move the graph.toposort() call to before invoking super().__init__ in TorchRegionBuilder.__init__ so the base-class initialization computes maps against the final node order (i.e., call self.graph.toposort() first, then super().__init__(graph, root=None) and then initialize self.regions).modelopt/onnx/quantization/autotune/autotuner.py (1)
101-112:⚠️ Potential issue | 🟠 MajorDouble-flattening issue persists:
linearize=Truecombined with_visit_region_recursivelyproduces duplicates.When
check_torch_naming_conventionreturns true,build_regions(linearize=True)returns a flat list where COMPOSITE regions still retain their children. The subsequent loop at lines 116-121 calls_visit_region_recursivelyon each region, which traverses into children again—causing regions that appear both in the linearized list and as children of other regions to be duplicated.Suggested fix: Use linearize=False and let the existing flattening handle it
if check_torch_naming_convention(self.graph): torch_search = TorchRegionBuilder(self.graph) - self.regions = torch_search.build_regions(linearize=True, only_quantizable=True) + self.regions = torch_search.build_regions(linearize=False, only_quantizable=True) self._reassign_region_ids(self.regions) else:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/onnx/quantization/autotune/autotuner.py` around lines 101 - 112, The code double-flattens regions when check_torch_naming_convention(...) is true because TorchRegionBuilder.build_regions(linearize=True, ...) returns a flat list while COMPOSITE regions still have children, and the later _visit_region_recursively(...) loop traverses children again; change build_regions to use linearize=False so you get the hierarchical regions and allow the existing _visit_region_recursively(...) flattening to run once, keeping calls to _reassign_region_ids(...) and leaving CombinedRegionSearch.search_regions() path unchanged.
🧹 Nitpick comments (1)
modelopt/onnx/quantization/autotune/torch_region_builder.py (1)
626-628: Consider extracting magic numbers as class constants.The iteration count
10and target region size12are hardcoded values that control region merging behavior. Consider defining these as class-level constants for clarity and easier tuning.Suggested refactor
class TorchRegionBuilder(RegionSearchBase): """Region builder that creates hierarchical regions from PyTorch-exported ONNX node names.""" + _MAX_MERGE_ITERATIONS = 10 + _MIN_COMPOSITE_REGION_SIZE = 12 + def __init__(self, graph: gs.Graph): ... def build_regions(self, linearize: bool = True, only_quantizable: bool = False) -> list[Region]: ... - for _ in range(10): + for _ in range(self._MAX_MERGE_ITERATIONS): self._merge_neighboring_regions(root_region) - self._merge_small_composite_regions(root_region, target_region_size=12) + self._merge_small_composite_regions(root_region, target_region_size=self._MIN_COMPOSITE_REGION_SIZE)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/onnx/quantization/autotune/torch_region_builder.py` around lines 626 - 628, Extract the hardcoded literals used in the merge loop into class-level constants to improve clarity and tunability: replace the literal 10 (loop iterations) and 12 (target_region_size) in the block that calls self._merge_neighboring_regions(root_region) and self._merge_small_composite_regions(root_region, target_region_size=12) with named constants (e.g., MERGE_ITERATIONS and TARGET_REGION_SIZE) defined on the containing class (torch_region_builder module), and use those constants in the for loop and the _merge_small_composite_regions call so configuration is centralized and self-documenting.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@modelopt/onnx/quantization/autotune/autotuner.py`:
- Around line 101-112: The code double-flattens regions when
check_torch_naming_convention(...) is true because
TorchRegionBuilder.build_regions(linearize=True, ...) returns a flat list while
COMPOSITE regions still have children, and the later
_visit_region_recursively(...) loop traverses children again; change
build_regions to use linearize=False so you get the hierarchical regions and
allow the existing _visit_region_recursively(...) flattening to run once,
keeping calls to _reassign_region_ids(...) and leaving
CombinedRegionSearch.search_regions() path unchanged.
In `@modelopt/onnx/quantization/autotune/torch_region_builder.py`:
- Around line 63-67: The precomputed node-indexed maps (tensor_users_map,
forward_reachable_nodes_map) in RegionSearchBase are invalidated by calling
graph.toposort() after RegionSearchBase.__init__; move the graph.toposort() call
to before invoking super().__init__ in TorchRegionBuilder.__init__ so the
base-class initialization computes maps against the final node order (i.e., call
self.graph.toposort() first, then super().__init__(graph, root=None) and then
initialize self.regions).
---
Nitpick comments:
In `@modelopt/onnx/quantization/autotune/torch_region_builder.py`:
- Around line 626-628: Extract the hardcoded literals used in the merge loop
into class-level constants to improve clarity and tunability: replace the
literal 10 (loop iterations) and 12 (target_region_size) in the block that calls
self._merge_neighboring_regions(root_region) and
self._merge_small_composite_regions(root_region, target_region_size=12) with
named constants (e.g., MERGE_ITERATIONS and TARGET_REGION_SIZE) defined on the
containing class (torch_region_builder module), and use those constants in the
for loop and the _merge_small_composite_regions call so configuration is
centralized and self-documenting.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 91d78196-5029-4668-b9d2-f2e05a8b9e51
📒 Files selected for processing (3)
modelopt/onnx/quantization/autotune/__init__.pymodelopt/onnx/quantization/autotune/autotuner.pymodelopt/onnx/quantization/autotune/torch_region_builder.py
🚧 Files skipped from review as they are similar to previous changes (1)
- modelopt/onnx/quantization/autotune/init.py
Signed-off-by: Will Guo <willg@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelopt/onnx/quantization/autotune/torch_region_builder.py`:
- Around line 227-233: The sorting in _sort_regions uses
max(region.get_region_nodes_and_descendants()) as a key which will raise
ValueError for regions with no nodes; update the key to handle empty sequences
(e.g., use max(..., default=-inf) or fall back to a sentinel value) so empty
regions sort deterministically and do not throw. Modify the lambda in
_sort_regions that references region.get_region_nodes_and_descendants() to
return a safe default when the iterable is empty, leaving the recursive call to
_sort_regions(child) unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: e51bd1be-c114-478d-a7ce-bdfc57b7f1e0
📒 Files selected for processing (2)
modelopt/onnx/quantization/autotune/autotuner.pymodelopt/onnx/quantization/autotune/torch_region_builder.py
🚧 Files skipped from review as they are similar to previous changes (1)
- modelopt/onnx/quantization/autotune/autotuner.py
| def _sort_regions(self, region: Region) -> None: | ||
| """Sort regions by topological order.""" | ||
| region.children = sorted( | ||
| region.children, key=lambda r: max(r.get_region_nodes_and_descendants()) | ||
| ) | ||
| for child in region.get_children(): | ||
| self._sort_regions(child) |
There was a problem hiding this comment.
Potential ValueError on empty region.
If a region has no nodes and no children (empty get_region_nodes_and_descendants()), max() on an empty sequence raises ValueError.
Proposed fix
def _sort_regions(self, region: Region) -> None:
"""Sort regions by topological order."""
region.children = sorted(
- region.children, key=lambda r: max(r.get_region_nodes_and_descendants())
+ region.children, key=lambda r: max(r.get_region_nodes_and_descendants(), default=-1)
)
for child in region.get_children():
self._sort_regions(child)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/onnx/quantization/autotune/torch_region_builder.py` around lines 227
- 233, The sorting in _sort_regions uses
max(region.get_region_nodes_and_descendants()) as a key which will raise
ValueError for regions with no nodes; update the key to handle empty sequences
(e.g., use max(..., default=-inf) or fall back to a sentinel value) so empty
regions sort deterministically and do not throw. Modify the lambda in
_sort_regions that references region.get_region_nodes_and_descendants() to
return a safe default when the iterable is empty, leaving the recursive call to
_sort_regions(child) unchanged.
There was a problem hiding this comment.
Review: Integrate TorchRegionBuilder to AutoQDQ
Code Review Findings
🔴 Critical
1. No unit tests for TorchRegionBuilder
There is no test_torch_region_builder.py. The existing test_region_inspect.py tests region_inspect.py (old module) using mocks — it exercises none of the new code paths. TorchRegionBuilder has ~15 methods involving complex multi-pass tree manipulation (trie build, merge, flatten, filter, epilogue probe, linearize). Without tests there is no way to verify correctness or detect regressions.
At minimum, tests should cover:
check_torch_naming_convention()— true/false, edge cases (empty graph, all Constants, threshold boundary)build_regions()— round-trip with a synthetic PyTorch-named graph, checking node coverage and region types_sort_regions()— empty region case (see issue #2 below)- The autotuner dispatch: given a torch-named graph,
_search_regions()must route toTorchRegionBuilder, notCombinedRegionSearch
2. _sort_regions raises ValueError on empty region (torch_region_builder.py:233) — open unresolved thread
region.children = sorted(
region.children, key=lambda r: max(r.get_region_nodes_and_descendants()) # 💥 ValueError if empty
)After filtering, a child region can have zero nodes and zero descendants. Fix:
key=lambda r: max(r.get_region_nodes_and_descendants(), default=-1)🟠 Important
3. boundary_op_types / is_quantizable_node duplicated between TorchRegionBuilder and TopDownRegionBuilder
TorchRegionBuilder.is_quantizable_node() (torch_region_builder.py:349) lists exactly the same ops as TopDownRegionBuilder.boundary_op_types in region_search.py. If one is updated (e.g. adding a new quantizable op) and the other is not, behavior will silently diverge. This should be extracted to a shared constant in common.py.
4. _remove_empty_regions doesn't clean up empty COMPOSITE regions (torch_region_builder.py:555–580)
The function removes LEAF children with no quantizable backbone, but does NOT remove COMPOSITE children that had all their sub-children emptied by the recursive pass. After the filter, a COMPOSITE with no remaining children or nodes stays in the tree. While linearize_regions may skip it in practice, this leaves the hierarchy in an inconsistent state. COMPOSITE children where child.get_region_nodes_and_descendants() is empty should also be removed.
5. Torch and default paths produce structurally different region lists in _search_regions
- Torch path (
autotuner.py:101–108):linearize=Truereturns post-order DFS, leaves before innermost composites. IDs assigned sequentially by list position. - Default path (
autotuner.py:110–124): pre-order DFS via_visit_region_recursively, then sorted so LEAFs come first. IDs assigned breadth-first via_reassign_region_ids.
Ordering and ID semantics differ between paths. Downstream pattern-matching and scheme-generation logic likely depends on LEAF ordering. This asymmetry should be explicitly documented or aligned.
6. check_torch_naming_convention is silent when a model barely misses the threshold
If a well-preprocessed PyTorch model falls to 79% torch-named nodes (e.g. after ONNX simplification renames some ops), it silently falls back to CombinedRegionSearch. This is hard to debug. Recommend adding a log at INFO level indicating which builder was chosen and the measured ratio:
logger.info(
f"Torch naming ratio {slash_count / total:.2f} "
f"({'≥' if result else '<'} {threshold} threshold); "
f"using {'TorchRegionBuilder' if result else 'CombinedRegionSearch'}"
)💡 Suggestions
7. Magic numbers in build_regions
for _ in range(10): # Why 10?
...
self._merge_small_composite_regions(root_region, target_region_size=12) # Why 12?These should be class-level constants (_MAX_MERGE_ITERATIONS, _MIN_REGION_SIZE) with a comment on how they were chosen.
8. _probe_epilogues_recursive docstring/contract mismatch
The method docstring says "fusible non-divergent epilogue nodes," but the implementation adds ALL consumers (including non-fusible) as "boundary nodes." The in-code comment explains the intent correctly — update the docstring for _probe_epilogues_recursive and _probe_epilogues to match the actual semantics.
What does this PR do?
This PR implemented a RegionSearch implementaion which create autotune regions based on node name for ONNX model exported by torch.
Usage
# Add a code snippet demonstrating how to use thisTesting
Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True, usingtorch.load(..., weights_only=True), avoidingpickle, etc.).Additional Information
Summary by CodeRabbit