Skip to content

Integrate TorchRegionBuilder to AutoQDQ#963

Open
willg-nv wants to merge 3 commits intoNVIDIA:mainfrom
willg-nv:dev-willg-integrate-torch-region-builder
Open

Integrate TorchRegionBuilder to AutoQDQ#963
willg-nv wants to merge 3 commits intoNVIDIA:mainfrom
willg-nv:dev-willg-integrate-torch-region-builder

Conversation

@willg-nv
Copy link
Copy Markdown
Contributor

@willg-nv willg-nv commented Mar 3, 2026

What does this PR do?

This PR implemented a RegionSearch implementaion which create autotune regions based on node name for ONNX model exported by torch.

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, using torch.load(..., weights_only=True), avoiding pickle, etc.).

  • Is this change backward compatible?: ✅ / ❌ / N/A
  • If you copied code from any other source, did you follow IP policy in CONTRIBUTING.md?: ✅ / ❌ / N/A
  • Did you write any new necessary tests?: ✅ / ❌ / N/A
  • Did you update Changelog?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

  • New Features
    • Autotuner now recognizes PyTorch-exported ONNX layouts and uses a Torch-specific region builder for improved hierarchical region discovery and fusion-friendly grouping.
    • Added an inspection CLI to print and return discovered regions, with options to include all regions or only quantizable ones.
    • The Torch-based region builder is exposed through the public API when available.

@willg-nv willg-nv requested a review from a team as a code owner March 3, 2026 08:00
@willg-nv willg-nv changed the title Integrate TorchRegionBuilder to AutoQDQ Draft: Integrate TorchRegionBuilder to AutoQDQ Mar 3, 2026
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Mar 3, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 3, 2026

📝 Walkthrough

Walkthrough

Adds a TorchRegionBuilder and naming-convention detector to discover hierarchical regions from PyTorch-exported ONNX graphs, integrates it into the autotuner's region search (conditional on naming detection), and provides inspection and CLI utilities for region analysis.

Changes

Cohort / File(s) Summary
API Export
modelopt/onnx/quantization/autotune/__init__.py
Adds TorchRegionBuilder to _OPTIONAL_EXPORTS and conditionally imports/exports it in __all__.
Autotuner Integration
modelopt/onnx/quantization/autotune/autotuner.py
_search_regions now calls check_torch_naming_convention(graph) and, if true, uses TorchRegionBuilder(...).build_regions(...) (assigns sequential ids); otherwise retains existing CombinedRegionSearch flow, followed by region sorting/counting.
Torch-based Region Builder
modelopt/onnx/quantization/autotune/torch_region_builder.py
New module (large): check_torch_naming_convention, TorchRegionBuilder that builds/normalizes/merges/flattens leaf/composite regions from slash-prefixed node names, quantizability filtering, linearization/search helpers, inspect_torch_regions() and CLI main().

Sequence Diagram(s)

sequenceDiagram
    participant Autotuner as Autotuner._search_regions
    participant Detector as check_torch_naming_convention
    participant Torch as TorchRegionBuilder
    participant Combined as CombinedRegionSearch
    Autotuner->>Detector: check_torch_naming_convention(graph)
    alt PyTorch naming detected
        Detector-->>Autotuner: true
        Autotuner->>Torch: TorchRegionBuilder(graph).build_regions(linearize=True, only_quantizable=True)
        Torch-->>Autotuner: regions_list
    else Fallback
        Detector-->>Autotuner: false
        Autotuner->>Combined: CombinedRegionSearch(...).search_regions()
        Combined-->>Autotuner: regions_list
    end
    Autotuner->>Autotuner: assign ids / reassign/flatten/sort / compute type_counts
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Integrate TorchRegionBuilder to AutoQDQ' directly and accurately summarizes the main change: integrating the TorchRegionBuilder component into the AutoQDQ system by updating imports and conditionally applying it in region search logic.
Docstring Coverage ✅ Passed Docstring coverage is 91.89% which is sufficient. The required threshold is 80.00%.
Security Anti-Patterns ✅ Passed Pull request does not contain any of the six critical security anti-patterns: no unsafe torch.load, numpy.load, eval/exec, trust_remote_code, nosec comments, or non-permissive dependencies.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/onnx/quantization/autotune/autotuner.py`:
- Around line 101-104: The Torch branch double-flattens regions: when
check_torch_naming_convention(self.graph) is true you call
TorchRegionBuilder(self.graph).build_regions(linearize=True, ...) which already
returns linearized descendants, but the subsequent recursive flatten loop (the
code that iterates over self.regions and extends a flat list) flattens again and
creates duplicates; fix by detecting the linearize=True case and skipping the
extra recursive flatten (i.e., assign self.regions directly to the build_regions
result when linearize is requested) or, alternatively, guard the recursive
flatten with a check for already-linearized entries or deduplicate by region
identity; reference check_torch_naming_convention, TorchRegionBuilder,
build_regions and self.regions when applying the change.

In `@modelopt/onnx/quantization/autotune/torch_region_builder.py`:
- Line 756: The code in torch_region_builder.py is forcibly overriding the
only_quantizable flag (setting only_quantizable = True), which ignores the
caller/CLI value; update the places that set only_quantizable (both occurrences)
to respect the incoming parameter/argument (e.g., remove the hardcoded
assignment and use the function parameter or pass-through value) so the
--only-quantizable option and the function argument control behavior instead of
being overridden; ensure any default remains set in the function signature/CLI
parsing and not by assigning True inside functions like the region-building
routine that references only_quantizable.
- Around line 622-627: torch_node_ratio can raise ZeroDivisionError when there
are no non-Constant nodes; update torch_node_ratio to guard against an empty
non_constant_nodes list by checking its length and returning 0.0 (or another
safe default) when it's zero, otherwise compute slash_count /
len(non_constant_nodes). Modify the method in torch_region_builder.py (function
torch_node_ratio, using self.graph.nodes and non_constant_nodes) to perform this
early-return check before the division.
- Around line 547-549: The loop in _probe_epilogues_recursive currently appends
every consumer_idx to epilogue_ops before validation; change it to validate each
consumer for fusibility and non-divergence first and only append + recurse when
it passes. Replace the direct epilogue_ops.append(consumer_idx) with a
conditional that calls the appropriate checks (e.g.,
self._is_fusible(consumer_idx) and self._is_non_divergent(consumer_idx) or a
combined self._is_fusible_non_divergent(consumer_idx)) and only then call
epilogue_ops.append(consumer_idx) followed by
self._probe_epilogues_recursive(consumer_idx, ...).
- Around line 698-700: After calling
self._filter_out_non_quantizable_nodes(root_region), the region membership has
changed but the region boundary information (inputs/outputs) computed earlier is
stale; fix by recomputing boundaries for root_region and its child regions
immediately after the filter step—either call the same helper used at the
earlier boundary computation or iterate the affected regions and recalculate
their inputs and outputs so downstream insertion logic reads up-to-date
boundaries (update region.inputs and region.outputs for root_region and nested
regions).
- Around line 313-315: The three methods _build_id_to_region_map,
_build_tensor_to_regions_map, and _merge_neighboring_regions use mutable default
arguments ({} and set()) which leak state across calls; change their signatures
to accept None (e.g., id_to_region_map: Optional[dict[int, Region]] = None,
tensor_to_regions_map: Optional[dict[str, set[int]]] = None, to_remove:
Optional[set[int]] = None) and inside each function initialize them to empty
dict/set when None is passed, then use those local containers; adjust any
internal calls (including where these methods are invoked without arguments) to
rely on the new None-default behavior so no shared mutable object is reused
across invocations.
- Around line 68-69: The graph is being topologically sorted after
RegionSearchBase.__init__ runs, so the precomputed tensor_users_map and
forward_reachable_nodes_map in RegionSearchBase are using stale node indices;
fix this by calling self.graph.toposort() before invoking
super().__init__(graph, root=None) so that RegionSearchBase builds its
node-indexed maps against the final node order (alternatively, if reordering
before super is not possible, explicitly rebuild the index maps immediately
after self.graph.toposort() by re-running the same logic that sets
tensor_users_map and forward_reachable_nodes_map).

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 860d0b4 and a550ed7.

📒 Files selected for processing (3)
  • modelopt/onnx/quantization/autotune/__init__.py
  • modelopt/onnx/quantization/autotune/autotuner.py
  • modelopt/onnx/quantization/autotune/torch_region_builder.py

Comment thread modelopt/onnx/quantization/autotune/autotuner.py
Comment thread modelopt/onnx/quantization/autotune/torch_region_builder.py Outdated
Comment thread modelopt/onnx/quantization/autotune/torch_region_builder.py
Comment thread modelopt/onnx/quantization/autotune/torch_region_builder.py Outdated
Comment thread modelopt/onnx/quantization/autotune/torch_region_builder.py
Comment thread modelopt/onnx/quantization/autotune/torch_region_builder.py Outdated
Comment thread modelopt/onnx/quantization/autotune/torch_region_builder.py Outdated
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
@willg-nv willg-nv force-pushed the dev-willg-integrate-torch-region-builder branch from a550ed7 to 2d3d90f Compare April 15, 2026 06:50
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
modelopt/onnx/quantization/autotune/torch_region_builder.py (1)

63-67: ⚠️ Potential issue | 🔴 Critical

Critical: Node index maps become stale after toposort reorders the graph.

RegionSearchBase.__init__() precomputes tensor_users_map and forward_reachable_nodes_map using node indices from the current graph order. Calling toposort() afterward reorders all nodes, leaving the precomputed indices pointing to wrong positions.

Suggested fix: Toposort before calling super().__init__
     def __init__(self, graph: gs.Graph):
         """Initialize the TorchRegionBuilder with a computation graph."""
+        graph.toposort()
         super().__init__(graph, root=None)
-        self.graph.toposort()
         self.regions: list[Region] = []
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/quantization/autotune/torch_region_builder.py` around lines 63
- 67, The precomputed node-indexed maps (tensor_users_map,
forward_reachable_nodes_map) in RegionSearchBase are invalidated by calling
graph.toposort() after RegionSearchBase.__init__; move the graph.toposort() call
to before invoking super().__init__ in TorchRegionBuilder.__init__ so the
base-class initialization computes maps against the final node order (i.e., call
self.graph.toposort() first, then super().__init__(graph, root=None) and then
initialize self.regions).
modelopt/onnx/quantization/autotune/autotuner.py (1)

101-112: ⚠️ Potential issue | 🟠 Major

Double-flattening issue persists: linearize=True combined with _visit_region_recursively produces duplicates.

When check_torch_naming_convention returns true, build_regions(linearize=True) returns a flat list where COMPOSITE regions still retain their children. The subsequent loop at lines 116-121 calls _visit_region_recursively on each region, which traverses into children again—causing regions that appear both in the linearized list and as children of other regions to be duplicated.

Suggested fix: Use linearize=False and let the existing flattening handle it
         if check_torch_naming_convention(self.graph):
             torch_search = TorchRegionBuilder(self.graph)
-            self.regions = torch_search.build_regions(linearize=True, only_quantizable=True)
+            self.regions = torch_search.build_regions(linearize=False, only_quantizable=True)
             self._reassign_region_ids(self.regions)
         else:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/quantization/autotune/autotuner.py` around lines 101 - 112, The
code double-flattens regions when check_torch_naming_convention(...) is true
because TorchRegionBuilder.build_regions(linearize=True, ...) returns a flat
list while COMPOSITE regions still have children, and the later
_visit_region_recursively(...) loop traverses children again; change
build_regions to use linearize=False so you get the hierarchical regions and
allow the existing _visit_region_recursively(...) flattening to run once,
keeping calls to _reassign_region_ids(...) and leaving
CombinedRegionSearch.search_regions() path unchanged.
🧹 Nitpick comments (1)
modelopt/onnx/quantization/autotune/torch_region_builder.py (1)

626-628: Consider extracting magic numbers as class constants.

The iteration count 10 and target region size 12 are hardcoded values that control region merging behavior. Consider defining these as class-level constants for clarity and easier tuning.

Suggested refactor
 class TorchRegionBuilder(RegionSearchBase):
     """Region builder that creates hierarchical regions from PyTorch-exported ONNX node names."""

+    _MAX_MERGE_ITERATIONS = 10
+    _MIN_COMPOSITE_REGION_SIZE = 12
+
     def __init__(self, graph: gs.Graph):
         ...

     def build_regions(self, linearize: bool = True, only_quantizable: bool = False) -> list[Region]:
         ...
-        for _ in range(10):
+        for _ in range(self._MAX_MERGE_ITERATIONS):
             self._merge_neighboring_regions(root_region)
-            self._merge_small_composite_regions(root_region, target_region_size=12)
+            self._merge_small_composite_regions(root_region, target_region_size=self._MIN_COMPOSITE_REGION_SIZE)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/quantization/autotune/torch_region_builder.py` around lines 626
- 628, Extract the hardcoded literals used in the merge loop into class-level
constants to improve clarity and tunability: replace the literal 10 (loop
iterations) and 12 (target_region_size) in the block that calls
self._merge_neighboring_regions(root_region) and
self._merge_small_composite_regions(root_region, target_region_size=12) with
named constants (e.g., MERGE_ITERATIONS and TARGET_REGION_SIZE) defined on the
containing class (torch_region_builder module), and use those constants in the
for loop and the _merge_small_composite_regions call so configuration is
centralized and self-documenting.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@modelopt/onnx/quantization/autotune/autotuner.py`:
- Around line 101-112: The code double-flattens regions when
check_torch_naming_convention(...) is true because
TorchRegionBuilder.build_regions(linearize=True, ...) returns a flat list while
COMPOSITE regions still have children, and the later
_visit_region_recursively(...) loop traverses children again; change
build_regions to use linearize=False so you get the hierarchical regions and
allow the existing _visit_region_recursively(...) flattening to run once,
keeping calls to _reassign_region_ids(...) and leaving
CombinedRegionSearch.search_regions() path unchanged.

In `@modelopt/onnx/quantization/autotune/torch_region_builder.py`:
- Around line 63-67: The precomputed node-indexed maps (tensor_users_map,
forward_reachable_nodes_map) in RegionSearchBase are invalidated by calling
graph.toposort() after RegionSearchBase.__init__; move the graph.toposort() call
to before invoking super().__init__ in TorchRegionBuilder.__init__ so the
base-class initialization computes maps against the final node order (i.e., call
self.graph.toposort() first, then super().__init__(graph, root=None) and then
initialize self.regions).

---

Nitpick comments:
In `@modelopt/onnx/quantization/autotune/torch_region_builder.py`:
- Around line 626-628: Extract the hardcoded literals used in the merge loop
into class-level constants to improve clarity and tunability: replace the
literal 10 (loop iterations) and 12 (target_region_size) in the block that calls
self._merge_neighboring_regions(root_region) and
self._merge_small_composite_regions(root_region, target_region_size=12) with
named constants (e.g., MERGE_ITERATIONS and TARGET_REGION_SIZE) defined on the
containing class (torch_region_builder module), and use those constants in the
for loop and the _merge_small_composite_regions call so configuration is
centralized and self-documenting.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 91d78196-5029-4668-b9d2-f2e05a8b9e51

📥 Commits

Reviewing files that changed from the base of the PR and between a550ed7 and 2d3d90f.

📒 Files selected for processing (3)
  • modelopt/onnx/quantization/autotune/__init__.py
  • modelopt/onnx/quantization/autotune/autotuner.py
  • modelopt/onnx/quantization/autotune/torch_region_builder.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • modelopt/onnx/quantization/autotune/init.py

Signed-off-by: Will Guo <willg@nvidia.com>
@willg-nv willg-nv changed the title Draft: Integrate TorchRegionBuilder to AutoQDQ Integrate TorchRegionBuilder to AutoQDQ Apr 15, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/onnx/quantization/autotune/torch_region_builder.py`:
- Around line 227-233: The sorting in _sort_regions uses
max(region.get_region_nodes_and_descendants()) as a key which will raise
ValueError for regions with no nodes; update the key to handle empty sequences
(e.g., use max(..., default=-inf) or fall back to a sentinel value) so empty
regions sort deterministically and do not throw. Modify the lambda in
_sort_regions that references region.get_region_nodes_and_descendants() to
return a safe default when the iterable is empty, leaving the recursive call to
_sort_regions(child) unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: e51bd1be-c114-478d-a7ce-bdfc57b7f1e0

📥 Commits

Reviewing files that changed from the base of the PR and between 2d3d90f and 1229f10.

📒 Files selected for processing (2)
  • modelopt/onnx/quantization/autotune/autotuner.py
  • modelopt/onnx/quantization/autotune/torch_region_builder.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • modelopt/onnx/quantization/autotune/autotuner.py

Comment on lines +227 to +233
def _sort_regions(self, region: Region) -> None:
"""Sort regions by topological order."""
region.children = sorted(
region.children, key=lambda r: max(r.get_region_nodes_and_descendants())
)
for child in region.get_children():
self._sort_regions(child)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential ValueError on empty region.

If a region has no nodes and no children (empty get_region_nodes_and_descendants()), max() on an empty sequence raises ValueError.

Proposed fix
     def _sort_regions(self, region: Region) -> None:
         """Sort regions by topological order."""
         region.children = sorted(
-            region.children, key=lambda r: max(r.get_region_nodes_and_descendants())
+            region.children, key=lambda r: max(r.get_region_nodes_and_descendants(), default=-1)
         )
         for child in region.get_children():
             self._sort_regions(child)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/quantization/autotune/torch_region_builder.py` around lines 227
- 233, The sorting in _sort_regions uses
max(region.get_region_nodes_and_descendants()) as a key which will raise
ValueError for regions with no nodes; update the key to handle empty sequences
(e.g., use max(..., default=-inf) or fall back to a sentinel value) so empty
regions sort deterministically and do not throw. Modify the lambda in
_sort_regions that references region.get_region_nodes_and_descendants() to
return a safe default when the iterable is empty, leaving the recursive call to
_sort_regions(child) unchanged.

@gcunhase gcunhase requested a review from ajrasane April 15, 2026 14:17
Copy link
Copy Markdown
Contributor

@ajrasane ajrasane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Integrate TorchRegionBuilder to AutoQDQ

Code Review Findings

🔴 Critical

1. No unit tests for TorchRegionBuilder

There is no test_torch_region_builder.py. The existing test_region_inspect.py tests region_inspect.py (old module) using mocks — it exercises none of the new code paths. TorchRegionBuilder has ~15 methods involving complex multi-pass tree manipulation (trie build, merge, flatten, filter, epilogue probe, linearize). Without tests there is no way to verify correctness or detect regressions.

At minimum, tests should cover:

  • check_torch_naming_convention() — true/false, edge cases (empty graph, all Constants, threshold boundary)
  • build_regions() — round-trip with a synthetic PyTorch-named graph, checking node coverage and region types
  • _sort_regions() — empty region case (see issue #2 below)
  • The autotuner dispatch: given a torch-named graph, _search_regions() must route to TorchRegionBuilder, not CombinedRegionSearch

2. _sort_regions raises ValueError on empty region (torch_region_builder.py:233) — open unresolved thread

region.children = sorted(
    region.children, key=lambda r: max(r.get_region_nodes_and_descendants())  # 💥 ValueError if empty
)

After filtering, a child region can have zero nodes and zero descendants. Fix:

key=lambda r: max(r.get_region_nodes_and_descendants(), default=-1)

🟠 Important

3. boundary_op_types / is_quantizable_node duplicated between TorchRegionBuilder and TopDownRegionBuilder

TorchRegionBuilder.is_quantizable_node() (torch_region_builder.py:349) lists exactly the same ops as TopDownRegionBuilder.boundary_op_types in region_search.py. If one is updated (e.g. adding a new quantizable op) and the other is not, behavior will silently diverge. This should be extracted to a shared constant in common.py.


4. _remove_empty_regions doesn't clean up empty COMPOSITE regions (torch_region_builder.py:555–580)

The function removes LEAF children with no quantizable backbone, but does NOT remove COMPOSITE children that had all their sub-children emptied by the recursive pass. After the filter, a COMPOSITE with no remaining children or nodes stays in the tree. While linearize_regions may skip it in practice, this leaves the hierarchy in an inconsistent state. COMPOSITE children where child.get_region_nodes_and_descendants() is empty should also be removed.


5. Torch and default paths produce structurally different region lists in _search_regions

  • Torch path (autotuner.py:101–108): linearize=True returns post-order DFS, leaves before innermost composites. IDs assigned sequentially by list position.
  • Default path (autotuner.py:110–124): pre-order DFS via _visit_region_recursively, then sorted so LEAFs come first. IDs assigned breadth-first via _reassign_region_ids.

Ordering and ID semantics differ between paths. Downstream pattern-matching and scheme-generation logic likely depends on LEAF ordering. This asymmetry should be explicitly documented or aligned.


6. check_torch_naming_convention is silent when a model barely misses the threshold

If a well-preprocessed PyTorch model falls to 79% torch-named nodes (e.g. after ONNX simplification renames some ops), it silently falls back to CombinedRegionSearch. This is hard to debug. Recommend adding a log at INFO level indicating which builder was chosen and the measured ratio:

logger.info(
    f"Torch naming ratio {slash_count / total:.2f} "
    f"({'≥' if result else '<'} {threshold} threshold); "
    f"using {'TorchRegionBuilder' if result else 'CombinedRegionSearch'}"
)

💡 Suggestions

7. Magic numbers in build_regions

for _ in range(10):  # Why 10?
    ...
    self._merge_small_composite_regions(root_region, target_region_size=12)  # Why 12?

These should be class-level constants (_MAX_MERGE_ITERATIONS, _MIN_REGION_SIZE) with a comment on how they were chosen.


8. _probe_epilogues_recursive docstring/contract mismatch

The method docstring says "fusible non-divergent epilogue nodes," but the implementation adds ALL consumers (including non-fusible) as "boundary nodes." The in-code comment explains the intent correctly — update the docstring for _probe_epilogues_recursive and _probe_epilogues to match the actual semantics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants