Skip to content

[match] optimized match_named_modules#697

Open
brian-dellabetta wants to merge 1 commit into
mainfrom
bdellabe/match-named-modules-optimized
Open

[match] optimized match_named_modules#697
brian-dellabetta wants to merge 1 commit into
mainfrom
bdellabe/match-named-modules-optimized

Conversation

@brian-dellabetta
Copy link
Copy Markdown
Collaborator

Potentially resolves #695

I asked Claude how match_named_modules can be optimized to resolve issue #695, it suggested the following changes:

  • cache compiled regexes rather than recompiling for every check
  • use set-based checks for exact target/ignore matches
  • wait until end to loop over regex-based matches only if needed.
  • include a mock MoE test for match

Claude suggests the v0.10.0 pathway is quicker because only leaf modules are checked for matching. We may want to add this as a toggle to match_named_modules, we may be safe to only check leaf modules during apply_quantization_config (k/v cache may prohibit this?)

Will check if this resolves user issue. Depends on the contents of their config.json how much of a speedup this will provide. We may want to extend this to other matching utils if helps out for their use case

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 29, 2026

Summary by CodeRabbit

  • Performance

    • Optimized module matching algorithm to improve performance when working with large model architectures.
  • Tests

    • Added performance validation test suite for module matching operations on large-scale models to ensure reliability.

Walkthrough

The PR optimizes match_named_modules from O(N×M) quadratic complexity to near-linear by building lookup structures once (exact match sets, precompiled regex, class targets) and performing O(1) exact matching before attempting regex and class-based matching on each module. A performance test validates the optimization on a MoE-like model topology.

Changes

Cohort / File(s) Summary
Performance Optimization
src/compressed_tensors/utils/match.py
Rewrote match_named_modules to eliminate nested loops: builds one-time lookup structures (exact targets/ignores as sets, precompiled regex), then processes each module with O(1) exact matching, suffix-based fused reconstruction, regex checks, and class-based matching. Added early InternalModule skipping and uses discard for unmatched target tracking.
Performance Testing
tests/test_utils/test_match.py
Added performance-focused unit test for match_named_modules using DummyMoEModel with mixed exact/regex targets and ignore patterns. Validates runtime stays below threshold and confirms matched modules include torch.nn.Linear instances.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title '[match] optimized match_named_modules' clearly and concisely summarizes the main change: optimization of the match_named_modules function.
Description check ✅ Passed The description is directly related to the changeset, explaining the optimizations implemented (caching regexes, set-based checks, deferred regex matching) and references the linked issue #695.
Linked Issues check ✅ Passed The PR implements optimizations to reduce O(N×M) complexity in match_named_modules by caching compiled regexes, using set-based checks for exact matches, and deferring regex matching, directly addressing issue #695's performance problem on large MoE models.
Out of Scope Changes check ✅ Passed All changes are scoped to optimizing match_named_modules and adding a performance test; no unrelated modifications are present outside the stated objectives.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bdellabe/match-named-modules-optimized

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 60 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (5)
src/compressed_tensors/utils/match.py (2)

58-60: Use ASCII x instead of Unicode multiplication sign ×.

The Unicode character × (U+00D7 MULTIPLICATION SIGN) can cause issues in some environments. Use the ASCII lowercase x instead.

Proposed fix
-    # Performance optimization: Pre-process targets and ignores to avoid O(N × M) loop
-    # For large MoE models (18K+ modules), this reduces ~925K iterations to ~18K
+    # Performance optimization: Pre-process targets and ignores to avoid O(N x M) loop
+    # For large MoE models (18K+ modules), this reduces ~925K iterations to ~18K
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/compressed_tensors/utils/match.py` around lines 58 - 60, Update the
inline comment in the "Performance optimization: Pre-process targets and
ignores..." block to replace the Unicode multiplication sign "×" with the ASCII
letter "x" so the phrase "O(N × M)" becomes "O(N x M)"; locate the comment near
the pre-compiling regex / set-based lookup logic in match.py (the block
mentioning large MoE models and transformers#44276) and make the
single-character change to avoid non-ASCII characters.

73-79: Heuristic may miss class names containing underscores.

The classification assumes targets without "." or "_" are class names, but class names can contain underscores (e.g., _ConvNd, LinearBase_v2). Such targets would only get exact name matching, not class matching.

This is likely acceptable for typical usage patterns (most class targets like Linear, LayerNorm don't have underscores), but consider adding a brief comment documenting this limitation.

Suggested comment addition
         elif "." in target or "_" in target:
-            # Likely a module name path (e.g., "model.layers.0.mlp")
+            # Likely a module name path (e.g., "model.layers.0.mlp")
+            # Note: This heuristic may miss class names with underscores (rare in practice)
             exact_targets.add(target)
tests/test_utils/test_match.py (3)

366-367: Replace Unicode × with ASCII x in docstrings and comments.

Multiple occurrences of Unicode multiplication sign × in this test. Replace with ASCII x for consistency and portability (also flagged by Ruff RUF001/RUF002/RUF003).

Affected lines and fix
-        This test ensures that match_named_modules doesn't regress to O(N × M)
+        This test ensures that match_named_modules doesn't regress to O(N x M)
         ...
-        # 48 layers × 128 experts × 3 projections = ~18K modules
+        # 48 layers x 128 experts x 3 projections = ~18K modules
         ...
-        # With optimization: O(N + M) should be <1s for 768 modules
-        # Without optimization: O(N × M) would be several seconds
+        # With optimization: O(N + M) should be <1s for 768 modules
+        # Without optimization: O(N x M) would be several seconds
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_utils/test_match.py` around lines 366 - 367, Replace all
occurrences of the Unicode multiplication sign "×" with the ASCII letter "x" in
comments/docstrings around the test that discusses match_named_modules (the
docstring/comment that reads "O(N × M)" should be "O(N x M)"); update any
related comments in tests referencing fine-grained MoE or complexity to use "x"
so linters (Ruff RUF001/RUF002/RUF003) stop flagging them and text remains
consistent.

416-423: Performance threshold seems reasonable but may be flaky on slow CI runners.

The 2-second threshold is generous (comment notes optimization should complete in <1s), but CI environments under heavy load could still occasionally exceed this. Consider:

  1. Adding a @pytest.mark.timeout(5) as a hard upper bound
  2. Or documenting this as an expected intermittent failure with instructions to re-run

The current implementation is acceptable for detecting major regressions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_utils/test_match.py` around lines 416 - 423, The performance
assertion using max_time with elapsed_time and matches can be flaky on slow CI;
annotate the test that contains max_time/elapsed_time/matches (the performance
block in tests/test_utils/test_match.py) with a pytest timeout decorator (e.g.,
`@pytest.mark.timeout`(5)) to provide a hard upper bound, or alternatively add a
brief comment / skip logic documenting that intermittent failures may be re-run;
ensure you import pytest and apply the decorator to the test function that runs
this assertion so the suite fails with a clear timeout instead of intermittent
assertion flakiness.

399-402: Consider using time.perf_counter() for more accurate timing.

time.perf_counter() provides higher resolution and is the recommended choice for measuring short durations in performance tests. time.time() can have lower precision on some platforms.

Proposed fix
         # Time the matching operation
-        start_time = time.time()
+        start_time = time.perf_counter()
         matches = list(match_named_modules(model, targets, ignore))
-        elapsed_time = time.time() - start_time
+        elapsed_time = time.perf_counter() - start_time
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_utils/test_match.py` around lines 399 - 402, Replace use of
time.time() with time.perf_counter() when timing the matching operation in the
test around match_named_modules; specifically update the start_time and
elapsed_time calculations in the block that calls
list(match_named_modules(model, targets, ignore)) so they use
time.perf_counter() for higher-resolution timing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/compressed_tensors/utils/match.py`:
- Around line 58-60: Update the inline comment in the "Performance optimization:
Pre-process targets and ignores..." block to replace the Unicode multiplication
sign "×" with the ASCII letter "x" so the phrase "O(N × M)" becomes "O(N x M)";
locate the comment near the pre-compiling regex / set-based lookup logic in
match.py (the block mentioning large MoE models and transformers#44276) and make
the single-character change to avoid non-ASCII characters.

In `@tests/test_utils/test_match.py`:
- Around line 366-367: Replace all occurrences of the Unicode multiplication
sign "×" with the ASCII letter "x" in comments/docstrings around the test that
discusses match_named_modules (the docstring/comment that reads "O(N × M)"
should be "O(N x M)"); update any related comments in tests referencing
fine-grained MoE or complexity to use "x" so linters (Ruff RUF001/RUF002/RUF003)
stop flagging them and text remains consistent.
- Around line 416-423: The performance assertion using max_time with
elapsed_time and matches can be flaky on slow CI; annotate the test that
contains max_time/elapsed_time/matches (the performance block in
tests/test_utils/test_match.py) with a pytest timeout decorator (e.g.,
`@pytest.mark.timeout`(5)) to provide a hard upper bound, or alternatively add a
brief comment / skip logic documenting that intermittent failures may be re-run;
ensure you import pytest and apply the decorator to the test function that runs
this assertion so the suite fails with a clear timeout instead of intermittent
assertion flakiness.
- Around line 399-402: Replace use of time.time() with time.perf_counter() when
timing the matching operation in the test around match_named_modules;
specifically update the start_time and elapsed_time calculations in the block
that calls list(match_named_modules(model, targets, ignore)) so they use
time.perf_counter() for higher-resolution timing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3865494c-1efc-4a0c-a972-299cc3e7d26d

📥 Commits

Reviewing files that changed from the base of the PR and between 4d563e8 and 0ee78a8.

📒 Files selected for processing (2)
  • src/compressed_tensors/utils/match.py
  • tests/test_utils/test_match.py

Copy link
Copy Markdown
Collaborator

@kylesayrs kylesayrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brian-dellabetta
Copy link
Copy Markdown
Collaborator Author

See #695 (comment)

@kylesayrs the changes in this PR are still basic efficiency gains for a frequently used pathway. Even if you think #695 is due to other reasons, I don't understand why changes are requested on this PR

@kylesayrs
Copy link
Copy Markdown
Collaborator

kylesayrs commented May 8, 2026

I don't think 200+ lines of AI-written code without any performance metrics across platforms is worth improving a function that already has a throughput of 244K it/s

@brian-dellabetta
Copy link
Copy Markdown
Collaborator Author

I don't think 200+ lines of AI-written code without any performance metrics across platforms is worth improving a function that already has a throughput of 244K it/s

Changing O(NxM) to O(Nxm) (where m << M because almost all ignores in a checkpoint are layer names rather than regexes) is a straightforward feature, at the cost of 100 lines of code. Is the code unclear somewhere? The matching logic is used everywhere, and this is a straightforward improvement. Your 244K it/s is entirely dependent on inputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

match_named_modules O(N×M) causes multi-minute silent load tail on fine-grained MoE checkpoints (Qwen3-30B-A3B-AWQ, ~18K Linear modules)

2 participants