[match] optimized match_named_modules#697
Conversation
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Summary by CodeRabbit
WalkthroughThe PR optimizes Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 60 minutes.Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (5)
src/compressed_tensors/utils/match.py (2)
58-60: Use ASCIIxinstead of Unicode multiplication sign×.The Unicode character
×(U+00D7 MULTIPLICATION SIGN) can cause issues in some environments. Use the ASCII lowercasexinstead.Proposed fix
- # Performance optimization: Pre-process targets and ignores to avoid O(N × M) loop - # For large MoE models (18K+ modules), this reduces ~925K iterations to ~18K + # Performance optimization: Pre-process targets and ignores to avoid O(N x M) loop + # For large MoE models (18K+ modules), this reduces ~925K iterations to ~18K🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/compressed_tensors/utils/match.py` around lines 58 - 60, Update the inline comment in the "Performance optimization: Pre-process targets and ignores..." block to replace the Unicode multiplication sign "×" with the ASCII letter "x" so the phrase "O(N × M)" becomes "O(N x M)"; locate the comment near the pre-compiling regex / set-based lookup logic in match.py (the block mentioning large MoE models and transformers#44276) and make the single-character change to avoid non-ASCII characters.
73-79: Heuristic may miss class names containing underscores.The classification assumes targets without "." or "_" are class names, but class names can contain underscores (e.g.,
_ConvNd,LinearBase_v2). Such targets would only get exact name matching, not class matching.This is likely acceptable for typical usage patterns (most class targets like
Linear,LayerNormdon't have underscores), but consider adding a brief comment documenting this limitation.Suggested comment addition
elif "." in target or "_" in target: - # Likely a module name path (e.g., "model.layers.0.mlp") + # Likely a module name path (e.g., "model.layers.0.mlp") + # Note: This heuristic may miss class names with underscores (rare in practice) exact_targets.add(target)tests/test_utils/test_match.py (3)
366-367: Replace Unicode×with ASCIIxin docstrings and comments.Multiple occurrences of Unicode multiplication sign
×in this test. Replace with ASCIIxfor consistency and portability (also flagged by Ruff RUF001/RUF002/RUF003).Affected lines and fix
- This test ensures that match_named_modules doesn't regress to O(N × M) + This test ensures that match_named_modules doesn't regress to O(N x M) ... - # 48 layers × 128 experts × 3 projections = ~18K modules + # 48 layers x 128 experts x 3 projections = ~18K modules ... - # With optimization: O(N + M) should be <1s for 768 modules - # Without optimization: O(N × M) would be several seconds + # With optimization: O(N + M) should be <1s for 768 modules + # Without optimization: O(N x M) would be several seconds🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_utils/test_match.py` around lines 366 - 367, Replace all occurrences of the Unicode multiplication sign "×" with the ASCII letter "x" in comments/docstrings around the test that discusses match_named_modules (the docstring/comment that reads "O(N × M)" should be "O(N x M)"); update any related comments in tests referencing fine-grained MoE or complexity to use "x" so linters (Ruff RUF001/RUF002/RUF003) stop flagging them and text remains consistent.
416-423: Performance threshold seems reasonable but may be flaky on slow CI runners.The 2-second threshold is generous (comment notes optimization should complete in <1s), but CI environments under heavy load could still occasionally exceed this. Consider:
- Adding a
@pytest.mark.timeout(5)as a hard upper bound- Or documenting this as an expected intermittent failure with instructions to re-run
The current implementation is acceptable for detecting major regressions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_utils/test_match.py` around lines 416 - 423, The performance assertion using max_time with elapsed_time and matches can be flaky on slow CI; annotate the test that contains max_time/elapsed_time/matches (the performance block in tests/test_utils/test_match.py) with a pytest timeout decorator (e.g., `@pytest.mark.timeout`(5)) to provide a hard upper bound, or alternatively add a brief comment / skip logic documenting that intermittent failures may be re-run; ensure you import pytest and apply the decorator to the test function that runs this assertion so the suite fails with a clear timeout instead of intermittent assertion flakiness.
399-402: Consider usingtime.perf_counter()for more accurate timing.
time.perf_counter()provides higher resolution and is the recommended choice for measuring short durations in performance tests.time.time()can have lower precision on some platforms.Proposed fix
# Time the matching operation - start_time = time.time() + start_time = time.perf_counter() matches = list(match_named_modules(model, targets, ignore)) - elapsed_time = time.time() - start_time + elapsed_time = time.perf_counter() - start_time🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_utils/test_match.py` around lines 399 - 402, Replace use of time.time() with time.perf_counter() when timing the matching operation in the test around match_named_modules; specifically update the start_time and elapsed_time calculations in the block that calls list(match_named_modules(model, targets, ignore)) so they use time.perf_counter() for higher-resolution timing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/compressed_tensors/utils/match.py`:
- Around line 58-60: Update the inline comment in the "Performance optimization:
Pre-process targets and ignores..." block to replace the Unicode multiplication
sign "×" with the ASCII letter "x" so the phrase "O(N × M)" becomes "O(N x M)";
locate the comment near the pre-compiling regex / set-based lookup logic in
match.py (the block mentioning large MoE models and transformers#44276) and make
the single-character change to avoid non-ASCII characters.
In `@tests/test_utils/test_match.py`:
- Around line 366-367: Replace all occurrences of the Unicode multiplication
sign "×" with the ASCII letter "x" in comments/docstrings around the test that
discusses match_named_modules (the docstring/comment that reads "O(N × M)"
should be "O(N x M)"); update any related comments in tests referencing
fine-grained MoE or complexity to use "x" so linters (Ruff RUF001/RUF002/RUF003)
stop flagging them and text remains consistent.
- Around line 416-423: The performance assertion using max_time with
elapsed_time and matches can be flaky on slow CI; annotate the test that
contains max_time/elapsed_time/matches (the performance block in
tests/test_utils/test_match.py) with a pytest timeout decorator (e.g.,
`@pytest.mark.timeout`(5)) to provide a hard upper bound, or alternatively add a
brief comment / skip logic documenting that intermittent failures may be re-run;
ensure you import pytest and apply the decorator to the test function that runs
this assertion so the suite fails with a clear timeout instead of intermittent
assertion flakiness.
- Around line 399-402: Replace use of time.time() with time.perf_counter() when
timing the matching operation in the test around match_named_modules;
specifically update the start_time and elapsed_time calculations in the block
that calls list(match_named_modules(model, targets, ignore)) so they use
time.perf_counter() for higher-resolution timing.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 3865494c-1efc-4a0c-a972-299cc3e7d26d
📒 Files selected for processing (2)
src/compressed_tensors/utils/match.pytests/test_utils/test_match.py
@kylesayrs the changes in this PR are still basic efficiency gains for a frequently used pathway. Even if you think #695 is due to other reasons, I don't understand why changes are requested on this PR |
|
I don't think 200+ lines of AI-written code without any performance metrics across platforms is worth improving a function that already has a throughput of |
Changing O(NxM) to O(Nxm) (where m << M because almost all ignores in a checkpoint are layer names rather than regexes) is a straightforward feature, at the cost of 100 lines of code. Is the code unclear somewhere? The matching logic is used everywhere, and this is a straightforward improvement. Your 244K it/s is entirely dependent on inputs. |
Potentially resolves #695
I asked Claude how
match_named_modulescan be optimized to resolve issue #695, it suggested the following changes:Claude suggests the v0.10.0 pathway is quicker because only leaf modules are checked for matching. We may want to add this as a toggle to
match_named_modules, we may be safe to only check leaf modules during apply_quantization_config (k/v cache may prohibit this?)Will check if this resolves user issue. Depends on the contents of their config.json how much of a speedup this will provide. We may want to extend this to other matching utils if helps out for their use case