fix(engine): close unpinned rule-cache torn-read + review follow-ups to #49#50
fix(engine): close unpinned rule-cache torn-read + review follow-ups to #49#50marcinpsk wants to merge 3 commits into
Conversation
…pinned torn-read The unpinned path returned _RULE_CACHE["exact"], ["regex"], ["memo"] as three separate dict reads while a reload mutated those keys in place, so a concurrent reload on another worker thread could pair one rule-set version's rules with a different version's memo and then memoize a stale-version result into the live memo (persisting until the next rule edit). _get_enabled_rules() now reads the module global once into a local, and a reload publishes the new set by rebinding _RULE_CACHE to a fresh dict in a single atomic assignment. A reader sees the whole old set or the whole new one — never exact from V1 paired with memo from V2. The pin-prime path captures from the same consistent local, so it can no longer snapshot a torn pair either.
PR #49 raised the stated compatibility floor to NetBox >= 4.3.0 in README.md and docs/installation.md but left the enforced PluginConfig.min_version at 4.2.0, so a 4.2.x install would still load despite the docs declaring it unsupported. Align the gate with the docs and pin them together with a test.
…ustiveness - Add test_reload_publishes_a_fresh_cache_dict_atomically: asserts a reload swaps in a new cache dict instead of mutating the old one in place — the invariant behind the unpinned torn-read fix. - Rewrite test_pinned_block_holds_snapshot_across_concurrent_cache_reload to use a real second thread, proving _pin is thread-local (a plain global now fails it), with try/finally cache restoration so the simulated reload can't leak. - test_memo_is_bounded now checks the cap after every insert, catching a mid-loop leak toward 2*cap that the old end-of-loop snapshot missed. - Add EnabledRuleFingerprintColumnsTest: fails if any concrete model column is neither fingerprinted nor explicitly excluded, so a future match-affecting field can't silently break fingerprint-based cache invalidation.
WalkthroughThe plugin raises its minimum supported NetBox version to 4.3.0 and changes enabled-rule cache reloads to publish a fresh shared dict atomically. Tests now cover the version gate, memo bounds, concurrent snapshot stability, and fingerprint column coverage. ChangesRule cache version floor and atomic reload
Sequence Diagram(s)sequenceDiagram
participant pinned_rule_cache
participant _get_enabled_rules
participant _RULE_CACHE
participant _enabled_rules_version
pinned_rule_cache->>_get_enabled_rules: resolve exact, regex, and memo snapshot
_get_enabled_rules->>_RULE_CACHE: read current cache dict
_get_enabled_rules->>_enabled_rules_version: compare fingerprint version
_get_enabled_rules->>_RULE_CACHE: publish fresh dict when version changes
_get_enabled_rules-->>pinned_rule_cache: return exact, regex, and memo
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
|
@coderabbitai review |
✅ Action performedReview finished.
|
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@netbox_interface_name_rules/engine.py`:
- Around line 234-242: The pinned branch in the cache-resolving logic is still
sharing the mutable memo dict, so a concurrent clear can race with memo lookups
and trigger KeyError. Update the pinning path in the function that sets
_pin.exact/_pin.regex/_pin.memo so pinned requests use an isolated copy of
cache["memo"] (or otherwise synchronized/sentinel-based access) instead of
aliasing the shared global memo.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 20577c4c-99cd-45e9-9f71-3c76291f7c01
📒 Files selected for processing (4)
netbox_interface_name_rules/__init__.pynetbox_interface_name_rules/engine.pynetbox_interface_name_rules/tests/test_misc.pynetbox_interface_name_rules/tests/test_rules.py
| if pinned: | ||
| # Pin this thread to the freshly-resolved set for the rest of the block. The tuples are | ||
| # immutable and the memo dict is now this thread's own reference, so a later global reload | ||
| # leaves our snapshot — and the entries we memoize into it — untouched. | ||
| _pin.exact = _RULE_CACHE["exact"] | ||
| _pin.regex = _RULE_CACHE["regex"] | ||
| _pin.memo = _RULE_CACHE["memo"] | ||
| _pin.exact = cache["exact"] | ||
| _pin.regex = cache["regex"] | ||
| _pin.memo = cache["memo"] | ||
| _pin.primed = True | ||
| return _pin.exact, _pin.regex, _pin.memo | ||
| return _RULE_CACHE["exact"], _RULE_CACHE["regex"], _RULE_CACHE["memo"] | ||
| return cache["exact"], cache["regex"], cache["memo"] |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Don’t pin the shared mutable memo dict.
_pin.memo = cache["memo"] still aliases the global cache memo; another thread can clear that dict at the memo cap while this pinned request is between sig in memo and memo[sig], causing a sporadic KeyError. Copy the memo for pinned blocks, or make memo access synchronized/sentinel-based.
Suggested localized fix
if pinned:
# Pin this thread to the freshly-resolved set for the rest of the block. The tuples are
- # immutable and the memo dict is now this thread's own reference, so a later global reload
+ # immutable and the memo dict is copied for this thread, so a later global reload
# leaves our snapshot — and the entries we memoize into it — untouched.
_pin.exact = cache["exact"]
_pin.regex = cache["regex"]
- _pin.memo = cache["memo"]
+ _pin.memo = dict(cache["memo"])
_pin.primed = True
- return cache["exact"], cache["regex"], cache["memo"]
+ return _pin.exact, _pin.regex, _pin.memo
+ return cache["exact"], cache["regex"], cache["memo"]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if pinned: | |
| # Pin this thread to the freshly-resolved set for the rest of the block. The tuples are | |
| # immutable and the memo dict is now this thread's own reference, so a later global reload | |
| # leaves our snapshot — and the entries we memoize into it — untouched. | |
| _pin.exact = _RULE_CACHE["exact"] | |
| _pin.regex = _RULE_CACHE["regex"] | |
| _pin.memo = _RULE_CACHE["memo"] | |
| _pin.exact = cache["exact"] | |
| _pin.regex = cache["regex"] | |
| _pin.memo = cache["memo"] | |
| _pin.primed = True | |
| return _pin.exact, _pin.regex, _pin.memo | |
| return _RULE_CACHE["exact"], _RULE_CACHE["regex"], _RULE_CACHE["memo"] | |
| return cache["exact"], cache["regex"], cache["memo"] | |
| if pinned: | |
| # Pin this thread to the freshly-resolved set for the rest of the block. The tuples are | |
| # immutable and the memo dict is copied for this thread, so a later global reload | |
| # leaves our snapshot — and the entries we memoize into it — untouched. | |
| _pin.exact = cache["exact"] | |
| _pin.regex = cache["regex"] | |
| _pin.memo = dict(cache["memo"]) | |
| _pin.primed = True | |
| return _pin.exact, _pin.regex, _pin.memo | |
| return cache["exact"], cache["regex"], cache["memo"] |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@netbox_interface_name_rules/engine.py` around lines 234 - 242, The pinned
branch in the cache-resolving logic is still sharing the mutable memo dict, so a
concurrent clear can race with memo lookups and trigger KeyError. Update the
pinning path in the function that sets _pin.exact/_pin.regex/_pin.memo so pinned
requests use an isolated copy of cache["memo"] (or otherwise
synchronized/sentinel-based access) instead of aliasing the shared global memo.



Follow-ups from code review of the (already-merged) #49 rule-cache work.
Production fixes
engine.py) —find_matching_rule's unpinned path returned_RULE_CACHE["exact"]/["regex"]/["memo"]as three separate dict reads while a reload mutated those keys in place. Under threaded workers a concurrent reload (V1→V2) could pair V1's rules with V2's memo and memoize a stale result into the live memo (persisting until the next rule edit)._get_enabled_rules()now reads the global once into a local and a reload publishes a fresh dict via one atomic rebind, so a reader sees the whole old set or the whole new one — never a torn mix. The pin-prime path captures from the same consistent local.min_version4.2.0 → 4.3.0 (__init__.py) — perf(engine): load the enabled rule set once and memoize find_matching_rule #49 raised the documented floor to NetBox ≥ 4.3.0 in README/docs but left the enforced gate at 4.2.0; a 4.2.x install would load despite the docs. Gate now matches the docs.Test hardening
test_reload_publishes_a_fresh_cache_dict_atomically— asserts the no-in-place-mutation invariant behind the torn-read fix (red-checked: fails against the old in-place code).test_pinned_block_holds_snapshot_across_concurrent_cache_reloadrewritten to spawn a real second thread, proving_pinis genuinely thread-local — a plain global now fails it (the old same-thread version could not catch that). Addstry/finallycache restoration.test_memo_is_boundednow checks the cap after every insert, catching a mid-loop leak toward 2×cap that the old end-of-loop snapshot missed.EnabledRuleFingerprintColumnsTest— fails if any concrete model column is neither in_VERSION_COLUMNSnor an explicitly-justified exclusion, so a future match-affecting field can't silently break fingerprint-based cache invalidation.Verification
Each fix red-checked (fails against un-fixed code, passes against the fix). Full suite 333 OK, coverage 97% (gate met), ruff clean.
Summary by CodeRabbit
Bug Fixes
Chores
Tests