fix: CacheWarmer hardening (#65)#68
Merged
Merged
Conversation
Add module-level _read_max_tid(conn) helper and a public PGJsonbStorage.current_max_tid() method that logs on failure and returns None. Route _restore_state, poll_invalidations, and the warmer's load_current_tid_fn through them. Warmer skips warmup when current_max_tid returns None instead of installing consensus=0. Closes I1 and I2 of #65. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Matches the existing import-style block (6 other names from .storage already imported at module top). Avoids a per-call name lookup in the poll_invalidations hot path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both are wanted by the cache warmer's race-detection fix (next commit). The existing instance.load / load_multiple call sites ignore the new return value; no behaviour change there. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
#65) Closes I3 of #65. After poll_advance, re-read consensus_tid and use that as the polled_tid for subsequent set() calls — if another instance advanced consensus ahead of our sampled TID, this ensures our sets still pass the gate. Track accepted vs. attempted writes via the new set() return value; log a WARNING when the entire warmup was rejected despite non-empty results. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Coverage gap flagged in #63 final review. Pins the invariant that the direct-use write path advances consensus and invalidates changed zoids in the shared cache, independent of any instance's tpc_finish. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #65.
Summary
Three hardening items + one coverage gap, all flagged during the #63 final review and filed as follow-ups in #65.
I1 — silent error swallow
The warmer's inline
_current_max_tidclosure onPGJsonbStorage.__init__caught any exception and silently returned0, making operator-visible failures disappear (closed connection, dropped `transaction_log`, wrong role). Replaced with a module-level_read_max_tid(conn)helper (raises to caller) and a publicPGJsonbStorage.current_max_tid()method that logs at WARNING and returnsNone. The warmer now skips warmup gracefully onNoneinstead of installing a fabricated `consensus=0`.I2 — DRY the 3 copies of the SELECT
The same
SELECT COALESCE(MAX(tid), 0) FROM transaction_loglived in three places with three different error-handling policies (_restore_state,poll_invalidations, and the inline closure). All three now route through_read_max_tid. Each caller keeps its own error policy: startup +poll_invalidationslet the exception propagate (fatal paths), warmer uses the wrappedcurrent_max_tid()(best-effort path).grep -rn "SELECT COALESCE(MAX(tid)" src/returns exactly one hit (inside the helper).I3 — startup race
Warmer samples
current_tid=T0, a concurrent instance poll advances shared-cache consensus to T1 > T0, warmer'spoll_advance(T0, [])becomes a no-op, every subsequentset(..., polled_tid=T0)is rejected becauseT0 < consensus=T1. Mitigation:poll_advance(current_tid, []), re-read `self._shared_cache.consensus_tid` →effective_tid.effective_tidas `polled_tid` for subsequentset()calls so they pass the gate even when the sampled TID is now stale.Plus observability:
SharedLoadCache.set()now returns `bool` (accepted/rejected). The warmer counts `written` via the bool and logs a WARNING when `written == 0` despite a non-empty result set — so if the mitigation's window isn't enough (consensus advances mid-loop), the silent-failure is no longer silent.Required API additions:
SharedLoadCache.consensus_tid— read-only property (lock-guarded).SharedLoadCache.set()— returns `bool`.Both are additive. Existing callers (
instance.load,load_multiple) discard the new return value, so no behaviour change elsewhere.Coverage gap
New regression tests in
TestMainStorageFinishAdvancesConsensuspin the `PGJsonbStorage._finish → shared.poll_advance` wiring (direct-use write path, exercised only implicitly by conformance tests before this PR).Commits
Each commit is green (verified for `git bisect`).
Test plan
Release note
Tagged for 1.12.1. This PR unblocks the release that was held pending #65.
🤖 Generated with Claude Code