fix: stabilize flaky CI (stage-exec-test (from-0, paral)#21886
Open
erigon-copilot[bot] wants to merge 2 commits into
Open
fix: stabilize flaky CI (stage-exec-test (from-0, paral)#21886erigon-copilot[bot] wants to merge 2 commits into
erigon-copilot[bot] wants to merge 2 commits into
Conversation
…eads Flush() did not touch the BranchCache, so the from-0 integration-path loop (Flush + ClearRam + tx.Commit) left stale commitment branches in the cache across batch boundaries. Subsequent reads hit the cache instead of MDBX, returning outdated branch nodes and producing wrong trie roots at blocks 263641 (parallel) / 513814 (serial). Flush now uses flushMemWithCallback to invalidate every commitment key it writes, forcing the next read to fall through to the DB. Co-authored-by: Giulio Rebuffo <giulio.rebuffo@gmail.com>
Actions fixed: stage-exec-test (from-0, parallel) Co-authored-by: Giulio Rebuffo <giulio.rebuffo@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Title
db/state: invalidate BranchCache entries on Flush to fix stale trie reads
Root Cause
The
BranchCacheintroduced in #21380 (State Cache Consolidation) is anaggregator-scoped commitment-branch cache that lives across batch boundaries.
SharedDomains.Commit()correctly updates the cache after a successful commit,but the
Flush()path — used by theintegration stage_execfrom-0 loop — didnot invalidate or update the cache at all.
In the from-0 execution loop (
cmd/integration/commands/stages.go:802–837):sd.memFlush()writessd.memto the MDBX tx (but does NOT touch BranchCache)ClearRam()clearssd.mem(but NOT the BranchCache)tx.Commit()persists to diskstep 1, bypassing MDBX which has the correct value from step 3
This produced wrong trie roots because the trie computation used stale
intermediate branch nodes instead of the freshly committed ones.
Fix
Flush()now uses the existingflushMemWithCallbackpath to invalidate everycommitment-domain key it writes. Subsequent reads fall through the (now-empty)
cache to MDBX and get the correct value.
The invalidation-only approach (vs. populating the cache like
Commit()does)is deliberately conservative:
Flush()callers own the commit, so we cannotguarantee the tx will actually be committed. Invalidation is safe regardless of
commit outcome.
Verification
TestBranchCacheFlushInvalidatesStaleEntries— new unit test that reproducesthe exact stale-cache scenario. Confirmed: FAILS without the fix (returns
stale
v1), PASSES with the fix (returns correctv2).TestBranchCacheCommitRefreshesAfterReadThrough— existing test, still passes.TestFromZero_GenesisAllocPreservedAfterResetReExec— passes (serial + parallel).db/state/execctxtests pass with-count=3 -race.execution/commitmenttests pass with-count=3 -race.make lintclean,make erigon integrationbuild clean.