Skip to content

[SharovBot] fix race in DecryptedTxnsPool.Wait causing flaky TestShutterBlockBuilding#21899

Closed
erigon-copilot[bot] wants to merge 1 commit into
mainfrom
fix-shutter-decryption-wait-race
Closed

[SharovBot] fix race in DecryptedTxnsPool.Wait causing flaky TestShutterBlockBuilding#21899
erigon-copilot[bot] wants to merge 1 commit into
mainfrom
fix-shutter-decryption-wait-race

Conversation

@erigon-copilot

Copy link
Copy Markdown
Contributor

Summary

  • Fix a race condition in DecryptedTxnsPool.Wait where decryption keys arriving at the deadline boundary were incorrectly reported as missing, causing ProvideTxns to fall back to the base provider and produce blocks without shutter transactions.
  • The root cause: Wait returned ctx.Err() unconditionally — when both the done channel (mark found by goroutine) and ctx.Done() (deadline expired) fired simultaneously, Go's select picked one at random, often returning DeadlineExceeded even though the mark was in the pool.
  • Fix tracks whether the mark was actually found, waits for the goroutine to finish in the ctx.Done() case, and returns nil when the mark is present regardless of context state.

Test plan

  • TestShutterBlockBuilding passes 10 consecutive runs without failure
  • No *_test.go files modified
  • go build ./... passes with no errors

…terBlockBuilding

Wait returned ctx.Err() unconditionally, so when both the done channel
(mark found) and ctx.Done() (deadline expired) fired simultaneously,
Go's select picked one at random — often returning DeadlineExceeded even
though the decryption mark was already in the pool. ProvideTxns then fell
back to the base provider, producing a block without shutter transactions.

Track whether the mark was actually found, wait for the goroutine to
finish in the ctx.Done() case, and return nil when the mark is present
regardless of context state.

Co-authored-by: Giulio Rebuffo <giulio.rebuffo@gmail.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a race in txnprovider/shutter where DecryptedTxnsPool.Wait could return context.DeadlineExceeded even when the requested decryption mark became available at the deadline boundary, causing shutter block building to incorrectly fall back to the base txn provider.

Changes:

  • Track whether the decryption mark was actually found before returning from Wait.
  • Ensure the waiting goroutine finishes in the ctx.Done() path so the result is deterministic when deadline and completion happen concurrently.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 67 to +71
case <-ctx.Done():
// note the below will wake up all waiters prematurely, but thanks to the for loop condition
// in the waiting goroutine the ones that still need to wait will go back to sleep
p.decryptionCond.Broadcast()
<-done
@taratorio

Copy link
Copy Markdown
Member

Closing since there is no link to a failing CI run in the PR. We should not be reviewing changes blindly.

@taratorio taratorio closed this Jun 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants