fix: eliminate data race in SleeperTask context lifecycle#1984
Closed
fix: eliminate data race in SleeperTask context lifecycle#1984
Conversation
Replace StopChan.NewCtx() (which spawns a detached goroutine that calls
cancel() when the stop channel closes) with a simple context backed
directly by the stop channel. This eliminates a data race between the
async cancel() writing to context internals and concurrent reflect-based
reads (e.g. testify mock argument formatting via fmt.Sprintf).
The race manifests in the EVM balance monitor's Worker, where SleeperTask
calls Work(ctx) which spawns goroutines that call BalanceAt(ctx, ...) on
a mock client. When Stop() closes chStop, the CtxCancel goroutine fires
cancel() concurrently with testify's callString() reflecting into the
context via fmt.Sprintf("%#v", ctx).
The new stopChanCtx is a read-only context implementation with no mutable
state — closing the channel is the cancellation signal, and no cancel()
function ever modifies internal fields. Child contexts (WithCancel,
WithTimeout) work correctly as the Done() channel propagates.
|
👋 Fletch153, thanks for creating this pull request! To help reviewers, please consider creating future PRs as drafts first. This allows you to self-review and make any final changes before notifying the team. Once you're ready, you can mark it as "Ready for review" to request feedback. Thanks! |
📊 API Diff Results
|
Adds TestSleeperTask_NoConcurrentContextRace which reproduces the exact race scenario: Work() spawns goroutines that read the context via fmt.Sprintf (simulating testify's mock.callString), while Stop() concurrently triggers context cancellation. With the old NewCtx()/CtxCancel pattern, this test reliably triggers WARNING: DATA RACE. With the stopChanCtx fix, it passes cleanly.
jmank88
reviewed
Apr 16, 2026
Contributor
There was a problem hiding this comment.
A custom context.Context make me nervous. I do see some in the wild now, though most contain an embedded context.Context that they are extending and sometimes don't even override any methods 🤔
Let's wait and see how the other fixes help before trying this.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Defense-in-depth against testify/mock's unsynchronized reflect reads of context arguments. Replaces
SleeperTask's context implementation with a read-onlystopChanCtxthat has no mutable internal state.Background
testify's mock library has a fundamentally broken code path: when a mock method is called without a matching expectation,
callString()formats all arguments viafmt.Sprintf("%#v", args...), which walks them withreflect. This is completely unsynchronized — any concurrent mutation of an argument (likecontext.cancel()writing tocancelCtxinternals) causes a data race. An upstream fix was attempted but rejected.What This Changes
SleeperTask.workerLoop()previously created its context viaStopChan.NewCtx(), which internally callscontext.WithCancel()+ spawns a detached goroutine that callscancel()when the stop channel closes. Thatcancel()writes tocancelCtxinternal fields (sync/atomic.AddInt32, mutex state, etc.) — racing with any concurrentfmt.Sprintf("%#v", ctx)from testify's error path.The new
stopChanCtxis a read-onlycontext.Contextimplementation backed directly by the stop channel:Done()returns the stop channel — closing it signals cancellationErr()checks if the channel is closedcancel()function, no detached goroutine, no mutable internal statecontext.WithCancel,context.WithTimeout) propagate correctly via Go's standardDone()mechanismSince there's nothing to mutate, testify can
fmt.Sprintf("%#v")this context all day without racing.Test Evidence
TestSleeperTask_NoConcurrentContextRacereproduces the exact scenario:Work()goroutines read the context viafmt.SprintfwhileStop()concurrently closes the stop channel.Old code (NewCtx/CtxCancel):
WARNING: DATA RACE—reflect.typedmemmovevssync/atomic.AddInt32New code (stopChanCtx): 10 runs × all tests, zero races
Related