Fix intermittently failing leader-related tests#862
Merged
Conversation
There's a bunch of clock-based testing happening in the leader tests for
drivers, resulting in intermittent failures like this one [1]:
--- FAIL: TestDriverRiverPgxV5 (23.18s)
driver_test.go:73: Reusing idle schema "river_2025_04_26t17_52_50_schema_07" after cleaning in 15.347655ms [7 generated] [6 reused]
--- FAIL: TestDriverRiverPgxV5/LeaderAttemptElect (0.00s)
--- FAIL: TestDriverRiverPgxV5/LeaderAttemptElect/ElectsLeader (0.22s)
riverdrivertest.go:2244: TestTx using schema: river_2025_04_26t17_52_50_schema_06
riverdrivertest.go:2258:
Error Trace: /home/runner/work/river/river/internal/riverinternaltest/riverdrivertest/riverdrivertest.go:2258
Error: Max difference between 2025-04-26 17:53:14.813635844 +0000 UTC m=+24.487676847 and 2025-04-26 17:53:14.59636 +0000 UTC allowed is 100ms, but difference was 217.275844ms
Test: TestDriverRiverPgxV5/LeaderAttemptElect/ElectsLeader
FAIL
FAIL github.com/riverqueue/river 31.602s
As with all these other clock-based failures, the problem is that in
slower environments like GitHub Actions, goroutines may be paused for
extended periods, so by the time they wake up `time.Now()` might be
significantly different, and easily off by a few hundred milliseconds.
Here, make the clock for these functions injectable so we can inject our
own time and get to better test reliability. I'm not necessarily against
other approaches either, but I'm still getting reasonable frequent
failures on these, so I'm definitely in favor of getting some sort of
fix into place.
A side benefit is that it makes clock injection from the client-level a
little more thorough as well.
[1] https://github.com/riverqueue/river/actions/runs/14683687209/job/41209541920
brandur
commented
Apr 26, 2025
|
|
||
| const leaderTTL = 10 * time.Second | ||
|
|
||
| t.Run("LeaderDeleteExpired", func(t *testing.T) { |
Contributor
Author
There was a problem hiding this comment.
I just moved this set of tests down so it's in alphabetical order and easier to find.
bgentry
approved these changes
Apr 27, 2025
Contributor
bgentry
left a comment
There was a problem hiding this comment.
Good find and fix, thanks 🚀
Contributor
Author
|
ty. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
There's a bunch of clock-based testing happening in the leader tests for
drivers, resulting in intermittent failures like this one [1]:
As with all these other clock-based failures, the problem is that in
slower environments like GitHub Actions, goroutines may be paused for
extended periods, so by the time they wake up
time.Now()might besignificantly different, and easily off by a few hundred milliseconds.
Here, make the clock for these functions injectable so we can inject our
own time and get to better test reliability. I'm not necessarily against
other approaches either, but I'm still getting reasonable frequent
failures on these, so I'm definitely in favor of getting some sort of
fix into place.
A side benefit is that it makes clock injection from the client-level a
little more thorough as well.
[1] https://github.com/riverqueue/river/actions/runs/14683687209/job/41209541920