Skip to content

Commit 0559e64

Browse files
committed
test(keyviz): replace fixed sleep with eventually in TestHotKeysAggregatorRaceFree
CI flake observed on Actions run 26765510693 (PR #902 branch but unrelated to that PR -- the keyviz/ change is on main and surfaced there by chance): --- FAIL: TestHotKeysAggregatorRaceFree (0.11s) hot_keys_test.go:356: Expected value not to be nil. Root cause: same wall-clock racing pattern as the SQS-throttle and Redis-TTL flake series (PR #891, #903). The test: 1. Launches an aggregator goroutine (Step=5ms tick). 2. Spawns 8 workers each calling Observe 500 times. 3. Sleeps 50ms. 4. Asserts both per-route snapshots are non-nil. The aggregator publishes a snapshot only on a tick (or on ctx-done drain). Under -race on a slow CI runner, the Run goroutine itself can be slow to schedule -- the 50ms wait may not contain a single actual tick worth of scheduler time, so hotKeysSnap stays at its initial nil atomic.Pointer load and the assertion fires. Fix: require.Eventually with a 3-second budget + 5ms poll cadence, asserting both snapshots non-nil. The test still pins the load-bearing property (snapshot reads are lock-free; the atomic.Pointer.Load works without contention with the publisher); only the WAIT mechanism changes from "sleep blindly and hope" to "poll until ready or fail". Total time bound is at most 3 s on a slow runner, ~5-50 ms in the common case. Validation: go test ./keyviz/ -run TestHotKeysAggregatorRaceFree -race -count=5 -> ok 1.1s gofmt + golangci-lint -> 0 issues Caller audit: test-only file; no Go callers affected. The flake linage matches the structural fixes in PR #891/#903 (replace fixed-time-window sleep with require.Eventually); future flakes in this category should follow the same pattern.
1 parent c40e0c9 commit 0559e64

1 file changed

Lines changed: 13 additions & 7 deletions

File tree

keyviz/hot_keys_test.go

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -348,13 +348,19 @@ func TestHotKeysAggregatorRaceFree(t *testing.T) {
348348
}(w)
349349
}
350350
wg.Wait()
351-
// Let the aggregator drain + publish a few ticks.
352-
time.Sleep(50 * time.Millisecond)
353-
// Snapshot reads must be lock-free and consistent.
354-
s1 := s.HotKeysSnapshot(1)
355-
s2 := s.HotKeysSnapshot(2)
356-
require.NotNil(t, s1)
357-
require.NotNil(t, s2)
351+
// Wait for the aggregator to drain the queue and publish at least
352+
// one tick-driven snapshot for each route. The previous
353+
// time.Sleep(50ms) raced the aggregator's tick (Step=5ms) on slow
354+
// -race CI runners — the Run goroutine could be slow to schedule
355+
// at all, leaving hotKeysSnap as its initial nil atomic.Pointer
356+
// load by the time the assertions fired. require.Eventually with
357+
// a 3-second budget + 5ms poll cadence rides out any scheduler
358+
// jitter; snapshot reads themselves remain lock-free
359+
// (atomic.Pointer.Load), which is what this test pins. Observed
360+
// failure: Actions run 26765510693.
361+
require.Eventually(t, func() bool {
362+
return s.HotKeysSnapshot(1) != nil && s.HotKeysSnapshot(2) != nil
363+
}, 3*time.Second, 5*time.Millisecond, "aggregator must publish a snapshot for each route within the budget")
358364
cancel()
359365
wgRun.Wait()
360366
}

0 commit comments

Comments
 (0)