Skip to content

[fix][test] Fix flaky RateLimiterTest.testDispatchRate#25846

Merged
nodece merged 1 commit into
apache:masterfrom
lhotari:lh-fix-rate-limiter-test-flake
May 21, 2026
Merged

[fix][test] Fix flaky RateLimiterTest.testDispatchRate#25846
nodece merged 1 commit into
apache:masterfrom
lhotari:lh-fix-rate-limiter-test-flake

Conversation

@lhotari
Copy link
Copy Markdown
Member

@lhotari lhotari commented May 20, 2026

Motivation

RateLimiterTest.testDispatchRate is flaky in CI. Recent example: PR #25844 run 26175306430 job 77005756965.

Gradle suite > Gradle test > org.apache.pulsar.common.util.RateLimiterTest > testDispatchRate FAILED
    java.lang.AssertionError: expected [true] but found [false]
        at org.apache.pulsar.common.util.RateLimiterTest.testDispatchRate(RateLimiterTest.java:210)

The test invokes tryAcquire(100) three times on a dispatch-rate RateLimiter (which has no upper bound on acquiredPermits, only back-pressure), leaving acquiredPermits = 300. It then sleeps 3 * rateTimeMSec (3 s) and asserts getAvailablePermits() > 0, expecting the scheduled renew task to have run three times and subtracted 3 * permits from acquiredPermits.

The renew is scheduled via ScheduledExecutorService.scheduleAtFixedRate(...) at a 1 s rate. Under CI load, the third tick can fire slightly after the assertion runs, so acquiredPermits is still ≥ 100 and getAvailablePermits() is still 0, failing the assertion.

Modifications

Replace the fixed Thread.sleep(rateTimeMSec) + assertion with Awaitility.await().atMost(10 * rateTimeMSec, MILLISECONDS).until(() -> rate.getAvailablePermits() > 0). This polls for the expected post-condition instead of relying on the scheduler firing on a fixed wall-clock budget. Also drops the intermediate assertEquals(getAvailablePermits(), 0) check after 2 s, since it is the same kind of timing-sensitive snapshot.

Verifying this change

  • Make sure that the change passes the CI checks.

This change is already covered by existing tests, such as the modified testDispatchRate itself, plus the other timing tests in RateLimiterTest.

The test waited a fixed 3 seconds and then asserted that permits had
been renewed, but the renew task is scheduled at fixed rate and can
fire late under CI load, leaving acquiredPermits == permits when the
assertion runs. Replace the fixed sleep with an Awaitility poll so the
test waits until the renew task has actually freed permits.
@nodece nodece merged commit d3aeb55 into apache:master May 21, 2026
42 of 43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants