Skip to content

fix: Try to fix flakes in LookupCoordinatorManagerTest#19309

Open
capistrant wants to merge 1 commit intoapache:masterfrom
capistrant:LookupCooridnatorManagerTest-flakes
Open

fix: Try to fix flakes in LookupCoordinatorManagerTest#19309
capistrant wants to merge 1 commit intoapache:masterfrom
capistrant:LookupCooridnatorManagerTest-flakes

Conversation

@capistrant
Copy link
Copy Markdown
Contributor

Description

Noticed LookupCoordinatorManagerTest as being pretty flakey when looking at CI runs and then running locally on loops. About every ~200 iterations I would get a test fail on the assertion in the teardown. With the proposed change I was able to get multiple runs above 3k iterations without a failure and gave up on waiting.

Analysis and solution notes

  • LookupCoordinatorManagerTest has 17 tests that call manager.start() (which spawns a ScheduledExecutorService running lookupManagementLoop() in the background) but never call manager.stop(). When the background loop fires after EasyMock expectations are exhausted, it throws, which gets caught and emitted as an alert via EmittingLogger. The @after teardown then fails its assertEquals(0, SERVICE_EMITTER.getNumEmittedEvents()) assertion.
  • Wraps post-start() logic in each affected test with try/finally blocks that call manager.stop() and manager.waitForBackgroundTermination().

The per-test try/finally pattern was chosen over a centralized teardown approach (e.g. a class-level field stopped in @after) because each test constructs its manager differently (varying constructors, overridden methods, and config) making a shared field seem maybe awkward. try/finally keeps the lifecycle visible right where start() is called.


Key changed/added classes in this PR
  • LookupCoordinatorManagerTest

This PR has:

  • been self-reviewed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant