Skip to content

Fix flaky jaeger test by fully closing managed channel to isolate tests#8322

Merged
jack-berg merged 3 commits intoopen-telemetry:mainfrom
jack-berg:jaeger-test-flake
Apr 23, 2026
Merged

Fix flaky jaeger test by fully closing managed channel to isolate tests#8322
jack-berg merged 3 commits intoopen-telemetry:mainfrom
jack-berg:jaeger-test-flake

Conversation

@jack-berg
Copy link
Copy Markdown
Member

The JaegerRemoteSamplerGrpcNettyTest.unimplemented_error_server_response test fails 57% of the time: https://develocity.opentelemetry.io/scans/tests?search.timeZoneId=America%2FChicago&tests.container=io.opentelemetry.sdk.extension.trace.jaeger.sampler.JaegerRemoteSamplerGrpcNettyTest

Right now the theory is:

  • Other tests configure tiny polling intervals (1ms), queueing up lots of requests to the shared armeria test server
  • unimplemented_error_server_response starts, queues up a mock UNIMPLEMENTED error resposne, waits for a log to assert the jaeger sampler received and handled it correctly, but...
  • the sampler never receives that error response because queued up requests from the jaeger samplers of other tests get served the response from the queue, leaving the jaeger sampler under test to get the default ok response

Solution is to try to isolate tests better by making sure requests from tests don't leak into each other. Accomplish this by a more thorough managed channel shutdown routine.

@jack-berg jack-berg requested a review from a team as a code owner April 23, 2026 14:43
pollFuture.cancel(true);
pollExecutor.shutdownNow();
grpcSender.shutdown();
grpcSender.shutdown().join(10, TimeUnit.SECONDS);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is consistent with SpanExporter, MetricExporter, LogRecordExporter close implementation.

JaegerREmoteSamplerTest would be prone to the same test flake as JaegerRemoteSamplerGrpcNettyTest if it weren't for a bug in which the server called grpcErrors.peek() instead of poll(). This causes JaegerRemoteSamplerTest to avoid test flakes by ignoring the error queue and just returning the same error over and over.

I fixed the peek / poll but, but need to make sure shutdown is thorough to isolate test cases, else JaegerRemoteSamplerTest will start being flaky.

@jack-berg jack-berg changed the title lFix flaky jaeger test by fully closing managed channel to isolate tests Fix flaky jaeger test by fully closing managed channel to isolate tests Apr 23, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 23, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.27%. Comparing base (f998f3f) to head (1434a41).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##               main    #8322   +/-   ##
=========================================
  Coverage     90.27%   90.27%           
  Complexity     7695     7695           
=========================================
  Files           850      850           
  Lines         23207    23207           
  Branches       2356     2356           
=========================================
  Hits          20951    20951           
  Misses         1531     1531           
  Partials        725      725           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jack-berg jack-berg merged commit 3e41e01 into open-telemetry:main Apr 23, 2026
62 of 66 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants