Skip to content

Reduce flakiness in TomcatServlet5Test.requestWithException()#18486

Merged
laurit merged 1 commit into
open-telemetry:mainfrom
trask:otelbot/flaky-fix-io-opentelemetry-instrumentation-servlet-v5-0-tomcat-TomcatS-20260501143407
May 1, 2026
Merged

Reduce flakiness in TomcatServlet5Test.requestWithException()#18486
laurit merged 1 commit into
open-telemetry:mainfrom
trask:otelbot/flaky-fix-io-opentelemetry-instrumentation-servlet-v5-0-tomcat-TomcatS-20260501143407

Conversation

@trask

@trask trask commented May 1, 2026

Copy link
Copy Markdown
Member

Automated attempt at fixing flakiness in io.opentelemetry.instrumentation.servlet.v5_0.tomcat.TomcatServlet5Test.requestWithException()[2].

Recent failed/flaky scans

  • anvud3rfqmlsa (flaky, :instrumentation:servlet:servlet-5.0:library:test)
  • e55s5qcsl44us (flaky, :instrumentation:servlet:servlet-5.0:library:test)
  • peei4r4hswlfm (flaky, :instrumentation:servlet:servlet-5.0:library:test)
  • e6y5tqp6qmix6 (flaky, :instrumentation:servlet:servlet-5.0:library:test)
  • xwo3bxevfasbg (flaky, :instrumentation:servlet:servlet-5.0:library:test)

Flake history (per UTC day)

Day flaky failed passed
2026-04-28 0 0 80
2026-04-29 99 0 42
2026-04-30 114 0 58
2026-05-01 16 0 11

Sample failure (from Develocity)

java.util.concurrent.CompletionException: io.opentelemetry.testing.internal.armeria.common.ClosedSessionException
> io.opentelemetry.testing.internal.armeria.common.ClosedSessionException: (No message provided)

Copilot diagnosis

Flaky Test Fix Diagnosis

Root cause

The FakeAsync servlet's EXCEPTION endpoint handler was missing a critical mitigation for Tomcat's connection reset behavior. When the servlet throws an exception after writing a response, Tomcat can reset the connection before the client fully receives the response body, causing the Armeria HTTP client to fail with ClosedSessionException during the aggregate().join() call.

Fix

  • Added explicit PrintWriter.close() call before throwing the exception in TestServlet5.FakeAsync.EXCEPTION handler
  • Applied the same Tomcat-specific mitigation that already existed in the Async.EXCEPTION handler (lines 159-164)
  • The fix uses req.getClass().getName().contains("catalina") to detect Tomcat and close the writer only on that server, ensuring the response is fully flushed before the exception is thrown

Why this addresses the root cause

The explicit close() call forces Tomcat to flush and commit the response to the network before the exception propagates to the servlet container. Without this, there's a race condition where Tomcat's error handling (triggered by the exception) can reset the connection while the client is still reading the response stream, resulting in ClosedSessionException at the client side. The Async servlet already had this mitigation and was not flaky; this fix brings FakeAsync to parity.

Risks / follow-ups

  • The fix is defensive and should not mask legitimate issues since it only affects test code behavior, not production instrumentation
  • If legitimate connection timeouts were being masked, they would now surface as test failures with different symptoms (timeouts rather than ClosedSessionException)
  • The Tomcat-specific check using class name reflection is a heuristic; if test environments change server implementations mid-test or use custom wrappers, this detection might not trigger. However, this same pattern has been stable in the Async servlet since it was introduced

Review the diagnosis and the diff carefully before merging - automated fixes can mask flakiness instead of addressing the root cause.

…cat.TomcatServlet5Test.requestWithException()[2]

Automated fix attempt based on Develocity flaky-test analysis.
@trask trask requested a review from a team as a code owner May 1, 2026 14:38
@trask trask marked this pull request as draft May 1, 2026 14:49
@trask trask marked this pull request as draft May 1, 2026 14:49
Comment on lines 157 to 164
PrintWriter writer = resp.getWriter();
writer.print(endpoint.getBody());
if (req.getClass().getName().contains("catalina")) {
// on tomcat close the writer to ensure response is sent immediately,
// otherwise there is a chance that tomcat resets the connection before the
// response is sent
writer.close();
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same fix already exists here

@trask trask marked this pull request as ready for review May 1, 2026 14:52
@trask trask changed the title Reduce flakiness in io.opentelemetry.instrumentation.servlet.v5_0.tomcat.TomcatServlet5Test.requestWithException()[2] Reduce flakiness in TomcatServlet5Test.requestWithException May 1, 2026
@trask trask changed the title Reduce flakiness in TomcatServlet5Test.requestWithException Reduce flakiness in TomcatServlet5Test.requestWithException() May 1, 2026
@laurit laurit merged commit f4ac638 into open-telemetry:main May 1, 2026
92 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants