Skip to content

Reduce flakiness in TomcatDispatchTest.requestWithException()#18498

Closed
trask wants to merge 8 commits into
open-telemetry:mainfrom
trask:otelbot/flaky-test-remediation-io-opentelemetry-instrumentation-servlet-v5-0-tomcat-TomcatD-20260501220909
Closed

Reduce flakiness in TomcatDispatchTest.requestWithException()#18498
trask wants to merge 8 commits into
open-telemetry:mainfrom
trask:otelbot/flaky-test-remediation-io-opentelemetry-instrumentation-servlet-v5-0-tomcat-TomcatD-20260501220909

Conversation

@trask

@trask trask commented May 1, 2026

Copy link
Copy Markdown
Member

Automated attempt at fixing flakiness in io.opentelemetry.instrumentation.servlet.v5_0.tomcat.TomcatDispatchTest.requestWithException()[1].

Recent failed/flaky scans

  • 5dnn3twqvjiiq (flaky, :instrumentation:servlet:servlet-5.0:library:test)
  • c76i2hfjhae7m (flaky, :instrumentation:servlet:servlet-5.0:library:test)
  • mcuo2nz6tuiqk (flaky, :instrumentation:servlet:servlet-5.0:library:test)
  • crdj4dy4iov3s (flaky, :instrumentation:servlet:servlet-5.0:library:test)
  • 47k6x73eiz7jo (flaky, :instrumentation:servlet:servlet-5.0:library:test)

Flake history (per UTC day)

Day flaky failed passed
2026-04-28 0 0 80
2026-04-29 57 0 84
2026-04-30 68 0 104
2026-05-01 19 0 46

Sample failure (from Develocity)

java.util.concurrent.CompletionException: io.opentelemetry.testing.internal.armeria.common.ClosedSessionException
> io.opentelemetry.testing.internal.armeria.common.ClosedSessionException: (No message provided)

Copilot diagnosis

Root cause

The flaky failures all occurred in TomcatDispatchTest.requestWithException()[1], which is the DISPATCH_ASYNC invocation, and the sampled build scans consistently report CompletionException: ClosedSessionException from the Armeria test client. The async exception servlet intentionally writes a 500 response and then throws so instrumentation can observe the exception, but the response did not include a Content-Length. On Tomcat this left the client dependent on connection-close timing to know the response was complete, so the intentional server-side exception could race with response aggregation and surface as a closed session instead of a valid 500 response.

Fix

  • Added resp.setContentLength(endpoint.getBody().length()) before writing the async exception response body in the servlet 5 test fixture.
  • Kept the existing Tomcat-specific writer close and the intentional IllegalStateException, preserving the exception-path coverage and span assertions.

Why this addresses the root cause

The response is now self-delimiting before the servlet throws, so the client can aggregate the complete 500 response without relying on Tomcat closing the connection cleanly. The exception is still thrown after the response is committed, so the test continues to verify exception handling instrumentation rather than converting the endpoint into a normal error response.

Risks / follow-ups

  • If Tomcat changes how it handles committed async responses that later throw, this may expose a real status/body assertion failure instead of a transport-level closed-session failure.
  • Maintainers may want to watch the analogous servlet 3 async exception fixture, which has the same response-framing pattern but was not the failing container in the analyzed window.

Review the diagnosis and the diff carefully before merging - automated fixes can mask flakiness instead of addressing the root cause.

trask added 4 commits May 1, 2026 15:13
…cat.TomcatDispatchTest.requestWithException()[1]

Automated fix attempt based on Develocity flaky-test analysis.
Comment on lines +65 to +69
if (!testLatestDeps()) {
// Older Tomcat versions may close the connection before sending an async
// response when the servlet throws after writing the response body.
writer.close();
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure it's only on old tomcat versions, but I didn't see any failures on latest dep test, so 🤞

@trask trask marked this pull request as ready for review May 2, 2026 16:45
@trask trask requested a review from a team as a code owner May 2, 2026 16:45
@trask trask changed the title Reduce flakiness in io.opentelemetry.instrumentation.servlet.v5_0.tomcat.TomcatDispatchTest.requestWithException()[1] Reduce flakiness in TomcatDispatchTest.requestWithException() May 2, 2026
@github-actions github-actions Bot mentioned this pull request May 5, 2026
@trask trask requested a review from Copilot May 9, 2026 20:17

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Reduces flaky failures in the servlet 5.0 Tomcat dispatch async exception test path by making the test-servlet behavior more deterministic across Tomcat versions / dependency sets.

Changes:

  • Adjusts async exception response handling to better control response completion timing (notably via Content-Length on older Tomcat in the servlet-5 fixture).
  • Makes the “close writer to force response send before throwing” workaround conditional based on testLatestDeps (to avoid relying on it for newer dependency runs).
  • Refactors Tomcat detection in the servlet-5 fixture into a small helper (isOldTomcat).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
instrumentation/tomcat/tomcat-7.0/javaagent/src/test/java/io/opentelemetry/javaagent/instrumentation/tomcat/v7_0/AsyncServlet.java Gates writer-closing workaround based on testLatestDeps for async exception responses.
instrumentation/tomcat/tomcat-10.0/javaagent/src/test/java/io/opentelemetry/javaagent/instrumentation/tomcat/v10_0/AsyncServlet.java Same conditional writer-closing change for the Tomcat 10 async exception servlet fixture.
instrumentation/servlet/servlet-5.0/testing/src/main/java/io/opentelemetry/javaagent/instrumentation/servlet/v5_0/TestServlet5.java Adds Content-Length + conditional writer close for old Tomcat in async exception handling; introduces isOldTomcat.
instrumentation/servlet/servlet-3.0/testing/src/main/java/io/opentelemetry/instrumentation/servlet/v3_0/TestServlet3.java Makes the Tomcat writer-close workaround conditional on !testLatestDeps.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

// handle the basic write-then-throw pattern fine, so this workaround is
// scoped to old Tomcat only: set Content-Length so the response is
// self-delimiting, and close the writer to force the flush before the throw.
if (req.getClass().getName().contains("catalina") && !testLatestDeps()) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I wouldn't bother with the latestDep check and apply to all tomcat versions. To me the content length feels arbitrary. I debugged this a bit and turns out that close doesn't really flush the changes to network. See https://github.com/apache/tomcat/blob/b4484fee2e939cf90529805339567e754a9d64f7/java/org/apache/catalina/connector/OutputBuffer.java#L283 Perhaps that is the issue? If so then I'd try adding resp.flushBuffer(); after writer.close();. Note that writer.close(); sets the content length.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trask

trask commented Jun 3, 2026

Copy link
Copy Markdown
Member Author

Closing in favor of #18804

@trask trask closed this Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants