Reduce flakiness in TomcatDispatchTest.requestWithException()#18498
Conversation
…cat.TomcatDispatchTest.requestWithException()[1] Automated fix attempt based on Develocity flaky-test analysis.
| if (!testLatestDeps()) { | ||
| // Older Tomcat versions may close the connection before sending an async | ||
| // response when the servlet throws after writing the response body. | ||
| writer.close(); | ||
| } |
There was a problem hiding this comment.
I'm not 100% sure it's only on old tomcat versions, but I didn't see any failures on latest dep test, so 🤞
There was a problem hiding this comment.
Pull request overview
Reduces flaky failures in the servlet 5.0 Tomcat dispatch async exception test path by making the test-servlet behavior more deterministic across Tomcat versions / dependency sets.
Changes:
- Adjusts async exception response handling to better control response completion timing (notably via
Content-Lengthon older Tomcat in the servlet-5 fixture). - Makes the “close writer to force response send before throwing” workaround conditional based on
testLatestDeps(to avoid relying on it for newer dependency runs). - Refactors Tomcat detection in the servlet-5 fixture into a small helper (
isOldTomcat).
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| instrumentation/tomcat/tomcat-7.0/javaagent/src/test/java/io/opentelemetry/javaagent/instrumentation/tomcat/v7_0/AsyncServlet.java | Gates writer-closing workaround based on testLatestDeps for async exception responses. |
| instrumentation/tomcat/tomcat-10.0/javaagent/src/test/java/io/opentelemetry/javaagent/instrumentation/tomcat/v10_0/AsyncServlet.java | Same conditional writer-closing change for the Tomcat 10 async exception servlet fixture. |
| instrumentation/servlet/servlet-5.0/testing/src/main/java/io/opentelemetry/javaagent/instrumentation/servlet/v5_0/TestServlet5.java | Adds Content-Length + conditional writer close for old Tomcat in async exception handling; introduces isOldTomcat. |
| instrumentation/servlet/servlet-3.0/testing/src/main/java/io/opentelemetry/instrumentation/servlet/v3_0/TestServlet3.java | Makes the Tomcat writer-close workaround conditional on !testLatestDeps. |
…try-instrumentation-servlet-v5-0-tomcat-TomcatD-20260501220909
| // handle the basic write-then-throw pattern fine, so this workaround is | ||
| // scoped to old Tomcat only: set Content-Length so the response is | ||
| // self-delimiting, and close the writer to force the flush before the throw. | ||
| if (req.getClass().getName().contains("catalina") && !testLatestDeps()) { |
There was a problem hiding this comment.
I think I wouldn't bother with the latestDep check and apply to all tomcat versions. To me the content length feels arbitrary. I debugged this a bit and turns out that close doesn't really flush the changes to network. See https://github.com/apache/tomcat/blob/b4484fee2e939cf90529805339567e754a9d64f7/java/org/apache/catalina/connector/OutputBuffer.java#L283 Perhaps that is the issue? If so then I'd try adding resp.flushBuffer(); after writer.close();. Note that writer.close(); sets the content length.
There was a problem hiding this comment.
|
Closing in favor of #18804 |
Automated attempt at fixing flakiness in
io.opentelemetry.instrumentation.servlet.v5_0.tomcat.TomcatDispatchTest.requestWithException()[1].instrumentation/servlet/servlet-5.0/library/src/test/java/io/opentelemetry/instrumentation/servlet/v5_0/tomcat/TomcatDispatchTest.javaRecent failed/flaky scans
:instrumentation:servlet:servlet-5.0:library:test):instrumentation:servlet:servlet-5.0:library:test):instrumentation:servlet:servlet-5.0:library:test):instrumentation:servlet:servlet-5.0:library:test):instrumentation:servlet:servlet-5.0:library:test)Flake history (per UTC day)
Sample failure (from Develocity)
Copilot diagnosis
Root cause
The flaky failures all occurred in
TomcatDispatchTest.requestWithException()[1], which is theDISPATCH_ASYNCinvocation, and the sampled build scans consistently reportCompletionException: ClosedSessionExceptionfrom the Armeria test client. The async exception servlet intentionally writes a 500 response and then throws so instrumentation can observe the exception, but the response did not include aContent-Length. On Tomcat this left the client dependent on connection-close timing to know the response was complete, so the intentional server-side exception could race with response aggregation and surface as a closed session instead of a valid 500 response.Fix
resp.setContentLength(endpoint.getBody().length())before writing the async exception response body in the servlet 5 test fixture.IllegalStateException, preserving the exception-path coverage and span assertions.Why this addresses the root cause
The response is now self-delimiting before the servlet throws, so the client can aggregate the complete 500 response without relying on Tomcat closing the connection cleanly. The exception is still thrown after the response is committed, so the test continues to verify exception handling instrumentation rather than converting the endpoint into a normal error response.
Risks / follow-ups
Review the diagnosis and the diff carefully before merging - automated fixes can mask flakiness instead of addressing the root cause.