Add exception handler on HTTP/2 parent channel to suppress WARN logs#48687
Add exception handler on HTTP/2 parent channel to suppress WARN logs#48687
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses noisy Netty WARN logs that occur on HTTP/2 parent (TCP) connections when idle connections are reset, by installing a parent-channel exception handler to consume those exceptions at DEBUG level and close the channel (matching HTTP/1.1 behavior).
Changes:
- Install an HTTP/2 parent-channel exception handler during
doOnConnectedby accessingconnection.channel().parent(). - Add
Http2ParentChannelExceptionHandlerto log parent-channel exceptions at DEBUG and close the channel. - Document the fix in
azure-cosmosCHANGELOG.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/http/ReactorNettyClient.java | Adds logic to install a parent (TCP) channel exception handler for HTTP/2 to suppress TailContext WARN logs. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/http/Http2ParentChannelExceptionHandler.java | Introduces a Netty inbound handler that consumes/logs exceptions at DEBUG and closes the parent channel. |
| sdk/cosmos/azure-cosmos/CHANGELOG.md | Adds a “Bugs Fixed” entry describing the WARN suppression for HTTP/2 parent-channel exceptions. |
sdk/cosmos/azure-cosmos/CHANGELOG.md
Outdated
|
|
||
| #### Bugs Fixed | ||
| Fixing an NPE caused due to boxed Boolean conversion. - See [PR 48656](https://github.com/Azure/azure-sdk-for-java/pull/48656/) | ||
| * Fixed Netty WARN log "An exceptionCaught() event was fired, and it reached at the tail of the pipeline" appearing on HTTP/2 connections when the server resets idle TCP connections. Added an exception handler on the HTTP/2 parent channel to consume connection-level exceptions at DEBUG level, matching HTTP/1.1 behavior. - See [PR TBD](https://github.com/Azure/azure-sdk-for-java/pull/TBD) |
There was a problem hiding this comment.
CHANGELOG entry still contains a placeholder PR reference ("PR TBD" and a /pull/TBD link). This will be a broken link in released notes; please replace with the actual PR number (or remove the link if not available).
| * Fixed Netty WARN log "An exceptionCaught() event was fired, and it reached at the tail of the pipeline" appearing on HTTP/2 connections when the server resets idle TCP connections. Added an exception handler on the HTTP/2 parent channel to consume connection-level exceptions at DEBUG level, matching HTTP/1.1 behavior. - See [PR TBD](https://github.com/Azure/azure-sdk-for-java/pull/TBD) | |
| * Fixed Netty WARN log "An exceptionCaught() event was fired, and it reached at the tail of the pipeline" appearing on HTTP/2 connections when the server resets idle TCP connections. Added an exception handler on the HTTP/2 parent channel to consume connection-level exceptions at DEBUG level, matching HTTP/1.1 behavior. |
sdk/cosmos/azure-cosmos/CHANGELOG.md
Outdated
| @@ -8,6 +8,7 @@ | |||
|
|
|||
| #### Bugs Fixed | |||
| Fixing an NPE caused due to boxed Boolean conversion. - See [PR 48656](https://github.com/Azure/azure-sdk-for-java/pull/48656/) | |||
There was a problem hiding this comment.
In the Unreleased 'Bugs Fixed' section, the first entry is not formatted as a bullet while the newly added entry is. Please make the formatting consistent (typically all entries in these sections are bulleted).
| Fixing an NPE caused due to boxed Boolean conversion. - See [PR 48656](https://github.com/Azure/azure-sdk-for-java/pull/48656/) | |
| * Fixing an NPE caused due to boxed Boolean conversion. - See [PR 48656](https://github.com/Azure/azure-sdk-for-java/pull/48656/) |
| Channel parent = connection.channel().parent(); | ||
| if (parent != null | ||
| && parent.pipeline().get(Http2ParentChannelExceptionHandler.HANDLER_NAME) == null) { | ||
|
|
||
| parent.pipeline().addLast( | ||
| Http2ParentChannelExceptionHandler.HANDLER_NAME, | ||
| new Http2ParentChannelExceptionHandler()); |
There was a problem hiding this comment.
The handler installation on the shared HTTP/2 parent channel uses a non-atomic check-then-add (pipeline().get(name) == null then addLast). If multiple stream channels attempt this concurrently, addLast can throw due to a duplicate handler name. Consider making installation robust (e.g., run the add on the parent channel's event loop and/or catch the duplicate-name exception and ignore it) so new stream creation can’t fail because of a race.
| // Install exception handler on the HTTP/2 parent (TCP) channel. | ||
| // In H2, doOnConnected fires for stream (child) channels — channel.parent() | ||
| // is the TCP connection. The parent pipeline has no ChannelOperationsHandler | ||
| // (unlike H1.1), so TCP-level exceptions (RST, broken pipe) propagate to | ||
| // Netty's TailContext and get logged as WARN. This handler matches H1.1 | ||
| // behavior by consuming exceptions at DEBUG level. | ||
| Channel parent = connection.channel().parent(); | ||
| if (parent != null | ||
| && parent.pipeline().get(Http2ParentChannelExceptionHandler.HANDLER_NAME) == null) { | ||
|
|
||
| parent.pipeline().addLast( | ||
| Http2ParentChannelExceptionHandler.HANDLER_NAME, | ||
| new Http2ParentChannelExceptionHandler()); | ||
| } |
There was a problem hiding this comment.
This change introduces new behavior (consuming parent-channel exceptions and closing the parent connection) without accompanying test coverage. There are existing Netty/transport tests in azure-cosmos-tests (e.g., ones that use EmbeddedChannel); please add a unit/integration test that asserts the handler is installed on the H2 parent pipeline and that an exception on the parent is consumed (no TailContext WARN) and results in the parent channel closing.
| public void exceptionCaught(ChannelHandlerContext ctx, Throwable cause) { | ||
| if (logger.isDebugEnabled()) { | ||
| logger.debug("Exception on HTTP/2 parent connection [id:{}]: {}", | ||
| ctx.channel().id().asShortText(), cause.getMessage(), cause); |
There was a problem hiding this comment.
exceptionCaught currently logs only cause.getMessage() in the formatted message. For many Netty exceptions the message can be null/empty, reducing diagnostics. Consider logging the exception class (e.g., cause.toString()) in the formatted portion so the DEBUG log remains useful even when getMessage() is null.
| ctx.channel().id().asShortText(), cause.getMessage(), cause); | |
| ctx.channel().id().asShortText(), cause.toString(), cause); |
d4c849b to
0ec7d77
Compare
In HTTP/2, reactor-netty multiplexes streams on a shared parent TCP connection. The parent channel pipeline has no ChannelOperationsHandler (unlike HTTP/1.1), so TCP-level exceptions like Connection reset by peer (ECONNRESET) propagate to Netty's TailContext, which logs them as WARN. This adds Http2ParentChannelExceptionHandler to the parent channel via doOnConnected (accessing channel.parent()). The handler consumes exceptions at DEBUG level WITHOUT closing the channel or altering connection lifecycle, matching HTTP/1.1 logging behavior. Changes: - Handler logs cause.toString() (not getMessage()) for null-safe diagnostics - Defensive try-catch for duplicate handler name on concurrent stream creation - Before/after verified with EmbeddedChannel unit tests Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
0ec7d77 to
e5f9537
Compare
Problem
When HTTP/2 is enabled, customers see Netty WARN logs:
These appear when the server (or Azure load balancer/middlebox) resets idle HTTP/2 TCP connections. The warnings are cosmetic only — the connection pool evicts dead connections transparently — but they trigger monitoring alerts.
Root Cause
In HTTP/2, reactor-netty multiplexes streams on a shared parent TCP connection:
In HTTP/1.1,
ChannelOperationsHandlersits directly on the connection pipeline and catches exceptions. In HTTP/2,ChannelOperationsHandleris only on child stream channels — the parent TCP channel has no exception handler.Fix
Added
Http2ParentChannelExceptionHandlerto the parent channel via the existingdoOnConnectedcallback (accessingchannel.parent()). The handler:RntbdRequestManager.exceptionCaught) and reactor-netty H1.1 behavior!channel.isActive()) handle connection cleanup independentlyIllegalArgumentExceptionon concurrent stream creationcause.toString()— null-safe diagnostics sinceNativeIoException.getMessage()can be nullTesting
Three
EmbeddedChannelunit tests prove before/after behavior:withoutHandler_exceptionReachesTailcheckException()throws — exception hit TailContext (WARN)withHandler_exceptionConsumedAndChannelStaysOpencheckException()clean + channel stays openwithHandler_runtimeExceptionAlsoConsumedRuntimeException(likeNativeIoException) also consumedImpact
!channel.isActive())