Skip to content

Add exception handler on HTTP/2 parent channel to suppress WARN logs#48687

Draft
jeet1995 wants to merge 1 commit intomainfrom
AzCosmos_Http2ParentChannelExceptionHandler
Draft

Add exception handler on HTTP/2 parent channel to suppress WARN logs#48687
jeet1995 wants to merge 1 commit intomainfrom
AzCosmos_Http2ParentChannelExceptionHandler

Conversation

@jeet1995
Copy link
Copy Markdown
Member

@jeet1995 jeet1995 commented Apr 3, 2026

Problem

When HTTP/2 is enabled, customers see Netty WARN logs:

An exceptionCaught() event was fired, and it reached at the tail of the pipeline.
It usually means the last handler in the pipeline did not handle the exception.
io.netty.channel.unix.Errors$NativeIoException: recvAddress(..) failed with error(-104): Connection reset by peer

These appear when the server (or Azure load balancer/middlebox) resets idle HTTP/2 TCP connections. The warnings are cosmetic only — the connection pool evicts dead connections transparently — but they trigger monitoring alerts.

Root Cause

In HTTP/2, reactor-netty multiplexes streams on a shared parent TCP connection:

Parent TCP Channel pipeline (managed by reactor-netty):
  SslHandler -> Http2FrameCodec -> Http2MultiplexHandler -> [TAIL]
  ^ TCP RST fires exceptionCaught here -> no handler -> WARN log

Child Stream Channel pipeline (per-request):
  ... -> reactor.left.httpCodec -> HttpClientOperations -> ...
  ^ Request-level exceptions handled here (matching HTTP/1.1)

In HTTP/1.1, ChannelOperationsHandler sits directly on the connection pipeline and catches exceptions. In HTTP/2, ChannelOperationsHandler is only on child stream channels — the parent TCP channel has no exception handler.

Fix

Added Http2ParentChannelExceptionHandler to the parent channel via the existing doOnConnected callback (accessing channel.parent()). The handler:

  1. Logs at DEBUG — consistent with RNTBD direct path (RntbdRequestManager.exceptionCaught) and reactor-netty H1.1 behavior
  2. Does NOT close the channel — no lifecycle changes; reactor-netty and the pool eviction predicate (!channel.isActive()) handle connection cleanup independently
  3. Installs once per parent — guards against duplicate installation with check + defensive try-catch for IllegalArgumentException on concurrent stream creation
  4. Uses cause.toString() — null-safe diagnostics since NativeIoException.getMessage() can be null

Testing

Three EmbeddedChannel unit tests prove before/after behavior:

Test What it proves
withoutHandler_exceptionReachesTail BEFORE: checkException() throws — exception hit TailContext (WARN)
withHandler_exceptionConsumedAndChannelStaysOpen AFTER: checkException() clean + channel stays open
withHandler_runtimeExceptionAlsoConsumed RuntimeException (like NativeIoException) also consumed

Impact

  • Zero operational change — dead connections are already evicted by the pool (!channel.isActive())
  • Zero lifecycle change — handler does not close the channel or alter connection management
  • Suppresses noisy WARNs — prevents monitoring alert fatigue
  • Matches HTTP/1.1 parity — same exception logging behavior across protocols

@jeet1995 jeet1995 requested review from a team and kirankumarkolli as code owners April 3, 2026 20:36
Copilot AI review requested due to automatic review settings April 3, 2026 20:36
@github-actions github-actions bot added the Cosmos label Apr 3, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses noisy Netty WARN logs that occur on HTTP/2 parent (TCP) connections when idle connections are reset, by installing a parent-channel exception handler to consume those exceptions at DEBUG level and close the channel (matching HTTP/1.1 behavior).

Changes:

  • Install an HTTP/2 parent-channel exception handler during doOnConnected by accessing connection.channel().parent().
  • Add Http2ParentChannelExceptionHandler to log parent-channel exceptions at DEBUG and close the channel.
  • Document the fix in azure-cosmos CHANGELOG.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/http/ReactorNettyClient.java Adds logic to install a parent (TCP) channel exception handler for HTTP/2 to suppress TailContext WARN logs.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/http/Http2ParentChannelExceptionHandler.java Introduces a Netty inbound handler that consumes/logs exceptions at DEBUG and closes the parent channel.
sdk/cosmos/azure-cosmos/CHANGELOG.md Adds a “Bugs Fixed” entry describing the WARN suppression for HTTP/2 parent-channel exceptions.


#### Bugs Fixed
Fixing an NPE caused due to boxed Boolean conversion. - See [PR 48656](https://github.com/Azure/azure-sdk-for-java/pull/48656/)
* Fixed Netty WARN log "An exceptionCaught() event was fired, and it reached at the tail of the pipeline" appearing on HTTP/2 connections when the server resets idle TCP connections. Added an exception handler on the HTTP/2 parent channel to consume connection-level exceptions at DEBUG level, matching HTTP/1.1 behavior. - See [PR TBD](https://github.com/Azure/azure-sdk-for-java/pull/TBD)
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CHANGELOG entry still contains a placeholder PR reference ("PR TBD" and a /pull/TBD link). This will be a broken link in released notes; please replace with the actual PR number (or remove the link if not available).

Suggested change
* Fixed Netty WARN log "An exceptionCaught() event was fired, and it reached at the tail of the pipeline" appearing on HTTP/2 connections when the server resets idle TCP connections. Added an exception handler on the HTTP/2 parent channel to consume connection-level exceptions at DEBUG level, matching HTTP/1.1 behavior. - See [PR TBD](https://github.com/Azure/azure-sdk-for-java/pull/TBD)
* Fixed Netty WARN log "An exceptionCaught() event was fired, and it reached at the tail of the pipeline" appearing on HTTP/2 connections when the server resets idle TCP connections. Added an exception handler on the HTTP/2 parent channel to consume connection-level exceptions at DEBUG level, matching HTTP/1.1 behavior.

Copilot uses AI. Check for mistakes.
@@ -8,6 +8,7 @@

#### Bugs Fixed
Fixing an NPE caused due to boxed Boolean conversion. - See [PR 48656](https://github.com/Azure/azure-sdk-for-java/pull/48656/)
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Unreleased 'Bugs Fixed' section, the first entry is not formatted as a bullet while the newly added entry is. Please make the formatting consistent (typically all entries in these sections are bulleted).

Suggested change
Fixing an NPE caused due to boxed Boolean conversion. - See [PR 48656](https://github.com/Azure/azure-sdk-for-java/pull/48656/)
* Fixing an NPE caused due to boxed Boolean conversion. - See [PR 48656](https://github.com/Azure/azure-sdk-for-java/pull/48656/)

Copilot uses AI. Check for mistakes.
Comment on lines +174 to +180
Channel parent = connection.channel().parent();
if (parent != null
&& parent.pipeline().get(Http2ParentChannelExceptionHandler.HANDLER_NAME) == null) {

parent.pipeline().addLast(
Http2ParentChannelExceptionHandler.HANDLER_NAME,
new Http2ParentChannelExceptionHandler());
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The handler installation on the shared HTTP/2 parent channel uses a non-atomic check-then-add (pipeline().get(name) == null then addLast). If multiple stream channels attempt this concurrently, addLast can throw due to a duplicate handler name. Consider making installation robust (e.g., run the add on the parent channel's event loop and/or catch the duplicate-name exception and ignore it) so new stream creation can’t fail because of a race.

Copilot uses AI. Check for mistakes.
Comment on lines +168 to +181
// Install exception handler on the HTTP/2 parent (TCP) channel.
// In H2, doOnConnected fires for stream (child) channels — channel.parent()
// is the TCP connection. The parent pipeline has no ChannelOperationsHandler
// (unlike H1.1), so TCP-level exceptions (RST, broken pipe) propagate to
// Netty's TailContext and get logged as WARN. This handler matches H1.1
// behavior by consuming exceptions at DEBUG level.
Channel parent = connection.channel().parent();
if (parent != null
&& parent.pipeline().get(Http2ParentChannelExceptionHandler.HANDLER_NAME) == null) {

parent.pipeline().addLast(
Http2ParentChannelExceptionHandler.HANDLER_NAME,
new Http2ParentChannelExceptionHandler());
}
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change introduces new behavior (consuming parent-channel exceptions and closing the parent connection) without accompanying test coverage. There are existing Netty/transport tests in azure-cosmos-tests (e.g., ones that use EmbeddedChannel); please add a unit/integration test that asserts the handler is installed on the H2 parent pipeline and that an exception on the parent is consumed (no TailContext WARN) and results in the parent channel closing.

Copilot generated this review using guidance from repository custom instructions.
public void exceptionCaught(ChannelHandlerContext ctx, Throwable cause) {
if (logger.isDebugEnabled()) {
logger.debug("Exception on HTTP/2 parent connection [id:{}]: {}",
ctx.channel().id().asShortText(), cause.getMessage(), cause);
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exceptionCaught currently logs only cause.getMessage() in the formatted message. For many Netty exceptions the message can be null/empty, reducing diagnostics. Consider logging the exception class (e.g., cause.toString()) in the formatted portion so the DEBUG log remains useful even when getMessage() is null.

Suggested change
ctx.channel().id().asShortText(), cause.getMessage(), cause);
ctx.channel().id().asShortText(), cause.toString(), cause);

Copilot uses AI. Check for mistakes.
@jeet1995 jeet1995 marked this pull request as draft April 3, 2026 20:42
@jeet1995 jeet1995 force-pushed the AzCosmos_Http2ParentChannelExceptionHandler branch from d4c849b to 0ec7d77 Compare April 3, 2026 21:08
In HTTP/2, reactor-netty multiplexes streams on a shared parent TCP connection.
The parent channel pipeline has no ChannelOperationsHandler (unlike HTTP/1.1),
so TCP-level exceptions like Connection reset by peer (ECONNRESET) propagate to
Netty's TailContext, which logs them as WARN.

This adds Http2ParentChannelExceptionHandler to the parent channel via
doOnConnected (accessing channel.parent()). The handler consumes exceptions
at DEBUG level WITHOUT closing the channel or altering connection lifecycle,
matching HTTP/1.1 logging behavior.

Changes:
- Handler logs cause.toString() (not getMessage()) for null-safe diagnostics
- Defensive try-catch for duplicate handler name on concurrent stream creation
- Before/after verified with EmbeddedChannel unit tests

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jeet1995 jeet1995 force-pushed the AzCosmos_Http2ParentChannelExceptionHandler branch from 0ec7d77 to e5f9537 Compare April 3, 2026 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants