Skip to content

Commit c905251

Browse files
jeet1995Copilot
andauthored
Gate HTTP/2 PING failure-based connection close exception rewraps to PING enablement flag. (#49403)
* perf: install Http2PingCloseRewrapHandler in doOnConnected, not doOnRequest The previous .doOnRequest() chained onto the HTTP/2 HttpClient builder wrapped every outgoing request's Mono in a Reactor operator. That paid per-request costs even when the rewrap handler was already installed: * MonoPeekTerminal wrap + per-request subscriber allocation (InternalMonoOperator.subscribe +2.60% in alloc profile) * per-request BiConsumer lambda invocation (Http2Pool\$\...run +1.16% in alloc profile) * per-request channel pipeline String-key walk via pipeline.get(...) Long-run async-profiler diff vs v4.80.0 on D2 attributed the PR's -7.6% QPS / +14.5% mean / +166% tail-max regression primarily to a +10.2% allocation rate -- with the operators above dominating the delta. Move the install into the existing .doOnConnected() block, which already runs once per HTTP/2 child stream (the customHeaderCleaner install right above it relies on the same contract). The rewrap handler is @sharable, so installing once per stream is correct and amortizes the install + pipeline-walk away from the hot path. The ch.parent() != null guard is kept so the install only targets child streams, never the parent TCP channel (which uses Http2ParentChannelExceptionHandler instead). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * perf: gate Http2PingCloseRewrapHandler install on isPingHealthEffectivelyEnabled Follow-up to the doOnConnected install move (previous commit on this PR). The rewrap handler translates Http2MultiplexHandler's per-child-stream channelInactive into a typed Http2PingTimeoutChannelClosedException when the PING sender closes the parent TCP channel after consecutive PING-ACK timeouts. If the PING sender is disabled there is no such signal to translate, so the install adds per-child-stream pipeline-add cost and an extra pipeline hop on every inbound frame for zero behavioral benefit. Empirical motivation: long-run perf matrix on a 2 vCPU D2s_v5 VM (point-read workload, Gateway+H2, ~98% CPU steady state, baseline = v4.80.0): baseline (v4.80.0) : 9010 QPS ping-prefix (PR before doOnConnected fix) : 8301 QPS (-7.9%) ping-postfix (doOnConnected install only) : 8170 QPS (-9.3%) ping-off (postfix + HTTP2_PING_HEALTH=false): 8102 QPS (-10.1%) Toggling the documented kill switch did not recover baseline because the rewrap install was unconditional -- the kill switch only suppressed the PING sender, not the per-child-stream rewrap install. async-profiler CPU diffs ping-off-vs-baseline and ping-off-vs-postfix were essentially identical, confirming the rewrap install dominates the residual cost. Consolidating both PING-sender install and rewrap install under Http2PingHandler.isPingHealthEffectivelyEnabled makes HTTP2_PING_HEALTH_ENABLED=false a true revert-to-baseline. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(cosmos): install HTTP/2 PING rewrap handler via observe(STREAM_CONFIGURED) The rewrap handler install lived inside .doOnConnected(...), which reactor-netty fires only on the H2 parent TCP channel (State.CONFIGURED, parent==null). The install was gated on ch.parent()!=null, so it never ran on the parent and never reached a child stream -- the handler was installed 0 times and PING-timeout parent closes surfaced as bare PrematureCloseException instead of the typed Http2PingTimeoutChannelClosedException. Move the install to .observe((connection, state) -> ...) keyed on HttpClientState.STREAM_CONFIGURED, the per-child-stream lifecycle event. The handler is now added at the head of each H2 child-stream pipeline, where it intercepts channelInactive ahead of HttpClientOperations and rewraps the close into the typed exception so ClientRetryPolicy can suppress region mark-down. Keeps the PING-health gate, ch.parent()!=null defensive check, get(HANDLER_NAME)==null guard, and TOCTOU try/catch. Also fixes stale doc-comments referencing the old install hook. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs(cosmos): tighten Http2PingCloseRewrapHandler class-doc wording The handler is a single @sharable instance installed per H2 child stream, not a per-request object. Reword the class-doc summary from 'Per-request' to 'Per-child-stream' for consistency with the rest of the doc. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add CI unit test for HTTP/2 PING close-rewrap install gate Extract the inline .observe() install decision into a package-private static installHttp2PingCloseRewrapHandlerIfNeeded(...) so the per-child-stream gate is unit-testable without a live HTTP/2 connection. Add a TestNG unit test (group "unit") that drives the real method via an EmbeddedChannel and proves the disablement path (kill-switch off, non-positive PING interval, HTTP/2 disabled, non-STREAM_CONFIGURED state, parent-less channel) is a true revert-to-baseline that installs nothing -- plus positive install, head-of-pipeline position, idempotency, and state-before-predicate short-circuit ordering. Behavior-preserving refactor; no user-facing change. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Update CHANGELOG.md --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 4ff3741 commit c905251

4 files changed

Lines changed: 354 additions & 32 deletions

File tree

Lines changed: 256 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,256 @@
1+
// Copyright (c) Microsoft Corporation. All rights reserved.
2+
// Licensed under the MIT License.
3+
4+
package com.azure.cosmos.implementation.http;
5+
6+
import com.azure.cosmos.Http2ConnectionConfig;
7+
import io.netty.channel.Channel;
8+
import io.netty.channel.embedded.EmbeddedChannel;
9+
import org.testng.annotations.AfterMethod;
10+
import org.testng.annotations.BeforeMethod;
11+
import org.testng.annotations.Test;
12+
import reactor.netty.http.client.HttpClientState;
13+
14+
import static org.assertj.core.api.Assertions.assertThat;
15+
16+
/**
17+
* Unit tests for {@link ReactorNettyClient#installHttp2PingCloseRewrapHandlerIfNeeded} --
18+
* the per-child-stream install gate for {@link Http2PingCloseRewrapHandler}.
19+
* <p>
20+
* In production the handler is installed from reactor-netty's {@code .observe(...)} hook
21+
* only when ALL of the following hold:
22+
* <ul>
23+
* <li>the connection-lifecycle state is {@link HttpClientState#STREAM_CONFIGURED}
24+
* (a child stream was just opened);</li>
25+
* <li>PING-health is effectively enabled -- kill-switch
26+
* {@code COSMOS.HTTP2_PING_HEALTH_ENABLED} on, a positive PING interval, and HTTP/2
27+
* enabled for the client;</li>
28+
* <li>the channel is a child stream ({@code parent() != null}).</li>
29+
* </ul>
30+
* These tests drive the real production method with an {@link EmbeddedChannel} so the
31+
* disablement path -- flipping the kill-switch (or any gate input) off must be a true
32+
* revert-to-baseline that installs nothing on the child pipeline -- is guarded in CI
33+
* without needing a live HTTP/2 server.
34+
*/
35+
public class Http2PingCloseRewrapInstallTest {
36+
37+
private static final String PING_HEALTH_ENABLED = "COSMOS.HTTP2_PING_HEALTH_ENABLED";
38+
private static final String PING_INTERVAL_SECONDS = "COSMOS.HTTP2_PING_INTERVAL_IN_SECONDS";
39+
40+
private String priorPingHealthEnabled;
41+
private String priorPingIntervalSeconds;
42+
43+
@BeforeMethod(groups = "unit")
44+
public void before_Method() {
45+
// Snapshot the two PING system properties the gate reads so each test sets an
46+
// explicit, isolated state and never leaks into sibling tests or CI defaults.
47+
this.priorPingHealthEnabled = System.getProperty(PING_HEALTH_ENABLED);
48+
this.priorPingIntervalSeconds = System.getProperty(PING_INTERVAL_SECONDS);
49+
}
50+
51+
@AfterMethod(groups = "unit", alwaysRun = true)
52+
public void after_Method() {
53+
restore(PING_HEALTH_ENABLED, this.priorPingHealthEnabled);
54+
restore(PING_INTERVAL_SECONDS, this.priorPingIntervalSeconds);
55+
}
56+
57+
@Test(groups = "unit")
58+
public void installsHandler_whenStreamConfiguredAndPingHealthEnabled() {
59+
enablePingHealth();
60+
ChildChannel child = newChildStream();
61+
try {
62+
ReactorNettyClient.installHttp2PingCloseRewrapHandlerIfNeeded(
63+
child, HttpClientState.STREAM_CONFIGURED, http2Enabled());
64+
65+
// Presence AND head-of-pipeline position: the rewrap handler must see
66+
// channelInactive before reactor-netty's stream operations handler, so a
67+
// regression from addFirst(...) to addLast(...) must fail this test.
68+
assertThat(child.pipeline().get(Http2PingCloseRewrapHandler.HANDLER_NAME)).isNotNull();
69+
assertThat(child.pipeline().first()).isSameAs(Http2PingCloseRewrapHandler.INSTANCE);
70+
assertThat(child.pipeline().names().get(0)).isEqualTo(Http2PingCloseRewrapHandler.HANDLER_NAME);
71+
} finally {
72+
releaseAll(child);
73+
}
74+
}
75+
76+
@Test(groups = "unit")
77+
public void skipsInstall_whenKillSwitchOff() {
78+
// Disablement path: COSMOS.HTTP2_PING_HEALTH_ENABLED=false must remove the work,
79+
// not just its effect -- nothing is added to the child pipeline.
80+
System.setProperty(PING_HEALTH_ENABLED, "false");
81+
System.setProperty(PING_INTERVAL_SECONDS, "1");
82+
ChildChannel child = newChildStream();
83+
try {
84+
ReactorNettyClient.installHttp2PingCloseRewrapHandlerIfNeeded(
85+
child, HttpClientState.STREAM_CONFIGURED, http2Enabled());
86+
87+
assertThat(child.pipeline().get(Http2PingCloseRewrapHandler.HANDLER_NAME)).isNull();
88+
} finally {
89+
releaseAll(child);
90+
}
91+
}
92+
93+
@Test(groups = "unit")
94+
public void skipsInstall_whenPingIntervalNonPositive() {
95+
// A non-positive PING interval disables the PING sender, so there is no
96+
// PING-timeout close signal to rewrap -> install is skipped.
97+
System.setProperty(PING_HEALTH_ENABLED, "true");
98+
System.setProperty(PING_INTERVAL_SECONDS, "0");
99+
ChildChannel child = newChildStream();
100+
try {
101+
ReactorNettyClient.installHttp2PingCloseRewrapHandlerIfNeeded(
102+
child, HttpClientState.STREAM_CONFIGURED, http2Enabled());
103+
104+
assertThat(child.pipeline().get(Http2PingCloseRewrapHandler.HANDLER_NAME)).isNull();
105+
} finally {
106+
releaseAll(child);
107+
}
108+
}
109+
110+
@Test(groups = "unit")
111+
public void skipsInstall_whenHttp2DisabledForClient() {
112+
// PING-health globally on, but HTTP/2 disabled on this client's config ->
113+
// isPingHealthEffectivelyEnabled is false -> no install.
114+
enablePingHealth();
115+
ChildChannel child = newChildStream();
116+
try {
117+
ReactorNettyClient.installHttp2PingCloseRewrapHandlerIfNeeded(
118+
child, HttpClientState.STREAM_CONFIGURED, http2Disabled());
119+
120+
assertThat(child.pipeline().get(Http2PingCloseRewrapHandler.HANDLER_NAME)).isNull();
121+
} finally {
122+
releaseAll(child);
123+
}
124+
}
125+
126+
@Test(groups = "unit")
127+
public void skipsInstall_whenStateNotStreamConfigured() {
128+
// Even with PING-health fully enabled, non-STREAM_CONFIGURED lifecycle events
129+
// (e.g. the parent-channel CONFIGURED state) must not install the child handler.
130+
enablePingHealth();
131+
ChildChannel child = newChildStream();
132+
try {
133+
ReactorNettyClient.installHttp2PingCloseRewrapHandlerIfNeeded(
134+
child, HttpClientState.CONFIGURED, http2Enabled());
135+
136+
assertThat(child.pipeline().get(Http2PingCloseRewrapHandler.HANDLER_NAME)).isNull();
137+
} finally {
138+
releaseAll(child);
139+
}
140+
}
141+
142+
@Test(groups = "unit")
143+
public void skipsInstall_andDoesNotEvaluatePredicate_whenStateNotStreamConfigured() {
144+
// Guards the short-circuit order: the state check must run BEFORE the PING-health
145+
// predicate. A null http2Cfg would NPE inside isPingHealthEffectivelyEnabled (the
146+
// bridge accessor dereferences it), so a non-STREAM_CONFIGURED state must return
147+
// early without touching the predicate -- proving the cheap state gate stays first
148+
// and off the hot path.
149+
enablePingHealth();
150+
ChildChannel child = newChildStream();
151+
try {
152+
ReactorNettyClient.installHttp2PingCloseRewrapHandlerIfNeeded(
153+
child, HttpClientState.CONFIGURED, null);
154+
155+
assertThat(child.pipeline().get(Http2PingCloseRewrapHandler.HANDLER_NAME)).isNull();
156+
} finally {
157+
releaseAll(child);
158+
}
159+
}
160+
161+
@Test(groups = "unit")
162+
public void skipsInstall_whenChannelHasNoParent() {
163+
// Defensive parent() != null guard: a parent-less channel (not a real H2 child
164+
// stream) must not get the rewrap handler even at STREAM_CONFIGURED.
165+
enablePingHealth();
166+
EmbeddedChannel parentless = new EmbeddedChannel();
167+
try {
168+
ReactorNettyClient.installHttp2PingCloseRewrapHandlerIfNeeded(
169+
parentless, HttpClientState.STREAM_CONFIGURED, http2Enabled());
170+
171+
assertThat(parentless.pipeline().get(Http2PingCloseRewrapHandler.HANDLER_NAME)).isNull();
172+
} finally {
173+
parentless.finishAndReleaseAll();
174+
}
175+
}
176+
177+
@Test(groups = "unit")
178+
public void install_isIdempotent() {
179+
// The @Sharable handler must be installed at most once per child pipeline even if
180+
// the observe hook fires STREAM_CONFIGURED more than once for the same stream.
181+
enablePingHealth();
182+
ChildChannel child = newChildStream();
183+
try {
184+
ReactorNettyClient.installHttp2PingCloseRewrapHandlerIfNeeded(
185+
child, HttpClientState.STREAM_CONFIGURED, http2Enabled());
186+
ReactorNettyClient.installHttp2PingCloseRewrapHandlerIfNeeded(
187+
child, HttpClientState.STREAM_CONFIGURED, http2Enabled());
188+
189+
long count = child.pipeline().toMap().values().stream()
190+
.filter(h -> h instanceof Http2PingCloseRewrapHandler)
191+
.count();
192+
assertThat(count).isEqualTo(1);
193+
} finally {
194+
releaseAll(child);
195+
}
196+
}
197+
198+
private static void enablePingHealth() {
199+
System.setProperty(PING_HEALTH_ENABLED, "true");
200+
System.setProperty(PING_INTERVAL_SECONDS, "1");
201+
}
202+
203+
// Explicit enabled/disabled flags so the gate's HTTP/2 condition does not depend on
204+
// the ambient COSMOS.HTTP2_ENABLED system property. Constructing the config also runs
205+
// Http2ConnectionConfig's static initializer, registering the bridge accessor the gate
206+
// reads.
207+
private static Http2ConnectionConfig http2Enabled() {
208+
return new Http2ConnectionConfig().setEnabled(true);
209+
}
210+
211+
private static Http2ConnectionConfig http2Disabled() {
212+
return new Http2ConnectionConfig().setEnabled(false);
213+
}
214+
215+
private static ChildChannel newChildStream() {
216+
ChildChannel child = new ChildChannel();
217+
child.setParentChannel(new EmbeddedChannel());
218+
return child;
219+
}
220+
221+
private static void restore(String key, String prior) {
222+
if (prior == null) {
223+
System.clearProperty(key);
224+
} else {
225+
System.setProperty(key, prior);
226+
}
227+
}
228+
229+
private static void releaseAll(ChildChannel child) {
230+
Channel parent = child.parent();
231+
child.finishAndReleaseAll();
232+
if (parent instanceof EmbeddedChannel) {
233+
((EmbeddedChannel) parent).finishAndReleaseAll();
234+
}
235+
}
236+
237+
/**
238+
* An {@link EmbeddedChannel} whose {@link #parent()} can be set after construction,
239+
* mimicking an HTTP/2 child stream whose parent is the TCP connection channel. The
240+
* parent field is assigned post-construction (never read during the superclass
241+
* constructor), which avoids the captured-variable-before-super pitfall of an
242+
* anonymous subclass.
243+
*/
244+
private static final class ChildChannel extends EmbeddedChannel {
245+
private Channel parentChannel;
246+
247+
@Override
248+
public Channel parent() {
249+
return this.parentChannel;
250+
}
251+
252+
void setParentChannel(Channel parent) {
253+
this.parentChannel = parent;
254+
}
255+
}
256+
}

sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/http/Http2PingCloseRewrapHandler.java

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,12 @@
88
import io.netty.channel.ChannelInboundHandlerAdapter;
99

1010
/**
11-
* Per-request HTTP/2 child-stream handler that translates a parent-TCP-channel close
11+
* Per-child-stream HTTP/2 handler that translates a parent-TCP-channel close
1212
* driven by {@link Http2PingHandler} into a typed {@link Http2PingTimeoutChannelClosedException}.
1313
* <p>
1414
* Installed at the head of each H2 child-stream pipeline by
15-
* {@code ReactorNettyClient}'s {@code .doOnRequest(...)} hook. When the parent (TCP)
15+
* {@code ReactorNettyClient}'s {@code .observe(...)} hook on {@code STREAM_CONFIGURED}
16+
* (the per-child-stream lifecycle event). When the parent (TCP)
1617
* channel is closed by {@link Http2PingHandler} after consecutive PING-ACK timeouts
1718
* or consecutive PING-send failures, the H2 multiplex codec propagates
1819
* {@code channelInactive} to every child stream.
@@ -29,7 +30,8 @@
2930
* JVM-wide {@link #INSTANCE} is reused across all H2 child streams.
3031
* <p>
3132
* For non-H2 channels (parent is {@code null}) this handler is never installed; the
32-
* install site in {@code ReactorNettyClient} gates on {@code ch.parent() != null}.
33+
* install site in {@code ReactorNettyClient} only runs at {@code STREAM_CONFIGURED}
34+
* (H2 child streams) and additionally gates on {@code ch.parent() != null}.
3335
*/
3436
@ChannelHandler.Sharable
3537
final class Http2PingCloseRewrapHandler extends ChannelInboundHandlerAdapter {

sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/http/Http2PingHandler.java

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,8 @@ public class Http2PingHandler extends ChannelDuplexHandler {
5252
* <p>
5353
* Consumed by {@link Http2PingCloseRewrapHandler}, a {@code @Sharable} handler
5454
* installed at the head of each H2 child-stream pipeline by {@code
55-
* ReactorNettyClient.doOnRequest(...)}. When the parent channel closes with this
55+
* ReactorNettyClient}'s {@code .observe(...)} hook on {@code STREAM_CONFIGURED}. When
56+
* the parent channel closes with this
5657
* attribute set, the rewrap handler fires {@code exceptionCaught} with a typed
5758
* {@link Http2PingTimeoutChannelClosedException} so the in-flight request's
5859
* response {@code Mono} fails with the typed exception (instead of a bare

0 commit comments

Comments
 (0)