Enable default connection health monitoring for CRT HTTP clients#6818
Enable default connection health monitoring for CRT HTTP clients#6818
Conversation
|
|
|
||
| public static HttpMonitoringOptions defaultConnectionHealthConfiguration(AttributeMap config) { | ||
| HttpMonitoringOptions httpMonitoringOptions = new HttpMonitoringOptions(); | ||
| httpMonitoringOptions.setMinThroughputBytesPerSecond(1); |
There was a problem hiding this comment.
I'm slightly worried about this configuration - but I also don't understand the details of how this is measured and applied. I guess what I'm worried about is cases where healthy connections for say, a response with only a few bytes might round down to 0 and be < 1.
There was a problem hiding this comment.
Good question! Kiro didn't think it's a risk (response below). 😛 cc @TingDaoK to help confirm.
Traced through the CRT C code to understand exactly how this works. The short answer is: minThroughputBytesPerSecond=1 is safe and won't kill healthy connections with small responses.
How throughput is measured
The monitor runs every 1 second. It doesn't simply divide bytes by 1 second — it divides bytes by the time a stream was actually active during that interval (throughput calculation):
bytespersecond = bytesread * 1000 / pendingreadintervalms
+ byteswritten * 1000 / pendingwriteintervalms
So if a 5-byte response completes in 50ms, throughput = 5 * 1000 / 50 = 100 bytes/sec. Even the worst case — 1 byte over a full 1000ms — gives exactly 1
byte/sec, which still passes the threshold.
When throughput is checked
There are also guards that prevent false positives:
- HTTP/1: Throughput is only checked if the same stream ID was active in both the current and previous 1-second tick. A short-lived request that starts and completes within a single tick is never checked. A new stream that just started is also
skipped (first tick → no previous ID match). - HTTP/2: Throughput is only checked if there was always at least one active stream throughout the entire interval (
was_inactive == false). If the last stream completes mid-interval, the check is skipped.
So for a small/fast response, either:
- It completes within one tick → not checked at all (stream ID mismatch for H1, or
was_inactive=truefor H2) - It spans two ticks → the bytes transferred will produce a non-zero throughput since
pending_msis proportionally small
There was a problem hiding this comment.
Nice! In that case I think the 1 byte setting makes sense :-)
|
This pull request has been closed and the conversation has been locked. Comments on closed PRs are hard for our team to see. If you need more assistance, please open a new issue that references this one. |




Motivation and Context
The CRT HTTP client previously had no connection health monitoring when users didn't explicitly configure
ConnectionHealthConfiguration. This meant stalled connections to non-responsive servers may hang indefinitely rather than being proactively terminated.Modifications
AwsCrtHttpClientBase: When noConnectionHealthConfigurationis provided, apply a default instead ofnull. The default setsminThroughputBytesPerSecond=1andallowableThroughputFailureIntervalSeconds=max(readTimeout, writeTimeout)(30s with default timeouts).AwsCrtConfigurationUtils: ExtracteddefaultConnectionHealthConfigurationas a static utility method for testability, consistent with existing patterns (buildSocketOptions,resolveCipherPreference).AwsCrtHttpClient/AwsCrtAsyncHttpClient: Updated Javadoc forconnectionHealthConfigurationto document the new default behavior.Testing
AwsCrtConfigurationUtilsTest): Parameterized test covering 5 cases — equal timeouts, read > write, write > read, and two integer overflow/saturation cases.NonResponsiveServerTest):ServerSocket-based mock that accepts connections but never responds, verifying bothAwsCrtHttpClient(sync) andAwsCrtAsyncHttpClient(async) throw/complete exceptionally when the health monitor terminates the stalled connection.Types of changes
Checklist
mvn clean install -pl :aws-crt-clientsucceedsLicense