Enable default connection health monitoring for CRT HTTP clients by zoewangg · Pull Request #6818 · aws/aws-sdk-java-v2

zoewangg · 2026-03-25T23:24:20Z

Motivation and Context

The CRT HTTP client previously had no connection health monitoring when users didn't explicitly configure ConnectionHealthConfiguration. This meant stalled connections to non-responsive servers may hang indefinitely rather than being proactively terminated.

Modifications

AwsCrtHttpClientBase: When no ConnectionHealthConfiguration is provided, apply a default instead of null. The default sets minThroughputBytesPerSecond=1 and allowableThroughputFailureIntervalSeconds=max(readTimeout, writeTimeout) (30s with default timeouts).
AwsCrtConfigurationUtils: Extracted defaultConnectionHealthConfiguration as a static utility method for testability, consistent with existing patterns (buildSocketOptions, resolveCipherPreference).
AwsCrtHttpClient / AwsCrtAsyncHttpClient: Updated Javadoc for connectionHealthConfiguration to document the new default behavior.

Testing

Unit tests (AwsCrtConfigurationUtilsTest): Parameterized test covering 5 cases — equal timeouts, read > write, write > read, and two integer overflow/saturation cases.
Functional tests (NonResponsiveServerTest): ServerSocket-based mock that accepts connections but never responds, verifying both AwsCrtHttpClient (sync) and AwsCrtAsyncHttpClient (async) throw/complete exceptionally when the health monitor terminates the stalled connection.

Types of changes

Bug fix (non-breaking change which fixes an issue)

Checklist

I have read the CONTRIBUTING document
Local run of mvn clean install -pl :aws-crt-client succeeds
My code follows the code style of this project
My change requires a change to the Javadoc documentation
I have updated the Javadoc documentation accordingly
I have added tests to cover my changes
All new and existing tests passed
I have added a changelog entry

License

I confirm that this pull request can be released under the Apache 2 license

sonarqubecloud · 2026-03-26T00:11:18Z

Quality Gate failed

Failed conditions
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

alextwoods · 2026-03-26T17:27:52Z

...-client/src/main/java/software/amazon/awssdk/http/crt/internal/AwsCrtConfigurationUtils.java


+    public static HttpMonitoringOptions defaultConnectionHealthConfiguration(AttributeMap config) {
+        HttpMonitoringOptions httpMonitoringOptions = new HttpMonitoringOptions();
+        httpMonitoringOptions.setMinThroughputBytesPerSecond(1);


I'm slightly worried about this configuration - but I also don't understand the details of how this is measured and applied. I guess what I'm worried about is cases where healthy connections for say, a response with only a few bytes might round down to 0 and be < 1.

Good question! Kiro didn't think it's a risk (response below). 😛 cc @TingDaoK to help confirm.

Traced through the CRT C code to understand exactly how this works. The short answer is: minThroughputBytesPerSecond=1 is safe and won't kill healthy connections with small responses.

How throughput is measured

The monitor runs every 1 second. It doesn't simply divide bytes by 1 second — it divides bytes by the time a stream was actually active during that interval (throughput calculation):

bytespersecond = bytesread * 1000 / pendingreadintervalms + byteswritten * 1000 / pendingwriteintervalms

So if a 5-byte response completes in 50ms, throughput = 5 * 1000 / 50 = 100 bytes/sec. Even the worst case — 1 byte over a full 1000ms — gives exactly 1
byte/sec, which still passes the threshold.

When throughput is checked

There are also guards that prevent false positives:

HTTP/1: Throughput is only checked if the same stream ID was active in both the current and previous 1-second tick. A short-lived request that starts and completes within a single tick is never checked. A new stream that just started is also
skipped (first tick → no previous ID match).

HTTP/2: Throughput is only checked if there was always at least one active stream throughout the entire interval (was_inactive == false). If the last stream completes mid-interval, the check is skipped.

So for a small/fast response, either:

It completes within one tick → not checked at all (stream ID mismatch for H1, or was_inactive=true for H2)

It spans two ticks → the bytes transferred will produce a non-zero throughput since pending_ms is proportionally small

Nice! In that case I think the 1 byte setting makes sense :-)

github-actions · 2026-03-31T17:24:25Z

This pull request has been closed and the conversation has been locked. Comments on closed PRs are hard for our team to see. If you need more assistance, please open a new issue that references this one.

Enable default connection health monitoring for CRT HTTP clients

4920601

zoewangg requested a review from a team as a code owner March 25, 2026 23:24

alextwoods reviewed Mar 26, 2026

View reviewed changes

alextwoods approved these changes Mar 26, 2026

View reviewed changes

zoewangg added the api-surface-area-approved-by-team Indicate API surface area introduced by this PR has been approved by team label Mar 31, 2026

zoewangg enabled auto-merge March 31, 2026 16:31

zoewangg added this pull request to the merge queue Mar 31, 2026

Merged via the queue into master with commit a5cdcb8 Mar 31, 2026
49 of 52 checks passed

github-actions bot locked as resolved and limited conversation to collaborators Mar 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable default connection health monitoring for CRT HTTP clients#6818

Enable default connection health monitoring for CRT HTTP clients#6818
zoewangg merged 1 commit intomasterfrom
zoewang/enableHttpMonitoringByDefault

zoewangg commented Mar 25, 2026

Uh oh!

sonarqubecloud bot commented Mar 26, 2026

Uh oh!

alextwoods Mar 26, 2026

Uh oh!

zoewangg Mar 26, 2026

Uh oh!

alextwoods Mar 26, 2026

Uh oh!

Uh oh!

github-actions bot commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zoewangg commented Mar 25, 2026

Motivation and Context

Modifications

Testing

Types of changes

Checklist

License

Uh oh!

sonarqubecloud bot commented Mar 26, 2026

Quality Gate failed

Uh oh!

alextwoods Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

zoewangg Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

alextwoods Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants