Skip to content

Enable default connection health monitoring for CRT HTTP clients#6818

Merged
zoewangg merged 1 commit intomasterfrom
zoewang/enableHttpMonitoringByDefault
Mar 31, 2026
Merged

Enable default connection health monitoring for CRT HTTP clients#6818
zoewangg merged 1 commit intomasterfrom
zoewang/enableHttpMonitoringByDefault

Conversation

@zoewangg
Copy link
Copy Markdown
Contributor

Motivation and Context

The CRT HTTP client previously had no connection health monitoring when users didn't explicitly configure ConnectionHealthConfiguration. This meant stalled connections to non-responsive servers may hang indefinitely rather than being proactively terminated.

Modifications

  • AwsCrtHttpClientBase: When no ConnectionHealthConfiguration is provided, apply a default instead of null. The default sets minThroughputBytesPerSecond=1 and allowableThroughputFailureIntervalSeconds=max(readTimeout, writeTimeout) (30s with default timeouts).
  • AwsCrtConfigurationUtils: Extracted defaultConnectionHealthConfiguration as a static utility method for testability, consistent with existing patterns (buildSocketOptions, resolveCipherPreference).
  • AwsCrtHttpClient / AwsCrtAsyncHttpClient: Updated Javadoc for connectionHealthConfiguration to document the new default behavior.

Testing

  • Unit tests (AwsCrtConfigurationUtilsTest): Parameterized test covering 5 cases — equal timeouts, read > write, write > read, and two integer overflow/saturation cases.
  • Functional tests (NonResponsiveServerTest): ServerSocket-based mock that accepts connections but never responds, verifying both AwsCrtHttpClient (sync) and AwsCrtAsyncHttpClient (async) throw/complete exceptionally when the health monitor terminates the stalled connection.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)

Checklist

  • I have read the CONTRIBUTING document
  • Local run of mvn clean install -pl :aws-crt-client succeeds
  • My code follows the code style of this project
  • My change requires a change to the Javadoc documentation
  • I have updated the Javadoc documentation accordingly
  • I have added tests to cover my changes
  • All new and existing tests passed
  • I have added a changelog entry

License

  • I confirm that this pull request can be released under the Apache 2 license

@zoewangg zoewangg requested a review from a team as a code owner March 25, 2026 23:24
@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE


public static HttpMonitoringOptions defaultConnectionHealthConfiguration(AttributeMap config) {
HttpMonitoringOptions httpMonitoringOptions = new HttpMonitoringOptions();
httpMonitoringOptions.setMinThroughputBytesPerSecond(1);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly worried about this configuration - but I also don't understand the details of how this is measured and applied. I guess what I'm worried about is cases where healthy connections for say, a response with only a few bytes might round down to 0 and be < 1.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! Kiro didn't think it's a risk (response below). 😛 cc @TingDaoK to help confirm.


Traced through the CRT C code to understand exactly how this works. The short answer is: minThroughputBytesPerSecond=1 is safe and won't kill healthy connections with small responses.

How throughput is measured

The monitor runs every 1 second. It doesn't simply divide bytes by 1 second — it divides bytes by the time a stream was actually active during that interval (throughput calculation):

   bytespersecond = bytesread * 1000 / pendingreadintervalms
                    + byteswritten * 1000 / pendingwriteintervalms

So if a 5-byte response completes in 50ms, throughput = 5 * 1000 / 50 = 100 bytes/sec. Even the worst case — 1 byte over a full 1000ms — gives exactly 1
byte/sec, which still passes the threshold.

When throughput is checked

There are also guards that prevent false positives:

  • HTTP/1: Throughput is only checked if the same stream ID was active in both the current and previous 1-second tick. A short-lived request that starts and completes within a single tick is never checked. A new stream that just started is also
    skipped (first tick → no previous ID match).
  • HTTP/2: Throughput is only checked if there was always at least one active stream throughout the entire interval (was_inactive == false). If the last stream completes mid-interval, the check is skipped.

So for a small/fast response, either:

  1. It completes within one tick → not checked at all (stream ID mismatch for H1, or was_inactive=true for H2)
  2. It spans two ticks → the bytes transferred will produce a non-zero throughput since pending_ms is proportionally small

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! In that case I think the 1 byte setting makes sense :-)

@zoewangg zoewangg added the api-surface-area-approved-by-team Indicate API surface area introduced by this PR has been approved by team label Mar 31, 2026
@zoewangg zoewangg enabled auto-merge March 31, 2026 16:31
@zoewangg zoewangg added this pull request to the merge queue Mar 31, 2026
Merged via the queue into master with commit a5cdcb8 Mar 31, 2026
49 of 52 checks passed
@github-actions
Copy link
Copy Markdown

This pull request has been closed and the conversation has been locked. Comments on closed PRs are hard for our team to see. If you need more assistance, please open a new issue that references this one.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 31, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

api-surface-area-approved-by-team Indicate API surface area introduced by this PR has been approved by team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants