Skip to content

Fix CRT HTTP client connection leak when aborting response stream#6876

Merged
zoewangg merged 2 commits intomasterfrom
zoewang/crtConnectionFix
Apr 18, 2026
Merged

Fix CRT HTTP client connection leak when aborting response stream#6876
zoewangg merged 2 commits intomasterfrom
zoewang/crtConnectionFix

Conversation

@zoewangg
Copy link
Copy Markdown
Contributor

@zoewangg zoewangg commented Apr 17, 2026

Motivation and Context

Fix connection leak in CRT HTTP client when aborting response streams.

The stream manager refactoring replaced connection.shutdown() + connection.close() with
stream.close(), which only releases the refcount without forcing the connection to shut down.

This caused connections to leak when customers called abort() before fully consuming a
GetObject response stream, eventually leading to
HttpException: Connection Manager failed to acquire a connection within the defined timeout.

Modifications

  • Split ResponseHandlerHelper.closeStream() into two methods:
    • releaseConnection() — calls stream.close() to return the connection to the pool. Used when the response completes successfully and the stream has been
      fully consumed.
    • closeConnection() — calls stream.cancel() then stream.close() to force-shutdown the connection. Used when:
      • The CRT reports a non-success error code in onResponseComplete
      • The publisher fails to write response data to the subscriber
      • The response stream is closed or aborted by the caller before being fully consumed
      • The request future is cancelled
  • Updated CrtResponseAdapter and InputStreamAdaptingHttpStreamResponseHandler to call the appropriate method on each path
  • Bumped aws-crt version to 0.45.1 for the new stream.cancel() API

Testing

  • Added ResponseHandlerHelperTest — unit tests for ordering, idempotency, and mutual exclusion of releaseConnection/closeConnection

  • Updated existing handler tests (BaseHttpStreamResponseHandlerTest, InputStreamAdaptingHttpStreamResponseHandlerTest) to verify cancel() + close()
    ordering on error/abort paths

  • Added GetObjectResponseInputStreamConnectionManagementTest — functional tests with WireMock parameterized across all HTTP clients (Apache, UrlConnection, CRT
    sync, Netty, CRT async) covering both abort and happy-path (fully consumed) scenarios

  • Added unit tests

  • Added functional tests

  • Ran existing tests

  • Verified fix against reproduction case from ticket

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)

Checklist

  • I have read the CONTRIBUTING document
  • Local run of mvn clean install -pl :{module} succeeds for affected modules
  • My code follows the code style of this project
  • My change requires a change to the Javadoc documentation
  • I have updated the Javadoc documentation accordingly
  • I have added tests to cover my changes
  • All new and existing tests passed
  • I have added a changelog entry
  • My change is to implement 1.11 parity feature and I have updated
    LaunchChangelog

License

  • I confirm that this pull request can be released under the Apache 2 license

  The stream manager refactoring replaced connection.shutdown() +
  connection.close() with stream.close(), which only releases the
  refcount without forcing the connection to shut down. This caused
  connections to leak when customers called abort() before fully
  consuming a GetObject response stream.

  Split ResponseHandlerHelper.closeStream() into two methods:
  - releaseConnection(): calls stream.close() to return the connection
    to the pool (used on successful completion)
  - closeConnection(): calls stream.cancel() then stream.close() to
    force-shutdown the connection (used on error/abort paths)

  Bump aws-crt version to 0.45.1 for the new stream.cancel() API.
@zoewangg zoewangg requested a review from a team as a code owner April 17, 2026 20:38
responsePublisher.complete().whenComplete((result, failure) -> {
if (failure != null) {
failResponseHandlerAndFuture(failure);
responseHandlerHelper.closeStream();
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this because it's a no-op because releaseConnection is invoked at line 125 anyway


crtResponseHandler.onResponseComplete(httpStream, 0);
assertThatThrownBy(() -> requestFuture.join()).isInstanceOf(CancellationException.class).hasMessageContaining(
assertThatThrownBy(() -> requestFuture.join()).isInstanceOf(CancellationException.class).hasStackTraceContaining(
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this to make the build pass in Java 25. In Java 25, the error message is just "join".

java.util.concurrent.CancellationException: join

	at java.base/java.util.concurrent.CompletableFuture.reportJoin(CompletableFuture.java:454)
	at java.base/java.util.concurrent.CompletableFuture.join(CompletableFuture.java:2139)
	at software.amazon.awssdk.http.crt.internal.CrtResponseHandlerTest.onResponseComplete_publisherCancelled_closesStream(CrtResponseHandlerTest.java:71)
	at java.base/java.lang.reflect.Method.invoke(Method.java:565)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1604)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1604)
Caused by: java.util.concurrent.CancellationException: subscription has been cancelled.

Copy link
Copy Markdown

@TingDaoK TingDaoK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

stubFor(get(anyUrl())
.inScenario("SucceedThenFail")
.whenScenarioStateIs("first request")
.willSetStateTo("second request")
Copy link
Copy Markdown
Contributor Author

@zoewangg zoewangg Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test has always been broken; the WireMock scenario only had stubs for two GET requests, but the CRT S3 client retries the faulted request multiple times, causing subsequent retries to hit an unmatched stub and receive a 404.

Previously, the tiny buffer pool (from initialReadBufferSizeInBytes(5L)) couldn't hold the 404 error body, masking it as a socket error; CRT 0.45.1 no longer reuses pooled buffers for error responses, so the 404 is now correctly parsed as an S3Exception.

Fixed the test so that we are testing the intended behavior

@sonarqubecloud
Copy link
Copy Markdown

@zoewangg zoewangg enabled auto-merge April 17, 2026 23:07
@zoewangg zoewangg added this pull request to the merge queue Apr 17, 2026
Merged via the queue into master with commit 7af9713 Apr 18, 2026
39 of 41 checks passed
@github-actions
Copy link
Copy Markdown

This pull request has been closed and the conversation has been locked. Comments on closed PRs are hard for our team to see. If you need more assistance, please open a new issue that references this one.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 18, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants