fix(compose): close docker clients to prevent goroutine leaks#3661
Conversation
Close dockerCli.Client() and dockerClient in Down() to release net/http persistConn.readLoop/writeLoop goroutines that were leaking on every Up+Down cycle. Store the *command.DockerCli instance in the DockerCompose struct so its HTTP transport connections can be closed during teardown. Previously the dockerCli was created in NewDockerComposeWith() but never retained, so its HTTP client connections could not be released. Adds TestDockerComposeGoroutineLeak to demonstrate the fix: it snapshots goroutines before an Up+Down cycle and asserts that no net/http persistConn goroutines remain afterward. Fixes testcontainers#2008 Co-Authored-By: Claude <noreply@anthropic.com>
✅ Deploy Preview for testcontainers-go ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
Summary by CodeRabbit
WalkthroughStored the initialized Docker CLI on DockerCompose, added a Close() method to close the CLI transport and the testcontainers Docker client, added a goroutine-leak unit test using goleak, and added go.uber.org/goleak as a direct module dependency. Changes
Sequence Diagram(s)sequenceDiagram
participant Test
participant DockerCompose
participant DockerCLI
participant DockerClient
participant Goleak
Test->>DockerCompose: NewDockerCompose / Up(ctx, Wait)
DockerCompose->>DockerClient: create client / composeService
DockerCompose->>DockerCLI: initialize & store CLI
Test->>DockerCompose: Down(ctx, RemoveVolumes...)
DockerCompose->>DockerClient: stop/remove services
Test->>DockerCompose: Close()
DockerCompose->>DockerCLI: DockerCli.Client().Close()
DockerCompose->>DockerClient: Close()
DockerCompose-->>Test: Close() returns (err?)
Test->>Goleak: VerifyNone()
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
modules/compose/compose_goroutine_test.go (2)
61-64:RenderComposeSimpleerror andtime.Sleepbaseline.Two small things:
path, _ := RenderComposeSimple(t)drops the returned error — if the helper ever starts returning a non-nil error, the test would silently run against an empty path. Userequire.NoErroron it (or match however sibling tests call it).time.Sleep(200 * time.Millisecond)as a "quiescence" baseline is inherently racy; prefergoleak(see sibling comment) or at least retry the snapshot until stable.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modules/compose/compose_goroutine_test.go` around lines 61 - 64, Replace the silent error drop from RenderComposeSimple by checking its error return (call RenderComposeSimple(t) into path and err and assert with require.NoError or the same helper used in sibling tests) so the test fails on helper errors; also remove the fixed time.Sleep quiescence and either use goleak to assert goroutine stability (as other tests do) or implement a small retry loop that re-takes the snapshot until it is stable to avoid the racy 200ms sleep in the baseline.
60-96: Consider usinggo.uber.org/goleakfor the regression test.The linked issue (
#2008) itself usesgoleak.VerifyNone(t), which is the de-facto standard for this kind of assertion in the Go ecosystem:
- It polls with retry/backoff instead of relying on
time.Sleep(200ms)/time.Sleep(500ms), which are flaky under loaded CI.- It diffs goroutines by stack signature across retries and ignores well-known background goroutines out of the box.
- Much less custom code to maintain (the
goroutineSnapshot/goroutineLeakshelpers go away).Rough shape:
func TestDockerComposeGoroutineLeak(t *testing.T) { defer goleak.VerifyNone(t, goleak.IgnoreTopFunction("..."), // if any known-benign background goroutines surface ) path, _ := RenderComposeSimple(t) compose, err := NewDockerCompose(path) require.NoError(t, err) ctx := context.Background() require.NoError(t, compose.Up(ctx, Wait(true))) require.NoError(t, compose.Down(ctx, RemoveOrphans(true), RemoveVolumes(true))) }This would also pick up any other future transport/goroutine leaks, not just
persistConn.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modules/compose/compose_goroutine_test.go` around lines 60 - 96, Replace the custom goroutine snapshot/compare logic in TestDockerComposeGoroutineLeak with go.uber.org/goleak.VerifyNone to make the test robust: remove calls to goroutineSnapshot and goroutineLeaks and the manual sleeps and filtering, add a deferred goleak.VerifyNone(t, ...) at the top of TestDockerComposeGoroutineLeak (optionally adding goleak.IgnoreTopFunction entries if there are known benign goroutines), then keep the same flow that calls RenderComposeSimple, NewDockerCompose, compose.Up and compose.Down and let goleak detect any leaked goroutines instead of inspecting "persistConn" stacks.modules/compose/compose_api.go (1)
272-281: Consider a dedicatedClose()method instead of tying cleanup toDown().Tearing down the HTTP transports inside
Down()makes theDockerComposeinstance unusable for any subsequent operation. Re-invokingUp(), callingServiceContainer(), or even callingDown()again will hit a transport whose idle connections have been closed, and any new request from that client will fail (use of closed network connection/http: connection has been closed). This is a behavior change versus the previous API, and it makes the object effectively single-use.The prior plan (captured in an earlier follow-up task) was to add a dedicated
Close()method onDockerComposethat closesdockerClient(and ideallyprovider). That pattern is friendlier tot.Cleanup(compose.Close)and does not conflate lifecycle with stack teardown.Also note:
providerholds the samedockerClientviaSetClient, and is itself not closed here — if you keep the current approach, consider closingprovideras well to avoid a partial cleanup surface.♻️ Suggested shape
// Close releases HTTP transport resources held by the underlying Docker // clients. After Close, the DockerCompose instance must not be reused. func (d *DockerCompose) Close() error { d.lock.Lock() defer d.lock.Unlock() var errs []error if d.dockerCli != nil { if cli := d.dockerCli.Client(); cli != nil { if err := cli.Close(); err != nil { errs = append(errs, fmt.Errorf("close docker cli client: %w", err)) } } } if d.dockerClient != nil { if err := d.dockerClient.Close(); err != nil { errs = append(errs, fmt.Errorf("close docker client: %w", err)) } } // Optionally: if d.provider != nil { errs = append(errs, d.provider.Close()) } return errors.Join(errs...) }Callers would then do
t.Cleanup(func() { _ = compose.Close() })(or similar), andDown()reverts to only tearing down the stack.Based on learnings from PR
#3568: "A follow-up task has been filed to add aClose()method toDockerComposethat callsprovider.Close()and/ordockerClient.Close()."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modules/compose/compose_api.go` around lines 272 - 281, The defer in DockerCompose.Down currently closes HTTP transports (d.dockerCli.Client().Close() and d.dockerClient.Close()), which makes the DockerCompose unusable after Down; extract that cleanup into a new DockerCompose.Close method that closes d.dockerCli.Client(), d.dockerClient (and optionally d.provider), aggregates/returns errors, and uses d.lock to guard concurrent calls; revert Down to only call d.composeService.Down (no transport closes) so callers can call compose.Close (e.g. t.Cleanup(func(){ _ = compose.Close() })) when they want to release HTTP resources.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modules/compose/compose_goroutine_test.go`:
- Around line 17-38: The goroutineSnapshot function can silently return
truncated stack data because runtime.Stack(buf, true) may fill the buffer;
update goroutineSnapshot to either (preferred) replace the custom snapshot with
go.uber.org/goleak's snapshot/check utilities, or (if keeping the custom logic)
iteratively grow the buffer and call runtime.Stack until the returned byte count
n is less than len(buf) (e.g., double the buffer on full fill) before parsing
stacks; reference the goroutineSnapshot function and runtime.Stack call when
making the change so keys are built from a complete, non-truncated dump.
---
Nitpick comments:
In `@modules/compose/compose_api.go`:
- Around line 272-281: The defer in DockerCompose.Down currently closes HTTP
transports (d.dockerCli.Client().Close() and d.dockerClient.Close()), which
makes the DockerCompose unusable after Down; extract that cleanup into a new
DockerCompose.Close method that closes d.dockerCli.Client(), d.dockerClient (and
optionally d.provider), aggregates/returns errors, and uses d.lock to guard
concurrent calls; revert Down to only call d.composeService.Down (no transport
closes) so callers can call compose.Close (e.g. t.Cleanup(func(){ _ =
compose.Close() })) when they want to release HTTP resources.
In `@modules/compose/compose_goroutine_test.go`:
- Around line 61-64: Replace the silent error drop from RenderComposeSimple by
checking its error return (call RenderComposeSimple(t) into path and err and
assert with require.NoError or the same helper used in sibling tests) so the
test fails on helper errors; also remove the fixed time.Sleep quiescence and
either use goleak to assert goroutine stability (as other tests do) or implement
a small retry loop that re-takes the snapshot until it is stable to avoid the
racy 200ms sleep in the baseline.
- Around line 60-96: Replace the custom goroutine snapshot/compare logic in
TestDockerComposeGoroutineLeak with go.uber.org/goleak.VerifyNone to make the
test robust: remove calls to goroutineSnapshot and goroutineLeaks and the manual
sleeps and filtering, add a deferred goleak.VerifyNone(t, ...) at the top of
TestDockerComposeGoroutineLeak (optionally adding goleak.IgnoreTopFunction
entries if there are known benign goroutines), then keep the same flow that
calls RenderComposeSimple, NewDockerCompose, compose.Up and compose.Down and let
goleak detect any leaked goroutines instead of inspecting "persistConn" stacks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 283fdd56-afe9-4a5d-b527-b73f15b546f5
📒 Files selected for processing (3)
modules/compose/compose.gomodules/compose/compose_api.gomodules/compose/compose_goroutine_test.go
- Extract Close() method on DockerCompose for explicit resource cleanup, removing the close defer from Down() to keep teardown and resource release concerns separate. Down() is no longer single-use. - Add go.uber.org/goleak as a direct dependency and replace the hand-rolled goroutineSnapshot/goroutineLeaks helpers with goleak.VerifyNone, which handles the buffer-growth loop, exponential retry and cleaner reporting. - Use goleak.IgnoreCurrent() (snapshotted after NewDockerCompose) plus IgnoreTopFunction for Reaper goroutines to scope the leak check to goroutines introduced by the Up+Down lifecycle only. - Call compose.Close() in t.Cleanup, after Down(), to release HTTP transport connections from both dockerCli.Client() and dockerClient. Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (3)
modules/compose/compose_api.go (1)
275-299: Consider makingCloseidempotent to tolerate double-close.
Close()doesn't nil outd.dockerCli/d.dockerClientafter closing, so a second call will attempt to close already-closed transports. Depending on how callers wire cleanup (e.g. adefer compose.Close()plus an explicitcompose.Close()in error paths, or at.Cleanupin addition to manual invocation), this can surface spurious errors. A simple guard — nil the fields once closed, or track aclosedflag — would make the API safer and match the contract documented in the comment.♻️ Possible approach
func (d *DockerCompose) Close() error { d.lock.Lock() defer d.lock.Unlock() var errs []error if d.dockerCli != nil { if cli := d.dockerCli.Client(); cli != nil { if err := cli.Close(); err != nil { errs = append(errs, fmt.Errorf("close docker cli client: %w", err)) } } + d.dockerCli = nil } if d.dockerClient != nil { if err := d.dockerClient.Close(); err != nil { errs = append(errs, fmt.Errorf("close docker client: %w", err)) } + d.dockerClient = nil } return errors.Join(errs...) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modules/compose/compose_api.go` around lines 275 - 299, Make DockerCompose.Close idempotent by acquiring d.lock, checking and closing d.dockerCli.Client() and d.dockerClient only if non-nil, and then setting d.dockerCli and d.dockerClient to nil (or setting a boolean closed flag) after successful close so subsequent Close() calls do nothing; update the DockerCompose.Close implementation to clear those fields under the lock (and still collect/Join errors via errors.Join) so double-close is tolerated without spurious errors.modules/compose/compose_goroutine_test.go (2)
39-39: Fragile reference to an internal goroutine top-function.
goleak.IgnoreTopFunction("github.com/testcontainers/testcontainers-go.(*Reaper).connect.func1")pins on an unexported closure name that will silently become a no-op if the Reaper implementation is refactored (e.g. the anonymous function gains a sibling and becomesfunc2, orconnectis renamed). Consider linking the tracking issue in the comment at Lines 20–22 so a future maintainer knows to revisit this when the Reaper leak is addressed, and/or adding aTODO(#2008)here so it surfaces when the unrelated Reaper fix lands.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modules/compose/compose_goroutine_test.go` at line 39, The test currently calls goleak.IgnoreTopFunction("github.com/testcontainers/testcontainers-go.(*Reaper).connect.func1") which references an internal anonymous closure and is fragile; update the test by keeping the IgnoreTopFunction call but add a clear comment referencing the tracking issue and a TODO marker (e.g., TODO(`#2008`)) immediately above the call so future maintainers know to remove or revisit the ignore when the Reaper leak is fixed; mention the exact symbol "goleak.IgnoreTopFunction" and the ignored name "github.com/testcontainers/testcontainers-go.(*Reaper).connect.func1" in the comment for easy location.
35-46: Consider registeringDownviat.Cleanupfor robustness.
compose.Down(...)is called inline at Line 46 andcompose.Close()int.Cleanupat Line 36. If the test later grows assertions betweenUpandDown(or an in-between call panics),Downis skipped and containers/volumes leak until Ryuk reaps them. Sincet.Cleanupruns LIFO, registeringClosefirst and thenDownafter theUpcall yieldsDown → Closeon teardown even on early failure:♻️ Possible refactor
ignoreExisting := goleak.IgnoreCurrent() t.Cleanup(func() { require.NoError(t, compose.Close(), "compose.Close()") goleak.VerifyNone(t, ignoreExisting, goleak.IgnoreTopFunction("github.com/testcontainers/testcontainers-go.(*Reaper).connect.func1"), ) }) ctx := context.Background() require.NoError(t, compose.Up(ctx, Wait(true)), "compose.Up()") - require.NoError(t, compose.Down(ctx, RemoveOrphans(true), RemoveVolumes(true)), "compose.Down()") + t.Cleanup(func() { + require.NoError(t, compose.Down(ctx, RemoveOrphans(true), RemoveVolumes(true)), "compose.Down()") + })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modules/compose/compose_goroutine_test.go` around lines 35 - 46, The test currently calls compose.Down(ctx, RemoveOrphans(true), RemoveVolumes(true)) inline, which can be skipped on panic and leak containers; instead, after a successful require.NoError(t, compose.Up(ctx, Wait(true))) register a t.Cleanup(func() { require.NoError(t, compose.Down(ctx, RemoveOrphans(true), RemoveVolumes(true)), "compose.Down()") }) so teardown runs LIFO (Down then Close) even on early failures; keep the existing t.Cleanup that calls require.NoError(t, compose.Close(), "compose.Close()") so Close remains the last cleanup step.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@modules/compose/compose_api.go`:
- Around line 275-299: Make DockerCompose.Close idempotent by acquiring d.lock,
checking and closing d.dockerCli.Client() and d.dockerClient only if non-nil,
and then setting d.dockerCli and d.dockerClient to nil (or setting a boolean
closed flag) after successful close so subsequent Close() calls do nothing;
update the DockerCompose.Close implementation to clear those fields under the
lock (and still collect/Join errors via errors.Join) so double-close is
tolerated without spurious errors.
In `@modules/compose/compose_goroutine_test.go`:
- Line 39: The test currently calls
goleak.IgnoreTopFunction("github.com/testcontainers/testcontainers-go.(*Reaper).connect.func1")
which references an internal anonymous closure and is fragile; update the test
by keeping the IgnoreTopFunction call but add a clear comment referencing the
tracking issue and a TODO marker (e.g., TODO(`#2008`)) immediately above the call
so future maintainers know to remove or revisit the ignore when the Reaper leak
is fixed; mention the exact symbol "goleak.IgnoreTopFunction" and the ignored
name "github.com/testcontainers/testcontainers-go.(*Reaper).connect.func1" in
the comment for easy location.
- Around line 35-46: The test currently calls compose.Down(ctx,
RemoveOrphans(true), RemoveVolumes(true)) inline, which can be skipped on panic
and leak containers; instead, after a successful require.NoError(t,
compose.Up(ctx, Wait(true))) register a t.Cleanup(func() { require.NoError(t,
compose.Down(ctx, RemoveOrphans(true), RemoveVolumes(true)), "compose.Down()")
}) so teardown runs LIFO (Down then Close) even on early failures; keep the
existing t.Cleanup that calls require.NoError(t, compose.Close(),
"compose.Close()") so Close remains the last cleanup step.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e36792a4-2ab4-4ac8-a305-e20578bcadfc
📒 Files selected for processing (3)
modules/compose/compose_api.gomodules/compose/compose_goroutine_test.gomodules/compose/go.mod
- Make Close() idempotent by nil-ing dockerCli and dockerClient after closing them, so double-close is a no-op. - Add TODO(testcontainers#2008) comment on the IgnoreTopFunction filter explaining that it pins on an internal closure name that may change if Reaper is refactored, and should be removed when the Reaper leak is fixed. - Register compose.Down in a t.Cleanup (LIFO after Close) so teardown runs even if the test panics after Up succeeds. Co-Authored-By: Claude <noreply@anthropic.com>
What does this PR do?
Closes the dockerCli and dockerClient instances in the compose module to release net/http persistConn.readLoop/writeLoop goroutines that were leaking on every Up+Down cycle.
Why is it important?
Every call to Up() followed by Down() leaks 2 HTTP transport goroutines permanently. In long-running test suites this accumulates unboundedly.
Related issues
Fixes #2008